问题描述
这里是 JS 的绝对初学者。我需要帮助从 DOM 中提取文本,看起来像这样。 提取可以通过 querySelectorAll() 或 getElementsByTagName() 完成。但我正在寻找的是创建一个对象,其中每个 h2 元素作为键,跨度作为它的值。我不知道如何实现这一点。任何建议都会非常有帮助。
<div class ="product-list">
<div class="row column">
<div class="column medium-9 large-10">
<h2 class="product-name">Products List 1</h2>
</div>
</div>
<div class="row">
<span>First Product</span>
</div>
<div class="row">
<span> Second Product</span>
</div>
.
.
.
<div class="row">
<span>
Nth Product
</span>
</div>
<div class="row column">
<div class="column medium-9 large-10">
<h2 class="product-name">Products List 2</h2>
</div>
</div>
<div class="row">
<span>Thrid Product</span>
</div>
<div class="row">
<span> Fourth Product</span>
</div>
.
.
.
<div class="row">
<span>
Nth Product
</span>
</div>
</div>
从这个 DOM 我需要将数据存储为
[
Products List 1 :[First Product,Second Product...Nth Product],Products List 2 :[Third Product,Fourth Product...Nth Product]
]
JS:
const products=await page.evaluate(()=>{
const productsArray=[];
var pdName1=document.querySelectorAll('div.column > h2.product-name');
var pdName2=document.querySelectorAll("div.row > span")
pdName2.forEach(query=>{
productArray.push(query.innerText)
})
return productArray
})
解决方法
你可以试试这样的:
import puppeteer from 'puppeteer';
const browser = await puppeteer.launch();
const html = `
<!doctype html>
<html>
<head><meta charset='UTF-8'><title>Test</title></head>
<body>
<div class ="product-list">
<div class="row column">
<div class="column medium-9 large-10">
<h2 class="product-name">Products List 1</h2>
</div>
</div>
<div class="row"><span>First Product</span></div>
<div class="row"><span> Second Product</span></div>
<div class="row"><span>Nth Product</span></div>
<div class="row column">
<div class="column medium-9 large-10">
<h2 class="product-name">Products List 2</h2>
</div>
</div>
<div class="row"><span>Thrid Product</span></div>
<div class="row"><span> Fourth Product</span></div>
<div class="row"><span>Nth Product</span></div>
</div>
</body>
</html>`;
try {
const [page] = await browser.pages();
await page.goto(`data:text/html,${html}`);
const data = await page.evaluate(() => {
const elements = document.querySelectorAll('h2,div.row span');
const list = {};
let currentKey = null;
for (const element of elements) {
if (element.tagName === 'H2') {
currentKey = element.innerText;
list[currentKey] = [];
} else {
list[currentKey].push(element.innerText);
}
}
return list;
});
console.log(data);
} catch (err) { console.error(err); } finally { await browser.close(); }