问题描述
我的代码是 this script 的变体。我基本上是尝试在页面底部滚动并获取所有 HTML 代码。我在 2 个网站上对此进行了测试,它在 http://quotes.toscrape.com/scroll 上运行良好,但在另一个网站 (https://scrapingclub.com/exercise/list_infinite_scroll/) 上它滚动到一个点然后停止。
phantom.casperPath = "../../casperjs/";
phantom.injectJs(phantom.casperPath + "/bin/bootstrap.js");
var casper = require('casper').create({
viewportSize: { width: 1680,height: 1050 }
});
casper.userAgent("Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML,like Gecko) Version/14.0 Safari/605.1.15");
casper.start('https://scrapingclub.com/exercise/list_infinite_scroll/'); // not working
//casper.start('http://quotes.toscrape.com/scroll'); // works
var scrolled = 0; // keep track of how much we have scrolled
var scrollDelta = 0; // Keep track of how much our new scroll position differs from our last
var getContent = function() {
casper.wait(10000,function(){
casper.scrollToBottom();
var newScrolled = casper.evaluate(function() {
//return __utils__.getDocumentHeight();
return window.scrollY; // grab how far the window is scrolled
});
scrollDelta = newScrolled - scrolled;
scrolled = newScrolled;
// console.log("Now scrolled",scrolled);
// console.log("Scroll Delta",scrollDelta);
});
casper.then(function() { // After we scroll to the bottom
if (scrollDelta != 0) {
getContent(); // If scrollDelta _has_ changed,recursively call getContent
} else {
casper.then(function() {
var html = String(casper.getHTML());
this.echo(html);
});
}
});
};
getContent();
casper.run();
我尝试将 casper.scrollToBottom() 函数更改为此
this.page.scrollPosition = {
top: this.page.scrollPosition["top"] + document.body.scrollHeight,left: 0
};
但我得到了相同的结果。
解决方法
暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!
如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。
小编邮箱:dio#foxmail.com (将#修改为@)