CURL写入功能缺少数据

问题描述

我正在使用 CURLOPT_WRITEFUNCTION CURL选项来获取网站内容,以及 CURLOPT_WRITEDATA 选项来指定我的缓冲区。根据使用C ++时的文档,我必须为写回调定义具有以下签名的静态类成员函数,否则,我将遇到分段错误

size_t write_callback(char *ptr,size_t size,size_t nmemb,void *userdata)

我知道,根据文档,最大写入大小是在 CURL_MAX_WRITE_SIZE 中定义的。但是,当我只打印 ptr 内容时,得到的内容却少于 CURL_MAX_WRITE_SIZE 中指定的内容。也就是说,我大约获得了网站内容的1/3。

当我不添加 CURLOPT_WRITEFUNCTION 选项时,认情况下,CURL将使用 fwrite() stdout 打印打印数据,在这里我可以看到网页的所有内容

因此我要问;为什么会无法从网页上获取全部写入数据?

下面是一个最小的可重现示例:

#include <curl/curl.h>
#include <string>
#include <iostream>

class curl_write
{
public:
    static size_t write_data(void* ptr,void* buffer)
    {
        ((std::string*) buffer)->append((char*) ptr,nmemb);
        return nmemb;
    }
};

int main()
{
    std::string buffer;

    CURL* handle = curl_easy_init();
    curl_easy_setopt(handle,CURLOPT_nopROGRESS,1);
    curl_easy_setopt(handle,CURLOPT_NOSIGNAL,CURLOPT_HTTPAUTH,CURLAUTH_ANY);
    curl_easy_setopt(handle,CURLOPT_PROXYAUTH,CURLOPT_URL,"https://ensk.dict.cc/");
    curl_easy_setopt(handle,CURLOPT_WRITEFUNCTION,curl_write::write_data);
    curl_easy_setopt(handle,CURLOPT_WRITEDATA,&buffer);
    curl_easy_perform(handle);
    curl_easy_cleanup(handle);

    std::cout << buffer << std::endl;

    return 0;
}

这是预期输出的结尾:

<br>
<div class="hl3" style="margin:25px 0px 3px 4px;">News &amp; History (<a href="//www.dict.cc/news_en.RSS.PHP" style="color:#36c">RSS</a>)</div>
</div><div style="width:728px; padding:0px 1px">
<div class="news"><em>2020-08-11:</em> Keyboard handling improved for the <a href="https://m.dict.cc/">mobile dict.cc website</a> when used on a PC/Mac. Just type your keyword,regardless of where the cursor is. Pressing the Tab key (followed by Enter) selects from the suggestions in most browsers.</div>
<div class="news"><em>2020-07-07:</em> The new vocabulary trainer (only web,not yet in the app) Now displays a message if there are several possible answers. The message can be clicked to show the translations currently not being asked for. The trainer Now also accepts any of the other answers and moves the corresponding card to the next Box,but it will keep asking for the missing answer. To get help finding out which term is asked for,it's still possible to press the space bar for the next correct character.</div>
<div class="news"><em>2020-06-29:</em> The mobile website Now offers audio buttons for both languages,saving the additional click on terms for re-translation.</div>
<div class="news"><em>2020-03-30:</em> The new vocabulary trainer is Now officially released! The old version stays available for the time being and can be used via the "v1" or "old" links and menu items.</div>
<div class="news">» <a href="//www.dict.cc/german-english-dictionary.PHP"><b>English: News History</b></a><br>
» <a href="//www.dict.cc/deutsch-englisch-woerterbuch.PHP"><b>Deutsch: Verlauf der Mitteilungen</b></a></div>
</div><table border="0" cellpadding="2" cellspacing="0" width="728" style="margin-left:1px;height:22px"> 
        <tbody><tr><td class="td1"><a href="#top" name="bottom" style="color:white;">back to top</a> | <a href="https://ensk.dict.cc/" style="color:white;">home</a></td><td class="td1" align="right">© 2002 - 2020 <a href="https://www.hemetsberger.com" style="color:white;">Paul Hemetsberger</a> | <a href="//www.dict.cc/?s=about%3A#impressum" style="color:white;">contact / privacy</a></td></tr></tbody></table><img src="https://www4.dict.cc/img/hr4.gif" width="728" height="4" alt="" style="margin:0px 0px 0px 1px"><div class="aftertable">Slovak-English online dictionary (Anglicko-slovenský slovník) developed to help you share your kNowledge with others. <a href="//www.dict.cc/?s=about%3A">More @R_799_4045@ion</a><br>Links to this dictionary or to single translations are very welcome! <a href="//www.dict.cc/?s=about%3Afaq">Questions and Answers</a></div><script type="text/javascript">add_inputhasfocus_changer();</script><script type="text/javascript">var my_vocab_lists = [];
</script><div id="speechlayer" style="width:1px;height:1px"></div><div id="recthomebot" style="position: absolute; left: 0px; top: 0px;"><div style="text-align:center;width:300px"><span class="noline" style="font-size:10px;color:#999">Advertisement</span></div><!-- 
headerbidding part2
--><div id="snhb-detail_rectangle-0"></div><script type="text/javascript"> snhb.queue.push( function(){ snhb.startAuction(['detail_rectangle']); }); </script></div><script type="text/javascript">
            if (document.getElementById("recthome")) {
            document.getElementById("recthomebot").style.position="absolute";
            document.getElementById("recthomebot").style.left=getX(document.getElementById("recthome"))+"px";
            document.getElementById("recthomebot").style.top=getY(document.getElementById("recthome"))+"px";
            }
            </script></div><iframe name="__tcfapiLocator" style="display: none;"></iframe><iframe name="__uspapiLocator" style="display: none;"></iframe><iframe name="__cmpLocator" style="display: none;"></iframe></body></html>

与实际输出的不同之处在于,最后6.5行丢失了。

解决方

解决方案是用空格字符交换回车符。

解决方法

*((std::string*) buffer) += (char*) ptr仅在ptr都以null结尾且不包含null的情况下有效。后者取决于您要下载的内容并卷曲explicitly states,而前者则不正确。

您应该改用std::string::append

((std::string*) buffer)->append(ptr,nmemb);

文档说明:

您的回调应返回实际处理的字节数。如果该金额与传递给回调函数的金额不同,则会向库发出错误状态信号。这将导致传输中止

return nmemb + 1;

应该是

return nmemb;

否则,传输将中止。

,

您将通过多个回调获取数据,并且std :: string不需要明确地以null终止。像这样:

#include <curl/curl.h>
#include <string>
#include <iostream>

class curl_write
{
public:
    static size_t write_data(void* ptr,size_t size,size_t nmemb,void* buffer)
    {
        ((std::string*) buffer)->append((char*) ptr,nmemb);

        return nmemb;
    }
};

int main()
{
    std::string buffer;

    CURL* handle = curl_easy_init();
    curl_easy_setopt(handle,CURLOPT_NOPROGRESS,1);
    curl_easy_setopt(handle,CURLOPT_NOSIGNAL,CURLOPT_HTTPAUTH,CURLAUTH_ANY);
    curl_easy_setopt(handle,CURLOPT_PROXYAUTH,CURLOPT_URL,"https://ensk.dict.cc/");
    curl_easy_setopt(handle,CURLOPT_WRITEFUNCTION,curl_write::write_data);
    curl_easy_setopt(handle,CURLOPT_WRITEDATA,&buffer);
    curl_easy_perform(handle);
    curl_easy_cleanup(handle);
    std::cout << buffer << std::endl;

    return 0;
}