基于PHP CURL获取邮箱地址的详解
CURL可谓居家旅行必备之杀人良药,为何如此形容?就是因为他好用方便能实现页面抓取模拟登录采集等一系列功能。
记得第一次接触CURL的时候是要实现完成从邮箱用户列表的抓取。当时为了赶进度没有细细研究只是网上找了一些资料实现了功能。现在把当初的代码整理一下功能依旧能用
<div class="codetitle"><a style="CURSOR: pointer" data="46081" class="copybut" id="copybut46081" onclick="doCopy('code46081')"> 代码如下:
<div class="codebody" id="code46081">
<?
PHPerror_reporting ( 0 );
set_time_limit ( 0 );
header ( "Content-Type: text/html; charset=GB2312" );//邮箱
用户名密码
$user = 'username';
$pass = 'password';//创建
一个文件用于存放cookie信息
define ( "COOKIEJAR",tempnam ( ini_get ( "upload_tmp_dir" ),"cookie" ) );$url = '
http://reg.163.com/logins.jsp?type=1&url=http://entry.mail.163.com/coremail/fcg/ntesdoor2?lightweight%3D1%26verifycookie%3D1%26language%3D-1%26style%3D-1';
$refer = '
http://mail.163.com';
$fields_post = array ('username' => $user,'password' => $pass,'verifycookie' => 1,'style' => - 1,'product' => 'mail163','selType' => - 1,'secure' => 'on' );
$fields_string = http_build_query ( $fields_post,'&' );
$headers_login = array ('User-Agent' => 'Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.9) Gecko/2008052906 Firefox/3.0','Referer' => '
http://www.163.com' );//
登录$ch = curl_init ( $url );
curl_s
etopt ( $ch,CURLOPT_RETURNTRANSFER,true );
curl_s
etopt ( $ch,CURLOPT_HEADER,CURLOPT_CONNECTTIMEOUT,120 );
curl_s
etopt ( $ch,CURLOPT_POST,CURLOPT_REFERER,$refer );
curl_s
etopt ( $ch,CURLOPT_COOKIESESSION,CURLOPT_COOKIEJAR,COOKIEJAR );
curl_s
etopt ( $ch,CURLOPT_HTTPHEADER,$headers_login );
curl_s
etopt ( $ch,count ( $fields ) );
curl_s
etopt ( $ch,CURLOPT_POSTFIELDS,$fields_string );
$result = curl_exec ( $ch );
curl_close ( $ch );//
跳转$url = '
http://entry.mail.163.com/coremail/fcg/ntesdoor2?lightweight=1&verifycookie=1&language=-1&style=-1&username=loki_wuxi';
$headers = array ('User-Agent' => 'Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.9) Gecko/2008052906 Firefox/3.0' );$ch = curl_init ( $url );
curl_s
etopt ( $ch,$headers );
curl_s
etopt ( $ch,CURLOPT_COOKIEFILE,COOKIEJAR );
$result = curl_exec ( $ch );
curl_close ( $ch );//取得sid
preg_match ( '/sid=[^\"].
/',$result,$location );
$sid = substr ( $location [0],4,- 1 );//通讯录地址
$url = 'http://g4a30.mail.163.com/jy3/address/addrlist.jsp?sid=' . $sid . '&gid=all';
$headers = array ('User-Agent' => 'Mozilla/5.0 (Windows; U; Windows NT 5.1; zh-CN; rv:1.9) Gecko/2008052906 Firefox/3.0' );$ch = curl_init ( $url );
curl_setopt ( $ch,COOKIEJAR );
$result = curl_exec ( $ch );
curl_close ( $ch );
unlink ( COOKIEJAR );//开始抓取内容
preg_match_all ( '/<td class="Ibx_Td_addrName"><a[^>]>(.
?)<\/a><\/td><td class="Ibx_Td_addrEmail"><a[^>]>(.
?)<\/a><\/td>/i',$infos,PREG_SET_ORDER );
//1:姓名2:邮箱
print_r ( $infos );
?>
后来在CSDN上又看到别人发帖问一个获取快递查询的问题,他想把一些大的快递公司查询业务做在一个页面中,的确是个很不错的实用小工具,但是因为快递查询有验证码,不由的又让我想起了CURL利器。后来帮帖主实现功能,思路很简单,先用CURL模拟抓取验证码,然后显示到用户提交页面中,同时保存验证码的COOKIE等用户查询一起提交就保证了COOKIE的同步。源代码如下: