Re: PHP curl函数模拟爬虫(操作cookie)
curl的cookie怎么使用?新手都很头疼的,curl的参数太多了,其中cookie部分就涉及了4个。
当然了,手册上明白写的curl的cookie是3个,但是嘛,不是还有个header的参数嘛,里面可以包含cookie.
七巧游戏网(http://7game.net.cn/)的采集是基于curl库的。
curl非常的好用。最主要的是要熟悉curl_setopt的用法。
curl_setopt ($ch, CURLOPT_COOKIE , $cookie );
这里的cookie值要用;分开的,而不是用&。也不需要用urlencode编码,当然了,编码了就更好。
$cookie = “a=b;c=d;name=方世玉”;
注意使用这个的时候,不可在 curl_setopt ($ch, CURLOPT_HTTPHEADER , $header );
的$header里包含Cookie参数,否则会重叠,造成cookie不可预见的情况发生。
如下代码是用来分析IE里的cookie的,就是c:\document …..\cookie里的文本
[code]function join_cookie($cook)
{
foreach( $cook as $k=>$v )
{
$d[] =$k.”=”.$v;
}
$data = implode(“;”,$d);
return $data;
}
function pase_cookie($cookFile,$encode=true)
{
$cookie = file_get_contents ( $cookFile );
$citem = explode(“*\n”,$cookie); //pr($citem);
foreach( $citem as $c )
{
list($ckey,$cvalue) = explode(“\n”,$c);
if($ckey!=”)$cook[$ckey] = $cvalue;
}
return $cook;
//pr($cook);
}
++++++++++++++++++++++++++++++++++++++++
$cookie_jar = ‘cookie.txt ‘;
$data = array( ‘mvfAdminMonths ‘ => array( ’200706 ‘, ’200707 ‘),
‘mvfSiteProvinces ‘ => ‘Beijing ‘,
‘whichFirst ‘ => ‘AS ‘,
‘__act ‘ => ‘__id.22.SeatsQuery.adp.actList ‘,
‘submit.x ‘=> ’28 ‘,
‘submit.y ‘=> ’9 ‘);
$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, ‘http://toefl.etest.edu.cn/cn/SeatsQuery ‘);
curl_setopt($ch, CURLOPT_POSTFIELDS,$data);
curl_setopt($ch, CURLOPT_RETURNTRANSFER,false);
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_jar);
echo curl_exec($ch);
sleep(30);
$data = array( ‘mvfAdminMonths ‘ => ’200706 ‘,
‘mvfAdminMonths ‘ => ’200707 ‘,
‘mvfSiteProvinces ‘ => ‘Beijing ‘,
‘whichFirst ‘ => ‘AS ‘,
‘__act ‘ => ‘__id.22.SeatsQuery.adp.actList ‘,
‘submit.x ‘=> ’28 ‘,
‘submit.y ‘=> ’9 ‘);
$ch2 = curl_init();
curl_setopt($ch2, CURLOPT_URL, ‘http://toefl.etest.edu.cn/cn/SeatsQuery ‘);
curl_setopt($ch2, CURLOPT_POSTFIELDS,$data);
curl_setopt($ch2, CURLOPT_RETURNTRANSFER,false);
curl_setopt($ch2, CURLOPT_COOKIEFILE, $cookie_jar);
echo curl_exec($ch2);
curl_close($ch);[/code]
获取cookie要指定cookie文件位置
类似于这样
curl_setopt($ch, CURLOPT_COOKIEJAR, $cookie_jar);
指定获取cookie位置
curl_setopt($ch2, CURLOPT_COOKIEFILE, $cookie_jar);
指定要传送的cookie的位置