PHP curl realizes multi process and high efficiency collection crawler
PHP curl to implement multi process concurrent grabbing data we often use. The details of the details are as follows.
.lt; script.gt; EC (2);.Lt; /script.gt;
code
run effect
main encapsulation function
multi_process ();
create sub processes based on parameters.
highlight function 1: a variety of abnormal exits of the child process, such as segment fault, Allowed memory size exhausted, and so on. After interrupting a subprocess, the parent process will re - fork a new process to top up and maintain the number of subprocesses. If a child completes a task in a process (such as judging TID to 10000), it can be exit (9) in a subprocess, and the parent process receives this exit state (9), and then waits for all subprocesses to exit and finally exit their own process.
highlight function 2: together with the curl package function, a statistical function is implemented, and some main statistical information will be displayed after the program closes (the bottom of Figure 2).
mp_counter ();
communicate between the parent process and all child processes, responsible for coordinating the tasks assigned to each sub process, and using the locking mechanism. You can set the 'init' parameter reset count to set the value of each update count.
curl_get ();
encapsulation of curl correlation function has added a large number of error mechanisms to support POST, GET, Cookie, Proxy, download. One of the implementation specifications of
mp_msg ();
is to output only one row of information after each task is processed.
highlight function: this function will determine the height and width of the terminal, and each screen content will display a statistical information (the purple row of Figure 1), easy to observe the execution of the program, control the length of each line, and keep one message not more than one line.
rand_exit ();
it is well known that there is an inherent leakage problem in PHP, so that every sub process executes a certain number of tasks, and multi_process () is responsible for the automatic establishment of a new subprocess (such as the green line in Figure 1).