我使用来自此答案https://stackoverflow.com/a/46834320/12616388的PHP cURL代码。当我在localhost上运行该脚本时,我得到了所需的输出。如果我从Web服务器运行它,我将检索验证码以验证我不是机器人。我是此主题的新手,希望了解原因。我的代码:
$request = array();
//$request[] = 'host:www.amazon.com';
$request[] = 'Connection: keep-alive';
$request[] = 'Pragma: no-cache';
$request[] = 'Cache-Control: no-cache';
$request[] = 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8';
$request[] = 'User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:98.0) Gecko/20100101 Firefox/98.0';//Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.101 Safari/537.36';
$request[] = 'DNT: 1';
$request[] = 'Accept-Encoding: gzip, deflate';
$request[] = 'Accept-Language: en-US,en;q=0.8';
$url = 'https://www.amazon.de/Wenn-Dunkeln-Sterne-funkeln-Puste-Licht-Buch/dp/3480236529/ref=sr_1_3?keywords=buch&qid=1670662644&sr=8-3';
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_POST, false);
curl_setopt($ch, CURLOPT_HTTPHEADER, $request);
curl_setopt($ch, CURLOPT_ENCODING,"");
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);
curl_setopt($ch, CURLOPT_TIMEOUT,10);
curl_setopt($ch, CURLOPT_FAILONERROR,true);
$output = curl_exec($ch);
编辑:我稍微修改了代码(随机用户代理字符串和循环中的多个cURL请求),但问题是相同的:在本地主机上没有问题,在Web服务器上,我得到了验证码)。
$user_agents = array('Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/14.1.1 Safari/605.1.15', 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.67 Safari/537.36', 'Mozilla/5.0 (Windows NT 5.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.101 Safari/537.36', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.114 Safari/537.36', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:89.0) Gecko/20100101 Firefox/89.0', 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_1) AppleWebKit/537.36 (K HTML, like Gecko) Chrome/61.0.3163.100 Safari/537.36', 'Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:59.0) Gecko/20100101 Firefox/59.0', 'Mozilla/5.0 (X11; Linux x86_64; rv:58.0) Gecko/20100101 Firefox/58.0', 'Mozilla/5.0 (Windows NT 6.2; WOW64; rv:17.0) Gecko/20100101 Firefox/17.0');
foreach ($products as $key => $value) {
$request = array();
$request[] = 'Connection: keep-alive';
$request[] = 'Pragma: no-cache';
$request[] = 'Cache-Control: no-cache';
$request[] = 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8';
$request[] = 'User-Agent: ' . $user_agents[array_rand($user_agents)];
$request[] = 'DNT: 1';
$request[] = 'Accept-Encoding: gzip, deflate';
$request[] = 'Accept-Language: en-US,en;q=0.8';
$url = $value['url'];
$ch = curl_init($url);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false);
curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_POST, false);
curl_setopt($ch, CURLOPT_HTTPHEADER, $request);
curl_setopt($ch, CURLOPT_ENCODING,"");
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, 10);
curl_setopt($ch, CURLOPT_TIMEOUT,10);
curl_setopt($ch, CURLOPT_FAILONERROR,true);
$output = curl_exec($ch);
...
}
2条答案
按热度按时间jyztefdp1#
因为只有当你在服务器上时才会被触发,验证码可能会跟踪IP地址。有没有可能是一个Recaptcha?
无论验证码是什么,有一件事可以帮助解决验证码从网络服务器的IP地址。
如果网络服务器有桌面环境,通过VNC(或任何你通常用来连接的东西)连接,打开浏览器并解出验证码。
如果没有,请尝试在Web服务器上设置VPN服务器(this one似乎很容易),从您的计算机连接到VPN(从而获得与Web服务器相同的IP地址),打开浏览器并解析验证码。
另一个选择是创建一个代理服务器,这将实现类似的结果VPN.
可悲的是,你将不得不不时地这样做,因为这正是验证码的作用-防止自动报废的网站通过机器人。
ndh0cuux2#
要解决此问题,您可以尝试在cURL请求中包含其他标头或Cookie,以使其看起来更像真实的用户。例如,您可以包含User-Agent标头以指定cURL请求来自的浏览器和操作系统,还可以包含
Cookie
标头以包含通常由真实用户发送的Cookie。例如: