我试图在https://www.realestate.com.au/sold/in-brisbane+-+greater+region,+qld/list-1上抓取一些真实的地产数据。调用fetch('https://www.realestate.com.au/sold/in-brisbane+-+greater+region,+qld/list-1')
,返回以下错误:
[scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.realestate.com.au/sold/in-brisbane+-+greater+region,+qld/list-1> (failed 1
times): 429 Unknown Status
有没有人知道如何绕过这个?我试过玩settings.py
中的设置,但无济于事。
2条答案
按热度按时间yfjy0ee71#
在这种情况下,最有可能的是网站正在阻止您的请求,因为它无法识别您。
使用Request Headers和Cookies-这是一种反禁令的方法来刮网站,这被称为
Reverse Engineering the request
.所以你打开浏览器工具,并复制粘贴- Request Headers,Cookies.使用它来请求网站.下面的代码为我工作(检查代码下面的截图)。让我知道,如果有任何其他的疑问。
快乐刮痧:)
bxjv4tth2#
我不认为返回的错误429与实际请求太多有关,但它确实是一种防刮擦措施。也就是说,我可以通过请求获得数据:
输出:
'<!doctype html>\n<html lang="en-AU">\n<head>\n <meta charset="utf-8"/>\n <meta http-equiv="X-UA-Compatible" content="IE=edge" />\n <meta name="viewport" content="width=device-width,initial-scale=1,minimum-scale=1">\n <meta name="format-detection" content="telephone=no">\n <title data-react-helmet="true">Sold Property Prices & Auction Results in Brisbane - Greater Region, QLD - realestate.com.au</title> <link data-react-helmet="true" rel="canonical" href="https://www.realestate.com.au/sold/in-brisbane+-+greater+region,+qld/list-1"/><link data-react-helmet="true" href="https://m.realestate.com.au/sold/in-brisbane+-+greater+region,+qld/list-1" rel="alternate" media="only screen and (max-width: 640px)"/><link data-react-helmet="true" rel="next" href="https://www.realestate.com.au/sold/in-brisbane+-+greater+region,+qld/list-2"/> <meta data-react-helmet="true" name="description" content="282214 sold properties in Brisbane - Greater Region, QLD. View the latest property sold prices and auction results in Brisbane - Greater Region with realestate.com.au."/> <script data-react-helmet="true" type="application/ld+json">[{"@context":"http://schema.org","@type":"Residence","address":{"@type":"PostalAddress","addressLocality":"Kangaroo Point","addressRegion":"QLD","postalCode":"4169","streetAddress":"14/10 Park Avenue"},"name":"14/10 Park Avenue"},{"@context":"http://schema.org","@type":"Residence","address":{"@type":"PostalAddress","addressLocality":"Nundah","addressRegion":"QLD","postalCode":"4012","streetAddress":"3/38 Franklin Street"},"name":"3/38 Franklin Street"},{"@context":"http://schema.org","@type":"Residence","address":{"@type":"PostalAddress","addressLocality":"Bracken Ridge","addressRegion":"QLD","postalCode":"4017","streetAddress":"22 Rinnicrew Street"},"name":"22 Rinnicrew Street"},{"@context":"http://schema.org","@type":"Residence","address":{"@type":"PostalAddress","addressLocality":"Stafford","addressRegion":"QLD","postalCode":"4053","streetAddress":"8/66 Gamelin Crescent"},"name":"8/66 Gamelin Crescent"},{"@context":"http://schema.org","@type":"Residence","address":{"@type":"PostalAddress","addressLocality":"Karana Downs","addressRegion":"QLD","postalCode":"4306","streetAddress":"6 Illawong Way"},"name":"6 Illawong Way"},{"@context":"http://schema.org","@type":"Residence","address":{"@type":"PostalAddress","addressLocality":"Durack","addressRegion":"QLD","postalCode":"4077","streetAddress":"9/80 Cintra Street"},"name":"9/80 Cintra Street"},{"@context":"http://schema.org","@type":"Residence","address":{"@type":"PostalAddress","addressLocality":"South Brisbane","addressRegion":"QLD","postalCode":"4101","streetAddress":"10809/22 Merivale Street"},"name":"10809/22 Merivale Street"},{"@context":"http://schema.org","@type":"Residence","address":{"@type":"PostalAddress","addressLocality":"Wavell Heights","addressRegion":"QLD","postalCode":"4012","streetAddress":"4/7 Rode Road"},"name":"4/7 Rode Road"},{"@context":"http://schema.org","@type":"Residence","address":
也就是说,您可能不需要所有的标头条目或cookie条目。我的建议是查看您需要的内容,并将其添加到您的请求中。