我是scrappy和python的新手,我正在用playwright方法从www.example.com中删除数据,它返回了(引用者:Aliexpress.com with playwright method and it returns (referer: None): Here is my code
class AliSpider(scrapy.Spider):
name = "aliex"
def start_requests(self):
# GET request
search_value = 'phones'
yield scrapy.Request(f"https://www.aliexpress.com/premium/{search_value}.html?spm=a2g0o.productlist.1000002.0&initiative_id=SB_20230118063054&dida=y",
meta=dict(
playwright= True,
playwright_include_page = True,
playwright_page_coroutines =[
PageMethod('wait_for_selector', '.list--gallery--34TropR')
]
))
async def parse(self, response):
for data in response.xpath("//h1"):
related_link = data.xpath(".//text()").get()
yield{
'related_link':related_link
}
我越来越
2023-01-18 19:56:55 [scrapy.core.engine] DEBUG: Crawled (200) <GET https://www.aliexpress.com/wholesale?SearchText=phones&spm=a2g0o.productlist.1000002.0&initiative_id=SB_20230118063054&dida=y> (referer: None)
2023-01-18 19:56:55 [scrapy.core.engine] INFO: Closing spider (finished)
我尝试了xpath和css选择器,但结果相同。任何人都可以帮助我
1条答案
按热度按时间qf9go6mv1#
这是一个完整的解决方案,使用独立的playwright和python,python可以在windows下工作。网站通过JavaScript动态加载数据,这就是为什么我使用**page. evaluate()**方法来执行JavaScript并滚动整个页面,否则,它不会抓取完整的ResultSet。