我需要从亚马逊的所有链接开始这一个-
https://www.amazon.com/s?k=guess+case&crid=2Q25FH0FOTCA4&sprefix=guess+case%2Caps%2C215&ref=nb_sb_noss
但我只需要猜测的情况下。这些链接必须包含2值-“猜测”和“电话”。例如:
https://www.amazon.com/Guess-Scarlett-Collection-Hard-iPhone/dp/B00QTEP0B0/ref=sr_1_2?crid=2Q25FH0FOTCA4&keywords=guess+case&qid=1650550474&sprefix=guess+case%2Caps%2C215&sr=8-2
https://www.amazon.com/Guess-GUHCP13SPCUMABK-Marble-Collection-iPhone/dp/B09J94ZMZ3/ref=sr_1_3?crid=2Q25FH0FOTCA4&keywords=guess+case&qid=1650550474&sprefix=guess+case%2Caps%2C215&sr=8-3
我怎样才能把这些链接与帮助库重新?
start_urls = ['https://www.amazon.com/s?k=guess+case&crid=2Q25FH0FOTCA4&sprefix=guess+case%2Caps%2C215&ref=nb_sb_noss/']
rules = [Rule(LinkExtractor(allow=r'???' , ))...
1条答案
按热度按时间v440hwme1#
只需使用if语句...
如果“guess”和“phone”不在url中: