我在lottery.net/powerball/numbers/#year上使用了基本上相同的代码,没有问题。为什么这次它不工作了?我已经修改了所有我需要做的信息,如链接和XPath的差异。
import scrapy
class MegaMillionsDrawingsSpider(scrapy.Spider):
name = 'mega_millions_drawings'
allowed_domains = ['www.lottery.net']
user_agent = # my user agent
def start_request(self):
start_urls = []
for i in reversed(range(1996,2023)):
current_url = 'http://www.lottery.net/mega-millions/numbers/'+ str(i)
start_urls.append(current_url)
for url in start_urls:
yield scrapy.Request(
url=url,
callback=self.parse,
headers={
'User-Agent': self.user_agent
}
)
def parse(self, response):
from scrapy.shell import inspect_response
inspect_response(response, self)
for drawing in response.xpath("//table[@class='prizes archive ']/tbody/tr"):
yield {
'date': drawing.xpath(".//td/a/text()[2]").get(),
#'url': response.urljoin(drawing.xpath(".//")).get(),
'first': drawing.xpath(".//td/ul[@class='multi results mega-millions']/li[@class='ball'][position() = 1]/text()").get(),
'second': drawing.xpath(".//td/ul[@class='multi results mega-millions']/li[@class='ball'][position() = 2]/text()").get(),
'third': drawing.xpath(".//td/ul[@class='multi results mega-millions']/li[@class='ball'][position() = 3]/text()").get(),
'fourth': drawing.xpath(".//td/ul[@class='multi results mega-millions']/li[@class='ball'][position() = 4]/text()").get(),
'fifth': drawing.xpath(".//td/ul[@class='multi results mega-millions']/li[@class='ball'][position() = 5]/text()").get(),
'mega-ball': drawing.xpath(".//td/ul[@class='multi results mega-millions']/li[@class='mega-ball']/text()").get()
}
1条答案
按热度按时间jrcvhitl1#
有几个问题,我可以看到。您的一些xpath表达式是关闭的,缩进是远远关闭,您使用的是
http
而不是https
。使用我在下面的示例中对格式和示例方法所做的轻微修改将修复这些问题。