当我想得到更少的数据没有问题,但当我想得到更多的数据,我采取错误429。我看了一个零碎的文档,但它没有帮助。我认为问题是速度。因为在6秒的响应计数是210,我不知道如何放慢它。顺便说一句,我尝试了DOWNLOAD_DELAY = [1]
,但没有工作太多。
这是代码:
class WanikaniSpider(scrapy.Spider):
name = 'japandict'
allowed_domains = ['www.wanikani.com']
url = ('https://www.wanikani.com/kanji/')
start_urls = []
kanjis = ["愛", "暗", "位", "偉", "易", "違", "育", "因", "引", "泳", "越", "園", "演", "煙", "遠", "押", "横", "王", "化", "加", "科", "果", "過", "解", "回", "皆", "絵", "害", "格", "確", "覚", "掛", "割", "活", "寒", "完", "官", "感", "慣", "観", "関", "顔", "願", "危", "喜", "寄", "幾", "期", "機", "規", "記", "疑", "議", "客", "吸", "求", "球", "給", "居", "許", "供", "共", "恐", "局", "曲", "勤", "苦", "具", "偶", "靴", "君", "係", "形", "景", "経", "警", "迎", "欠", "決", "件", "権", "険", "原", "現", "限", "呼", "互", "御", "誤", "交", "候", "光", "向", "好", "幸", "更", "構", "港", "降", "号", "合", "刻", "告", "込", "困", "婚", "差", "座", "最", "妻", "才", "歳", "済", "際", "在", "罪", "財", "昨", "察", "殺", "雑", "参", "散", "産", "賛", "残", "市", "師", "指", "支", "資", "歯", "似", "次", "治", "示", "耳", "辞", "式", "識", "失", "実", "若", "取", "守", "種", "酒"]
liste=[]
for kanji in kanjis:
liste.append(kanji)
nurl = url + kanji
start_urls.append(nurl)
file = open("n3kanji.txt","w",encoding="utf-8")
file1 = open("n3onyomi.txt","w",encoding="utf-8")
file2 = open("n3kunyomi.txt","w",encoding="utf-8")
file3 = open("n3meanings.txt","w",encoding="utf-8")
def parse(self, response):
print(response.url)
kanjiicon = response.xpath('//*[@id="main"]/body/div[1]/div[3]/div/div/header/h1/span/text()').getall()
meanings = response.xpath('//*[@id="meaning"]/div[1]/p/text()').getall()
reading = response.xpath('//*[@id="reading"]/div')
for onkun in reading:
onyomi= onkun.xpath('//*[@id="reading"]/div/div[1]/p/text()').getall()
kunyomi= onkun.xpath('//*[@id="reading"]/div/div[2]/p/text()').getall()
for x in kanjiicon:
yield{'kanjiicon': x.strip()}
self.file.write(x + "\n")
self.file.close
for y in onyomi:
yield{'onyomi': y.strip()}
self.file1.write(y + "\n" +"\r")
self.file1.close
for z in kunyomi:
yield{'kunyomi': z.strip()}
self.file2.write(z + "\n" + "\r")
self.file2.close
for m in meanings:
yield{'meanings': m.strip()}
self.file3.write(m + "\n")
self.file3.close`
谢谢你的帮助。
1条答案
按热度按时间9fkzdhlc1#
您可以通过在spider上或项目的主
settings.py
文件中设置自定义设置,使用多种方法来降低spider的速度。其中一些设置包括并发请求、下载延迟、每个域的并发请求、每个IP的并发请求、自动节流启用
例如: