如何在Scrapy结束时运行代码或因出错而再次运行

vsaztqbk  于 2022-11-09  发布在  其他
关注(0)|答案(1)|浏览(321)

如果有错误文件,我会尝试再次运行scrapy,因为我正在废弃amazon,有时它会阻塞请求,所以我必须使用try/except将相应的URL保存到指定的错误文件中,这样做很好,但如果有错误文件,我如何再次运行scrapy?我是否应该为此创建另一个脚本?
错误文件如下所示:

URL
https://www.amazon.com/dp/B09XRCVVNG
https://www.amazon.com/dp/B097PZT7J3
......
https://www.amazon.com/dp/B0881YZJ45
https://www.amazon.com/dp/B01N6SEXI5

主文件:

ISBN,PERMALINK,Main Link,Brand,Price
B085K...,Razer-Raptor-..ble,https://www.a...47.FM,Razer,$619.95
B085...,Razer-Rap...e,https://www.am....,Razer,$619.95
B095...,Razer-...e,https://www.amazon.com/..,$797.49
B087...,A....r,https://www.amazon.com/A...,Alienware,

我是一个初学者在scrapy和我不知道该搜索什么。请帮助我出一个工作的解决方案。

jm81lzqq

jm81lzqq1#

我建议使用一个列表,弹出这个列表的第一个元素,如果这个元素是一个错误,那么把这个元素附加到同一个列表中。你只需要控制你允许的错误的数量。

import requests as r
import pandas as pd

max_error = 5

print("Start")

df_urls = pd.read_csv('./input/urls.csv')
print("DataFrame")
print(df_urls)

urls = list(df_urls.values)
print("Vector")
print(urls)

error_list = []
error_count = 0
while len(urls) >0 and error_count <= max_error:
    try:
        url = urls.pop(0)[0]
        x = r.get(url)
        print(x.status_code)
    except:
        urls.append(url)
        print("Error url:" + str(url))
        error_count += 1

print("Number of error: " + str(error_count))
print("Finish")

URL文件
网址
如何在Scrapy结束时运行代码或因出错而再次运行
如何在Scrapy结束时运行代码或因出错而再次运行
https://stackoverhflow.com/questions/9399900358/is-there-a-decent-alternative-to-
输出量:

Start
DataFrame
                                                urls
0  https://stackoverflow.com/questions/73003565/h...
1  https://stackoverflow.com/questions/73003565/h...
2  https://stackoverhflow.com/questions/939990035...
Vector
[array(['https://stackoverflow.com/questions/73003565/how-do-run-code-when-scrapy-ended-or-run-again-for-error'],
      dtype=object), array(['https://stackoverflow.com/questions/73003565/how-do-run-code-when-scrapy-ended-or-run-again-for-error'],
      dtype=object), array(['https://stackoverhflow.com/questions/9399900358/is-there-a-decent-alternative-to-dynamic-cast'],
      dtype=object)]
200
200
Error url:https://stackoverhflow.com/questions/9399900358/is-there-a-decent-alternative-to-dynamic-cast
Error url:h
Error url:h
Error url:h
Error url:h
Error url:h
Number of error: 6
Finish

Process finished with exit code 0

相关问题