scrapy 我想将我的url存储在变量名“url”中,以便将url保存在excel工作表csv中

4dbbbstv  于 2022-11-09  发布在  其他
关注(0)|答案(1)|浏览(134)

我想将我的url存储在变量名“url”中,以将url保存在Excel工作表CSV中,但在赋值之前给我引用了unboundlocalerror局部变量“url”。
类新闻蜘蛛(scrapy.spider):name =“文章”

def start_requests(self):
    url = input("Enter the article url: ")

    yield scrapy.Request(url, callback=self.parse_dir_contents)

def parse_dir_contents(self, response):
    url = url
    yield{
    'Category':Category,
    'Headlines':Headlines,
    'Author': Author,
    'Source': Source,
    'Publication Date': Published_Date,
    'Feature_Image': Feature_Image,
    'Skift Take': skift_take,
    'Article Content': Content
    }
        # =============== Data Store +++++++++++++++++++++
    Data = [[Category,Headlines,Author,Source,Published_Date,Feature_Image,Content,url]]
    try:
        df = pd.DataFrame (Data, columns = ['Category','Headlines','Author','Source','Published_Date','Feature_Image','Content','URL'])
        print(df)
        with open('C:/Users/Public/pagedata.csv', 'a') as f:
            df.to_csv(f, header=False)
    except:
        df = pd.DataFrame (Data, columns = ['Category','Headlines','Author','Source','Published_Date','Feature_Image','Content','URL'])
        print(df)
        df.to_csv('C:/Users/Public/pagedata.csv', mode='a')
ws51t4hk

ws51t4hk1#

您可以只调用response.url而不是url = url

url = response.url

def parse_dir_contents(self, response):

    yield{
    'Category':Category,
    'Headlines':Headlines,
    'Author': Author,
    'Source': Source,
    'Publication Date': Published_Date,
    'Feature_Image': Feature_Image,
    'Skift Take': skift_take,
    'Article Content': Content,
    'url': response.url
    }

相关问题