如何使用scrapy刮多个表？

raogr8fs 于 2022-11-09 发布在其他

关注(0)|答案(1)|浏览(123)

你好，我试图从这个网站https://hs.e-to-china.com中抓取多个表，我想循环通过表并获得所需的信息。问题是它只抓取第一个表，并重复它多次，因为在该页中有表。我的问题是我如何从表到下一个表。下面是我使用的代码：

tables = response.xpath('//*[@class="tax-table"]').extract()
for table in tables:
                    hs_code = response.xpath('//*[@class="hs-code"]//code/text()').extract_first()
                    Unit = response.xpath('//*[@class="tax-table"]//tr[1]//td[1]/text()').extract_first()
                    Gen_General_Tariff_Rate = response.xpath('//*[@class="tax-table"]//tr[1]//td[2]/text()').extract_first()
                    MFN_Most_favored_Nation = response.xpath('//*[@class="tax-table"]//tr[1]//td[3]/text()').extract_first()

                    TaxVAT_Value_added_Tax = response.xpath('//*[@class="tax-table"]//tr[2]//td[1]/text()').extract_first()
                    Additional_Tariff_on_US_Imports =  response.xpath('//*[@class="tax-table"]//tr[2]//td[2]/text()').extract_first()
                    Export_Tax_Rebate = response.xpath('//*[@class="tax-table"]//tr[2]//td[3]/text()').extract_first()

                    Regulations_and_Restrictions = response.xpath('//*[@class="tax-table"]//tr[3]//td[1]/text()').extract_first()
                    Inspection_and_Quarantine = response.xpath('//*[@class="tax-table"]//tr[3]//td[2]/text()').extract_first()
                    Consumption_Tax = response.xpath('//*[@class="tax-table"]//tr[3]//td[3]/text()').extract_first()

                    FTA_Free_Trade_Agreement_Tax = response.xpath('//*[@class="tax-table"]//tr[4]//td[1]/text()').extract_first()
                    CCC_Certificate = response.xpath('//*[@class="tax-table"]//tr[4]//td[2]/text()').extract_first()
                    In_Quota_on_Imported_Goods = response.xpath('//*[@class="tax-table"]//tr[4]//td[3]/text()').extract_first()

                    IT_Origin_Country_Tariff = response.xpath('//*[@class="tax-table"]//tr[5]//td[1]/text()').extract_first()
                    Anti_Dumping_Anti_Subsidy = response.xpath('//*[@class="tax-table"]//tr[5]//td[2]/text()').extract_first()

scrapy

来源：https://stackoverflow.com/questions/72212719/how-to-scrape-multiple-tables-using-scrapy

1条答案

按热度按时间

htzpubme1#

使用以//开头的XPath时要小心
这将通知引擎从根目录启动。如果您在循环中，则从.//开始使用当前上下文
所以我们没有

hs_code = response.xpath('//*[@class="hs-code"]//code/text()').extract_first()

用途：

hs_code = response.xpath('.//*[@class="hs-code"]//code/text()').extract_first()

赞(0）回复(0）举报 2022-11-09

我来回答

如何使用scrapy刮多个表？

1条答案

相关问题

热门标签

最新问答