我只需要用字母A擦table。我的代码是这样的:
class ChallengeSpider(scrapy.Spider):
name = "challenge"
allowed_domains = ["laws.bahamas.gov.bs"]
start_urls = ["http://laws.bahamas.gov.bs/cms/en/legislation/acts.html"]
字符串
问题是当我解析页面时,html元素出现在输出中。这是我的parse
函数。
def parse(self, response):
css_selector = ".hasTip"
rows = response.css(css_selector)
for row in rows:
title = row.css(".hasTip").get()
source_url = row.css(".hasTip").get()
date = row.css(".hasTip").get()
yield {
"title": title,
"source_url": source_url,
"date": date,
}
型
输出为:
[
{"title": "<div id=\"alphabet\" class=\"hasTip\" title=\"Alphabetical Selection\" rel=\"\n\t\t Click on one of the alphabetical buttons to select all Acts commencing with that letter. The selection will 'stick' even if you navigate to another page.\">\n <input type=\"submit\" id=\"submitX\" name=\"submit4\" class=\"btn btn-primary\" value=\"A\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"B\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"C\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"D\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"E\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"F\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"G\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"H\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"I\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"J\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"K\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"L\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"M\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"N\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"O\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"P\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"Q\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"R\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"S\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"T\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"U\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"V\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"W\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"X\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"Y\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"Z\"> <input type=\"hidden\" name=\"pointintime\" value=\"2023-07-26 00:00:00\">\n </div>", "source_url": "<div id=\"alphabet\" class=\"hasTip\" title=\"Alphabetical Selection\" rel=\"\n\t\t Click on one of the alphabetical buttons to select all Acts commencing with that letter. The selection will 'stick' even if you navigate to another page.\">\n <input type=\"submit\" id=\"submitX\" name=\"submit4\" class=\"btn btn-primary\" value=\"A\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"B\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"C\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"D\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"E\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"F\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"G\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"H\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"I\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"J\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"K\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"L\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"M\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"N\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"O\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"P\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"Q\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"R\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"S\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"T\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"U\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"V\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"W\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"X\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"Y\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"Z\"> <input type=\"hidden\" name=\"pointintime\" value=\"2023-07-26 00:00:00\">\n </div>", "date": "<div id=\"alphabet\" class=\"hasTip\" title=\"Alphabetical Selection\" rel=\"\n\t\t Click on one of the alphabetical buttons to select all Acts commencing with that letter. The selection will 'stick' even if you navigate to another page.\">\n <input type=\"submit\" id=\"submitX\" name=\"submit4\" class=\"btn btn-primary\" value=\"A\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"B\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"C\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"D\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"E\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"F\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"G\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"H\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"I\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"J\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"K\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"L\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"M\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"N\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"O\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"P\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"Q\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"R\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"S\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"T\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"U\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"V\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"W\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"X\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"Y\"><input type=\"submit\" id=\"submit4\" name=\"submit4\" class=\"btn\" value=\"Z\"> <input type=\"hidden\" name=\"pointintime\" value=\"2023-07-26 00:00:00\">\n </div>"},
{"title": "<td class=\"hasTip minColumn hidden-phone\" title=\"Notes Relating to this Statute\" rel=\"\n
]
型
我需要做的是将http://laws.bahamas.gov.bs
添加到pdf文件的url中,并清理我抓取的数据。我还需要做什么才能得到我需要的?
1条答案
按热度按时间hgc7kmma1#
看起来你得到的比你想用CSS选择器得到的要多。
.hasTip
是一个存在于表的每个单元格中的类。所以每一行都是不同的值。我想你可以这样做来获取所有感兴趣的行:
字符串
然后,在遍历每一行时,您可以像这样获得所需的信息:
型
希望这对你有帮助!