python-3.x 使用CSS选择器和Scrapy获取子标签的文本时不返回任何内容

z0qdvdin 于 2022-12-05 发布在 Python

关注(0)|答案(1)|浏览(180)

虽然这是一个非常常见的问题，我已经尝试了许多不同的方法来从下面的html代码递归地废弃所有的文本，但由于某种原因，它们都没有工作：

<span class="coupon__logo coupon__logo--for-shops">



      <span class="amount"><b>20</b>%</span>

      <span class="type">Cupom</span>


</span>

我尝试过：

p.css('span.coupon__logo coupon__logo--for-shops *::text').get()

p.css('span.amount ::text').get()

p.css('span.amount *::text').get()

甚至还有xpath：

p.xpath('//span[@class="coupon__logo coupon__logo--for-shops"]//text()').get()
p.xpath('//span[@class="amount"]//text()').get()

我得到的最好的东西是p.css('span.amount *::text').getall()，但它会从所有的并发事件中提取文本，这需要我创建一个代码来分别组织它们，而如果我能只得到当前示例的文本，那就更好了，特别是因为我正在循环通过许多示例，而且它很容易受到网站的任何更改。

python-3.x

来源：https://stackoverflow.com/questions/74678565/getting-text-of-children-tags-with-css-selector-with-scrapy-returns-nothing

1条答案

按热度按时间

yzuktlbb1#

您可以获取特定子级的文本，而不是获取<span class="coupon__logo coupon__logo--for-shops">的所有子级的所有文本。
CSS：

scrapy shell file:///path/to/file.html

In [1]: ' '.join(response.css('span.coupon__logo.coupon__logo--for-shops span *::text').getall())
Out[1]: '20 % Cupom'

路径：

scrapy shell file:///path/to/file.html

In [1]: ' '.join(response.xpath('//span[@class="coupon__logo coupon__logo--for-shops"]/span//text()').getall())
Out[1]: '20 % Cupom'

如果您有多个span标签，而您只需要amount和type，则可以使用以下命令：
CSS：

scrapy shell file:///path/to/file.html

In [1]: ' '.join(response.css('span.coupon__logo.coupon__logo--for-shops span.amount *::text, span.type::text').getall())
Out[1]: '20 % Cupom'

路径：

scrapy shell file:///path/to/file.html

In [1]: ' '.join(response.xpath('//span[@class="coupon__logo coupon__logo--for-shops"]/span[@class="amount" or @class="type"]//text()').getall())
Out[1]: '20 % Cupom'

赞(0）回复(0）举报 2022-12-05

我来回答

python-3.x 使用CSS选择器和Scrapy获取子标签的文本时不返回任何内容

1条答案

相关问题

热门标签

最新问答