在< wbr>scrapy中从锚标记中获取文本分隔符

5fjcxozz  于 2023-04-06  发布在  其他
关注(0)|答案(1)|浏览(162)

如何获取标签中的所有文本,即:"Digital Business Designer (m/w/d)"从这样的标签

<a class="title">Digital Business Designer (m/<wbr>w/<wbr>d)</a>

我试过下面的代码,但它只返回"Digital Business Designer (m/"

async def parse(self, response):
        programs = response.css('#programslist')
        for program in programs.css('.title'):
            title = program.css('::text').get()
            title = re.sub(r'<wbr>', '', title)
            yield {'title': title}
pjngdqdw

pjngdqdw1#

当与getall()组合时,可以使用xpath //text()指令获取列表中元素及其所有子元素的内部文本。然后可以使用''.join()将文本组合回单个字符串。
例如:

from scrapy.http.response.text  import TextResponse

def parse(response):
    lst = response.xpath("//a[@class='title']//text()").getall()
    text = "".join(lst)
    print(text)

doc = """
<html>
    <body>
        <a class="title">Digital Business Designer (m/<wbr>w/<wbr>d)</a>
    </body>
</html>
""".encode("utf8")

response = TextResponse("url", body=doc)
parse(response)

输出

Digital Business Designer (m/w/d)

相关问题