html 我需要用scrapy和python抓取网页，但无法解析地址以找到要提取的内容

inn6fuwd 于 2022-12-09 发布在 Python

关注(0)|答案(1)|浏览(121)

I am trying to get the data from a job proposal page using Python in Jupyter, the page is this: computrabajo
I have managed to obtain the title, company and qualification using a guide but when I want to obtain the description of the proposal, the following appears:

I think the reason is because I am not placing the route correctly or whatever it is called in the following code (DESCRIPTION_SELECTOR AND extract_first()):

def parse(self, response):
    SET_SELECTOR = '.box_border'
    for brickset in response.css(SET_SELECTOR):
        NAME_SELECTOR = 'h1 ::text'
        EMPRESA_SELECTOR = './/p[text()]/a/text()'
        CALIFICACIÓN_SELECTOR = './/p[text()]/span/text()'
        DESCRIPTION_SELECTOR = './/p[text()]/text()'
        yield {
            'name': brickset.css(NAME_SELECTOR).extract_first(),
            'empresa': brickset.xpath(EMPRESA_SELECTOR).extract_first(),
            'calificacion': brickset.xpath(CALIFICACIÓN_SELECTOR).extract_first(),
            'descripcion': brickset.xpath(DESCRIPTION_SELECTOR).extract_first()
        }

This is what I want to get, if I use the extract it extracts everything, but at least I know that it is possible to extract
If it is not too much trouble, if someone knew how to save those records that I get in a csv as with beautiful soup it would help me a lot.

Html

来源：https://stackoverflow.com/questions/70206137/i-need-to-scrape-a-web-with-scrapy-and-python-but-i-cant-resolve-the-address-to

1条答案

按热度按时间

1u4esq0p1#

您的SET_SELECTOR下有多个<p>标记。
尝试使用更具体的xpath选择器，如：

.//p[@class='fc_aux t_word_wrap mb10 hide_m']/text()

赞(0）回复(0）举报 2022-12-09

我来回答

html 我需要用scrapy和python抓取网页，但无法解析地址以找到要提取的内容

1条答案

相关问题

热门标签

最新问答