html 我需要用scrapy和python抓取网页,但无法解析地址以找到要提取的内容

inn6fuwd  于 2022-12-09  发布在  Python
关注(0)|答案(1)|浏览(120)

I am trying to get the data from a job proposal page using Python in Jupyter, the page is this: computrabajo
I have managed to obtain the title, company and qualification using a guide but when I want to obtain the description of the proposal, the following appears:

I think the reason is because I am not placing the route correctly or whatever it is called in the following code (DESCRIPTION_SELECTOR AND extract_first()):

def parse(self, response):
    SET_SELECTOR = '.box_border'
    for brickset in response.css(SET_SELECTOR):
        NAME_SELECTOR = 'h1 ::text'
        EMPRESA_SELECTOR = './/p[text()]/a/text()'
        CALIFICACIÓN_SELECTOR = './/p[text()]/span/text()'
        DESCRIPTION_SELECTOR = './/p[text()]/text()'
        yield {
            'name': brickset.css(NAME_SELECTOR).extract_first(),
            'empresa': brickset.xpath(EMPRESA_SELECTOR).extract_first(),
            'calificacion': brickset.xpath(CALIFICACIÓN_SELECTOR).extract_first(),
            'descripcion': brickset.xpath(DESCRIPTION_SELECTOR).extract_first()
        }

This is what I want to get, if I use the extract it extracts everything, but at least I know that it is possible to extract
If it is not too much trouble, if someone knew how to save those records that I get in a csv as with beautiful soup it would help me a lot.

1u4esq0p

1u4esq0p1#

您的SET_SELECTOR下有多个<p>标记。
尝试使用更具体的xpath选择器,如:

.//p[@class='fc_aux t_word_wrap mb10 hide_m']/text()

相关问题