I am trying to get the data from a job proposal page using Python in Jupyter, the page is this: computrabajo
I have managed to obtain the title, company and qualification using a guide but when I want to obtain the description of the proposal, the following appears:
I think the reason is because I am not placing the route correctly or whatever it is called in the following code (DESCRIPTION_SELECTOR AND extract_first()):
def parse(self, response):
SET_SELECTOR = '.box_border'
for brickset in response.css(SET_SELECTOR):
NAME_SELECTOR = 'h1 ::text'
EMPRESA_SELECTOR = './/p[text()]/a/text()'
CALIFICACIÓN_SELECTOR = './/p[text()]/span/text()'
DESCRIPTION_SELECTOR = './/p[text()]/text()'
yield {
'name': brickset.css(NAME_SELECTOR).extract_first(),
'empresa': brickset.xpath(EMPRESA_SELECTOR).extract_first(),
'calificacion': brickset.xpath(CALIFICACIÓN_SELECTOR).extract_first(),
'descripcion': brickset.xpath(DESCRIPTION_SELECTOR).extract_first()
}
This is what I want to get, if I use the extract it extracts everything, but at least I know that it is possible to extract
If it is not too much trouble, if someone knew how to save those records that I get in a csv as with beautiful soup it would help me a lot.
1条答案
按热度按时间1u4esq0p1#
您的
SET_SELECTOR
下有多个<p>
标记。尝试使用更具体的
xpath
选择器,如: