我正在使用CSS类选择器来帮助我处理一个蜘蛛。在Scrapy shell上,如果我执行以下命令,我会得到我需要的所有元素的输出:
scrapy shell "https://www.fcf.cat/acta/2022/futbol-11/cadet-primera-divisio/grup-2/1c/la-salle-bonanova-ce-a/1c/lhospitalet-centre-esports-b"
我根据收到的建议对Spider进行了修改:
import scrapy
class ActaSpider(scrapy.Spider):
name = 'acta_spider'
start_urls = [
'https://www.fcf.cat/acta/2022/futbol-11/cadet-primera-divisio/grup-2/1c/la-salle-bonanova-ce-a/1c/lhospitalet-centre-esports-b']
def parse(self, response):
print ("[ PARSE START ]")
temporada = response.css(".print-acta-temp::text").get()
temporada = temporada.replace('TEMPORADA ','')
print (temporada)
acta_comp = response.css(".print-acta-comp::text").get()
acta_comp_llista = acta_comp.split(' ')
print (acta_comp_llista)
for actaelements in response.css('table.acta-table tbody tr'):
yield {
'name': actaelements.css('a::text').get(),
'link': actaelements.css('a::attr(href)').get(default='Link Error'),
}
现在,我需要根据构建网页所基于的12个表格的信息来构建JSON文件。
{
"DadesPartit":
{
"Temporada": temporada,
"Categoria": acta_comp_llista[1],
"Divisio": acta_comp_llista[2],
"Grup": acta_comp_llista[6],
"Jornada": 28
},
"TitularsCasa":
[
{
"Nom": "IGNACIO",
"Cognom":"FERNÁNDEZ ARTOLA",
"Link": "https://.."
},
{
"Nom": "JAIME",
"Cognom":"FERNÁNDEZ ARTOLA",
"Link": "https://.."
},
{
"Nom": "BRUNO",
"Cognom":"FERRÉ CORREA",
"Link": "https://.."
}
],
"SuplentsCasa":
[
{
"Nom": " MARC",
"Cognom":"GIMÉNEZ ABELLA",
"Link": "https://.."
}
],
"CosTecnicCasa":
[
{
"Nom": " JORDI",
"Cognom":"LORENTE VILLENA",
"Llicencia": "E"
}
],
"TargetesCasa":
[
{
"Nom": "IGNACIO",
"Cognom":"FERNÁNDEZ ARTOLA",
"Tipus": "Groga",
"Minut": 65
}
],
"Arbitres":
[
{
"Nom": "ALEJANDRO",
"Cognom":"ALVAREZ MOLINA",
"Delegacio": "Barcelona1"
}
],
"Gols":
[
{
"Nom": "NATXO",
"Cognom":"MONTERO RAYA",
"Minut": 5,
"Tipus": "Gol de penal"
}
],
"Estadi":
{
"Nom": "CAMP DE FUTBOL COL·LEGI LA SALLE BONANOVA",
"Direccio":"C/ DE SANT JOAN DE LA SALLE, 33, BARCELONA"
},
"TitularsFora":
[
{
"Nom": "MARTI",
"Cognom":"MOLINA MARTIMPE",
"Link": "https://.."
},
{
"Nom": " XAVIER",
"Cognom":"MORA AMOR",
"Link": "https://.."
},
{
"Nom": " IVAN",
"Cognom":"ARRANZ MORALES",
"Link": "https://.."
}
],
"SuplentsFora":
[
{
"Nom": "OLIVER",
"Cognom":"ALCAZAR SANCHEZ",
"Link": "https://.."
}
],
"CosTecnicFora":
[
{
"Nom": "RAFAEL",
"Cognom":"ESPIGARES MARTINEZ",
"Llicencia": "D"
}
],
"TargetesFora":
[
{
"Nom": "ORIOL",
"Cognom":"ALCOBA LAGE",
"Tipus": "Groga",
"Minut": 34
}
]
}
我想知道如何建造它。
谢谢,琼
1条答案
按热度按时间vmpqdwk31#
使用
requests
和pandas
要简单得多。可以执行以下操作:您只需要为表建立
table_fb
索引。下面是另一种选择: