我在此列表中存储了URL(用于网页抓取)和市政名称:muni = [("https://openbilanci.it/armonizzati/bilanci/filettino-comune-fr/entrate/dettaglio?year=2021&type=preventivo", "filettino"), ("https://openbilanci.it/armonizzati/bilanci/partanna-comune-tp/entrate/dettaglio?year=2021&type=preventivo","partanna"), ("https://openbilanci.it/armonizzati/bilanci/fragneto-labate-comune-bn/entrate/dettaglio?year=2021&type=preventivo", "fragneto-labate") ]
我尝试为不同的市创建不同的数据集。例如,从第一个URL抓取的数据将是:filettinodak.csv
。我现在使用的是以下代码:
import re
import json
import requests
import pandas as pd
import os
import random
os.chdir(r"/Users/aartimalik/Dropbox/data")
muni = [("https://openbilanci.it/armonizzati/bilanci/filettino-comune-fr/entrate/dettaglio?year=2021&type=preventivo", "filettino"),
("https://openbilanci.it/armonizzati/bilanci/partanna-comune-tp/entrate/dettaglio?year=2021&type=preventivo","partanna"),
("https://openbilanci.it/armonizzati/bilanci/fragneto-labate-comune-bn/entrate/dettaglio?year=2021&type=preventivo", "fragneto-labate")
]
for m in muni[1]:
URL = m
r = requests.get(URL)
p = re.compile("var bilancio_tree = (.*?);")
data = p.search(r.text).group(1)
data = json.loads(data)
all_data = []
for d in data:
for v in d["values"]:
for kk, vv in v.items():
all_data.append([d["label"], "-", kk, vv.get("abs"), vv.get("pc")])
for c in d["children"]:
for v in c["values"]:
for kk, vv in v.items():
all_data.append(
[d["label"], c["label"], kk, vv.get("abs"), vv.get("pc")]
)
df = pd.DataFrame(all_data, columns=["label 1", "label 2", "year", "abs", "pc"])
df.to_csv(muni[2]+"dak.csv", index=False)
我得到的错误是:Traceback (most recent call last): File "<stdin>", line 19, in <module> TypeError: can only concatenate tuple (not "str") to tuple
.
我想我在市政索引方面做错了什么:muni[i]
.有什么建议吗?非常感谢!
1条答案
按热度按时间vaqhlq811#
如果你稍微调整一下for循环,它应该能解决你的问题。下面的更改循环遍历
muni
中的所有列表条目。每次,它都会从每个元组中提取第一个值到URL
中,并将第二个元组值提取到label
中。通过这种更改,代码中的最后一行可以变为: