pandas 如何将XML数据添加到空列表中以创建 Dataframe ?

f4t66c6m  于 2023-01-28  发布在  其他
关注(0)|答案(2)|浏览(161)

我正在尝试将XML数据附加到空列表中以创建一个 Dataframe 。我能够创建除三个列表之外的所有列表,因为一些标记具有空值。我已经尝试使用xpath函数从所需的标记中获取所有文本。

import requests
from lxml import objectify
from lxml import etree
from bs4 import BeautifulSoup 

URL = 'https://data.virginia.gov/api/views/xvir-sctz/rows.xml?accessType=DOWNLOAD'
response = requests.get(URL).content

import requests
from lxml import objectify

root = objectify.fromstring(response)

下面是一些我想附加的空列表

households_served = []
individuals_served = []
pounds_of_food_distributed = []
month = []

我试着用这个给我列清单,很有效。

pounds_of_food_distributed = root.xpath('//response/row/row/pounds_of_food_distributed/text()')
individuals_served = root.xpath('//response/row/row/individuals_served/text()')
households_served = root.xpath('//response/row/row/households_served/text()')
month = root.xpath('//response/row/row/month/text()')

但是当我尝试将pd.DataFrame与此代码一起使用时,我得到了一个错误。

table = pd.DataFrame(
    {'Month': month,
     'House': households_served,
     'People': individuals_served,
     'Pounds' : pounds_of_food_distributed
    })

有什么建议吗?

anhgbhbe

anhgbhbe1#

您的问题是row中不存在某些元素,但月份始终存在。
一个想法是用0或任何你想要的来填充缺失的数据。

import requests
from lxml import objectify
from lxml import etree
from bs4 import BeautifulSoup
import pandas as pd

URL = 'https://data.virginia.gov/api/views/xvir-sctz/rows.xml?accessType=DOWNLOAD'
response = requests.get(URL).content
root = objectify.fromstring(response)

households_served = []
individuals_served = []
pounds_of_food_distributed = []
month = []

for element in root.xpath('//row/row'):
    month.append(element["month"]) # month always exists
    individuals_served.append(element["individuals_served"] if hasattr(element, "individuals_served") else 0)
    households_served.append(element["households_served"] if hasattr(element, "households_served") else 0)
    pounds_of_food_distributed.append(element["pounds_of_food_distributed"] if hasattr(element, "pounds_of_food_distributed") else 0)

print(len(month))
print(len(individuals_served))
print(len(pounds_of_food_distributed))
print(len(households_served))

table = pd.DataFrame(
    {'Month': month,
     'House': households_served,
     'People': individuals_served,
     'Pounds' : pounds_of_food_distributed
    })

print(table)

输出:

db2dz4w8

db2dz4w82#

另一种方法是将pandas直接与pandas.read_xml一起使用,将xpath设置为row元素的所有子元素row,并根据需要对输出进行切片-这也将处理XML结构中缺失/空的元素:

import pandas as pd
df = pd.read_xml('https://data.virginia.gov/api/views/xvir-sctz/rows.xml?accessType=DOWNLOAD', xpath='row//row')[['month','individuals_served','households_served','pounds_of_food_distributed']]
df.columns = ['Month','House','People','Pounds']
df

| | 月份|豪斯|人|磅|
| - ------|- ------|- ------|- ------|- ------|
| 无|十月|楠|楠|小行星156644|
| 1个|四月|楠|楠|小行星21602|
| 第二章|八月|楠|楠|小行星51338|
| 三个|五月|六二七|二百七十|小行星67633|
| 四个|五月|楠|楠|小行星54561|
| ...|||||
| 小行星4254|八月|三十七|十七|小行星482661|
| 小行星4255|八月|小行星1974|七八三|小行星29211|
| 小行星4256|四月|四百八十五|二百五十九|小行星16254.5|
| 小行星4257|八月|小行星34986|八五八三|小行星561|
| 小行星4258|六月|七四九|二百五十八|小行星31560.7|

相关问题