csv XML文件在分析时返回NonType跟踪错误

hpxqektj  于 12个月前  发布在  其他
关注(0)|答案(3)|浏览(76)

我一直在尝试使用ElementTree.parse解析一个xml文件,结果出现了NullTypeTraceBack错误。我知道有很多关于这个问题的问题,但是我注意到了一些关于我试图解析的xml文件的问题。我注意到xml文件与我在这些平台上看到的问题中看到的不同。并不是说我对标准的xml文件应该是什么样子有什么想法,而是我怀疑也许我的文件是问题所在。
该文件从www.example.com获得archive.org,它是一个包含来自stackexchange的数据的文件。文件包含在这里

<?xml version="1.0" encoding="utf-8"?>
<users>
  <row Id="-1" Reputation="1" CreationDate="2015-03-16T22:02:56.000" DisplayName="Community" LastAccessDate="2015-03-16T22:02:56.000" Location="on the server farm" AboutMe="&lt;p&gt;Hi, I'm not really a person.&lt;/p&gt;&#xD;&#xA;&lt;p&gt;I'm a background process that helps keep this site clean!&lt;/p&gt;&#xD;&#xA;&lt;p&gt;I do things like&lt;/p&gt;&#xD;&#xA;&lt;ul&gt;&#xD;&#xA;&lt;li&gt;Randomly poke old unanswered questions every hour so they get some attention&lt;/li&gt;&#xD;&#xA;&lt;li&gt;Own community questions and answers so nobody gets unnecessary reputation from them&lt;/li&gt;&#xD;&#xA;&lt;li&gt;Own downvotes on spam/evil posts that get permanently deleted&lt;/li&gt;&#xD;&#xA;&lt;li&gt;Own suggested edits from anonymous users&lt;/li&gt;&#xD;&#xA;&lt;li&gt;&lt;a href=&quot;http://meta.stackoverflow.com/a/92006&quot;&gt;Remove abandoned questions&lt;/a&gt;&lt;/li&gt;&#xD;&#xA;&lt;/ul&gt;" Views="22" UpVotes="74" DownVotes="991" AccountId="-1" />
  <row Id="1" Reputation="101" CreationDate="2015-03-17T14:49:42.463" DisplayName="Adam Lear" LastAccessDate="2022-09-20T19:44:00.090" Location="New York, NY" AboutMe="&#xA;&lt;p&gt;Developer at Stack Overflow focusing on public Q&amp;amp;A. Russian Canadian working in the American idiom.&lt;/p&gt;&#xA;&lt;p&gt;Once upon a time:&lt;/p&gt;&#xA;&lt;ul&gt;&#xA;&lt;li&gt;community manager at Stack Overflow&lt;/li&gt;&#xA;&lt;li&gt;elected moderator on Stack Overflow and Software Engineering&lt;/li&gt;&#xA;&lt;li&gt;desktop software developer ¯\_(ツ)_/¯&lt;/li&gt;&#xA;&lt;/ul&gt;&#xA;&lt;p&gt;Email me a link to your favorite Wikipedia article: &lt;code&gt;[email protected]&lt;/code&gt;.&lt;/p&gt;&#xA;" Views="76" UpVotes="0" DownVotes="0" AccountId="37099" />
  <row Id="3" Reputation="4186" CreationDate="2015-03-17T14:59:26.040" DisplayName="dfife" LastAccessDate="2022-03-01T21:34:56.797" WebsiteUrl="http://www.dustinfife.net" AboutMe="" Views="70" UpVotes="54" DownVotes="0" AccountId="136853" />
  <row Id="4" Reputation="1" CreationDate="2015-03-17T14:59:37.893" DisplayName="Conor Semler" LastAccessDate="2015-03-17T15:14:42.150" Views="3" UpVotes="0" DownVotes="0" AccountId="5091225" />
  <row Id="5" Reputation="443" CreationDate="2015-03-17T14:59:56.080" DisplayName="Niall C." LastAccessDate="2023-04-01T03:09:46.917" WebsiteUrl="http://diy.stackexchange.com/users/22" Location="Portland, OR" AboutMe="&lt;p&gt;I'm here because when I'm working on my house, I think of things that I'd like to make.&lt;/p&gt;&#xA;" Views="2" UpVotes="28" DownVotes="2" AccountId="10331" />
  <row Id="6" Reputation="525" CreationDate="2015-03-17T15:02:41.573" DisplayName="CoAstroGeek" LastAccessDate="2022-11-16T20:19:09.717" AboutMe="&lt;p&gt;Working in the space business in Colorado Springs for about 15 years with a focus on astrodynamics, orbit design, collision avoidance analysis and high performance computing.&lt;/p&gt;&#xA;" Views="3" UpVotes="46" DownVotes="1" AccountId="3341901" />
  <row Id="7" Reputation="101" CreationDate="2015-03-17T15:06:07.907" DisplayName="Shog9" LastAccessDate="2022-08-23T21:09:17.643" WebsiteUrl="http://shog9.com" Location="Frontier, WA, USA" AboutMe="&lt;p&gt;Well, fancy seeing you here!&lt;/p&gt;&#xA;&lt;p&gt;I work for &lt;a href=&quot;https://www.enterprisedb.com/&quot; rel=&quot;nofollow noreferrer&quot;&gt;EDB&lt;/a&gt;, assisting a bunch of really skilled people share their PostgreSQL expertise and experience with the world.&lt;/p&gt;&#xA;&lt;p&gt;Before that, I worked here - at Stack Overflow / Stack Exchange. Here, my duties &lt;em&gt;also&lt;/em&gt; involved helping a bunch of really skilled people share their knowledge. So you might encounter some posts from me which provide guidance and advice for the folks using this network of Q&amp;amp;A sites.&lt;/p&gt;&#xA;&lt;p&gt;I tend to write as though I know what I'm writing about... And sometimes I do... But, you should always use your own judgement: question everything, read the links to supporting materials, and draw your own conclusions. I'm usually happy to discuss anything I've written, so don't hesitate to raise concerns or point out when something is unclear!&lt;/p&gt;&#xA;&lt;p&gt;&lt;sup&gt;&lt;sub&gt;&lt;strong&gt;Whatsoever thy hand findeth to do, do it with thy might; for there is no work, nor device, nor knowledge, nor wisdom, in the grave, whither thou goest.&lt;/strong&gt;&lt;/sub&gt;&lt;/sup&gt;&lt;/p&gt;&#xA;" Views="0" UpVotes="13" DownVotes="0" AccountId="620" />
  <row Id="8" Reputation="101" CreationDate="2015-03-17T15:06:48.457" DisplayName="Gabe" LastAccessDate="2015-03-19T02:03:09.250" Views="0" UpVotes="1" DownVotes="0" AccountId="5960" />
  <row Id="9" Reputation="101" CreationDate="2015-03-17T15:07:35.310" DisplayName="mmmmmpie" LastAccessDate="2015-03-30T17:57:49.187" WebsiteUrl="http://[email protected]" Location="West Virginia" AboutMe="&lt;p&gt;Oracle DBA, Sys Admin, SQL Dev, and some radios and stuff.&lt;/p&gt;&#xA;" Views="0" UpVotes="0" DownVotes="0" AccountId="5034898" />
  <row Id="10" Reputation="3591" CreationDate="2015-03-17T15:07:41.150" DisplayName="drs" LastAccessDate="2022-01-12T20:43:49.603" Location="Rochester, NY, USA" AboutMe="&lt;p&gt;Engage in &lt;a href=&quot;http://woodworking.stackexchange.com&quot;&gt;Woodworking&lt;/a&gt;&lt;/p&gt;&#xA;" Views="57" UpVotes="2710" DownVotes="2" AccountId="1597949" />
  <row Id="11" Reputation="4798" CreationDate="2015-03-17T15:07:41.540" DisplayName="Steven" LastAccessDate="2022-08-17T13:54:36.727" WebsiteUrl="http://www.twitter.com/sberkovitz" Location="Toronto, Canada" Views="49" UpVotes="497" DownVotes="15" AccountId="79558" />
  <row Id="12" Reputation="101" CreationDate="2015-03-17T15:07:54.307" DisplayName="Mike" LastAccessDate="2015-03-17T15:07:54.307" Location="Earth, TX" Views="0" UpVotes="0" DownVotes="0" AccountId="154660" />
  <row Id="13" Reputation="631" CreationDate="2015-03-17T15:09:10.383" DisplayName="rgmrtn" LastAccessDate="2022-02-23T14:45:54.897" WebsiteUrl="" Location="62º 28'N, 114º 22' W" AboutMe="&lt;p&gt;sans tache; im certifiable&lt;/p&gt;&#xA;" Views="4" UpVotes="10" DownVotes="0" AccountId="3872225" />
  <row Id="14" Reputation="835" CreationDate="2015-03-17T15:09:14.263" DisplayName="Joe" LastAccessDate="2022-07-27T16:58:34.473" AboutMe="&lt;p&gt;SAS Programmer/Developer/Analyst/buzzword, when I'm not parenting a pair of rambunctious little boys.&lt;/p&gt;&#xA;&#xA;&lt;p&gt;#SOreadytohelp&lt;/p&gt;&#xA;" Views="13" UpVotes="31" DownVotes="0" AccountId="1780022" />
  <row Id="15" Reputation="101" CreationDate="2015-03-17T15:09:19.203" DisplayName="Chris Farmer" LastAccessDate="2016-01-05T13:13:25.113" WebsiteUrl="http://cfarmerga.myopenid.com/" Location="Nashville, TN" AboutMe="Always happy, never blue." Views="0" UpVotes="3" DownVotes="0" AccountId="323" />
  <row Id="16" Reputation="173" CreationDate="2015-03-17T15:09:25.003" DisplayName="Markie" LastAccessDate="2017-02-14T14:31:50.920" Location="UK" AboutMe="&lt;p&gt;Website developer&lt;/p&gt;&#xA;" Views="2" UpVotes="3" DownVotes="0" AccountId="2321310" />
  <row Id="17" Reputation="101" CreationDate="2015-03-17T15:09:50.447" DisplayName="Lumi" LastAccessDate="2015-04-11T01:46:50.457" Views="0" UpVotes="0" DownVotes="0" AccountId="3780251" />
</users>

我试图解析上面的xml文件,以便将其加载到csv,我得到了一个NullType错误追溯。我知道很多问题都是关于同一个问题,但我认为这里提出的问题中的xml文件与我的不同。我不确定xml文件是什么样子的。但这里是代码。

# Importing the required libraries
import xml.etree.ElementTree as Xet
import pandas as pd
  
cols = ["row Id", "Reputation", "CreationDate", "DisplayName", "LastAccessDate", "WebsiteUrl", "Location", "AboutMe", "Views", "UpVotes", "DownVotes", "AccountId"]
rows = []
  
# Parsing the XML file
xmlparse = Xet.parse('Users.xml')
root = xmlparse.getroot()

for r in root.findall("users"):
    
    print(rows)
        
    rowId = i.find("row Id").text
    print('Row Id: ' + str(rowId))
    reputation = i.find("Reputation").text
    creationDate = i.find("Creationdate").text
    displayname = i.find("Displayname").text
    lastAccessDate = i.find("LastAccessDate").text
    websiteUrl = i.find("WebsiteUel").text
    location = i.find("Location").text
    aboutMe = i.find("AboutMe").text
    views = i.find("Views").text
    upVotes = i.find("UpVotes").text
    downVotes = i.find("DownVotes").text
    accountId = i.find("AccountId").text
            
    rows.append({"rowId": rowId,
                "Reputation": reputation,
                "Creationdate": creationdate,
                "Displayname": displayname,
                "LastAccessDate": lastAccessDate,
                "WebsiteUrl": websiteUrl,
                "Location": location,
                "AboutMe": aboutMe,
                "Views": views,
                "UpVotes": upVotes,
                "DownVotes": downVotes,
                "AccountId": accountId
                })

            
    
df = pd.DataFrame(rows, columns=cols)
  
# Writing dataframe to csv
df.to_csv('users1.csv')
mnemlml8

mnemlml81#

您确定需要csv作为输出而不是SS吗?字段AboutMe保存html内容。
无论如何,由于您需要所有字段,因此无需硬编码它们:

import csv
import xml.etree.ElementTree as ET
from itertools import chain

root = ET.parse("Users.xml").getroot()

data = [dict(r.items()) for r in root.findall(".//row")]  # or simply root
    
with open("Users.csv", mode="w", newline="", encoding="utf-8") as f:
    
    flds = dict.fromkeys(chain.from_iterable([r.keys() for r in root])).keys()
    wr = csv.DictWriter(f, delimiter=",", fieldnames=flds) # adjust if needed

    wr.writeheader()
    wr.writerows(data)

或者因为你使用的是pandas,你可以简单地read_xml,然后创建一个csv(* 或CSV表 *)。

pd.read_xml("Users.xml").to_csv("Users.csv") # or to_excel("Users.xlsx")
ctrmrzij

ctrmrzij2#

这是因为当你正在寻找的键是使用dict.get API的每个元素上的属性时,你正在使用find

# Iterate over the root directly
for child in root:
    print(rows)

    # get the attributes like this
    rowId = element.get('Id')

请参阅相关文档以获取参考信息。
element.find期望在element下有子节点,但这里不是这样。

vwkv1x7d

vwkv1x7d3#

试试powershell脚本

using assembly System.Xml.Linq

$inputFilename = "c:\temp\test.xml"
$outputFilename = "c:\temp\test.csv"

$doc = [System.Xml.Linq.XDocument]::Load($inputFilename)

$rows = $doc.Descendants("row")

$table =  [System.Collections.ArrayList]@()
foreach($row in $rows)
{
   $newRow = New-Object -TypeName pscustomobject
   foreach($att in $row.Attributes())
   {
      Write-Host $att.Name.LocalName $att.Value
      $newRow | Add-Member -NotePropertyName $att.Name.LocalName -NotePropertyValue $att.Value
      Write-Host "here : " $newRow 
   }
$newRow
   $table.Add($newRow) | out-null
}
$table | Export-CSV -path $outputFilename -NoTypeInformation

相关问题