我正在使用Python SAX解析器解析一个文件夹中的xml文件,并使用Pandas将输出写入CSV,但我只从CSV中的最后一个文件中获取数据。
我是Python新手,这是我第一次尝试SAX解析
文件读取:
for dirpath, dirs, files in os.walk(fp1):
for filename in files:
print(files)
fname = os.path.join(dirpath,filename)
if fname.endswith('.xml'):
print(fname)
#for count in files:
parser.parse(fname)
def characters(self, content):
rows = []
cols = ["ReporterCite","DecisionDate","CaseName","FileNum","CourtLocation","CourtName","CourtAbbrv","Judge","CaseLength","CourtCite","ParallelCite","CitedCount","UCN"]
#ReporteCite, DecisionDate, CaseName, FileNum, CourtLocation, CourtName, CourtAbbrv, Judge, CaseLength, CourtCite, ParallelCite, CitedCount, UCN
rows.append({"ReporterCite":self.rc,
"DecisionDate": self.dd,
"CaseName": self.can,
"FileNum": self.fn,
"CourtLocation": self.loc,
"CourtName": self.cn,
"CourtAbbrv": self.ca,
"Judge": self.j,
"CaseLength": self.cl,
"CourtCite": self.cc,
"ParallelCite": self.pc,
"CitedCount": self.cd,
"UCN": self.rn})
#print(rows)
df = pd.DataFrame(rows, columns=cols)
df.to_csv(fp2,index=False)
1条答案
按热度按时间2izufjch1#
我假设你总是会覆盖你之前的结果。这是一个Pandas问题,不是SAX问题。你想附加到现有的csv,对吗?如果是这种情况,你必须使用mode = 'a',比如
df.to_csv('filename.csv',mode = 'a')
更多选项,请参见Doc