pandas Python中的SAX解析器

biswetbf  于 2023-02-02  发布在  Python
关注(0)|答案(1)|浏览(141)

我正在使用Python SAX解析器解析一个文件夹中的xml文件,并使用Pandas将输出写入CSV,但我只从CSV中的最后一个文件中获取数据。
我是Python新手,这是我第一次尝试SAX解析
文件读取:

for dirpath, dirs, files in os.walk(fp1): 
          for filename in files:
            print(files)
            fname = os.path.join(dirpath,filename)
            if fname.endswith('.xml'):
              print(fname)
              #for count in files:
            parser.parse(fname)
def characters(self, content):
        rows = []
        cols = ["ReporterCite","DecisionDate","CaseName","FileNum","CourtLocation","CourtName","CourtAbbrv","Judge","CaseLength","CourtCite","ParallelCite","CitedCount","UCN"]
        #ReporteCite, DecisionDate, CaseName, FileNum, CourtLocation, CourtName, CourtAbbrv, Judge, CaseLength, CourtCite, ParallelCite, CitedCount, UCN             

        rows.append({"ReporterCite":self.rc,
                     "DecisionDate": self.dd,
                     "CaseName": self.can,
                     "FileNum": self.fn,
                     "CourtLocation": self.loc,
                     "CourtName": self.cn,
                     "CourtAbbrv": self.ca,
                     "Judge": self.j,   
                     "CaseLength": self.cl,
                     "CourtCite": self.cc,
                     "ParallelCite": self.pc,
                     "CitedCount": self.cd,
                     "UCN": self.rn})

        #print(rows)
        df = pd.DataFrame(rows, columns=cols)
        df.to_csv(fp2,index=False)
2izufjch

2izufjch1#

我假设你总是会覆盖你之前的结果。这是一个Pandas问题,不是SAX问题。你想附加到现有的csv,对吗?如果是这种情况,你必须使用mode = 'a',比如df.to_csv('filename.csv',mode = 'a')更多选项,请参见Doc

  • 'w'打开以进行写入,首先截断文件(默认)
  • “x”以独占方式打开,如果文件已存在则失败
  • 打开“a”以进行写入,如果存在,则追加到文件末尾

相关问题