python-3.x 使用类进行XML分析将找不到值

eimct9ow  于 2023-01-06  发布在  Python
关注(0)|答案(2)|浏览(184)

我将一个大的XML拆分成小的分支,然后只解析这部分。我搜索修改的时间戳"mod_time"标签,它在"contacts"标签中可用,但我的对象函数调用找不到值。在一些contacts中,也有一些标签完全丢失。
我尝试了iterfind('tag_name')iter()findall('tag_name'),但我的程序没有显示任何结果,我几个小时都找不到故障在哪里。
下面是我的XML,它简化为两个元素:

<?xml version="1.0" encoding = "utf-8"?>
<phonebooks>
  <phonebook name="Telefonbuch">
   <contact>
      <category>0</category>
      <person>
        <realName>Dummy, Name, Street</realName>
      </person>
      <telephony nid="1">
        <number type="work" prio="1" id="0">012345678</number>
      </telephony>
      <services />
      <setup />
      <features doorphone="0" />
      <mod_time>1587477163</mod_time>
      <uniqueid>358</uniqueid>
    </contact>
    <contact>
      <category>0</category>
      <person>
        <realName>Foto Name</realName>
      </person>
      <telephony nid="1">
        <number type="home" prio="1" id="0">067856743</number>
      </telephony>
      <services />
      <setup />
      <features doorphone="0" />
      <mod_time>1547749691</mod_time>
      <uniqueid>68</uniqueid>
    </contact>
</phonebook>
</phonebooks>

她知道我做了什么

import timeit
import xml.etree.ElementTree as ET

class Phonebook:
    def __init__(self, xml_file, tag_node):
        """Split tree in contact branches """
        self.xml_file = xml_file
        self.tag_node = tag_node
        # For furter parsing
        contacts = []
        i = 0
        events =('start','end','start-ns','end-ns')
        for event, elem in ET.iterparse(self.xml_file, events=events):
            if event == 'end' and elem.tag == self.tag_node[0]:
                #print(elem.tag)
                contacts.append(elem)
                par = Contact(elem, i)
                par.parse_node(elem, i)
                i += 1
            elem.clear()
        print("Amount of contacts:", len(contacts))

class Contact:
    def __init__(self, branch, i):
        self.tree = branch
        #print(i, self.tree)
       
    def parse_node(self, branch, i):
        for node in branch.iterfind('.//mod_time'):
           print(node.text)               
         
def main():
    elem = Phonebook('new _dummy1.xml',['contact'])

    
if __name__ == '__main__':
    """ Input XML file definition """
    starttime=timeit.default_timer()
    main()
    print('Finished')
    print("Runtime:", timeit.default_timer()-starttime)

输出:Amount of contacts: 2 Finished Runtime: 0.0006361000050674193
预期产出:
小行星1587477

ibps3vxo

ibps3vxo1#

代码

import timeit
import xml.etree.ElementTree as ET

class Phonebook:
    def __init__(self, xml_file, selector):
        self.xml_file = xml_file
        self.selector = selector
        root = ET.parse(xml_file)
        contacts = root.findall(selector)  
        print("Amount of contacts:", len(contacts))
        for mod_time in contacts:
            print(mod_time.text)

def main():
    Phonebook('./_dummy1.xml','.//contact/mod_time')

if __name__ == '__main__':
    starttime=timeit.default_timer()
    main()
    print('Finished')
    print("Runtime:", timeit.default_timer()-starttime)

产出

$ python test.py
Amount of contacts: 2
1587477163
1547749691
Finished
Runtime: 0.0006627999973716214
waxmsbnn

waxmsbnn2#

我现在解决了对象数据握手的问题。我现在创建了一个Contact的示例,它继承了父类Phonbook的值,而不是从Phonbook对象调用Contact。

super()

函数,它引用了这个很棒的page。我发布我的解决方案,因为它可能会对其他遇到类似问题的人感兴趣。感谢所有试图提供帮助的人!
我更改的代码:

import psutil
import timeit

import xml.etree.ElementTree as ET

class Phonebook:
    def __init__(self, file_path):
        """Split tree in contact branches """
        self.file_path = file_path
    
    def contacts_list(self, file_path):    
        contacts = []
        events =('start','end','start-ns','end-ns')
        for event, elem in ET.iterparse(self.file_path, events=events):
            if event == 'end' and elem.tag == 'contact':
                contact = elem
                contacts.append(contact)
        elem.clear()
        return contacts
        #print("Superclass:",contacts)
        
class Contact(Phonebook):
    def __init__(self, file_path):
        super().__init__(file_path)
               
    def search_node(self, contact, searched_tag):
        contact_template =['category','person', 'telephony', 'services', 'setup', 'features', 'mod_time', 'uniqueid' ]
        node_tag_list = []
        list_difference = []
        search_list = []
        for node in contact:
            if node.tag not in node_tag_list:
                node_tag_list.append(node.tag)
        for element in contact_template:
            if element not in node_tag_list:
                list_difference.append(element)
        
        for node in contact:
            if node.tag == searched_tag and node.tag not in list_difference:
                search_list.append(node.text)
                #print(node.text)
            else:
                if len(list_difference) != 0 and searched_tag in list_difference:
                    message = self.missed_tag(list_difference)
                    #print(message)
                    if message not in search_list:
                        search_list.append(message)                
        return  search_list
                        
    def missed_tag(self, list_difference):
        for m in list_difference:
            message = f'{m} - not assigned'
            return message
                    
         
def main():
    con = Contact('dummy.xml')
    contacts = con.contacts_list(('dummy.xml'))
    
    mod_time_list =[]
    for contact in contacts:
        mod_time = con.search_node(contact, 'mod_time')
        mod_time_list.append(mod_time)
    print(len(mod_time_list))
    print(mod_time_list)
    
if __name__ == '__main__':
    """ Input XML file definition """
    starttime=timeit.default_timer()
    main()
    print('Finished')
    # Getting % usage of virtual_memory ( 3rd field)
    print('RAM memory % used:', psutil.virtual_memory()[2])
    # Getting usage of virtual_memory in GB ( 4th field)
    print('RAM Used (GB):', psutil.virtual_memory()[3]/1000000000)
    print("Runtime:", timeit.default_timer()-starttime)

相关问题