用python解析xml到csv

j2qf4p5b  于 2021-07-14  发布在  Java
关注(0)|答案(1)|浏览(549)

我想将xml文件解析为csv格式,并以如下格式显示:

我已经成功地在csv文件中找到了每个元素的文本,我想将namelink和描述匹配成行,并将每个元素的文本放在每个列中,如表中所示。
原始xml文件:

我当前的尝试:


# Importing the required libraries

import xml.etree.ElementTree as Xet
import pandas as pd

# Parsing the XML file

xmlparse = Xet.parse('NiktoReportTest.xml')
root = xmlparse.getroot()

cols = ["namelink", "description"]
rows = []

x = []
for elm in root.findall("./niktoscan/scandetails/item/namelink"):
    x.append(elm.text)

y = []
for value in root.findall("./niktoscan/scandetails/item/description"):
    y.append(value.text)

rows.append({"namelink": x,
             "description": y})

df = pd.DataFrame(rows, columns=cols)

# Writing dataframe to csv

df.to_csv('output.csv')

csv文件的当前输出:

,namelink,description
0,"['http://127.0.0.1:80/', 'http://127.0.0.1:80/', 'http://127.0.0.1:80/', 'http://127.0.0.1:80/', 'http://127.0.0.1:80/', 'http://127.0.0.1:80/./', 'http://127.0.0.1:80/./', 'http://127.0.0.1:80//', 'http://127.0.0.1:80//', 'http://127.0.0.1:80/%2e/', 'http://127.0.0.1:80/%2e/', 'http://127.0.0.1:80///etc/hosts', 'http://127.0.0.1:80///', 'http://127.0.0.1:80/server-status', 'http://127.0.0.1:80/?PageServices', 'http://127.0.0.1:80/?wp-cs-dump', 'http://127.0.0.1:80///////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////', 'http://127.0.0.1:80///////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////', 'http://127.0.0.1:80/wp-content/themes/twentyeleven/images/headers/server.php?filesrc=/etc/hosts', 'http://127.0.0.1:80/wordpresswp-content/themes/twentyeleven/images/headers/server.php?filesrc=/etc/hosts', 'http://127.0.0.1:80/wp-includes/Requests/Utility/content-post.php?filesrc=/etc/hosts', 'http://127.0.0.1:80/wordpresswp-includes/Requests/Utility/content-post.php?filesrc=/etc/hosts', 'http://127.0.0.1:80/wp-includes/js/tinymce/themes/modern/Meuhy.php?filesrc=/etc/hosts', 'http://127.0.0.1:80/wordpresswp-includes/js/tinymce/themes/modern/Meuhy.php?filesrc=/etc/hosts', 'http://127.0.0.1:80/assets/mobirise/css/meta.php?filesrc=', 'http://127.0.0.1:80/login.cgi?cli=aa%20aa%27cat%20/etc/hosts', 'http://127.0.0.1:80/shell?cat+/etc/hosts']","['The anti-clickjacking X-Frame-Options header is not present.', 'The X-XSS-Protection header is not defined. This header can hint to the user agent to protect against some forms of XSS', 'The X-Content-Type-Options header is not set. This could allow the user agent to render the content of the site in a different fashion to the MIME type', '/: Directory indexing found.', 'Allowed HTTP Methods: POST, OPTIONS, HEAD, GET ', '/./: Directory indexing found.', ""/./: Appending '/./' to a directory allows indexing"", '//: Directory indexing found.', '//: Apache on Red Hat Linux release 9 reveals the root directory listing by default if there is no index page.', '/%2e/: Directory indexing found.', '/%2e/: Weblogic allows source code or directory listing, upgrade to v6.0 SP1 or higher. BID-2513.', ""///etc/hosts: The server install allows reading of any system file by adding an extra '/' to the URL."", '///: Directory indexing found.', '/server-status: This reveals Apache information. Comment out appropriate line in the Apache conf file or restrict access to allowed sources.', ""/?PageServices: The remote server may allow directory listings through Web Publisher by forcing the server to show all files via 'open directory browsing'. Web Publisher should be disabled. CVE-1999-0269."", ""/?wp-cs-dump: The remote server may allow directory listings through Web Publisher by forcing the server to show all files via 'open directory browsing'. Web Publisher should be disabled. CVE-1999-0269."", '///////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////: Directory indexing found.', ""///////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////////: Abyss 1.03 reveals directory listing when \t /'s are requested."", '/wp-content/themes/twentyeleven/images/headers/server.php?filesrc=/etc/hosts: A PHP backdoor file manager was found.', '/wordpresswp-content/themes/twentyeleven/images/headers/server.php?filesrc=/etc/hosts: A PHP backdoor file manager was found.', '/wp-includes/Requests/Utility/content-post.php?filesrc=/etc/hosts: A PHP backdoor file manager was found.', '/wordpresswp-includes/Requests/Utility/content-post.php?filesrc=/etc/hosts: A PHP backdoor file manager was found.', '/wp-includes/js/tinymce/themes/modern/Meuhy.php?filesrc=/etc/hosts: A PHP backdoor file manager was found.', '/wordpresswp-includes/js/tinymce/themes/modern/Meuhy.php?filesrc=/etc/hosts: A PHP backdoor file manager was found.', '/assets/mobirise/css/meta.php?filesrc=: A PHP backdoor file manager was found.', '/login.cgi?cli=aa%20aa%27cat%20/etc/hosts: Some D-Link router remote command execution.', '/shell?cat+/etc/hosts: A backdoor was identified.']"
bwleehnv

bwleehnv1#

我修改了你的代码以写入csv文件。没有必要这样做 panda 如果只用于写入csv文件。

import csv
import xml.etree.ElementTree as Xet

# Parsing the XML file

xmlparse = Xet.parse('test.xml')
root = xmlparse.getroot()

column_names = ["namelink", "description"]
column_values = {}

# Extract column data for all columns defined above

for column_name in column_names:
    column_values[column_name] = []
    for element in root.findall(f'./niktoscan/scandetails/item/{column_name}'):
        column_values[column_name].append(element.text)

# Create a row item for every column value, that was extracted above

rows = zip(*column_values.values())

with open('output.csv', 'w') as f:
    writer = csv.writer(f)
    writer.writerow(column_names)
    writer.writerows(rows)

相关问题