我尝试使用以下代码段为包含35M行的csv文件插入标题行
import csv
with open('E:\\Dataset\\dataset1.csv') as infile:
text = infile.read()
header = ['User IP','Top-level domain', 'Timestamp', 'Is Attack', 'Request',
'Len(request) withou TLD', 'Subdomains_count', 'w_count', 'w_max',
'entropy', 'w_max_ratio', 'w_count_ratio', 'digits_ratio', 'uppercase_ratio',
'time_avg', 'time_stdev', 'size_avg', 'size stdev', 'throughput', 'unique', 'entropy_avg'
'entropy_stdev']
with open('E:\\Dataset\\dataset2.csv', 'w') as outfile:
# join the headers into a string with commas and add a newline
outfile.write(f"{','.join(header)}\n")
outfile.write(text)
然而,当我试图打印带有表头的数据时,表头移动了一列。这是数据的原始表头(没有表头)
186.169.253.58 surbl.org 1624438272607 False \
0 186.169.253.58 surbl.org 1624438272607 False
1 186.169.253.58 spamhaus.org 1624438273058 False
2 186.169.253.58 spamhaus.org 1624438273058 False
3 186.169.253.58 spamhaus.org 1624438273059 False
4 186.169.253.58 spamhaus.org 1624438273059 False
h.surbl.org 1 1.1 0 0.1 -0.0 0.0 \
0 f.surbl.org 1 1 0 0 -0.000000 0.000000
1 118.141.11.106.sbl.spamhaus.org 18 5 0 0 2.633731 0.000000
2 118.141.11.106.zen.spamhaus.org 18 5 1 3 2.633731 0.166667
3 128.141.11.106.sbl.spamhaus.org 18 5 0 0 2.863826 0.000000
4 128.141.11.106.zen.spamhaus.org 18 5 1 3 2.863826 0.166667
0.0.1 0.0.2 0.0.3 3.4444444444444446 9.59311095410544 1.5 \
0 0.000000 0.000000 0.0 0.222222 0.440959 1.0
1 0.000000 0.611111 0.0 55.555556 165.542375 17.2
2 0.055556 0.611111 0.0 0.333333 0.500000 17.2
3 0.000000 0.611111 0.0 0.333333 0.500000 17.3
4 0.055556 0.611111 0.0 0.333333 0.500000 17.4
1.5811388300841898 468.75 0.4444444444444444 0.25849625007211563 \
0 0.000000 3333.333333 0.555556 0.000000
1 0.421637 343.313373 0.000000 3.048277
2 0.421637 43000.000000 0.000000 2.983547
3 0.483046 43250.000000 0.000000 2.959741
4 0.516398 43500.000000 0.000000 2.935936
0.81743691684035
0 0.000000
1 0.177285
2 0.199622
3 0.198131
4 0.193400
这是添加header后的数据头
User IP Top-level domain Timestamp \
186.169.253.58 surbl.org 1624438272607 False
186.169.253.58 surbl.org 1624438272607 False
186.169.253.58 spamhaus.org 1624438273058 False
186.169.253.58 spamhaus.org 1624438273058 False
186.169.253.58 spamhaus.org 1624438273059 False
Is Attack Request \
186.169.253.58 h.surbl.org 1
186.169.253.58 f.surbl.org 1
186.169.253.58 118.141.11.106.sbl.spamhaus.org 18
186.169.253.58 118.141.11.106.zen.spamhaus.org 18
186.169.253.58 128.141.11.106.sbl.spamhaus.org 18
Len(request) withou TLD Subdomains_count w_count w_max \
186.169.253.58 1 0 0 -0.000000
186.169.253.58 1 0 0 -0.000000
186.169.253.58 5 0 0 2.633731
186.169.253.58 5 1 3 2.633731
186.169.253.58 5 0 0 2.863826
entropy w_max_ratio w_count_ratio digits_ratio \
186.169.253.58 0.000000 0.000000 0.000000 0.0
186.169.253.58 0.000000 0.000000 0.000000 0.0
186.169.253.58 0.000000 0.000000 0.611111 0.0
186.169.253.58 0.166667 0.055556 0.611111 0.0
186.169.253.58 0.000000 0.000000 0.611111 0.0
uppercase_ratio time_avg time_stdev size_avg \
186.169.253.58 3.444444 9.593111 1.5 1.581139
186.169.253.58 0.222222 0.440959 1.0 0.000000
186.169.253.58 55.555556 165.542375 17.2 0.421637
186.169.253.58 0.333333 0.500000 17.2 0.421637
186.169.253.58 0.333333 0.500000 17.3 0.483046
size stdev throughput unique entropy_avgentropy_stdev
186.169.253.58 468.750000 0.444444 0.258496 0.817437
186.169.253.58 3333.333333 0.555556 0.000000 0.000000
186.169.253.58 343.313373 0.000000 3.048277 0.177285
186.169.253.58 43000.000000 0.000000 2.983547 0.199622
186.169.253.58 43250.000000 0.000000 2.959741 0.198131
看起来像是使用第一列作为索引。
1条答案
按热度按时间qnyhuwrf1#
如果您能够利用它,请使用Pandas,因为您正在处理的标题问题可以通过可用的明确导入和导出标志来解决。
你的问题中并不清楚你是否想要输出中的索引,或者它们是否存在于输入中,如果它们存在于输入中,设置index_col=0,如果你想要它们存在于输出中,在to_csv调用中设置index=True。