保存数据框为pandas中的csv/text文件,不进行行号

d7v8vwbk  于 12个月前  发布在  其他
关注(0)|答案(3)|浏览(108)

我在pandas中使用文本文件创建了一个数据框。

df = pd.read_table('inputfile.txt',names=['Line'])

字符串
当我做df

Line
0   17/08/31 13:24:48 INFO spark.SparkContext: Run...
1   17/08/31 13:24:49 INFO spark.SecurityManager: ...
2   17/08/31 13:24:49 INFO spark.SecurityManager: ...
3   17/08/31 13:24:49 INFO spark.SecurityManager: ...
4   17/08/31 13:24:49 INFO util.Utils: Successfull...
5   17/08/31 13:24:49 INFO slf4j.Slf4jLogger: Slf4...
6   17/08/31 13:24:49 INFO Remoting: Starting remo...
7   17/08/31 13:24:50 INFO Remoting: Remoting star...
8   17/08/31 13:24:50 INFO Remoting: Remoting now ...
9   17/08/31 13:24:50 INFO util.Utils: Successfull...


现在,我想将此文件另存为csv(保存

df.to_csv('outputfile')


我得到的结果是

0,17/08/31 13:24:48 INFO spark.SparkContext: Running Spark version 1.6.0
1,17/08/31 13:24:49 INFO spark.SecurityManager: Changing view acls to: user1
2,17/08/31 13:24:49 INFO spark.SecurityManager: Changing modify acls to: user1
3,17/08/31 13:24:49 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(user1);
4,17/08/31 13:24:49 INFO util.Utils: Successfully started service 'sparkDriver' on port 17101.
5,17/08/31 13:24:49 INFO slf4j.Slf4jLogger: Slf4jLogger started
6,17/08/31 13:24:49 INFO Remoting: Starting remoting
7,17/08/31 13:24:50 INFO Remoting: Remoting started; listening on addresses :
8,17/08/31 13:24:50 INFO Remoting: Remoting now listens on addresses: 
9,17/08/31 13:24:50 INFO util.Utils: Successfully started service 'sparkDriverActorSystem' on port 100033.


我希望我的输出是

17/08/31 13:24:48 INFO spark.SparkContext: Running Spark version 1.6.0
17/08/31 13:24:49 INFO spark.SecurityManager: Changing view acls to: user1
17/08/31 13:24:49 INFO spark.SecurityManager: Changing modify acls to: user1
17/08/31 13:24:49 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(user1);
17/08/31 13:24:49 INFO util.Utils: Successfully started service 'sparkDriver' on port 17101.
17/08/31 13:24:49 INFO slf4j.Slf4jLogger: Slf4jLogger started
17/08/31 13:24:49 INFO Remoting: Starting remoting
17/08/31 13:24:50 INFO Remoting: Remoting started; listening on addresses :
17/08/31 13:24:50 INFO Remoting: Remoting now listens on addresses: 
17/08/31 13:24:50 INFO util.Utils: Successfully started service 'sparkDriverActorSystem' on port 100033.


我已经尝试了下面的方法,但仍然得到相同的结果,而不是我想要的输出。

np.savetxt(r'np.txt', df.Line, fmt='%d')

df.to_csv(sep=' ', index=False, header=False)

lg40wkob

lg40wkob1#

James的答案在特殊情况下可能是正确的。然而,pandas的标准行为是将行号作为一个没有标题的列放在前面。要删除这个,只需将index=参数设置为False

df.to_csv("outfile.csv", index=False)

字符串
(Edit:更正,根据@Haagimus正确指出的,不要误导任何人。

dced5bon

dced5bon2#

克里斯蒂安几乎是对的。如果你看看to_csv command的文档。
根据文件
第一个月
我强烈推荐助手工具Kite来帮助处理像这样简单的事情。

df.to_csv('outfile.csv', index=False)

字符串

piztneat

piztneat3#

看起来数字可能是Line列中字符串的一部分。您可以将前导数字和空格替换为nothing,并将其输出到没有索引的文件中:

df.Line.str.replace('^\d+ +','').to_csv('outputfile.csv', index=False, header=False)

字符串

相关问题