Pandas在阅读csv文件时将垃圾数字放在真实的数据之后

6vl6ewon 于 2022-12-06 发布在其他

关注(0)|答案(1)|浏览(132)

我正在使用read_csv读取文件，但当我查看已放入 Dataframe 中的内容时，它显示的数字比原始文件中的数字多得多。
编码：

df = pd.read_csv(f'{name}.csv', sep=',', decimal='.', dtype={'col1': str, 'column_with_trash': float})

   df[df['col1'] == '0001'].to_excel('1.xlsx')
   df['column_with_trash'] = df['column_with_trash'] - 1524684.3740493
   df[df['col1'] == '0001'].to_excel('2.xlsx')

Csv文件如下所示：

col1,col2,col3,col4,col5,col6,column_with_trash
0001,TP,2021-12-31,T,N,2130875.40078,1524684.374049378

我甚至不进行运算，但得到的输出仍然与输入不同。当我打开1.xlsx文件时，我可以从输出中减去1524684.3740493（它看起来像input - pic一样结束），我会得到0，00000007799826562404630000，因为其中有开头没有的数字。在2.xlsx文件中，我得到了相同的结果。
这怎么可能呢？我试过float_precision=“high”、“round_trip”和None，结果都没有变化。差别在小数点后第9位，这会打乱我的计算。df['column_with_trash'] = df['column_with_trash'].round(9)应该可以，但不会改变这些输出中的任何内容。
pic

csv

来源：https://stackoverflow.com/questions/73961588/pandas-put-trash-digits-after-real-data-while-reading-a-csv-file