在python中使用带浮点值的panda将一个 Dataframe 除以另一个 Dataframe

bvpmtnay  于 2022-11-19  发布在  Python
关注(0)|答案(2)|浏览(183)

我有两个名为df1和df2的独立 Dataframe ,如下所示:

Scaffold  Position  Ref_Allele_Count  Alt_Allele_Count  Coverage_Depth  Alt_Allele_Frequency
0          1        11                 7                51              58              0.879310
1          1        16                20                95             115              0.826087
2          2         9                 9                33              42              0.785714
3          2        12                86                51             137              0.372263
4          2        67                41                98             139              0.705036
5          3         8                 0                 0               0              0.000000
6          4        99                32                26              58              0.448276
7          4       101               100                24             124              0.193548
8          4       115                69                26              95              0.273684
9          5         6                40                57              97              0.587629
10         5        19                53                87             140              0.621429
    Scaffold  Position  Ref_Allele_Count  Alt_Allele_Count  Coverage_Depth  Alt_Allele_Frequency
0          1        11                 7                64              71              0.901408
1          1        16                10                90             100              0.900000
2          2         9                79                86             165              0.521212
3          2        12                12                73              85              0.858824
4          2        67                54                96             150              0.640000
5          3         8                 0                 0               0              0.000000
6          4        99                86                28             114              0.245614
7          4       101                32                25              57              0.438596
8          4       115                97                16             113              0.141593
9          5         6                86                43             129              0.333333
10         5        19                59                27              86              0.313953

我已经在等位基因计数和覆盖深度中找到了df1和df2的和值,但我需要将两个df的Alt_Allele_Count和Coverage_Depth除以另一个来计算总等位基因频率(AF)。我试着将这两个变量相除,但得到了错误消息:TypeError:float()参数必须是字符串或数字,而不是'DataFrame',当我试图将它们转换为浮点数时,以及将此表转换为df时:

Alt_Allele_Count  Coverage_Depth
0                NaN             NaN
1                NaN             NaN
2                NaN             NaN
3                NaN             NaN
4                NaN             NaN
5                NaN             NaN
6                NaN             NaN
7                NaN             NaN
8                NaN             NaN
9                NaN             NaN
10               NaN             NaN

我目前的代码:

import csv
import pandas as pd
import numpy as np

df1 = pd.read_csv('C:/Users/Tom/Python_CW/file_pairA_1.csv')
df2 = pd.read_csv('C:/Users/Tom/Python_CW/file_pairA_2.csv')
print(df1)
print(df2)

Ref_Allele_Count = (df1[['Ref_Allele_Count']] + df2[['Ref_Allele_Count']])
print(Ref_Allele_Count)

Alt_Allele_Count = (df1[['Alt_Allele_Count']] + df2[['Alt_Allele_Count']])
print(Alt_Allele_Count)

Coverage_Depth = (df1[['Coverage_Depth']] + df2[['Coverage_Depth']]).astype(float)
print(Coverage_Depth)

AF = Alt_Allele_Count / Coverage_Depth

print(AF)
fnvucqvd

fnvucqvd1#

这个错误源于Pandas系列和 Dataframe 之间的差异。系列是一维结构,就像一个单一的列,而 Dataframe 是二维对象,就像表。系列加在一起会产生一系列新的值,而 Dataframe 加在一起会产生一些更不实用的东西。
获取 Dataframe 的切片可能会生成系列对象或 Dataframe 对象,具体取决于您的操作方式:

df['column_name'] -> Series
df[['column_name', 'column_2']] -> Dataframe

所以在这一行:

Ref_Allele_Count = (df1[['Ref_Allele_Count']] + df2[['Ref_Allele_Count']])

df1'Ref_Allele_Count'成为单一列 Dataframe 而不是序列。

Ref_Allele_Count = (df1['Ref_Allele_Count'] + df2['Ref_Allele_Count'])

这里应该会传回正确的结果。其他要加在一起的栏也是一样。

disho6za

disho6za2#

当引用Pandasdf中的一列而不是2时,只需使用一组方括号'[]'就可以解决这个问题。

import csv
import pandas as pd
import numpy as np

df1 = pd.read_csv('C:/Users/Tom/Python_CW/file_pairA_1.csv')
df2 = pd.read_csv('C:/Users/Tom/Python_CW/file_pairA_2.csv')
print(df1)
print(df2)

# note that I changed your double brackets ([["col_name"]]) to single (["col_name"])
# this results in pd.Series objects instead of pd.DataFrame objects
Ref_Allele_Count = (df1['Ref_Allele_Count'] + df2['Ref_Allele_Count'])
print(Ref_Allele_Count)

Alt_Allele_Count = (df1['Alt_Allele_Count'] + df2['Alt_Allele_Count'])
print(Alt_Allele_Count)

Coverage_Depth = (df1['Coverage_Depth'] + df2['Coverage_Depth']).astype(float)
print(Coverage_Depth)

AF = Alt_Allele_Count / Coverage_Depth

print(AF)

相关问题