我试图计算csv文件的联合概率,并返回一个新的csv文件,其中有一个额外的列的联合概率。问题是我的csv文件中有一些nan,我希望包含nan的行也有联合概率。我的csv输入文件看起来像:(这只是一个子集)
Age,Salary
84.0,74198.0
25.5,57881.5
41.0,NaN
57.0,NaN
54.0,40286.0
字符串
CSV输出文件看起来像这样:
Age,Salary,JointProbability
84.0,74198.0,0.04000000000000001
25.5,57881.5,0.04000000000000001
41.0,,0.0
57.0,,0.0
54.0,40286.0,0.04000000000000001
型
所需的csv输出:(概率随机,因此它们的总和必须为1)我还希望具有概率的附加列被称为P(输入csv文件的列名),其可以根据输入csv文件而改变。我还希望NaN在那里,而不是像前一个文件中那样为空白。
Age,Salary,P(Age,Salary)
84.0,74198.0,0.3
25.5,57881.5,0.1
41.0,NaN,0.1
57.0,NaN,0.2
54.0,40286.0,0.3
型
代码:
import pandas as pd
# Read the CSV file into a DataFrame
df = pd.read_csv('random_sampled_data.csv')
# Calculate joint probabilities
joint_probabilities = []
total_count = len(df)
# Calculate joint probability for each row
for _, row in df.iterrows():
for column in df.columns:
probability = len(df[df[column] == row[column]]) / total_count
joint_probabilities.append(probability)
# Add joint probabilities as a new column to the DataFrame
df['JointProbability'] = joint_probabilities
# Save the updated DataFrame to a new CSV file
df.to_csv('output_with_joint_probabilities.csv', index=False)
型
2条答案
按热度按时间oxiaedzo1#
创建一个包含概率的新列,考虑具有NaN值的行,并计算CSV文件的联合概率。
我修改了你的代码如下:
字符串
zwghvu4y2#
我再给你一个答案:
字符串