csv 创建一个函数来为数据文件生成一组Pearson相关性,而不使用库

6qftjkof  于 2023-01-15  发布在  其他
关注(0)|答案(1)|浏览(138)

我正在尝试开发一个函数,它允许我为csv数据集中的每一对列生成皮尔逊相关系数。该函数需要返回:一个元组列表,每个元组包含两个列名,然后是Pearson相关系数值。但是,我不能使用任何外部库。所以我不能导入像csv reader或NumPy这样的东西。
这是我到目前为止所尝试的,但我很难理解如何继续这一点或以不同的方式处理它。

def gen_pearson(file_name):
    col_names = data [0]
#create list of tuples of pairs of columns
    col_pairs = [(col_names[i], col_names[j]) for i in range(len(col_names)) for j in range (i+1, len(col_names))]

#calculate pearson correlation coefficient for each pair columns   
coefficients = []
for pair in col_pairs:
    col_1 = [row[pair[0]] for row in data[1:]]
    col_2 = [row[pair[1]] for row in data[1:]]

coefficient = sum((a - mean_col_1) * (b - mean_col_2) for (a,b) in zip (col_1, col_2)) / len (col_1)
stdev_x = (sum((a - mean_col_1) **2 for a in col_1)/len(col_1)) **0.5
stdev_y = (sum((b - mean_col_2) **2 for b in col_2)/len(col_2)) **0.5
pearson_result = cov/(stdev_x * stdev_y)
8yparm6h

8yparm6h1#

def gen_pearson(file_name):
    # Read the data from the file
    with open(file_name, 'r') as f:
        data = [row.split(',') for row in f.read().split('\n') if row]

    col_names = data[0]
    # Create a list of tuples of pairs of columns
    col_pairs = [(col_names[i], col_names[j]) for i in range(len(col_names)) for j in range(i+1, len(col_names))]

    coefficients = []
    for pair in col_pairs:
        col_1 = [float(row[pair[0]]) for row in data[1:]]
        col_2 = [float(row[pair[1]]) for row in data[1:]]
        mean_col_1 = sum(col_1) / len(col_1)
        mean_col_2 = sum(col_2) / len(col_2)

        # calculate the covariance
        cov = sum((a - mean_col_1) * (b - mean_col_2) for (a, b) in zip(col_1, col_2)) / len(col_1)
        # calculate the standard deviation
        stdev_x = (sum((a - mean_col_1) ** 2 for a in col_1) / len(col_1)) ** 0.5
        stdev_y = (sum((b - mean_col_2) ** 2 for b in col_2) / len(col_2)) ** 0.5
        # calculate the Pearson correlation coefficient
        pearson_result = cov / (stdev_x * stdev_y)
        coefficients.append((pair[0], pair[1], pearson_result))
    return coefficients

你需要小心处理数据类型,因为上面的代码假设数据是浮点型的,如果不是浮点型的,你需要相应地转换它们。

相关问题