numpy 为大数据绘制直方图[重复]

7fhtutme  于 2023-10-19  发布在  其他
关注(0)|答案(1)|浏览(102)

此问题已在此处有答案

Efficient way to partially read large numpy file?(1个答案)
How do I read a large csv file with pandas?(16个回答)
上个月关门了。
我试图用python绘制一个大数据(近700万个点)的直方图,我想知道值的频率。我试过这个代码,但它需要太长的时间才能完成一个多小时!有什么建议吗?

import numpy as np
import matplotlib.pyplot as plt

file_path = "D:/results/planarity2.txt" 
data_array = []

with open(file_path, "r") as file:
    for line in file:
        value = line.strip()  
        data_array.append(value)
column_values = data_array 

unique_values, counts = np.unique(column_values, return_counts=True)

value_frequency = dict(zip(unique_values, counts))

x_values = list(value_frequency.keys())
y_values = list(value_frequency.values())

plt.bar(x_values, y_values, edgecolor='black', alpha=0.7)

plt.xlabel('Column Values')
plt.ylabel('Frequency')
plt.title('Frequency of Points Based on Column Values')
plt.show()

我也试过这个,但没有用

import numpy as np
import matplotlib.pyplot as plt

file_path = "D:/results/planarity2.txt" 
data_array = []

with open(file_path, "r") as file:
    for line in file:
        value = line.strip()  
        data_array.append(value)
column_values = data_array 
value_frequency = {}

for value in column_values:
    if value in value_frequency:
        value_frequency[value] += 1
    else:
        value_frequency[value] = 1

x_values = list(value_frequency.keys())
y_values = list(value_frequency.values())

plt.bar(x_values, y_values, edgecolor='black', alpha=0.7)

plt.xlabel('Column Values')
plt.ylabel('Frequency')
plt.title('Frequency of Points Based on Column Values')
plt.show()
dtcbnfnu

dtcbnfnu1#

我认为你的主要问题是,你似乎是在文件中阅读,并留下字符串,而不是将值转换为数字,并将它们保存在NumPy数组中(假设你的值只是数字?)。拥有700万个数据点不应该是一个特别的问题。首先要尝试的一件事是使用NumPy loadtxt函数读取文件,该函数将在读取值时自动将其转换为浮点数并输出NumPy数组。例如,而不是:

file_path = "D:/results/planarity2.txt" 
data_array = []

with open(file_path, "r") as file:
    for line in file:
        value = line.strip()  
        data_array.append(value)
column_values = data_array

只是有:

file_path = "D:/results/planarity2.txt"
column_values = np.loadtxt(file_path)

看看有没有用

相关问题