使用NumPy计算平均分数时出错:“ufunc add”不包含循环

n7taea2i  于 12个月前  发布在  其他
关注(0)|答案(2)|浏览(219)

我在尝试使用NumPy计算学生的平均分数时遇到了一个问题。我写的代码给了我以下错误:
Traceback(most recent call last):average_scores = np.nanmean(numeric_columns,axis=1)... numpy.core._exceptions._ufuncNoLoopError:ufunc 'add'未包含签名匹配类型的循环

代码

import numpy as np

# Defining anything that could be missing in someone else's data
missing_values = ['N/A', 'NA', 'nan', 'NaN', 'NULL', '', '']

# Defining each of the data types
dtype = [('Student Name', 'U50'), ('Math', 'float'), 
         ('Science', 'float'), ('English', 'float'), 
         ('History', 'float'), ('Art', 'float')]

# Load data into a numpy array
data = np.genfromtxt('grades.csv', delimiter=',', 
                     names=True, dtype=dtype,
                     encoding=None, missing_values=missing_values,
                     filling_values=np.nan, ndmin=2)

# Get all the field names (column names) in the structured array
field_names = data.dtype.names

# Extract the numeric columns by checking their data type
numeric_columns = data[[field for field in field_names if data[field].dtype == float]]

# Calculate the average score for each student
average_scores = np.nanmean(numeric_columns, axis=1)

print(average_scores)

字符串

以下是我在“grades.csv”文件中的数据:

Student Name,Math,Science,English,History,Art
Alice,90,88,94,85,78
Bob, 85,92,,88,90
Charlie,78,80,85,85,79
David,94,,90,92,84
Eve,92,88,92,90,88
Frank,,95,94,86,95

我尝试了什么我尝试了加载数据,过滤数字列,并使用np.nanmean()计算平均分数。我还确保适当地处理缺失值。
期望我期望代码计算并打印每个学生的平均分数,没有错误。
请求帮助如果您能帮助我了解错误的原因以及如何解决它,我将不胜感激。

wz1wpwve

wz1wpwve1#

函数np.nanmean()正确,因为它是ignores the NaN values, read documentation.
对于您的示例,您的数值列是一个异构(多类型)数组。您可以通过使用array.astype()函数将其转换为同构(单类型)数组来解决此问题。
试试这个:


的数据

xurqigkl

xurqigkl2#

使用您的示例txt:

In [2]: txt='''Student Name,Math,Science,English,History,Art
   ...: Alice,90,88,94,85,78
   ...: Bob, 85,92,,88,90
   ...: Charlie,78,80,85,85,79
   ...: David,94,,90,92,84
   ...: Eve,92,88,92,90,88
   ...: Frank,,95,94,86,95'''

字符串
你的vw处理genfromtxt以及任何人,我见过的SO:

In [3]: # Defining anything that could be missing in someone else's data
   ...: missing_values = ['N/A', 'NA', 'nan', 'NaN', 'NULL', '', '']
   ...: 
   ...: # Defining each of the data types
   ...: dtype = [('Student Name', 'U50'), ('Math', 'float'), 
   ...:          ('Science', 'float'), ('English', 'float'), 
   ...:          ('History', 'float'), ('Art', 'float')]
   ...: 
   ...: # Load data into a numpy array
   ...: data = np.genfromtxt(txt.splitlines(), delimiter=',', 
   ...:                      names=True, dtype=dtype,
   ...:                      encoding=None, missing_values=missing_values,
   ...:                      filling_values=np.nan, ndmin=2)


data是一个结构化数组; ipython's显示的是repr,所以它显示的是dtype:

In [4]: data
Out[4]: 
array([[('Alice', 90., 88., 94., 85., 78.)],
       [('Bob', 85., 92., nan, 88., 90.)],
       [('Charlie', 78., 80., 85., 85., 79.)],
       [('David', 94., nan, 90., 92., 84.)],
       [('Eve', 92., 88., 92., 90., 88.)],
       [('Frank', nan, 95., 94., 86., 95.)]],
      dtype=[('Student_Name', '<U50'), ('Math', '<f8'), ('Science', '<f8'), ('English', '<f8'), ('History', '<f8'), ('Art', '<f8')])

In [5]: field_names = data.dtype.names
   ...: # Extract the numeric columns by checking their data type
   ...: numeric_columns = data[[field for field in field_names if data[field].dtype == float]]

In [6]: numeric_columns
Out[6]: 
array([[(90., 88., 94., 85., 78.)],
       [(85., 92., nan, 88., 90.)],
       [(78., 80., 85., 85., 79.)],
       [(94., nan, 90., 92., 84.)],
       [(92., 88., 92., 90., 88.)],
       [(nan, 95., 94., 86., 95.)]],
      dtype={'names': ['Math', 'Science', 'English', 'History', 'Art'], 'formats': ['<f8', '<f8', '<f8', '<f8', '<f8'], 'offsets': [200, 208, 216, 224, 232], 'itemsize': 240})


这是一个(6,1)形状,有5个字段。尺寸1的维度在那里是因为你指定了ndmin=2。如果没有,它将是(6,)。
nanmean不能处理这种复合dtype。有几种方法可以转换为简单的浮点数组。astype/view可以,也可以:

In [7]: x=np.array(numeric_columns.tolist())
In [8]: x.shape
Out[8]: (6, 1, 5)
In [10]: np.nanmean(x[:,0,:], axis=1)
Out[10]: array([87.  , 88.75, 81.4 , 90.  , 90.  , 92.5 ])


另一个转换器:

In [12]: import numpy.lib.recfunctions as rf
In [13]: y=rf.structured_to_unstructured(numeric_columns[:,0])
In [14]: y
Out[14]: 
array([[90., 88., 94., 85., 78.],
       [85., 92., nan, 88., 90.],
       [78., 80., 85., 85., 79.],
       [94., nan, 90., 92., 84.],
       [92., 88., 92., 90., 88.],
       [nan, 95., 94., 86., 95.]])

相关问题