numpy 从pandas DataFrame中提取行时保留dtypes

uwopmtnx 于 12个月前发布在其他

关注(0)|答案(3)|浏览(113)

从pandasDataFrame中提取单个行（例如，使用.loc或.iloc）产生pandasSeries。（即DataFrame的列不都是相同的dtype），这导致来自行中不同列的所有值被强制为单个dtype，因为Series只能有一个dtype。这里有一个简单的例子来说明我的意思：

import numpy
import pandas

a = numpy.arange(5, dtype='i8')
b = numpy.arange(5, dtype='u8')**2
c = numpy.arange(5, dtype='f8')**3
df = pandas.DataFrame({'a': a, 'b': b, 'c': c})
df.dtypes
# a      int64
# b     uint64
# c    float64
# dtype: object
df
#    a   b     c
# 0  0   0   0.0
# 1  1   1   1.0
# 2  2   4   8.0
# 3  3   9  27.0
# 4  4  16  64.0
df.loc[2]
# a    2.0
# b    4.0
# c    8.0
# Name: 2, dtype: float64

字符串
df.loc[2]中的所有值都已转换为float64。
有没有一个好的方法可以提取一行而不引起这种类型转换？我可以想象，例如返回一个numpy structured array，但我没有看到一个轻松的方法来创建这样一个数组。

numpy

来源：https://stackoverflow.com/questions/62647887/preserving-dtypes-when-extracting-a-row-from-a-pandas-dataframe

3条答案

按热度按时间

v8wbuo2f1#

另一种方法（但感觉有点黑客）：
与loc或iloc一起使用整数不同，您可以使用长度为1的切片器。这将返回长度为1的DataFrame，因此iloc[0]包含您的数据。例如

In[1] : row2 = df[2:2+1]
In[2] : type(row)
Out[2]: pandas.core.frame.DataFrame
In[3] : row2.dtypes
Out[3]: 
a      int64
b     uint64
c    float64
In[4] : a2 = row2.a.iloc[0]
In[5] : type(a2)
Out[5]: numpy.int64
In[6] : c2 = row2.c.iloc[0]
In[7] : type(c2)
Out[7]: numpy.float64

字符串
对我来说，这感觉比两次转换数据类型更好（一次是在行提取期间，一次是在之后），并且比使用相同的行规范多次引用原始DataFrame更清晰（这可能在计算上很昂贵）。
我认为如果pandas有一个DataFrameRow类型来处理这个问题会更好。

赞(0）回复(0）举报 12个月前

sirbozc52#

正如你已经意识到的，series不允许混合dtypes。但是，如果你将其dtypes指定为object，它允许混合数据类型。所以，你可以将数组的dtypes转换为object。每个列都将是dtype object，但每个值仍然保持其数据类型int和float

df1 = df.astype('O')

Out[10]:
   a   b   c
0  0   0   0
1  1   1   1
2  2   4   8
3  3   9  27
4  4  16  64

In [12]: df1.loc[2].map(type)
Out[12]:
a      <class 'int'>
b      <class 'int'>
c    <class 'float'>
Name: 2, dtype: object

字符串
否则，您需要将%{v_扩展}转换为np.recarray

n_recs = df.to_records(index=False)

Out[22]:
rec.array([(0,  0,  0.), (1,  1,  1.), (2,  4,  8.), (3,  9, 27.),
           (4, 16, 64.)],
          dtype=[('a', '<i8'), ('b', '<u8'), ('c', '<f8')])

型

赞(0）回复(0）举报 12个月前

elcex8rz3#

在official documentation中，使用和.loc返回DataFrame而不是Series。这会保留列的dtype。使用原始示例：

>>> import numpy
>>> import pandas
>>> a = numpy.arange(5, dtype='i8')
>>> b = numpy.arange(5, dtype='u8')**2
>>> c = numpy.arange(5, dtype='f8')**3
>>> df = pandas.DataFrame({'a': a, 'b': b, 'c': c})
>>> df.dtypes
a      int64
b     uint64
c    float64
dtype: object

>>> df
   a   b     c
0  0   0   0.0
1  1   1   1.0
2  2   4   8.0
3  3   9  27.0
4  4  16  64.0

>>> df.loc[[2]]
   a  b    c
2  2  4  8.0

>>> df.loc[[2]].dtypes
a      int64
b     uint64
c    float64
dtype: object

>>> df.loc[[2]].iloc[0].name 
2

字符串

赞(0）回复(0）举报 12个月前

我来回答

numpy 从pandas DataFrame中提取行时保留dtypes

3条答案

相关问题

热门标签

最新问答