scipy 按dtype选择Pandas列

z8dt9xmd 于 2022-12-29 发布在其他

关注(0)|答案(9)|浏览(193)

我想知道在Pandas DataFrame中是否有一种优雅而简洁的方法来按数据类型（dtype）选择列，即从DataFrame中只选择int64列。
更详细地说，一些沿着于

df.select_columns(dtype=float64)

scipy

来源：https://stackoverflow.com/questions/21271581/selecting-pandas-columns-by-dtype

9条答案

按热度按时间

yvfmudvl1#

从0.14.1开始，有一个select_dtypes方法，所以你可以更优雅/更一般地完成这个任务。

In [11]: df = pd.DataFrame([[1, 2.2, 'three']], columns=['A', 'B', 'C'])

In [12]: df.select_dtypes(include=['int'])
Out[12]:
   A
0  1

要选择所有数值类型，请使用numpy数据类型numpy.number

In [13]: df.select_dtypes(include=[np.number])
Out[13]:
   A    B
0  1  2.2

In [14]: df.select_dtypes(exclude=[object])
Out[14]:
   A    B
0  1  2.2

赞(0）回复(0）举报 2022-12-29

chhkpiq42#

df.loc[:, df.dtypes == np.float64]

赞(0）回复(0）举报 2022-12-29

6xfqseft3#

df.select_dtypes(include=[np.float64])

赞(0）回复(0）举报 2022-12-29

eqoofvh94#

我想通过添加选择 * 所有浮点 * 数据类型或 * 所有整数 * 数据类型的选项来扩展现有答案：
演示：

np.random.seed(1234)

df = pd.DataFrame({
        'a':np.random.rand(3), 
        'b':np.random.rand(3).astype('float32'), 
        'c':np.random.randint(10,size=(3)).astype('int16'),
        'd':np.arange(3).astype('int32'), 
        'e':np.random.randint(10**7,size=(3)).astype('int64'),
        'f':np.random.choice([True, False], 3),
        'g':pd.date_range('2000-01-01', periods=3)
     })

产量：

In [2]: df
Out[2]:
          a         b  c  d        e      f          g
0  0.191519  0.785359  6  0  7578569  False 2000-01-01
1  0.622109  0.779976  8  1  7981439   True 2000-01-02
2  0.437728  0.272593  0  2  2558462   True 2000-01-03

In [3]: df.dtypes
Out[3]:
a           float64
b           float32
c             int16
d             int32
e             int64
f              bool
g    datetime64[ns]
dtype: object

选择所有浮点数列：

In [4]: df.select_dtypes(include=['floating'])
Out[4]:
          a         b
0  0.191519  0.785359
1  0.622109  0.779976
2  0.437728  0.272593

In [5]: df.select_dtypes(include=['floating']).dtypes
Out[5]:
a    float64
b    float32
dtype: object

选择所有整数列：

In [6]: df.select_dtypes(include=['integer'])
Out[6]:
   c  d        e
0  6  0  7578569
1  8  1  7981439
2  0  2  2558462

In [7]: df.select_dtypes(include=['integer']).dtypes
Out[7]:
c    int16
d    int32
e    int64
dtype: object

选择所有数值列：

In [8]: df.select_dtypes(include=['number'])
Out[8]:
          a         b  c  d        e
0  0.191519  0.785359  6  0  7578569
1  0.622109  0.779976  8  1  7981439
2  0.437728  0.272593  0  2  2558462

In [9]: df.select_dtypes(include=['number']).dtypes
Out[9]:
a    float64
b    float32
c      int16
d      int32
e      int64
dtype: object

赞(0）回复(0）举报 2022-12-29

t30tvxxf5#

用于选择具有类型列表的列的多个包含，例如-float 64和int 64

df_numeric = df.select_dtypes(include=[np.float64,np.int64])

赞(0）回复(0）举报 2022-12-29

bxjv4tth6#

选择数据类型（包括=[np.int]）

赞(0）回复(0）举报 2022-12-29

lskq00tm7#

如果要选择int64列，然后更新“in place”，可以用途：

int64_cols = [col for col in df.columns if is_int64_dtype(df[col].dtype)]
df[int64_cols]

例如，请注意我将df中的所有int64列更新为零，如下所示：

In [1]:

    import pandas as pd
    from pandas.api.types import is_int64_dtype

    df = pd.DataFrame({'a': [1, 2] * 3,
                       'b': [True, False] * 3,
                       'c': [1.0, 2.0] * 3,
                       'd': ['red','blue'] * 3,
                       'e': pd.Series(['red','blue'] * 3, dtype="category"),
                       'f': pd.Series([1, 2] * 3, dtype="int64")})

    int64_cols = [col for col in df.columns if is_int64_dtype(df[col].dtype)] 
    print('int64 Cols: ',int64_cols)

    print(df[int64_cols])

    df[int64_cols] = 0

    print(df[int64_cols]) 

Out [1]:

    int64 Cols:  ['a', 'f']

           a  f
        0  1  1
        1  2  2
        2  1  1
        3  2  2
        4  1  1
        5  2  2
           a  f
        0  0  0
        1  0  0
        2  0  0
        3  0  0
        4  0  0
        5  0  0

仅供参考：

df.loc（）和df.select_dtypes（）将从 Dataframe 中给予一个切片的副本，这意味着如果你试图更新df.select_dtypes（）中的值，你将得到一个SettingWithCopyWarning，并且不会对df进行任何更新。

例如，请注意，当我尝试使用.loc（）或.select_dtypes（）更新df来选择列时，什么也没发生：

In [2]:

    df = pd.DataFrame({'a': [1, 2] * 3,
                       'b': [True, False] * 3,
                       'c': [1.0, 2.0] * 3,
                       'd': ['red','blue'] * 3,
                       'e': pd.Series(['red','blue'] * 3, dtype="category"),
                       'f': pd.Series([1, 2] * 3, dtype="int64")})

    df_bool = df.select_dtypes(include='bool')
    df_bool.b[0] = False

    print(df_bool.b[0])
    print(df.b[0])

    df.loc[:, df.dtypes == np.int64].a[0]=7
    print(df.a[0])

Out [2]:

    False
    True
    1

赞(0）回复(0）举报 2022-12-29

nkkqxpd98#

或者，如果您不想在此过程中创建 Dataframe 的子集，可以直接迭代列数据类型。
我还没有基准测试下面的代码，假设它会更快，如果你的工作非常大的数据集。

[col for col in df.columns.tolist() if df[col].dtype not in ['object','<M8[ns]']]

赞(0）回复(0）举报 2022-12-29

ryevplcw9#

您可以使用：

for i in x.columns[x.dtypes == 'object']:
    print(i)

如果你只想显示一个特定 Dataframe 的列名，而不是一个切片 Dataframe 。不知道python是否存在这样的函数。
PS：用你想要的数据类型替换object。

赞(0）回复(0）举报 2022-12-29

我来回答

scipy 按dtype选择Pandas列

9条答案

相关问题

热门标签

最新问答