为什么我需要转换为_numpy()，否则loc赋值不起作用？

new9mtju 于 2023-10-19 发布在其他

关注(0)|答案(2)|浏览(151)

我用的是this csv

import pandas as pd
import numpy as np

real_estate = pd.read_csv('real_estate.csv',index_col=0)

buckets = pd.cut(real_estate['X2 house age'],4,labels=False)

for i in range(len(real_estate['X2 house age'])):
    real_estate.loc[i,'X2 house age'] = buckets[i]

它给了我：

KeyError: 0

对于real_estate.loc[i,'X2 house age'] = buckets[i]行，它只在第一次迭代时失败
为什么我需要将行更改为buckets = pd.cut(real_estate['X2 house age'],4,labels=False).to_numpy()才能使其工作？

numpy

来源：https://stackoverflow.com/questions/77035987/why-do-i-need-to-convert-to-numpy-otherwise-loc-assignment-does-not-work

2条答案

按热度按时间

66bbxpm51#

你不需要循环，只需要用途：

real_estate['X2 house age'] = pd.cut(real_estate['X2 house age'], 4, labels=False)

您当前的方法失败了，因为您没有从0开始的范围索引。因此，当赋值给索引0，1，.时，pandas没有找到正确的索引，并移动了数据。
输出量：

X1 transaction date  X2 house age  X3 distance to the nearest MRT station  X4 number of convenience stores  X5 latitude  X6 longitude  Y house price of unit area
No                                                                                                                                                                   
1              2012.917             2                                84.87882                               10     24.98298     121.54024                        37.9
2              2012.917             1                               306.59470                                9     24.98034     121.53951                        42.2
3              2013.583             1                               561.98450                                5     24.98746     121.54391                        47.3
4              2013.500             1                               561.98450                                5     24.98746     121.54391                        54.8
5              2012.833             0                               390.56840                                5     24.97937     121.54245                        43.1

赞(0）回复(0）举报 2023-10-19

vs3odd8k2#

除了我们可以将结果直接分配给新列之外，主要的问题是位置索引和标记索引之间的混淆。
您应该在real_estate.index上进行重命名，或者使用.iloc或.iat对位置数据进行寻址：

# labeled indexing
for i in real_estate.index:
    real_estate.loc[i,'X2 house age'] = buckets[i]

或

# positional indexing
pos_house_age = real_estate.columns.get_loc('X2 house age')
for i in range(len(real_estate)):
    real_estate.iloc[i, pos_house_age] = buckets.iloc[i]

哪里

buckets = pd.cut(real_estate['X2 house age'], 4, labels=False)

使用.to_numpy()会导致带标签的索引被擦除，之后buckets[i]就相当于位置索引。
另请参阅：

职位遴选
按标签选择

ps.以防万一：pandas.cut(..., labels=False)不影响返回序列的索引，但用类别代码替换类别标签。

赞(0）回复(0）举报 2023-10-19

我来回答

为什么我需要转换为_numpy()，否则loc赋值不起作用？

2条答案

相关问题

热门标签

最新问答