pandas 如何使用名称以开头的选定列顺序命名多个子集

mwg9r5ms  于 2023-01-24  发布在  其他
关注(0)|答案(1)|浏览(100)

我想创建2个子集,列的名称以radius_,area_开头。让我提供给你假数据。抱歉,我修改了下面的一点

data = {'radius_mean':[18, 21, 20, 11, 20],
            'radius_se':[1, 0.5, 0.7, 0.4, 0.8],
           'area_mean': [1001, 1326, 1203, 386, 1200],
           'area_se': [153, 75, 94, 27, 95]}
    df=pd.DataFrame(data)
    df1=pd.DataFrame(). 
    df2=pd.DataFrame(). 
    subsets=[df1, df2]. 
    features=['radius', 'area']. 
    for subset, feature in zip(subsets, features):  
        subcol=[col for col in df.columns if col.startswith(feature+ '_')]. 
        print(subcol). 
        subset=df[subcol]. 
        print(subset.head()).

我期望df1。

['radius_mean', 'radius_se']. 
       radius_mean  radius_se. 
     0           18        1.0. 
     1           21        0.5. 
     2           20        0.7. 
     3           11        0.4. 
     4           20        0.8.

我期望df2,如下所示,但是data1和data2是空的,但是创建了子集,如下所示:

['area_mean', 'area_se']. 
     area_mean  area_se. 
    0       1001      153. 
    1       1326       75. 
    2       1203       94. 
    3        386       27. 
    4       1200       95.
w1jd8yoj

w1jd8yoj1#

你遇到了一个问题,因为如何处理对 Dataframe 的引用。你的逻辑是有道理的,但我认为发生的事情是,你的表的副本,而不是保持对原始表的引用,所以当你试图更新原件的时候你实际上是在更新拷贝。你可以-通过在循环之后创建data1data2来解决这个问题,就像我在后面的代码中显示的那样

import pandas as pd
import io #you don't need this, it's just for me to read in the cancer table

#again you don't need this, this just lets me get the cancer table
cancer = pd.read_csv(io.StringIO("""
radius_mean  radius_se  radius_worst    area_mean  area_se  area_worst
        17.99     1.0950         25.38      1001.0   153.40      2019.0
        20.57     0.5435         24.99     1326.0    74.08      1956.0
        19.69     0.7456         23.57     1203.0    94.03      1709.0
        11.42     0.4956         14.91     386.1    27.23       567.7
        20.29     0.7572         22.54     1297.0    94.44      1575.0
"""),delim_whitespace=True)

data1=pd.DataFrame()
data2=pd.DataFrame()
dsets=[data1, data2] #copies of data1 and data2 are made

#editing the entries in the dsets list doesn't update data1 or data2 since they are different copies
dsets[0] = pd.DataFrame({'a':[1,2,3]}) #trying to update 0-index, doesn't update data1
print(dsets[0]) #changed
print(data1) #not changed

#in your loop the same 'copy' issue is happening again so data1 and data2 don't get updated
features=['radius', 'area']
for dset, feature in zip(dsets,features): 
    subcol=[col for col in cancer.columns if col.startswith(feature+ '_')]
    dset=cancer[subcol]
    
print(data1) #still not updated

解决方案:改为在循环中第一次创建data 1和data 2

dsets = []
features=['radius', 'area']
for feature in features: 
    subcol=[col for col in cancer.columns if col.startswith(feature+ '_')]
    dsets.append(cancer[subcol])
    
data1,data2 = dsets

print(data1)
print(data2)

相关问题