pandas 访问panda数据透视表中元素的正确方法

to94eoyn  于 2023-01-19  发布在  其他
关注(0)|答案(2)|浏览(123)

我一直在尝试使用panda dataframe slicing .IX表示法访问以下数据透视表中的元素。但是我遇到错误:
无钥匙。

pivot = c.pivot("date","stock_name","close").resample("A",how="ohlc")
pt = pd.DataFrame(pivot,index=pivot.index.year)
pt

从Pandas数据透视表中只切出一个或多个行和/或列的正确方法是什么?
例如,如果我只想要Billabong2016年价格?

pivot["2016-12-31"]["BBG"]
jjhzyzn0

jjhzyzn01#

您可以使用loc,docs:

print c
     date stock_name  close
0 2012-08-31        ibm      1
1 2013-08-31       aapl      1
2 2014-08-31       goog      1
3 2015-08-31        bhp      1
4 2016-08-31        bhp      1

pivot = c.pivot("date","stock_name","close").resample("A",how="ohlc")
print pivot
           aapl                 bhp                goog                 ibm  \
           open high low close open high low close open high low close open   
date                                                                          
2012-12-31  NaN  NaN NaN   NaN  NaN  NaN NaN   NaN  NaN  NaN NaN   NaN    1   
2013-12-31    1    1   1     1  NaN  NaN NaN   NaN  NaN  NaN NaN   NaN  NaN   
2014-12-31  NaN  NaN NaN   NaN  NaN  NaN NaN   NaN    1    1   1     1  NaN   
2015-12-31  NaN  NaN NaN   NaN    1    1   1     1  NaN  NaN NaN   NaN  NaN   
2016-12-31  NaN  NaN NaN   NaN    1    1   1     1  NaN  NaN NaN   NaN  NaN   

           high low close  
date                       
2012-12-31    1   1     1  
2013-12-31  NaN NaN   NaN  
2014-12-31  NaN NaN   NaN  
2015-12-31  NaN NaN   NaN  
2016-12-31  NaN NaN   NaN  

print pivot.loc["2014", ('goog', slice(None))]
           goog               
           open high low close
date                          
2014-12-31    1    1   1     1
sulc1iza

sulc1iza2#

在我的示例中,我创建了一个延迟发货的数据框,并按freight_cost_group分组,然后获取value_counts()。我的目标是计算p值并测试h0和ha结果。我使用透视表和loc访问结果集。

data="""id       country managed_by  fulfill_via vendor_inco_term  weight_kilograms  freight_cost_usd freight_cost_groups line_item_insurance_usd freight_cost_group late
36203.0       Nigeria   PMO-US  Direct_Drop              EXW    1426.0          33279.83           expensive                  373.83          expensive Yes
30998.0      Botswana   PMO-US  Direct_Drop              EXW    10.0            559.89          reasonable                    1.72         reasonable No
69871.0       Vietnam   PMO-US  Direct_Drop              EXW    3723.0          19056.13           expensive                  181.57          expensive No
17648.0  South_Africa   PMO-US  Direct_Drop              DDP    7698.0          11372.23           expensive                  779.41          expensive No
5647.0        Uganda   PMO-US  Direct_Drop              EXW    56.0            360.00          reasonable                    0.01         reasonable No
13608.0        Uganda   PMO-US  Direct_Drop              DDP    43.0            199.00          reasonable                   12.72         reasonable No
80394.0    Congo_DRC   PMO-US  Direct_Drop              EXW    99.0           2162.55          reasonable                   13.10         reasonable No
61675.0        Zambia   PMO-US  Direct_Drop              EXW    881.0          14019.38           expensive                  210.49          expensive Yes
39182.0  South_Africa   PMO-US  Direct_Drop              DDP    16234.0          14439.17           expensive                 1421.41          expensive No
5645.0    Botswana   PMO-US  Direct_Drop              EXW    46.0           1028.18          reasonable                   23.04         reasonable No
"""
late_shipments = pd.read_csv(io.StringIO(data), sep='\s+', header=0,index_col=["id"])
#print(late_shipments.head)

#late_by_freight_cost_group = late_shipments.groupby("freight_cost_group")["late"].value_counts()
#results=(late_by_freight_cost_group.unstack(fill_value=0))
#print(results)

results=late_shipments.pivot_table(index=['freight_cost_group'], columns='late', aggfunc='size', fill_value=0)
success_expensive=results.loc["expensive"]["Yes"]
fail_expensive=results.loc["expensive"]["No"]
success_reasonable=results.loc["reasonable"]["Yes"]
fail_reasonable=results.loc["reasonable"]["No"]

success_counts = np.array([success_expensive, success_reasonable])

n = np.array([success_expensive + fail_expensive, success_reasonable + fail_reasonable])

from statsmodels.stats.proportion import proportions_ztest

stat, p_value = proportions_ztest(count=success_counts, nobs=n,
                                  alternative="larger")

print(stat, p_value)

相关问题