pandas带键的序列sort_index

enyaitl3 于 2023-04-10 发布在其他

关注(0)|答案(2)|浏览(97)

我尝试使用sort_index(key = lambda idx: foo(idx))对一个Series进行排序，它应该将列表的第一项放在最后。我的排序函数foo看起来像这样：

def foo(idx):
    print("pre",idx)
    if idx.name == "pca_n":
        ret = pd.Index(list(idx[1:]) + list(idx[:1]),name=idx.name)
    else:
        ret = idx.copy()
    print("post",ret)
    return ret

我这样称呼它：

print("index before sort",byHyp.index)
byHyp = byHyp.sort_index(key = lambda x: foo(x))
print("index after sort",byHyp.index)

这将产生以下输出：

index before sort Int64Index([-1, 2, 5, 10, 20], dtype='int64', name='pca_n')
pre Int64Index([-1, 2, 5, 10, 20], dtype='int64', name='pca_n')
post Int64Index([2, 5, 10, 20, -1], dtype='int64', name='pca_n')
index after sort Int64Index([20, -1, 2, 5, 10], dtype='int64', name='pca_n')

换句话说，foo的输出给出了一个索引列表，但它们并没有保留在Series中。（我期待[2,5,10,20,-1]，因为这是foo的输出）。也许我误解了如何使用sort_index的key参数？

pandas

来源：https://stackoverflow.com/questions/75931355/pandas-series-sort-index-with-key

2条答案

按热度按时间

vbkedwbf1#

docs解释：
key：可调用，可选
如果不是None，则在排序前对索引值应用键函数。
换句话说，foo被调用并返回一个索引[2,5,10,20,-1]，之后，df索引将根据foo的输出进行排序：

在你的例子中，foo的输出已经接近排序，我们只需要把最后一个元素-1作为第一个元素
这意味着df的索引将从[-1, 2, 5, 10, 20]变为[20, -1, 2, 5, 10]，这正是输出所显示的。

我认为你要做的不是对索引进行排序，而是像这样使用foo重新排序：

print("index before-ordering",byHyp.index)
byHyp = byHyp.loc[foo(byHyp.index), :]
print("index after re-ordering",byHyp.index)

...或者，正如OP在注解中指出的那样，如果输入是一个系列，则：

byHyp = byHyp[foo(byHyp.index)]

输出：

index before-ordering Int64Index([-1, 2, 5, 10, 20], dtype='int64', name='pca_n')
pre Int64Index([-1, 2, 5, 10, 20], dtype='int64', name='pca_n')
post Int64Index([2, 5, 10, 20, -1], dtype='int64', name='pca_n')
index after re-ordering Int64Index([2, 5, 10, 20, -1], dtype='int64', name='pca_n')

赞(0）回复(0）举报 2023-04-10

vsnjm48y2#

如果你只是返回你想要的顺序列表作为一个常规列表，然后执行df.loc[returned list]，它会按照你想要的顺序排序。注意下面的索引从1912到1916，但是你可以用df.loc[your_new_order]将它重置为任何顺序。