pandas.Series.apply中的访问索引

vmjh9lq9 于 2023-10-14 发布在其他

关注(0)|答案(8)|浏览(120)

假设我有一个MultiIndex系列s：

>>> s
     values
a b
1 2  0.1 
3 6  0.3
4 4  0.7

我想应用一个使用行索引的函数：

def f(x):
   # conditions or computations using the indexes
   if x.index[0] and ...: 
   other = sum(x.index) + ...
   return something

我如何为这样的函数做s.apply(f)？进行这种操作的推荐方法是什么？我希望获得一个新的Series，其中将此函数的值应用于每行和相同的MultiIndex。

pandas

来源：https://stackoverflow.com/questions/18316211/access-index-in-pandas-series-apply

8条答案

按热度按时间

kuuvgm7e1#

我不相信apply可以访问索引;它将每一行都视为一个numpy对象，而不是一个Series，正如你所看到的：

In [27]: s.apply(lambda x: type(x))
Out[27]: 
a  b
1  2    <type 'numpy.float64'>
3  6    <type 'numpy.float64'>
4  4    <type 'numpy.float64'>

要绕过此限制，请将索引提升到列，应用您的函数，并使用原始索引重新创建Series。

Series(s.reset_index().apply(f, axis=1).values, index=s.index)

其他方法可能使用s.get_level_values，在我看来，它通常会变得有点难看，或者s.iterrows()，它可能会更慢--也许具体取决于f做什么。

赞(0）回复(0）举报 2023-10-14

vfh0ocws2#

将其设置为帧，如果需要，可以返回标量（因此结果是一个序列）
设置

In [11]: s = Series([1,2,3],dtype='float64',index=['a','b','c'])

In [12]: s
Out[12]: 
a    1
b    2
c    3
dtype: float64

打印功能

In [13]: def f(x):
    print type(x), x
    return x
   ....: 

In [14]: pd.DataFrame(s).apply(f)
<class 'pandas.core.series.Series'> a    1
b    2
c    3
Name: 0, dtype: float64
<class 'pandas.core.series.Series'> a    1
b    2
c    3
Name: 0, dtype: float64
Out[14]: 
   0
a  1
b  2
c  3

由于您可以在这里返回任何内容，只需返回标量（通过name属性访问索引）

In [15]: pd.DataFrame(s).apply(lambda x: 5 if x.name == 'a' else x[0] ,1)
Out[15]: 
a    5
b    2
c    3
dtype: float64

赞(0）回复(0）举报 2023-10-14

e3bfsja23#

转换为DataFrame并沿沿着行应用。您可以使用x.name访问索引。x现在也是一个值为1的Series

s.to_frame(0).apply(f, axis=1)[0]

赞(0）回复(0）举报 2023-10-14

ldfqzlk84#

你可能会发现在这里使用where比apply更快：

In [11]: s = pd.Series([1., 2., 3.], index=['a' ,'b', 'c'])

In [12]: s.where(s.index != 'a', 5)
Out[12]: 
a    5
b    2
c    3
dtype: float64

你也可以对任何部分使用numpy风格的逻辑/函数：

In [13]: (2 * s + 1).where((s.index == 'b') | (s.index == 'c'), -s)
Out[13]: 
a   -1
b    5
c    7
dtype: float64

In [14]: (2 * s + 1).where(s.index != 'a', -s)
Out[14]: 
a   -1
b    5
c    7
dtype: float64

我建议测试速度（因为效率与应用将取决于功能）。虽然，我发现apply s更可读... *

赞(0）回复(0）举报 2023-10-14

9fkzdhlc5#

如果使用DataFrame.apply（）而不是Series.apply（），则可以在函数中将整行作为参数访问。

def f1(row):
    if row['I'] < 0.5:
        return 0
    else:
        return 1

def f2(row):
    if row['N1']==1:
        return 0
    else:
        return 1

import pandas as pd
import numpy as np
df4 = pd.DataFrame(np.random.rand(6,1), columns=list('I'))
df4['N1']=df4.apply(f1, axis=1)
df4['N2']=df4.apply(f2, axis=1)

赞(0）回复(0）举报 2023-10-14

3vpjnl9f6#

使用reset_index()将Series转换为DataFrame，将索引转换为列，然后将函数apply转换为DataFrame。
棘手的部分是知道reset_index()如何命名列，所以这里有几个例子。

单索引序列

s=pd.Series({'idx1': 'val1', 'idx2': 'val2'})

def use_index_and_value(row):
    return 'I made this with index {} and value {}'.format(row['index'], row[0])

s2 = s.reset_index().apply(use_index_and_value, axis=1)

# The new Series has an auto-index;
# You'll want to replace that with the index from the original Series
s2.index = s.index
s2

输出量：

idx1    I made this with index idx1 and value val1
idx2    I made this with index idx2 and value val2
dtype: object

多索引序列

这里的概念相同，但您需要以row['level_*']的形式访问索引值，因为这是Series.reset_index()放置它们的位置。

s=pd.Series({
    ('idx(0,0)', 'idx(0,1)'): 'val1',
    ('idx(1,0)', 'idx(1,1)'): 'val2'
})

def use_index_and_value(row):
    return 'made with index: {},{} & value: {}'.format(
        row['level_0'],
        row['level_1'],
        row[0]
    )

s2 = s.reset_index().apply(use_index_and_value, axis=1)

# Replace auto index with the index from the original Series
s2.index = s.index
s2

输出量：

idx(0,0)  idx(0,1)    made with index: idx(0,0),idx(0,1) & value: val1
idx(1,0)  idx(1,1)    made with index: idx(1,0),idx(1,1) & value: val2
dtype: object

如果您的系列或索引具有名称，则需要进行相应的调整。

赞(0）回复(0）举报 2023-10-14

kupeojn67#

Series实现了items()方法，该方法允许使用列表解析来Map键（即索引值）和值。
给定一个系列：

In[1]: seriesA = pd.Series([4, 2, 3, 7, 9], name="A")
In[2]: seriesA
Out[2]:
0    4
1    2
2    3
3    7
4    9
dtype: int64

现在，假设函数f接受一个键和一个值：

def f(key, value):
    return key + value

现在我们可以通过使用a来创建一个新的系列：

In[1]: pd.Series(data=[f(k,v) for k, v in seriesA.items()], index=seriesA.index)
Out[1]:
0     4
1     3
2     5
3    10
4    13
dtype: int64

当然，这并没有利用任何numpy性能优势，但对于某些操作来说，这是有意义的。

赞(0）回复(0）举报 2023-10-14

wribegjk8#

另一个肮脏的解决方案是使用正则表达式。
首先，重置索引以创建一个嵌套框架。

df = s.reset_index()

df
a   b   values
0   1   2   0.1
1   3   6   0.3
2   4   4   0.7

然后创建一个包含串联列和索引的列，如下所示：只要确保使用在模式识别过程中可以容易分离的分隔符。在我的例子中，我使用'first_wall'和'second_wall'

concatenated_series = df['a'].astype(str)+'first_wall'+df['b'].astype(str)+'second_wall'+df['values'].astype(str)

concatenated_series

0    1first_wall2second_wall0.1
1    3first_wall6second_wall0.3
2    4first_wall4second_wall0.7
dtype: object

然后创建函数

def f(x):
   first_index = int(re.search('^(.+)first_wall', x).group(1))
   second_index = int(re.search('first_wall(.+)second_wall', x).group(1))
   value = float(re.search(r'second_wall(.+)$',x).group(1))
   #do something and whatever you like
   return first_index + second_index + value

然后将其应用于串联序列。

concatenated_series.apply(f)

0    3.1
1    9.3
2    8.7
dtype: float64

干杯！干杯！

赞(0）回复(0）举报 2023-10-14

我来回答

pandas.Series.apply中的访问索引

8条答案

单索引序列

多索引序列

相关问题

热门标签

最新问答