pandas列切片与mypy

8i9zcol2 于 2023-04-10 发布在其他

关注(0)|答案(1)|浏览(130)

最近我发现自己陷入了一个奇怪的境地，我自己也解决不了：
考虑此MWE：

import pandas
import numpy as np

data = pandas.DataFrame(np.random.rand(10, 5), columns=list("abcde"))

observations = data.loc[:, :"c"]
features = data.loc[:, "c":]

print(data)
print(observations)
print(features)

根据this Answer，切片本身是正确的，它的工作原理是打印正确的结果。但是当我尝试运行mypy时，我得到了这个错误：

mypy.exe .\t.py
t.py:1: error: Skipping analyzing "pandas": module is installed, but missing library stubs or py.typed marker
t.py:1: note: See https://mypy.readthedocs.io/en/stable/running_mypy.html#missing-imports
t.py:6: error: Slice index must be an integer or None
t.py:7: error: Slice index must be an integer or None
Found 3 errors in 1 file (checked 1 source file)

这也是正确的，因为切片不是用整数完成的。我如何满足或禁用Slice index must be an integer or None错误？
当然，你可以使用iloc(:,:3)来解决这个问题，但这感觉像是一个糟糕的做法，因为使用iloc时，我们依赖于列的顺序（在这个例子中，loc也依赖于顺序，但这样做只是为了保持MWE短）。

pandas

来源：https://stackoverflow.com/questions/75926634/pandas-column-slices-with-mypy

1条答案

按热度按时间

9gm1akwq1#

这是一个开放的问题（#GH2410）。
作为一种解决方法，您可以尝试使用get_loc：

col_idx = data.columns.get_loc("c")

observations = data.iloc[:, :col_idx+1]
features = data.iloc[:, col_idx:]

输出：

a         b         c # <- observations
0   0.269605  0.497063  0.676928
1   0.526765  0.204216  0.748203
2   0.919330  0.059722  0.422413
..       ...       ...       ...
7   0.056050  0.521702  0.727323
8   0.635477  0.145401  0.258166
9   0.041886  0.812769  0.839979

[10 rows x 3 columns]

           c         d         e  # <- features
0   0.676928  0.672298  0.177933
1   0.748203  0.995165  0.136659
2   0.422413  0.222377  0.395179
..       ...       ...       ...
7   0.727323  0.291441  0.056998
8   0.258166  0.219025  0.405838
9   0.839979  0.923173  0.431298

[10 rows x 3 columns]

赞(0）回复(0）举报 2023-04-10

我来回答

pandas列切片与mypy

1条答案

相关问题

热门标签

最新问答