我想删除那些出生于1900年但尚未死亡的人。
下面的代码工作,但我需要两个过滤器来删除特定的行。有没有更简单的方法可以用一个过滤器删除行?
要复制的最少代码:
import pandas as pd
data = [
(1900, None,), # needs to be removed
(1900, 2000,),
(2000, None,),
(2000, 2020,),
]
df = pd.DataFrame(data, columns=['birth', 'death'])
df.to_parquet('test.parquet')
# Rows which do not match the filter predicate will be removed
filters= [
[
('birth', '!=', 1900),
],
[
('birth', '=', 1900),
('death', 'not in', [None]),
]
]
df2 = pd.read_parquet('test.parquet', filters=filters)
df2.head()
文件:https://arrow.apache.org/docs/python/generated/pyarrow.parquet.read_table.html#pyarrow.parquet.read_table
1条答案
按热度按时间y0u0uwnf1#
实际上你不需要
('birth', '=', 1900)
条件,你可以保留(NOT BIRTH == 1900) OR (DEATH NOT IN NONE)
的行,相当于NOT (BIRTH == 1900 AND DEATH IN NONE)
:您还可以用途:
输出: