Python中特定条件下的ID删除

hk8txs48  于 2023-08-08  发布在  Python
关注(0)|答案(3)|浏览(124)

我有一个数据集,我需要完全删除在某个日期之前或之后标记的ID。我在这件事上遇到了麻烦。
df =

ID      Date        Flagged 
 101    6/4/2023       0
 101    7/23/2023      0
 102    4/28/2023      1
 102    5/2/2023       1
 102    6/30/2023      1
 102    7/11/2023      1
 103    6/23/2023      1
 103    7/12/2023      1
 104    4/17/2023      0 
 104    5/12/2023      1
 104    6/17/2023      1
 104    7/22/2023      1

字符串
我想在2023年5月1日之前删除所有Flagged的ID。我试过了

today = datetime.datetime.today()
x_days = today - dt(days=90)`
filtered_df = df{(df['Flagged'] == 1) & (df['Date' >= x_days)]


当我运行这个程序时,我仍然有我想完全删除的ID。下面是所需的输出:
df =

ID      Date      Flagged 
 103   6/23/2023     1
 103   7/12/2023     1
 104   5/12/2023     1
 104   6/17/2023     1
 104   7/22/2023     1


任何帮助这将是伟大的,谢谢!

ohtdti5x

ohtdti5x1#

试试这个:

# Convert 'Date' column to datetime format (to be sure)
df['Date'] = pd.to_datetime(df['Date'])

# Define cutoff date
cutoff_date = datetime(2023, 5, 1)

# Create a mask to filter IDs flagged before the cutoff date
mask = (df['Flagged'] == 1) & (df['Date'] >= cutoff_date)

# Get the list of IDs to be removed
ids_to_remove = df.loc[mask, 'ID'].unique()

# Filter out the IDs from the DataFrame
filtered_df = df[~df['ID'].isin(ids_to_remove)]

字符串

roqulrg3

roqulrg32#

首先找到在指定日期之前已标记的任何ID,然后只找到既已标记且不具有其中一个坏ID的值:

import pandas as pd
import datetime 

# create dataframe:
df = pd.DataFrame(
    {'ID' : [101, 101, 102, 102, 102, 102, 103, 103, 104, 104, 104, 104],
     'Date' : ['6/4/2023' ,'7/23/2023','4/28/2023','5/2/2023' ,'6/30/2023','7/11/2023','6/23/2023','7/12/2023','4/17/2023','5/12/2023','6/17/2023','7/22/2023'],
     'Flagged' : [0, 0, 1, 1, 1, 1, 1, 1, 0, 1, 1, 1]})

# Convert dates
df['Date'] = pd.to_datetime(df.Date, format = '%m/%d/%Y')

# Find bad IDs (ones that had a flag before specific date)
bad_ids = df[(df.Flagged.eq(1)) & (df.Date < datetime.datetime(2023, 5, 1))].ID.unique()

# Find all values that are flagged and do not have a bad ID
df[(df.Flagged.eq(1)) & ~(df.ID.isin(bad_ids))]

字符串
输出量:

ID  Date        Flagged
6   103 2023-06-23  1
7   103 2023-07-12  1
9   104 2023-05-12  1
10  104 2023-06-17  1
11  104 2023-07-22  1

blmhpbnm

blmhpbnm3#

使用布尔索引:

#convert to datetime if needed
df["Date"] = pd.to_datetime(df["Date"],format="%m/%d/%Y")

#get flagged IDs to ignore
flagged = df[df["Flagged"].eq(1)&df["Date"].lt(pd.Timestamp.today()-pd.DateOffset(90))]

>>> df[~df["ID"].isin(flagged["ID"])&df["Flagged"].eq(1)]

     ID       Date  Flagged
6   103 2023-06-23        1
7   103 2023-07-12        1
9   104 2023-05-12        1
10  104 2023-06-17        1
11  104 2023-07-22        1

字符串

相关问题