I have a large dataframe of crime incidents, df, with four columns. Here INCIDENT_DATE is datatype datetime. There are three possible types as well (Violent, Property, and non-index).
| ID | Crime | INCIDENT_DATE | Type |
| ------------ | ------------ | ------------ | ------------ |
| XL123445 | Aggrevated Assault | 2018-12-29 | Violent |
| XL123445 | Simple Assault | 2018-12-29 | Violent |
| XL123445 | Theft | 2018-12-30 | Property |
| TX56784 | Theft | 2018-04-28 | Property |
| ... | ... | | |
| CA45678 | Sexual Assault | 1991-10-23 | Violent |
| LA356890 | Burglary | 2018-12-21 | Property |
I want to create a new dataframe, where I can get the monthly counts (for each ID) of type property and violent, and a row for the sum total of incidents for that ID during that month.
So I would want something like:
| ID | Year_Month | Violent | Property | Total |
| ------------ | ------------ | ------------ | ------------ | ------------ |
| XL123445 | 2018-08 | 19654 | 500 | 20154 |
| TX56784 | 2011-07 | 17 | 15 | 32 |
| ... | ... | ... | | |
| CA45678 | 1992-06 | 100 | 100 | 200 |
| LA356890 | 1993-05 | Property | 50 | 50 |
I have created a previous dataframe with column 'Year_Month' before that only took into account aggregated counts of crime incidents for each ID, but this ignored 'Type'. I did this with:
df1 = (df.value_counts(['ID', df['INCIDENT_DATE'].dt.to_period('M').rename('Year_Month')])
.rename('Count').reset_index())
Is there a way I can carry over this same logic while creating two additional columns, as desired.
1条答案
按热度按时间vnzz0bqm1#
IIUC,你们很接近了:
在示例数据上: