我正在尝试开发一个程序,它可以根据列中的值为每一行创建多个行和列。
这是我的数据
import pandas as pd
data = pd.read_excel("test data.xlsx")
| 身份证|周数|工时|开始日期|结束日期|起始年份|起始周期间|
| - ------|- ------|- ------|- ------|- ------|- ------|- ------|
| 美国汽车协会|第二章|十个|二○二三年一月十五日|二○二三年一月二十九日|二○二三|三个|
| bbb|三个|十二|二○二三年十二月二日|二○二三年五月三日|二○二三|七|
需要扩展表,以便每一行都按周数扩展。需要添加每周工时的列和计算每个ID的周数的列。
结果应如下所示
| 身份证|周数|工时|开始日期|结束日期|起始年份|起始周期间|周计数|劳工|周数|
| - ------|- ------|- ------|- ------|- ------|- ------|- ------|- ------|- ------|- ------|
| 美国汽车协会|第二章|十个|二○二三年一月十五日|二○二三年一月二十九日|二○二三|三个|1个|五个|三个|
| 美国汽车协会|第二章|十个|二○二三年一月十五日|二○二三年一月二十九日|二○二三|三个|第二章|五个|四个|
| bbb|三个|十二|二○二三年十二月二日|二○二三年五月三日|二○二三|七|1个|四个|七|
| bbb|三个|十二|二○二三年十二月二日|二○二三年五月三日|二○二三|七|第二章|四个|八个|
| bbb|三个|十二|二○二三年十二月二日|二○二三年五月三日|二○二三|七|三个|四个|十个|
通过执行以下操作,我已经能够获得所需格式的表:
# Expand the number of rows by the number of weeks for each job record
df = df.loc[df.index.repeat(df["# of weeks"])].reset_index(drop=True)
不过,还有一些问题。
我添加了以下列
# Add column for cumulative number of weeks for each expanded job record row
df['Week Count'] = df.groupby(['Id']).cumcount() + 1
# Add column for year for each job record row
df['Year'] = np.where((df['Starting Week period'] + df['Week Count']-1) > 52,
(df['Starting Year'] + 1),
df['Starting Year'])
# Add column for the week number for the calendar year for each job record row
df['Week #'] = np.where((df['Starting Week period'] + df['Week Count']-1) > 52,
(df['Starting Week period'] + df['Week Count']-53),
df['Starting Week period'] + df['Week Count']-1)
# Add a column Period which concatenates the Year and Week # columns
df['Period'] = df['Year'].astype(str) + "-" + df['Week #'].astype(str)
这会带来一些问题,因为只有当记录持续时间仅超过1个日历年时,"年"和"周"列才会重置。如果记录持续时间超过2个或更多日历年,则不会重置。
我尝试了以下方法
# Add column for number of week for each expanded job record row
df['Week Count'] = df.groupby(['Id']).cumcount() + 1
# Add column for year for each job record row
from math import floor
df['Year'] = np.where((df['Starting Week period'] + df['Week Count']-1) > 52,
df['Starting Year'] + floor((df['Starting Week period'] + df['Week Count'])/52),
df['Starting Year'])
# Add column for the number of week for the calendar year for each job record row
df['Week #'] = np.where((df['Starting Week period'] + df['Week Count']-1) > 52,
(df['Starting Week period'] + df['Week Count']-53),
df['Starting Week period'] + df['Week Count']-1)
# Add leading 0 to the Week # Column
df['Week #'] = df['Week #'].astype(str).str.pad(2, side = 'left', fillchar = '0')
# Add a column Period which concatenates the Year and Week # columns
df['Period'] = df['Year'].astype(str) + "-" + df['Week #'].astype(str)
然而,这是给我以下错误:
TypeError Traceback (most recent call last)
Cell In[6], line 7
4 # Add column for year for each job record row
5 from math import floor
6 df['Year'] = np.where((df['Starting Week period'] + df['Week Count']-1) > 52,
----> 7 df['Starting Year'] + floor((df['Starting Week period'] + df['Week Count'])/52),
8 df['Starting Year'])
10 # Add column for the number of week for the calendar year for each job record row
11 df['Week #'] = np.where((df['Starting Week period'] + df['Week Count']-1) > 52,
12 (df['Starting Week period'] + df['Week Count']-53),
13 df['Starting Week period'] + df['Week Count']-1)
File /opt/anaconda3/lib/python3.9/site-packages/pandas/core/series.py:191, in _coerce_method.<locals>.wrapper(self)
189 if len(self) == 1:
190 return converter(self.iloc[0])
--> 191 raise TypeError(f"cannot convert the series to {converter}")
TypeError: cannot convert the series to <class 'float'>
1条答案
按热度按时间v8wbuo2f1#
你试着把浮点数函数应用到Pandas系列上,它们是不同的类型
我建议你用
.astype(int)
,它的舍入方式和math.floor
一样您还可以使用
numpy
库,用于应用不同的类型或舍入但在您的情况下,您仍然必须应用
.astype(int)
,因为应用np
不会更改系列数据的类型而且会影响你的成绩
希望对你有帮助!