如何透视Pandasdf,其中每个列标题是一个日期,列有多个值

cyvaqqii  于 2023-02-28  发布在  其他
关注(0)|答案(1)|浏览(124)

我使用python BeautifulSoup和selenium从Jira的时间表中提取数据,以便按资源获取日志工作。
这是打印我的 Dataframe 时的结果:
| 资源小时|体重1/2|Th2/2细胞|
| - ------|- ------|- ------|
| 美国汽车协会|8.0分|8.0分|
| bbb|8.0分|8.0分|
| 气候变化中心|8.0分|8.0分|
但我想确定的结果是
| 日期|资源|价值|
| - ------|- ------|- ------|
| 体重1/2|美国汽车协会|8.0分|
| 体重1/2|bbb|8.0分|
| 体重1/2|气候变化中心|8.0分|
| Th2/2细胞|美国汽车协会|8.0分|
| Th2/2细胞|bbb|8.0分|
| Th2/2细胞|气候变化中心|8.0分|
是否有办法循环 Dataframe 头并附加单元元素?
以下是目前为止的python脚本:

chromedriver_path = r"C:\selinum drivers\chromedriver.exe"

driver = webdriver.Chrome(chromedriver_path)

# Login credentials
username = "username"
password = "pwd"

# Login to the website
driver.get("http://*******/login.jsp")
driver.find_element_by_id("login-form-username").send_keys(username)
driver.find_element_by_id("login-form-password").send_keys(password)
driver.find_element_by_id("login-form-submit").click()

# URL to retrieve table
url = "http://********/secure/projecttimesheet!project.jspa"
# Navigate to the URL
driver.get(url)

# Open the dropdown menu
dropdown_menu_button = driver.find_element(By.XPATH, '//button[@ng-init="ts.getFilterProject();"]')
dropdown_menu_button.click()

checkbox_div = driver.find_element(By.CLASS_NAME, "toggleProject")
checkbox_div.click()

# Click on the body of the page to close the dropdown menu
body = driver.find_element(By.TAG_NAME, "body")
body.click()

# Wait for the table to load
time.sleep(2)

resources_button = driver.find_element(By.ID, "sp-group-by-resources")
resources_button.click()

# Wait for the table to load
time.sleep(2)

# Parse the HTML content
soup = BeautifulSoup(driver.page_source, 'html.parser')

# Close the browser
driver.close()

# Find the table element in the HTML
table = soup.find('table')

# Read the table data into a pandas dataframe, starting from the second row
df = pd.read_html(str(table), decimal=',', thousands='.', header=1)[0]

# Remove the last 3 rows
df = df.iloc[:-4]

# Remove the "Unnamed: 22", "∑ Hours", and "∑ Days" columns
df = df.drop(columns=["Unnamed: 1" , "Unnamed: 22", "∑ Hours", "∑ Days"])

# Replace NaN values with 0
df = df.fillna(0)
bmp9r5qi

bmp9r5qi1#

我把你的第一个DataFrame转换成了第二个df。我认为它解决了:

import pandas as pd

准备数据:

header = 'Resources Hours,We1/2,Th2/2'.split(',')
header
d = (('aaa', 8.0, 8.0), ('bbb', 8.0, 8.0), ('ccc', 8.0, 8.0))
d1 = pd.DataFrame(columns = header, data = d)

我复制了你的第一个df在d1
所以我准备了结果,从标题开始:

header2 = ('date Resources value').split()

因此,我将数据转换为所需的格式:

d = [(wd, r, x) for wd in header[1:3] for x, r in zip(d1[wd],d1['Resources Hours'])]

将d代入df:

d2 = pd.DataFrame(columns = header2, data = d)
d2

输出:

相关问题