我使用python BeautifulSoup和selenium从Jira的时间表中提取数据,以便按资源获取日志工作。
这是打印我的 Dataframe 时的结果:
| 资源小时|体重1/2|Th2/2细胞|
| - ------|- ------|- ------|
| 美国汽车协会|8.0分|8.0分|
| bbb|8.0分|8.0分|
| 气候变化中心|8.0分|8.0分|
但我想确定的结果是
| 日期|资源|价值|
| - ------|- ------|- ------|
| 体重1/2|美国汽车协会|8.0分|
| 体重1/2|bbb|8.0分|
| 体重1/2|气候变化中心|8.0分|
| Th2/2细胞|美国汽车协会|8.0分|
| Th2/2细胞|bbb|8.0分|
| Th2/2细胞|气候变化中心|8.0分|
是否有办法循环 Dataframe 头并附加单元元素?
以下是目前为止的python脚本:
chromedriver_path = r"C:\selinum drivers\chromedriver.exe"
driver = webdriver.Chrome(chromedriver_path)
# Login credentials
username = "username"
password = "pwd"
# Login to the website
driver.get("http://*******/login.jsp")
driver.find_element_by_id("login-form-username").send_keys(username)
driver.find_element_by_id("login-form-password").send_keys(password)
driver.find_element_by_id("login-form-submit").click()
# URL to retrieve table
url = "http://********/secure/projecttimesheet!project.jspa"
# Navigate to the URL
driver.get(url)
# Open the dropdown menu
dropdown_menu_button = driver.find_element(By.XPATH, '//button[@ng-init="ts.getFilterProject();"]')
dropdown_menu_button.click()
checkbox_div = driver.find_element(By.CLASS_NAME, "toggleProject")
checkbox_div.click()
# Click on the body of the page to close the dropdown menu
body = driver.find_element(By.TAG_NAME, "body")
body.click()
# Wait for the table to load
time.sleep(2)
resources_button = driver.find_element(By.ID, "sp-group-by-resources")
resources_button.click()
# Wait for the table to load
time.sleep(2)
# Parse the HTML content
soup = BeautifulSoup(driver.page_source, 'html.parser')
# Close the browser
driver.close()
# Find the table element in the HTML
table = soup.find('table')
# Read the table data into a pandas dataframe, starting from the second row
df = pd.read_html(str(table), decimal=',', thousands='.', header=1)[0]
# Remove the last 3 rows
df = df.iloc[:-4]
# Remove the "Unnamed: 22", "∑ Hours", and "∑ Days" columns
df = df.drop(columns=["Unnamed: 1" , "Unnamed: 22", "∑ Hours", "∑ Days"])
# Replace NaN values with 0
df = df.fillna(0)
1条答案
按热度按时间bmp9r5qi1#
我把你的第一个DataFrame转换成了第二个df。我认为它解决了:
准备数据:
我复制了你的第一个df在d1
所以我准备了结果,从标题开始:
因此,我将数据转换为所需的格式:
将d代入df:
输出: