通过将列向右移动来替换Dataframe列中的NaN值

fruv7luv  于 2022-10-02  发布在  Python
关注(0)|答案(4)|浏览(185)

我希望将数据框中的数据拆分并移动到包含NAN的列中

从pdf中读取数据并创建表格后,输出为:

Output
   |       Summary       |  Prior Years|1|2|3|4| 5 |6|7|8|9|10|  Total  |
   |---------------------|-------------|-|-|-|-|---|-|-|-|-|--|---------|
   |Total Value 3,700,000|     110     |-|-|-|5|NaN|-|-|-|-|--|3,815,000|  
   |Total Value 320,000  |     110     |-|-|-|5|NaN|-|-|-|-|--| 435,000 |

由于采用PDF格式,读取数据会导致第一列将‘Total Value’和‘Preor Year’的期望值组合在一起

Expected Output
   |  Summary  |Prior Years| 1 |2|3|4|5|6|7|8|9|10|  Total  |
   |-----------|-----------|---|-|-|-|-|-|-|-|-|--|---------|
   |Total Value| 3,700,000 |110|-|-|-|5|-|-|-|-|--|3,815,000|  
   |Total Value|  320,000  |110|-|-|-|5|-|-|-|-|--| 435,000 |

包含NAN的‘5’列之后的所有数据都与预期一致

有没有一种方法可以拆分“摘要”列中的数据,并让所有数据转移,直到NAN列被占用?

wqnecbli

wqnecbli1#

仅填充通过轴=1

df.ffill(axis=1)

之后,您可以轻松地将4列替换为np.nan

df['4'] = df['4'].replace(5,np.nan)
t2a7ltrp

t2a7ltrp2#

这里有一个方法可以做到这一点。基于所述的假设,即第5列的值将从左移位,而汇总列将分为两列


# shift the values from 'Prior Years' thur column 5 to right and assign to col '1' thru '5'

df.loc[:,"1":"5"]=df.loc[:,"Prior Years":"5"].shift(axis=1)

# split the summary into text (Total Value) and the value, and assign to

# Summary and Prior Years columns

df[['Summary','Prior Years']]=df['Summary'].str.strip().str.extract(r'(D*).*?([d,.]*)' )
df
Summary       Prior Years     1     2   3   4   5   6   7   8   9   10  Total
0   Total Value     3,700,000   110     -   -   -   5   -   -   -   -   --  3,815,000
1   Total Value       320,000   110     -   -   -   5   -   -   -   -   --  435,000
ghg1uchk

ghg1uchk3#

我不确定您的所有列都是什么dtype,但应该可以使用下面这样的代码:

import pandas as pd
import numpy as np

# example dataframe (leaves off last few columns, which aren't relevant)

df = pd.DataFrame({'Summary': ['Total Value 3,700,000', 'Total Value 320,000'], 'Prior Years': [110, 110],
                   '1': ['-', '-'], '2': ['-', '-'], '3': ['-', '-'], '4': [5, 5], '5': [np.nan, np.nan],
                   '6': ['-', '-']})

# create list of column names, drop na column, and rename relevant columns (cols 1 - 5, just shift each name back by one)

columns = df.columns.to_list()
new_col_dict = {columns[i]: columns[i + 1] for i in range(1,6)}
df.drop(columns=['5'], inplace=True)
df.rename(columns=new_col_dict, inplace=True)

# split up Summary column (based on spaces)

df.loc[:, 'Prior Years'] = df.Summary.str.split(" ").apply(lambda x: x[2])
df.loc[:, 'Summary'] = df.Summary.str.split(" ").apply(lambda x: x[0]) + " " + df.Summary.str.split(" ").apply(lambda x: x[1])

# if you want the "Prior Years" column to be int type:

df.loc[:, 'Prior Years'] = df['Prior Years'].str.replace(',', '')
df.loc[:, 'Prior Years'] = df['Prior Years'].astype(int)

# re-order dataframe columns, if you care to

df = df[['Summary', 'Prior Years', '1', '2', '3', '4', '5', '6']]
fslejnso

fslejnso4#

您是否尝试使用df.shift()

df.shift(periods_to_be_shifted, axis = 1)

在您的periods_to_be_shifted = 1案例中,请尝试以下操作:

df.shift(1, axis = 1)

相关问题