I'm trying to build a dataframe by reading 100s of csv files and keeping the last row of each csv via .tail(1) and then pd.concat(). The current result is a df that includes the header row with each row of data.
I'm hoping for guidance on an approach to read the last row of each csv and build a dataframe that has the header row at top and then only data rows after that.
Here's my current code:
count = 0
with open('names.txt', 'r') as my_file:
newline_break = ""
for readline in my_file:
line_strip = readline.strip()
newline_break += line_strip
count +=1
try:
df = pd.read_csv('~/' + line_strip + '.csv',
index_col=None,
)
df2 = df.tail(1)
df3 = pd.concat([df2])
print(df3)
except Exception as e:
exc_type, exc_obj, exc_tb = sys.exc_info()
fname = os.path.split(exc_tb.tb_frame.f_code.co_filename)[1]
print(exc_type, fname, exc_tb.tb_lineno)
The .txt file is a simple list of names that selects the .csv file for df.read_csv step.
Here's the current output:
| | Unnamed: 0 | Date | name | field1 | field2 | field3 | field4 | field5 | field6 | field7 | field8 |
| ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ |
| 532 | 532 | 2022-12-02 | Jones | 2.2 | 0.03 | 234 | 17.0 | 800 | 1.2 | 23.34 | 15.28 |
| | Unnamed: 0 | Date | name | field1 | field2 | field3 | field4 | field5 | field6 | field7 | field8 |
| ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ |
| 674 | 674 | 2022-12-02 | Smith | 3.81 | 4.08 | 3.75 | 3.99 | 16 | 2.832 | 3.97 | 4.05 |
| | Unnamed: 0 | Date | name | field1 | field2 | field3 | field4 | field5 | field6 | field7 | field8 |
| ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ |
| 674 | 674 | 2022-12-02 | Grove | 28.42 | 28.57 | 28.42 | 28.55 | 72 | 0.04 | 2.67 | 6.8 |
| | Unnamed: 0 | Date | name | field1 | field2 | field3 | field4 | field5 | field6 | field7 | field8 |
| ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ |
| 674 | 674 | 2022-12-02 | Injo | 3.09 | 3.16 | 3.08 | 3.1 | 462 | 0.94 | 2.93 | 2.90 |
| | Unnamed: 0 | Date | name | field1 | field2 | field3 | field4 | field5 | field6 | field7 | field8 |
| ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ |
| 674 | 674 | 2022-12-02 | Solas | 1.26 | 14.83 | 18.69 | 3.32 | 500 | 0.31 | 13.07 | 17.92 |
| | Unnamed: 0 | Date | name | field1 | field2 | field3 | field4 | field5 | field6 | field7 | field8 |
| ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ |
| 674 | 674 | 2022-12-02 | Resto | 1.84 | 1.04 | 1.04 | 3.77 | 100 | 0.1 | 9.9 | 7.7 |
This is the desired output:
| Date | name | field1 | field2 | field3 | field4 | field5 | field6 | field7 | field8 |
| ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ |
| 2022-12-02 | Jones | 2.2 | 0.03 | 234 | 17.0 | 800 | 1.2 | 23.34 | 15.28 |
| 2022-12-02 | Smith | 3.81 | 4.08 | 3.75 | 3.99 | 16 | 2.832 | 3.97 | 4.05 |
| 2022-12-02 | Grove | 28.42 | 28.57 | 28.42 | 28.55 | 72 | 0.04 | 2.67 | 6.8 |
| 2022-12-02 | Injo | 3.09 | 3.16 | 3.08 | 3.1 | 462 | 0.94 | 2.93 | 2.90 |
| 2022-12-02 | Solas | 1.26 | 14.83 | 18.69 | 3.32 | 500 | 0.31 | 13.07 | 17.92 |
| 2022-12-02 | Resto | 1.84 | 1.04 | 1.04 | 3.77 | 100 | 0.1 | 9.9 | 7.7 |
- NB: Removing the additional index columns would be great also . . . :-)
Grateful for your guidance.
1条答案
按热度按时间kwvwclae1#
试着在循环之前示例化一个空 Dataframe 来重构你的代码,并将每一个新行与它连接起来,如下所示:
在with语句之后和之外,设置一个新索引: