通过阅读循环中100s个csv文件的最后一行来构建 Dataframe

kpbwa7wx  于 2022-12-15  发布在  其他
关注(0)|答案(1)|浏览(113)

I'm trying to build a dataframe by reading 100s of csv files and keeping the last row of each csv via .tail(1) and then pd.concat(). The current result is a df that includes the header row with each row of data.
I'm hoping for guidance on an approach to read the last row of each csv and build a dataframe that has the header row at top and then only data rows after that.
Here's my current code:

count = 0

with open('names.txt', 'r') as my_file: 
    newline_break = "" 
    for readline in my_file: 
        line_strip = readline.strip() 
        newline_break += line_strip 
        count +=1
        
        try:

            df = pd.read_csv('~/' + line_strip + '.csv', 
                             index_col=None,
                            )
            
            df2 = df.tail(1)
            
            df3 = pd.concat([df2])
            
            print(df3)
            
        except Exception as e: 
            exc_type, exc_obj, exc_tb = sys.exc_info()
            fname = os.path.split(exc_tb.tb_frame.f_code.co_filename)[1]
            print(exc_type, fname, exc_tb.tb_lineno)

The .txt file is a simple list of names that selects the .csv file for df.read_csv step.
Here's the current output:
| | Unnamed: 0 | Date | name | field1 | field2 | field3 | field4 | field5 | field6 | field7 | field8 |
| ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ |
| 532 | 532 | 2022-12-02 | Jones | 2.2 | 0.03 | 234 | 17.0 | 800 | 1.2 | 23.34 | 15.28 |
| | Unnamed: 0 | Date | name | field1 | field2 | field3 | field4 | field5 | field6 | field7 | field8 |
| ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ |
| 674 | 674 | 2022-12-02 | Smith | 3.81 | 4.08 | 3.75 | 3.99 | 16 | 2.832 | 3.97 | 4.05 |
| | Unnamed: 0 | Date | name | field1 | field2 | field3 | field4 | field5 | field6 | field7 | field8 |
| ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ |
| 674 | 674 | 2022-12-02 | Grove | 28.42 | 28.57 | 28.42 | 28.55 | 72 | 0.04 | 2.67 | 6.8 |
| | Unnamed: 0 | Date | name | field1 | field2 | field3 | field4 | field5 | field6 | field7 | field8 |
| ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ |
| 674 | 674 | 2022-12-02 | Injo | 3.09 | 3.16 | 3.08 | 3.1 | 462 | 0.94 | 2.93 | 2.90 |
| | Unnamed: 0 | Date | name | field1 | field2 | field3 | field4 | field5 | field6 | field7 | field8 |
| ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ |
| 674 | 674 | 2022-12-02 | Solas | 1.26 | 14.83 | 18.69 | 3.32 | 500 | 0.31 | 13.07 | 17.92 |
| | Unnamed: 0 | Date | name | field1 | field2 | field3 | field4 | field5 | field6 | field7 | field8 |
| ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ |
| 674 | 674 | 2022-12-02 | Resto | 1.84 | 1.04 | 1.04 | 3.77 | 100 | 0.1 | 9.9 | 7.7 |
This is the desired output:
| Date | name | field1 | field2 | field3 | field4 | field5 | field6 | field7 | field8 |
| ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ |
| 2022-12-02 | Jones | 2.2 | 0.03 | 234 | 17.0 | 800 | 1.2 | 23.34 | 15.28 |
| 2022-12-02 | Smith | 3.81 | 4.08 | 3.75 | 3.99 | 16 | 2.832 | 3.97 | 4.05 |
| 2022-12-02 | Grove | 28.42 | 28.57 | 28.42 | 28.55 | 72 | 0.04 | 2.67 | 6.8 |
| 2022-12-02 | Injo | 3.09 | 3.16 | 3.08 | 3.1 | 462 | 0.94 | 2.93 | 2.90 |
| 2022-12-02 | Solas | 1.26 | 14.83 | 18.69 | 3.32 | 500 | 0.31 | 13.07 | 17.92 |
| 2022-12-02 | Resto | 1.84 | 1.04 | 1.04 | 3.77 | 100 | 0.1 | 9.9 | 7.7 |

  • NB: Removing the additional index columns would be great also . . . :-)

Grateful for your guidance.

kwvwclae

kwvwclae1#

试着在循环之前示例化一个空 Dataframe 来重构你的代码,并将每一个新行与它连接起来,如下所示:

count = 0

with open("names.txt", "r") as my_file:

    df = pd.DataFrame()

    newline_break = ""
    for readline in my_file:
        line_strip = readline.strip()
        newline_break += line_strip
        count += 1

        try:

            df = pd.concat(
                [
                    df,
                    pd.read_csv(
                        "~/" + line_strip + ".csv",
                        index_col=None,
                    )
                    .drop(columns=["Unnamed: 0"])
                    .tail(1),
                ],
            )

        except Exception as e:
            exc_type, exc_obj, exc_tb = sys.exc_info()
            fname = os.path.split(exc_tb.tb_frame.f_code.co_filename)[1]
            print(exc_type, fname, exc_tb.tb_lineno)

在with语句之后和之外,设置一个新索引:

df3 = df3.set_index("Date")

相关问题