不使用Pandas的csv文件的基本统计

yr9zkbsy  于 2023-07-31  发布在  其他
关注(0)|答案(1)|浏览(174)

我试图在我的程序中运行基本的统计学,而不使用numpy和pandas。我想计算一下借书的总平均数。然而,我的total_贷款len()没有返回被借的书的总数,因此导致我的程序返回1,这是不正确的。
任何建议将不胜感激!
我下面的代码应该返回一本书被借阅的总平均天数,如果它同时出现在bookloans.csv和books.csv中,基于公共的book_id字段(只有books.csv有标题,bookloans.csv没有任何标题)

import csv

total_loan_days = 0
total_loans = 0

# Open the books file and read the title and authors into a list
with open('books.csv', 'r', encoding='utf-8-sig') as csv_books_file:
    books_reader = csv.reader(csv_books_file)
    next(books_reader)  # skip the header row
    for row in books_reader:
        books = row[0]

with open('bookloans.csv', 'r', encoding='utf-8-sig') as csv_bookloans_file:
    loans_reader = csv.reader(csv_bookloans_file)
    for rows in loans_reader:
        book_number = row[0]
        return_date = rows[3]
        start_date = rows[2]
        date_diff = int(return_date) - int(start_date)
        
        
        total_loan_days += date_diff
        total_loans = len(book_number)

       
#Calculate the overall average loan days for all books
    if date_diff > 1:
        overall_average_loan_days = total_loan_days / total_loans
    else:
        overall_average_loan_days = 1

    print(f"Overall Average Loan Days: {over`all_average_loan_days}")

字符串

zysjyyx4

zysjyyx41#

我首先将主图书列表读入一个dict,并键入图书的ID:

import csv

books: dict[str, list[str]] = {}

with open("books.csv", newline="", encoding="utf-8") as f:
    reader = csv.reader(f)
    header = next(reader)
    assert header == ["ID", "Title", "Author"], f"{header} != {['ID', 'Title', 'Author']}"

    for row in reader:
        books[row[0]] = row

字符串
阅读此CSV后:

ID,Title,Author
1,A Closed and Common Orbit,Becky Chambers
2,The Clan of the Cave Bear,Jean M. Auel
3,Sum,David Eagleman
4,the 13 1/2 lives of Captain Bluebear,Walter Moers


字典看起来像:

{
    "1": ["1", "A Closed and Common Orbit", "Becky Chambers"],
    "2": ["2", "The Clan of the Cave Bear", "Jean M. Auel"],
    "3": ["3", "Sum", "David Eagleman"],
    "4": ["4", "the 13 1/2 lives of Captain Bluebear", "Walter Moers"],
}


然后,我将贷款CSV读入一个单独的dict,并将 checkout 和返回的时间戳解析为真实的日期值(这将使数学和统计数据更容易,更准确):

from datetime import date, datetime, timedelta

DATE_FMT = r"%Y/%m/%d"

loans: dict[str, list[tuple[date, date]]] = {}

with open("book_loans.csv", newline="", encoding="utf-8") as f:
    reader = csv.reader(f)

    for row in reader:
        book_id = row[0]
        checked_out = datetime.strptime(row[1], DATE_FMT).date()
        returned = datetime.strptime(row[2], DATE_FMT).date()

        if book_id not in loans:
            loans[book_id] = []

        loans[book_id].append((checked_out, returned))


阅读此贷款CSV后:

1,2000/01/01,2000/01/13
2,2000/01/01,2000/01/10
3,2000/01/04,2000/01/16
2,2000/01/11,2000/02/01
3,2000/01/12,2000/01/13
1,2000/01/12,2000/01/26


贷款dict看起来像:

{
    "1": [(date(2000, 1, 1), date(2000, 1, 13)), (date(2000, 1, 12), date(2000, 1, 26))],
    "2": [(date(2000, 1, 1), date(2000, 1, 10)), (date(2000, 1, 11), date(2000, 2, 1))],
    "3": [(date(2000, 1, 4), date(2000, 1, 16)), (date(2000, 1, 12), date(2000, 1, 13))],
}


一个图书ID,它指向一个日期对列表(每对日期都被检出并返回)。

  • 注:第4册,船长。蓝熊,从未被租借。*

现在,有了所有可用书籍的完整列表和每本书(实际上是借出的)的借阅日期列表,您可以应用您需要的任何逻辑/数学。
我去了:

for book_id, row in books.items():
    title = row[1]
    author = row[2]

    if book_id not in loans:
        print(f"'{title}' by {author} was never loaned out. 🙁")
        continue

    loan_dates = loans[book_id]
    first_checked_out = date.max
    last_returned = date.min
    loan_durations: list[timedelta] = []

    for checked_out, returned in loan_dates:
        if checked_out < first_checked_out:
            first_checked_out = checked_out
        if returned > last_returned:
            last_returned = returned
        loan_durations.append(returned - checked_out)

    # https://stackoverflow.com/a/3617540/246801, give sum() a starting value of timedelta(0)
    avg_loan = sum(loan_durations, timedelta(0)) / len(loan_durations)

    print(
        f"'{title}', by {author}, was loaned {len(loan_durations)} times.  It was first checked out on {first_checked_out} and last returned on {last_returned}.  The average loan was {avg_loan.days} days."
    )


它打印:

'A Closed and Common Orbit', by Becky Chambers, was loaned 2 times.  It was first checked out on 2000-01-01 and last returned on 2000-01-26.  The average loan was 13 days.
'The Clan of the Cave Bear', by Jean M. Auel, was loaned 2 times.  It was first checked out on 2000-01-01 and last returned on 2000-02-01.  The average loan was 15 days.
'Sum', by David Eagleman, was loaned 2 times.  It was first checked out on 2000-01-04 and last returned on 2000-01-16.  The average loan was 6 days.
'the 13 1/2 lives of Captain Bluebear' by Walter Moers was never loaned out. 🙁

相关问题