处理csv文件与几乎相似的记录,但不同的时间-需要将它们分组为一个记录

z2acfund  于 2023-04-03  发布在  其他
关注(0)|答案(5)|浏览(139)

我试图解决下面的实验和有问题.这个问题涉及csv输入.有解决方案需要满足的标准.任何帮助或提示都将不胜感激.我的代码是在问题的末尾沿着我的输出.

Each row contains the title, rating, and all showtimes of a unique movie.
A space is placed before and after each vertical separator ('|') in each row.
Column 1 displays the movie titles and is left justified with a minimum of 44 characters.
If the movie title has more than 44 characters, output the first 44 characters only.
Column 2 displays the movie ratings and is right justified with a minimum of 5 characters.
Column 3 displays all the showtimes of the same movie, separated by a space.

这是输入:

16:40,Wonders of the World,G
20:00,Wonders of the World,G
19:00,End of the Universe,NC-17
12:45,Buffalo Bill And The Indians or Sitting Bull's History Lesson,PG
15:00,Buffalo Bill And The Indians or Sitting Bull's History Lesson,PG
19:30,Buffalo Bill And The Indians or Sitting Bull's History Lesson,PG
10:00,Adventure of Lewis and Clark,PG-13
14:30,Adventure of Lewis and Clark,PG-13
19:00,Halloween,R

以下是预期输出:

Wonders of the World                         |     G | 16:40 20:00
End of the Universe                          | NC-17 | 19:00
Buffalo Bill And The Indians or Sitting Bull |    PG | 12:45 15:00 19:30
Adventure of Lewis and Clark                 | PG-13 | 10:00 14:30
Halloween                                    |     R | 19:00

我的代码到目前为止:

import csv
rawMovies = input()
repeatList = []

with open(rawMovies, 'r') as movies:
    moviesList = csv.reader(movies)
    for movie in moviesList:
        time = movie[0]
        #print(time)
        show = movie[1]
        if len(show) > 45:
            show = show[0:44]
        #print(show)
        rating = movie[2]
        #print(rating)
        print('{0: <44} | {1: <6} | {2}'.format(show, rating, time))

我的输出没有右对齐评级,我不知道如何过滤重复的电影而不删除列表的时间部分:

Wonders of the World                         | G      | 16:40
Wonders of the World                         | G      | 20:00
End of the Universe                          | NC-17  | 19:00
Buffalo Bill And The Indians or Sitting Bull | PG     | 12:45
Buffalo Bill And The Indians or Sitting Bull | PG     | 15:00
Buffalo Bill And The Indians or Sitting Bull | PG     | 19:30
Adventure of Lewis and Clark                 | PG-13  | 10:00
Adventure of Lewis and Clark                 | PG-13  | 14:30
Halloween                                    | R      | 19:00
wqlqzqxt

wqlqzqxt1#

您可以在字典中收集输入数据,以title-rating-tuples作为键,并在列表中收集放映时间,然后打印合并的信息。例如(您必须调整文件名):

import csv

movies = {}
with open("data.csv", "r") as file:
    for showtime, title, rating in csv.reader(file):
        movies.setdefault((title, rating), []).append(showtime)
for (title, rating), showtimes in movies.items():
    print(f"{title[:44]: <44} | {rating: >5} | {' '.join(showtimes)}")

输出:

Wonders of the World                         |     G | 16:40 20:00
End of the Universe                          | NC-17 | 19:00
Buffalo Bill And The Indians or Sitting Bull |    PG | 12:45 15:00 19:30
Adventure of Lewis and Clark                 | PG-13 | 10:00 14:30
Halloween                                    |     R | 19:00

由于输入看起来像是在连接的块中出现的,所以您也可以使用itertools.groupby(来自标准库),并在阅读时打印:

import csv
from itertools import groupby
from operator import itemgetter

with open("data.csv", "r") as file:
    for (title, rating), group in groupby(
        csv.reader(file), key=itemgetter(1, 2)
    ):
        showtimes = " ".join(time for time, *_ in group)
        print(f"{title[:44]: <44} | {rating: >5} | {showtimes}")
i7uaboj4

i7uaboj42#

为此,考虑评级字符串的最大长度。从该值中减去评级的长度。制作该长度的空格字符串并附加评级。所以基本上

your_desired_str = ' '*(6-len(Rating))+Rating

也只是替换

'somestr {value}'.format(value)

f字符串,更容易阅读

f'somestr {value}'
ehxuflar

ehxuflar3#

下面是我在社区的一些提示后得出的结论。

rawMovies = input()
outputList = []

with open(rawMovies, 'r') as movies:
    moviesList = csv.reader(movies)
    movieold = [' ', ' ', ' ']
    for movie in moviesList:
        if movieold[1] == movie[1]:
            outputList[-1][2] += ' ' + movie[0]
        else:
            time = movie[0]
            # print(time)
            show = movie[1]
            if len(show) > 45:
                show = show[0:44]
            # print(show)
            rating = movie[2]
            outputList.append([show, rating, time])
            movieold = movie
            # print(rating)
#print(outputList)

for movie in outputList:
    print('{0: <44} | {1: <5} | {2}'.format(movie[0], movie[1].rjust(5), movie[2]))
unftdfkk

unftdfkk4#

我将使用Python的groupby()函数来实现这一点,它可以帮助您将具有相同值的连续行分组。
例如:

import csv
from itertools import groupby

with open('movies.csv') as f_movies:
    csv_movies = csv.reader(f_movies)
    
    for title, entries in groupby(csv_movies, key=lambda x: x[1]):
        movies = list(entries)
        showtimes = ' '.join(row[0] for row in movies)
        rating = movies[0][2]
        
        print(f"{title[:44]: <44} | {rating: >5} | {showtimes}")

为您提供:

Wonders of the World                         |     G | 16:40 20:00
End of the Universe                          | NC-17 | 19:00
Buffalo Bill And The Indians or Sitting Bull |    PG | 12:45 15:00 19:30
Adventure of Lewis and Clark                 | PG-13 | 10:00 14:30
Halloween                                    |     R | 19:00

groupby()如何工作?

当阅读CSV文件时,您将一次获取一行。groupby()所做的是将行分组到包含具有相同值的行的迷你列表中。它查找的值使用key参数给出。在这种情况下,lambda函数一次传递一行,它返回x[1]的当前值,即titlegroupby()会一直阅读行,直到该值发生变化。然后,它将当前列表作为entries作为迭代器返回。
这种方法确实假设你想要分组的行是文件中连续的行。你甚至可以编写你自己的group by generator函数:

def group_by_title(csv):
    title = None
    entries = []
    
    for row in csv:
        if title and row[1] != title:
            yield title, entries
            entries = []
        
        title = row[1]
        entries.append(row)
    
    if entries:
        yield title, entries

with open('movies.csv') as f_movies:
    csv_movies = csv.reader(f_movies)
    
    for title, entries in group_by_title(csv_movies):
        showtimes = ' '.join(row[0] for row in entries)
        rating = entries[0][2]
        
        print(f"{title[:44]: <44} | {rating: >5} | {showtimes}")
kiayqfof

kiayqfof5#

file_name = input()
my_movies = {}

with open(file_name, 'r') as f:
  rows = f.readlines()
  for row in rows:
    showtimes ,title, rating = row.strip().split(",")
    if title in my_movies:
      my_movies[title]["showtimes"].append(showtimes)

    else:
      my_movies[title] = {"rating": rating, "showtimes": [showtimes]}
  
    
for movie, item in my_movies.items():
  showtimes = " ".join(item["showtimes"])
  ratings = item["rating"]
  title = movie[:44]
  print(f'{title:<44} | {ratings:>5} | {showtimes}')

相关问题