python—如何从列表中删除项及其在其他列表中的关联项

inn6fuwd  于 2021-08-20  发布在  Java
关注(0)|答案(5)|浏览(269)

我正在做一个项目,我需要从主列表中删除任何重复项。我这里有三个列表,我正在努力消除航班id列表中的重复项。我设法做到了,但不幸的是,我无法删除与flight_id列表中已删除元素相关的其他元素。


# All lists have a length of 20

flight_ID = ['1064662221', '1064617390', '1064614152', '1064614152', \
 '1064775880', '1064645826', '1064645826', '1064664535', '1064659772', \
 '1064659772', '1064614050', '1064614050', '1064614286', '1064614286', \
'1064614286', '1064614286', '1064614286', '1064614286', '1064614286', '1064646536']

flight_number = ['1827', '1585', '8409', '1465', '30', '9188', '2232', '3760', '579', '3309', '1259', '2193', '6566', '2231', '5214', '8601', '3169', '1601', '7832', '335']

airline_Code = ['TK', 'AY', 'DL', 'AF', 'FX', 'UA', 'LH', 'U2', 'SK', 'A3', 'AF', 'KL', 'VS', 'UX', 'G3', 'UU', 'KQ', 'AF', 'AR', 'LO']

我使用以下函数从主列表中删除重复项:

def remove_dup(a):
   i = 0
   while i < len(a):
      j = i + 1
      while j < len(a):
         if a[i] == a[j]:
            del a[j]
         else:
            j += 1
      i += 1

remove_dup(flight_ID)

# OUTPUT

['1064662221', '1064617390', '1064614152', '1064775880', '1064645826', '1064664535', '1064659772', '1064614050', '1064614286', '1064646536']

# 10 elements have been removed.

现在,如上所述,我需要对其他列表执行相同的操作,因此与主列表(航班号)中的项目匹配的项目也会被删除。
注意:虽然主列表显示重复的项目,但其他列表的项目不显示

pgky5nke

pgky5nke1#

我建议使用 Pandas 如果您打算以您描述的方式对格式化数据做更多的处理,因为这样可以轻松地执行删除重复项之类的操作:

import pandas as pd

# Make a DataFrame

flight_ID = ['1064662221', '1064617390', ...]
flight_number = ['1827', '1585', '8409', ...]
airline_Code = ['TK', 'AY', 'DL', ...]

df = pd.DataFrame({'flight_ID': flight_ID,
                   'flight_number': flight_number,
                   'airline_Code': airline_Code})

# Remove duplicates - just one line!

df.drop_duplicates('flight_ID', inplace=True)

您得到的 Dataframe 如下所示:

flight_ID flight_number airline_Code
0   1064662221          1827           TK
1   1064617390          1585           AY
2   1064614152          8409           DL
4   1064775880            30           FX
5   1064645826          9188           UA
7   1064664535          3760           U2
8   1064659772           579           SK
10  1064614050          1259           AF
12  1064614286          6566           VS
19  1064646536           335           LO
lx0bsm1f

lx0bsm1f2#

首先,根据需要更改表示以链接项目,而不是使用平行列表。

flight_list = zip(flight_ID, flight_number, airline_Code)

这使得删除三个相关项变得更容易。
现在,使用任意一种标准方法删除重复项。在每一个项目中,都要建立一个新的列表:改变你的迭代目标是一个坏主意,这在本网站的许多帖子中都有记录。将其保持在演示的编程级别:

unique_flight = []
found_ID = set()
for flight in flight_list:
    if flight[0] not in found_ID:
        found_ID.add(flight[0])
        unique_flight.append(flight)

for flight in unique_flight:
    print(flight)

输出:

('1064662221', '1827', 'TK')
('1064617390', '1585', 'AY')
('1064614152', '8409', 'DL')
('1064775880', '30', 'FX')
('1064645826', '9188', 'UA')
('1064664535', '3760', 'U2')
('1064659772', '579', 'SK')
('1064614050', '1259', 'AF')
('1064614286', '6566', 'VS')
('1064646536', '335', 'LO')
ymdaylpp

ymdaylpp3#

这里有几种方法,但我会考虑使用类来表示此类数据(类似于namedtuple示例的工作方式)
将flight_ID添加到字典中作为键使其具有唯一性,并使用值作为索引:

flight_ID_inds = {f: i for i, f in enumerate(flight_ID)}
flight_ID = list(flight_ID_inds.keys())
flight_number = [flight_number[i] for i in flight_ID_inds.values()]
airline_Code = [airline_Code[i] for i in flight_ID_inds.values()]

同样,使用值作为其他列表数据的元组,而不是索引:

dic = {fid: (fn, ac) for fid, fn, ac in zip(flight_ID, flight_number, airline_Code)}
flight_ID = list(dic.keys())
flight_number = [x[0] for x in dic.values()]
airline_Code = [x[1] for x in dic.values()]

使用命名元组(使用dicts表示的列表也可以):

from collections import namedtuple

flight_nt = namedtuple("Flight", "flight_ID, flight_number, airline_Code")

flights = [flight_nt(fid, fn, ac) for fid, fn, ac in zip(flight_ID, flight_number, airline_Code)]
uniq_ids = set()
uniq_flights = []
for f in flights:
    if f.flight_ID not in uniq_ids:
        uniq_ids.add(f.flight_ID)
        uniq_flights.append(f)
flight_ID = [x.flight_ID for x in uniq_flights]
flight_number = [x.flight_number for x in uniq_flights]
airline_Code = [x.airline_Code for x in uniq_flights]

对于这类问题,我建议使用面向对象(类或数据类):

class Flight:
    def __init__(self, flight_id, flight_number, airline_code):
        self.flight_id = flight_id
        self.flight_number = flight_number
        self.airline_code = airline_code

    def __hash__(self):
        return hash(self.flight_id)

    def __eq__(self, other):
        return other.flight_id == self.flight_id

flights = [Flight(fid, fn, ac) for fid, fn, ac in zip(flight_ID, flight_number, airline_Code)]
uniq_flights = set(flights)
35g0bw71

35g0bw714#

@prune有一个更好的解决方案,但您可以始终使用 enumerate() ```
for index, id in enumerate(flight_ID):
if id in flight_ID[index:]:
del flight_ID[index]
del flight_number[index]
del airline_Code[index]

请注意,这并不保留顺序,如果您想这样做,您必须在切片中找到值的索引。
mmvthczy

mmvthczy5#

您可以先确定要保留/删除的,然后再使用 itertools.compress 要删除元素,请执行以下操作:

import itertools as it

keep = []
seen = set()
for x in flight_ID:
    keep.append(x not in seen)
    seen.add(x)

flight_ID = list(it.compress(flight_ID, keep))
flight_number = list(it.compress(flight_number, keep))
airline_Code = list(it.compress(airline_Code, keep))

但是,由于这些数据在逻辑上似乎属于同一类,因此最好为其创建一个专用的容器类,例如via namedtuple :

from collections import namedtuple

FlighData = namedtuple('id number code')

data = [FlightData(*x) for x in zip(flight_ID, flight_number, airline_Code)]

那么另一种方法就是使用 itertools.groupby :

unique_data = list(next(g) for k, g in it.groupby(sorted(data), key=op.itemgetter(0)))

相关问题