python 比较两个列表并创建一个新列表

wlzqhblo  于 2023-05-05  发布在  Python
关注(0)|答案(1)|浏览(123)

所以这是我的问题:
我有一个国家列表(所有国家)和字母表列表。
需要发生的是,当一个国家(第一个列表)有一个或多个字母在它从字母表(第二个列表)
从字母表中删除这些字母,并将使用的国家添加到新列表中。
然后继续这样做,直到x数量的国家已被使用,并且字母表中的所有字母都被删除/使用。
还要确保国家列表少于14个国家
返回国家列表。
这是我现在的代码:

def alphabet_set(countries):
    list_of_letters = ["a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z"]
    matching_countries = []
    for country in countries:
        for letter in list_of_letters:
            if letter in country.lower():
                matching_countries.append(country)
                list_of_letters.remove(letter)
                break

    list_of_countries = list(matching_countries)
    # print(f"Matching countries: \n{matching_countries}")
    print(f"Matching countries: \n{list_of_countries}")
    print(f"Remaining characters: \n{list_of_letters}")

    return list_of_countries

我知道这不是一个好方法,因为现在我有一个超过14个国家的国家列表。

ryevplcw

ryevplcw1#

您需要采取的方法是greedy one,即在每次迭代时,您需要找到从字母表中删除最多剩余字母的国家。要做到这一点,使用Python set s很有用。如果您从字符串创建一个集合,它将包含该字符串中唯一字符的排序集合,例如:

country = set("india")
print(country)
{'a', 'n', 'd', 'i'}

有了集合,你还可以很容易地计算出unions|运算符),intersections&运算符)和集合之间的差异(-运算符)。
以下是您可以采取的一种方法(国家列表较少):

# import the list of lowercase ascii Latin alphabet characters
from string import ascii_lowercase

# create a set of the alphabet
alphabet = set(ascii_lowercase)

# a list of countries
countries = [
    "india",
    "china",
    "france",
    "spain",
    "germany",
    "qatar",
    "mexico",
    "japan",
    "thailand",
    "argentina",
    "brazil",
    "nigeria",
    "chad",
    "chile",
    "switzerland",
    "togo",
    "new zealand",
    "canada",
    "vanuatu",
    "slovakia",
    "jamaica",
    "sudan",
    "peru",
    "united kingdom",
    "egypt",
    "ukraine",
    "greece",
]

# list that will be filled in with country names
matching_countries = []

# create a while loop that continues until all alphabet characters have been used
while len(alphabet) > 0:
    # create a list of sets of country letters that are in the
    # current alphabet
    countrysets = [
        set(country.replace(" ", "").lower()) & alphabet
        for country in countries
    ]

    # get the number of unique characters in each country
    setlengths = [len(country) for country in countrysets]

    # get the maximum length
    maxlen = max(setlengths)

    # get the list index of the country with the largest length (or first one
    # if there are several equal length ones)
    for i in range(len(countrysets)):
        if setlengths[i] == maxlen:
            argmax = i
            break

    # remove used characters from the alphabet
    alphabet = alphabet - countrysets[argmax]

    # put the country into the matching countries list
    matching_countries.append(countries.pop(argmax))

print(matching_countries)

['switzerland', 'united kingdom', 'china', 'japan', 'france', 'germany', 'qatar', 'mexico', 'brazil', 'vanuatu']

只是为了仔细检查这个列表是否包含我们可以做的所有拉丁字母字符:

set("".join(country.replace(" ", "") for country in matching_countries)) == set(ascii_lowercase)
True

相关问题