Python重复单词

6ie5vjzr  于 2023-01-22  发布在  Python
关注(0)|答案(6)|浏览(222)

我有一个问题,在Python(v3.4.1)中我需要计算重复单词的数量,然后把它们放在一个句子中。我使用了counter,但是我不知道如何按照下面的顺序得到输出。输入是:

mysentence = As far as the laws of mathematics refer to reality they are not certain as far as they are certain they do not refer to reality

我把这些列成一个清单
输出应该如下所示

"As" is repeated 1 time.
"are" is repeated 2 times.
"as" is repeated 3 times.
"certain" is repeated 2 times.
"do" is repeated 1 time.
"far" is repeated 2 times.
"laws" is repeated 1 time.
"mathematics" is repeated 1 time.
"not" is repeated 2 times.
"of" is repeated 1 time.
"reality" is repeated 2 times.
"refer" is repeated 2 times.
"the" is repeated 1 time.
"they" is repeated 3 times.
"to" is repeated 2 times.

我已经说到这里了

x=input ('Enter your sentence :')
y=x.split()
y.sort()
for y in sorted(y):
    print (y)
ut6juiuv

ut6juiuv1#

我可以理解sort的作用,因为你可以可靠地知道什么时候你碰到了一个新单词,并跟踪每个唯一单词的计数。然而,你真正想做的是使用一个哈希(字典)来跟踪计数,因为字典键是唯一的。例如:

words = sentence.split()
counts = {}
for word in words:
    if word not in counts:
        counts[word] = 0
    counts[word] += 1

现在你将得到一个字典,其中键是单词,值是它出现的次数,你可以做一些事情,比如使用collections.defaultdict(int),这样你就可以添加值:

counts = collections.defaultdict(int)
for word in words:
    counts[word] += 1

但是甚至还有比这更好的东西... collections.Counter,它会把你的单词列表变成一个包含计数的字典(实际上是字典的扩展)。

counts = collections.Counter(words)

从这里开始,您需要按排序顺序列出单词列表及其计数,以便打印它们。items()将为您提供元组列表,sorted将按每个元组的第一项(本例中为单词)排序(默认情况下)......这正是您想要的。

import collections
sentence = """As far as the laws of mathematics refer to reality they are not certain as far as they are certain they do not refer to reality"""
words = sentence.split()
word_counts = collections.Counter(words)
for word, count in sorted(word_counts.items()):
    print('"%s" is repeated %d time%s.' % (word, count, "s" if count > 1 else ""))
    • 输出**
"As" is repeated 1 time.
"are" is repeated 2 times.
"as" is repeated 3 times.
"certain" is repeated 2 times.
"do" is repeated 1 time.
"far" is repeated 2 times.
"laws" is repeated 1 time.
"mathematics" is repeated 1 time.
"not" is repeated 2 times.
"of" is repeated 1 time.
"reality" is repeated 2 times.
"refer" is repeated 2 times.
"the" is repeated 1 time.
"they" is repeated 3 times.
"to" is repeated 2 times.
kt06eoxx

kt06eoxx2#

要按排序顺序打印字符串中的重复单词:

from itertools import groupby 

mysentence = ("As far as the laws of mathematics refer to reality "
              "they are not certain as far as they are certain "
              "they do not refer to reality")
words = mysentence.split() # get a list of whitespace-separated words
for word, duplicates in groupby(sorted(words)): # sort and group duplicates
    count = len(list(duplicates)) # count how many times the word occurs
    print('"{word}" is repeated {count} time{s}'.format(
            word=word, count=count,  s='s'*(count > 1)))

"As" is repeated 1 time
"are" is repeated 2 times
"as" is repeated 3 times
"certain" is repeated 2 times
"do" is repeated 1 time
"far" is repeated 2 times
"laws" is repeated 1 time
"mathematics" is repeated 1 time
"not" is repeated 2 times
"of" is repeated 1 time
"reality" is repeated 2 times
"refer" is repeated 2 times
"the" is repeated 1 time
"they" is repeated 3 times
"to" is repeated 2 times
6g8kf2rb

6g8kf2rb3#

嘿,我已经在python 2. 7(mac)上试过了,因为我有那个版本,所以试着掌握逻辑

from collections import Counter

mysentence = """As far as the laws of mathematics refer to reality they are not certain as far as they are certain they do not refer to reality"""

mysentence = dict(Counter(mysentence.split()))
for i in sorted(mysentence.keys()):
    print ('"'+i+'" is repeated '+str(mysentence[i])+' time.')

我希望这是你正在寻找的,如果不是,然后平我高兴地学习新的东西。

"As" is repeated 1 time.
"are" is repeated 2 time.
"as" is repeated 3 time.
"certain" is repeated 2 time.
"do" is repeated 1 time.
"far" is repeated 2 time.
"laws" is repeated 1 time.
"mathematics" is repeated 1 time.
"not" is repeated 2 time.
"of" is repeated 1 time.
"reality" is repeated 2 time.
"refer" is repeated 2 time.
"the" is repeated 1 time.
"they" is repeated 3 time.
"to" is repeated 2 time.
l7wslrjt

l7wslrjt4#

下面是一个非常糟糕的例子,说明除了列表之外不使用任何其他东西就可以做到这一点:

x = "As far as the laws of mathematics refer to reality they are not certain as far as they are certain they do not refer to reality"
words = x.split(" ")
words.sort()

words_copied = x.split(" ")
words_copied.sort()

for word in words:
    count = 0
    while(True):
        try:
            index = words_copied.index(word)
            count += 1
            del words_copied[index]
        except ValueError:
            if count is not 0:
                print(word + " is repeated " + str(count) + " times.")
            break

编辑:这里有一个更好的方法:

x = "As far as the laws of mathematics refer to reality they are not certain as far as they are certain they do not refer to reality"
words = x.split(" ")
words.sort()

last_word = ""
for word in words:
    if word != last_word:
        count = [i for i, w in enumerate(words) if w == word]
        print(word + " is repeated " + str(len(count)) + " times.")
    last_word = word
x33g5p2x

x33g5p2x5#

一个基于numpy数组和基于post How do I count the occurrence of a certain item in an ndarray?的解决方案:

mysentence = """As far as the laws of mathematics refer to reality they are not certain as far as they are certain they do not refer to reality"""
import numpy as np
mysentence = np.array(mysentence.split(" "))
words, frq = np.unique(mysentence, return_counts=True)

for item in zip(words,frq):                  
    print(f'"{item[0]}" is repeated {item[1]} time.')

输出:

"As" is repeated 1 time.
"are" is repeated 2 time.
"as" is repeated 3 time.
"certain" is repeated 2 time.
"do" is repeated 1 time.
"far" is repeated 2 time.
"laws" is repeated 1 time.
"mathematics" is repeated 1 time.
"not" is repeated 2 time.
"of" is repeated 1 time.
"reality" is repeated 2 time.
"refer" is repeated 2 time.
"the" is repeated 1 time.
"they" is repeated 3 time.
"to" is repeated 2 time.
8hhllhi2

8hhllhi26#

如果字符串为“miamimimimimimimimimimimimimimimimimiami”或“旧金山旧金山旧金山旧金山旧金山旧金山旧金山旧金山旧金山旧金山旧金山旧金山旧金山旧金山旧金山”

import re

String="San FranciscoSan FranciscoSan FranciscoSan FranciscoSan FranciscoSan FranciscoSan FranciscoSan FranciscoSan Francisco"
word=""
for i in String:
    word+=i
    if String=="".join(re.findall(word,String)):
        print(a)
        break

相关问题