python 将4字符串列表分组为对列表

cclgggtu  于 2023-02-07  发布在  Python
关注(0)|答案(4)|浏览(154)

我有以下字符串列表:

['word1 word2 word3 word4', 'word5 word6 word7 word8']

(我只展示了两个字符串,但可以有很多。)我想创建一个新的列表,看起来应该像这样:

['word1 word2', 'word3 word4', 'word5 word6', 'word7 word8']

我试着跟随:

lines = ['word1 word2 word3 word4', 'word5 word6 word7 word8']
[[word1 + ' ' + word2, word3 + ' ' + word4] for line in lines for word1, word2, word3, word4 in line.split()]

但它给出以下错误:

ValueError: too many values to unpack (expected 4)

我该怎么用最像Python的方式来做呢?

hm2xizp9

hm2xizp91#

使用短正则表达式匹配:

import re

lst = ['word1 word2 word3 word4', 'word5 word6 word7 word8']
res = [pair for words in lst for pair in re.findall(r'\S+ \S+', words)]
  • \S+ \S+-匹配2个连续的“* 单词 *”
['word1 word2', 'word3 word4', 'word5 word6', 'word7 word8']
ljsrvy3e

ljsrvy3e2#

修改了@jsbueno之前的回答,该回答稍有错误:

>>> words = [item for line in lines for item in line.split()]
>>> words
['word1', 'word2', 'word3', 'word4', 'word5', 'word6', 'word7', 'word8']
>>> [l[i] + ' ' + l[i+1] for i in range(0, len(words), 2)]
['word1 word2', 'word3 word4', 'word5 word6', 'word7 word8']
3df52oht

3df52oht3#

Python并不意味着"更少的行",这可以通过一个简单的for循环轻松实现:

result = []
for line in lines:
    words = line.split()
    result.append(' '.join(words[:2]))
    result.append(' '.join(words[2:]))

这将提供您所需的结果:

['word1 word2', 'word3 word4', 'word5 word6', 'word7 word8']

在线试用!
如果你想让这个方法更通用于单词较多的字符串,你可以编写一个函数,产生所需大小的块,并将其用于str.join

def chunks(iterable, chunk_size):
    c = []
    for item in iterable:
        c.append(item)
        if len(c) == chunk_size:
            yield c
            c = []

    if c: yield c

result = []
for line in lines:
    words = line.split()
    for chunk in chunks(words, 2):
        result.append(' '.join(chunk))

在线试用!

bxfogqkk

bxfogqkk4#

一个优化的解决方案,将所有的单项工作推到C层:

from itertools import chain

lines = ['word1 word2 word3 word4', 'word5 word6 word7 word8']
words = chain.from_iterable(map(str.split, lines))
paired = list(map('{} {}'.format, words, words))
print(paired)

在线试用!
chain.from_iterable(map(str.split, lines))创建单个单词的迭代器。map('{} {}'.format, words, words)将相同的迭代器Map两次,以便将它们成对地放回一起(map(' '.join, zip(words, words))将获得相同的效果,但是具有额外的中间乘积;请随意测试实际中哪个更快)。list Package 器使用它来生成最终结果。
这避免了Python层的每一项工作(随着输入的增长,不需要执行额外的字节码),避免了Python的一个奇怪的高开销方面(索引和简单的整数运算),从而击败了现有的答案。

相关问题