numpy 基于连续编号模式停止的时间拆分列表

am46iovg 于 2023-01-30 发布在其他

关注(0)|答案(5)|浏览(120)

我有一个现有的列表。我想在后面的数字不等于它前面的值时将它分解成单独的列表。

x = [1,4,4,5,5,8,8,10,10,25,25,70,70,90,90,100,2,3,3,4,4,5,5,8,8,9,20,21,21,22,23)

所需的输出应如下所示：

a = [1,4,4,5,5,8,8,10,10,25,25,70,70,90,90,100)

 b = [2,3,3,4,4,5,5,8,8,9)

 c = [20,21,21,22]

 d = [23]

numpy

来源：https://stackoverflow.com/questions/19027975/split-list-based-on-when-a-pattern-of-consecutive-numbering-stops

5条答案

按热度按时间

mfpqipee1#

为了回答您的问题：
我有[...]一个列表。每当后面的数字不等于它前面的值时，我想把它分成单独的列表。
看看itertools.groupby。
示例：

import itertools
l = [38, 1200, 1200, 306, 306, 391, 391, 82, 82, 35, 35, 902, 902, 955, 955, 13]
for x, v in itertools.groupby(l):
    # `v` is an iterator that yields all subsequent elements
    # that have the same value
    # `x` is that value
    print list(v)

输出为：

$ python test.py
[38]
[1200, 1200]
[306, 306]
[391, 391]
[82, 82]
[35, 35]
[902, 902]
[955, 955]
[13]

显然这就是你想要的？
至于你的模式，这里有一些生成器函数，它至少能为给定的输入产生你所期望的输出：

import itertools

def split_sublists(input_list):
    sublist = []
    for val, l in itertools.groupby(input_list):
        l = list(l)
        if not sublist or len(l) == 2:
            sublist += l
        else:
            sublist += l
            yield sublist
            sublist = []
    yield sublist

input_list = [1,4,4,5,5,8,8,10,10,25,25,70,70,90,90,100,2,3,3,4,4,5,5,8,8,9,20,21,21,22,23]
for sublist in split_sublists(input_list):
    print sublist

输出：

$ python test.py
[1, 4, 4, 5, 5, 8, 8, 10, 10, 25, 25, 70, 70, 90, 90, 100]
[2, 3, 3, 4, 4, 5, 5, 8, 8, 9]
[20, 21, 21, 22]
[23]

赞(0）回复(0）举报 2023-01-30

afdcj2ne2#

numpy 版本：

>>> inds = np.where(np.diff(x))[0]
>>> out = np.split(x,inds[np.diff(inds)==1][0::2]+2)
>>> for n in out:
...     print n

[  38 1200 1200  306  306  391  391   82   82   35   35  902  902  955  955
   13]
[955 847 847 835 835 698 698 777 777 896 896 923 923 940 940 569 569  53
  53 411]
[  53 1009 1009 1884]
[1009  878]
[ 923  886  886  511  511  942  942 1067 1067 1888 1888  243  243 1556]

你的新案子是一样的：

>>> inds = np.where(np.diff(x))[0]
>>> out = np.split(x,inds[np.diff(inds)==1][0::2]+2)
>>> for n in out:
...     print n
...
[  1   4   4   5   5   8   8  10  10  25  25  70  70  90  90 100]
[2 3 3 4 4 5 5 8 8 9]
[20 21 21 22]
[23]

以x作为列表开始：

%timeit inds = np.where(np.diff(x))[0];out = np.split(x,inds[np.diff(inds)==1][0::2]+2)
10000 loops, best of 3: 169 µs per loop

如果x是numpy数组：

%timeit inds = np.where(np.diff(arr_x))[0];out = np.split(arr_x,inds[np.diff(inds)==1][0::2]+2)
10000 loops, best of 3: 135 µs per loop

对于较大的系统，你可以期待numpy比python有更好的性能。

赞(0）回复(0）举报 2023-01-30

luaexgnf3#

下面是我的丑陋的解决方案：

x = [38, 1200, 1200, 306, 306, 391, 391, 82, 82, 35, 35, 902, 902, 955, 955, 13, 955, 847, 847, 835, 83, 5698, 698, 777, 777, 896, 896, 923, 923, 940, 940, 569, 569, 53, 53, 411]

def weird_split(alist):
    sublist = []
    for i, n in enumerate(alist[:-1]):
        sublist.append(n)
        # make sure we only create a new list if the current one is not empty
        if len(sublist) > 1 and n != alist[i-1] and n != alist[i+1]:
            yield sublist
            sublist = []
    # always add the last element
    sublist.append(alist[-1])
    yield sublist

for sublist in weird_split(x):
    print sublist

并且输出：

[38, 1200, 1200, 306, 306, 391, 391, 82, 82, 35, 35, 902, 902, 955, 955, 13]
[955, 847, 847, 835]
[83, 5698]
[698, 777, 777, 896, 896, 923, 923, 940, 940, 569, 569, 53, 53, 411]

赞(0）回复(0）举报 2023-01-30

cwdobuhd4#

首先，您还没有定义[1, 0, 0, 1, 0, 0, 1]的行为，因此这会将其拆分为[1, 0, 0, 1]、[0, 0]和[1]。
其次，有很多极端情况需要正确处理，所以它比你想象的要长，如果它直接把东西放进列表，它也会缩短，但是生成器是个好东西，所以我确保不那样做。
首先，使用完整的迭代器接口，而不是yield快捷方式，因为它允许更好地共享外部和内部迭代器之间的状态，而无需在每次迭代时生成新的subsection生成器。带有yield s的嵌套def可能能够在更少的空间内实现这一点，但在这种情况下，我认为冗长是可以接受的。
因此，设置：

class repeating_sections:
    def __init__(self, iterable):
        self.iter = iter(iterable)

        try:
            self._cache = next(self.iter)
            self.finished = False
        except StopIteration:
            self.finished = True

我们需要定义一个子迭代器，它在找到一个不匹配的对之前产生，因为结尾将从迭代器中移除，我们需要在下一次调用_subsection时将其yield，所以将其存储在_cache中。

def _subsection(self):
        yield self._cache

        try:
            while True:
                item1 = next(self.iter)

                try:
                    item2 = next(self.iter)
                except StopIteration:
                    yield item1
                    raise

                if item1 == item2:
                    yield item1
                    yield item2

                else:
                    yield item1
                    self._cache = item2
                    return

        except StopIteration:
            self.finished = True

__iter__应为可迭代项返回self：

def __iter__(self):
        return self

__next__返回一个子段，除非完成。注意，如果要使行为可靠，则完成该子段是很重要的。

def __next__(self):
        if self.finished:
            raise StopIteration

        subsection = self._subsection()
        return subsection

        for item in subsection:
            pass

一些测试：

for item in repeating_sections(x):
    print(list(item))
#>>> [38, 1200, 1200, 306, 306, 391, 391, 82, 82, 35, 35, 902, 902, 955, 955, 13]
#>>> [955, 847, 847, 835, 835, 698, 698, 777, 777, 896, 896, 923, 923, 940, 940, 569, 569, 53, 53, 411]
#>>> [53, 1009, 1009, 1884]
#>>> [1009, 878]
#>>> [923, 886, 886, 511, 511, 942, 942, 1067, 1067, 1888, 1888, 243, 243, 1556]

for item in repeating_sections([1, 0, 0, 1, 0, 0, 1]):
    print(list(item))
#>>> [1, 0, 0, 1]
#>>> [0, 0]
#>>> [1]

一些时间来证明这并不是完全没有意义的：

SETUP="
x = [38, 1200, 1200, 306, 306, 391, 391, 82, 82, 35, 35, 902, 902, 955, 955, 13, 955, 847, 847, 835, 83, 5698, 698, 777, 777, 896, 896, 923, 923, 940, 940, 569, 569, 53, 53, 411]
x *= 5000

class repeating_sections:
    def __init__(self, iterable):
        self.iter = iter(iterable)

        try:
            self._cache = next(self.iter)
            self.finished = False
        except StopIteration:
            self.finished = True

    def _subsection(self):
        yield self._cache

        try:
            while True:
                item1 = next(self.iter)

                try:
                    item2 = next(self.iter)
                except StopIteration:
                    yield item1
                    raise

                if item1 == item2:
                    yield item1
                    yield item2

                else:
                    yield item1
                    self._cache = item2
                    return

        except StopIteration:
            self.finished = True

    def __iter__(self):
        return self

    def __next__(self):
        if self.finished:
            raise StopIteration

        subsection = self._subsection()
        return subsection

        for item in subsection:
            pass

def weird_split(alist):
    sublist = []
    for i, n in enumerate(alist[:-1]):
        sublist.append(n)
        # make sure we only create a new list if the current one is not empty
        if len(sublist) > 1 and n != alist[i-1] and n != alist[i+1]:
            yield sublist
            sublist = []
    # always add the last element
    sublist.append(alist[-1])
    yield sublist
"

python -m timeit -s "$SETUP" "for section in repeating_sections(x):" "    for item in section: pass"
python -m timeit -s "$SETUP" "for section in weird_split(x):"        "    for item in section: pass"

结果：

10 loops, best of 3: 150 msec per loop
10 loops, best of 3: 207 msec per loop

虽然差别不大，但还是快多了。

赞(0）回复(0）举报 2023-01-30

sdnqo3pr5#

def group(l,skip=0):
    prevind = 0
    currind = skip+1
    for val in l[currind::2]:
        if val != l[currind-1]:
            if currind-prevind-1 > 1: yield l[prevind:currind-1]
            prevind = currind-1
        currind += 2
    if prevind != currind:
        yield l[prevind:currind]

对于您定义的列表，当使用skip=1调用时返回

[38, 1200, 1200, 306, 306, 391, 391, 82, 82, 35, 35, 902, 902, 955, 955]
[13, 955, 847, 847, 835, 835, 698, 698, 777, 777, 896, 896, 923, 923, 940, 940, 569, 569, 53, 53]
[411, 53, 1009, 1009]
[1884, 1009]
[878, 923, 886, 886, 511, 511, 942, 942, 1067, 1067, 1888, 1888, 243, 243, 1556]

下面是一个简单的示例列表[1,1,3,3,2,5]：

for g in group(l2):
    print g

[1, 1, 3, 3]
[2, 5]

skip是该函数的可选参数的原因是，在您的示例中，尽管 * 38 * 不等于 * 1200 *，但它仍被包含在内。如果这是一个错误，则只需删除skip并将currind初始设置为等于1。

- 说明：**

在一个列表[a,b,c,d,e,...]中，我们想连续比较两个元素，即a == b，c == d，然后当比较没有返回True时，捕获所有前面的元素（不包括那些已经捕获的）。为此，我们需要跟踪上次捕获发生的位置，其初始值为0（即没有捕获）。然后我们检查每一对，通过遍历列表中从currind开始的每个 * 第二 * 元素（不跳过元素时）为1。然后将从l[currind::2]获得的值与它之前的值l[currind-1]进行比较。currind是currind '中每个 * 第二 * 个元素的索引初始值（默认为1）。如果值 * 不 * 匹配，那么我们需要执行捕获，但只有当结果捕获将包含一个项!因此currind-prevind-1〉1（因为列表分片的长度为-1，所以它需要为2或更大才能提取至少1个元素）。l[prevind:currind-1]执行此捕获，从上次不匹配的比较的索引开始（或默认为0），直到每个比较对a,b或c,d等中的第一个值之前的元素***。然后prevind被设置为currind-1，即捕获的最后一个元素的索引。然后，我们将currind递增2，以到达下一个val的索引，最后，如果有一个对剩余，我们提取它。
因此，对于[1,1,3,3,2,5]：

val is 1, at index 1. comparing to value at 0 i.e 1
make currind the index of last element of the next pair
val is 3, at index 3. comparing to value at 2 i.e 3
make currind the index of last element of the next pair
val is 5, at index 5. comparing to value at 4 i.e 2
not equal so get slice between 0,4
[1, 1, 3, 3]
make currind the index of last element of the next pair  #happens after the for loop
[2, 5]

赞(0）回复(0）举报 2023-01-30

我来回答

numpy 基于连续编号模式停止的时间拆分列表

5条答案

相关问题

热门标签

最新问答