numpy 基于连续编号模式停止的时间拆分列表

am46iovg  于 2022-12-23  发布在  其他
关注(0)|答案(5)|浏览(121)

我有一个现有的列表。我想在后面的数字不等于它前面的值时将它分解成单独的列表。

x = [1,4,4,5,5,8,8,10,10,25,25,70,70,90,90,100,2,3,3,4,4,5,5,8,8,9,20,21,21,22,23)

所需的输出应如下所示:

a = [1,4,4,5,5,8,8,10,10,25,25,70,70,90,90,100)

 b = [2,3,3,4,4,5,5,8,8,9)

 c = [20,21,21,22]

 d = [23]
mfpqipee

mfpqipee1#

为了回答您的问题:
我有[...]一个列表。每当后面的数字不等于它前面的值时,我想把它分成单独的列表。
看看itertools.groupby
示例:

import itertools
l = [38, 1200, 1200, 306, 306, 391, 391, 82, 82, 35, 35, 902, 902, 955, 955, 13]
for x, v in itertools.groupby(l):
    # `v` is an iterator that yields all subsequent elements
    # that have the same value
    # `x` is that value
    print list(v)

输出为:

$ python test.py
[38]
[1200, 1200]
[306, 306]
[391, 391]
[82, 82]
[35, 35]
[902, 902]
[955, 955]
[13]

显然这就是你想要的?
至于你的模式,这里有一些生成器函数,它至少能为给定的输入产生你所期望的输出:

import itertools

def split_sublists(input_list):
    sublist = []
    for val, l in itertools.groupby(input_list):
        l = list(l)
        if not sublist or len(l) == 2:
            sublist += l
        else:
            sublist += l
            yield sublist
            sublist = []
    yield sublist

input_list = [1,4,4,5,5,8,8,10,10,25,25,70,70,90,90,100,2,3,3,4,4,5,5,8,8,9,20,21,21,22,23]
for sublist in split_sublists(input_list):
    print sublist

输出:

$ python test.py
[1, 4, 4, 5, 5, 8, 8, 10, 10, 25, 25, 70, 70, 90, 90, 100]
[2, 3, 3, 4, 4, 5, 5, 8, 8, 9]
[20, 21, 21, 22]
[23]
afdcj2ne

afdcj2ne2#

numpy 版本:

>>> inds = np.where(np.diff(x))[0]
>>> out = np.split(x,inds[np.diff(inds)==1][0::2]+2)
>>> for n in out:
...     print n

[  38 1200 1200  306  306  391  391   82   82   35   35  902  902  955  955
   13]
[955 847 847 835 835 698 698 777 777 896 896 923 923 940 940 569 569  53
  53 411]
[  53 1009 1009 1884]
[1009  878]
[ 923  886  886  511  511  942  942 1067 1067 1888 1888  243  243 1556]

你的新案子是一样的:

>>> inds = np.where(np.diff(x))[0]
>>> out = np.split(x,inds[np.diff(inds)==1][0::2]+2)
>>> for n in out:
...     print n
...
[  1   4   4   5   5   8   8  10  10  25  25  70  70  90  90 100]
[2 3 3 4 4 5 5 8 8 9]
[20 21 21 22]
[23]

x作为列表开始:

%timeit inds = np.where(np.diff(x))[0];out = np.split(x,inds[np.diff(inds)==1][0::2]+2)
10000 loops, best of 3: 169 µs per loop

如果x是numpy数组:

%timeit inds = np.where(np.diff(arr_x))[0];out = np.split(arr_x,inds[np.diff(inds)==1][0::2]+2)
10000 loops, best of 3: 135 µs per loop

对于较大的系统,你可以期待numpy比python有更好的性能。

luaexgnf

luaexgnf3#

下面是我的丑陋的解决方案:

x = [38, 1200, 1200, 306, 306, 391, 391, 82, 82, 35, 35, 902, 902, 955, 955, 13, 955, 847, 847, 835, 83, 5698, 698, 777, 777, 896, 896, 923, 923, 940, 940, 569, 569, 53, 53, 411]

def weird_split(alist):
    sublist = []
    for i, n in enumerate(alist[:-1]):
        sublist.append(n)
        # make sure we only create a new list if the current one is not empty
        if len(sublist) > 1 and n != alist[i-1] and n != alist[i+1]:
            yield sublist
            sublist = []
    # always add the last element
    sublist.append(alist[-1])
    yield sublist

for sublist in weird_split(x):
    print sublist

并且输出:

[38, 1200, 1200, 306, 306, 391, 391, 82, 82, 35, 35, 902, 902, 955, 955, 13]
[955, 847, 847, 835]
[83, 5698]
[698, 777, 777, 896, 896, 923, 923, 940, 940, 569, 569, 53, 53, 411]
cwdobuhd

cwdobuhd4#

首先,您还没有定义[1, 0, 0, 1, 0, 0, 1]的行为,因此这会将其拆分为[1, 0, 0, 1][0, 0][1]
其次,有很多极端情况需要正确处理,所以它比你想象的要长,如果它直接把东西放进列表,它也会缩短,但是生成器是个好东西,所以我确保不那样做。
首先,使用完整的迭代器接口,而不是yield快捷方式,因为它允许更好地共享外部和内部迭代器之间的状态,而无需在每次迭代时生成新的subsection生成器。带有yield s的嵌套def可能能够在更少的空间内实现这一点,但在这种情况下,我认为冗长是可以接受的。
因此,设置:

class repeating_sections:
    def __init__(self, iterable):
        self.iter = iter(iterable)

        try:
            self._cache = next(self.iter)
            self.finished = False
        except StopIteration:
            self.finished = True

我们需要定义一个子迭代器,它在找到一个不匹配的对之前产生,因为结尾将从迭代器中移除,我们需要在下一次调用_subsection时将其yield,所以将其存储在_cache中。

def _subsection(self):
        yield self._cache

        try:
            while True:
                item1 = next(self.iter)

                try:
                    item2 = next(self.iter)
                except StopIteration:
                    yield item1
                    raise

                if item1 == item2:
                    yield item1
                    yield item2

                else:
                    yield item1
                    self._cache = item2
                    return

        except StopIteration:
            self.finished = True

__iter__应为可迭代项返回self

def __iter__(self):
        return self

__next__返回一个子段,除非完成。注意,如果要使行为可靠,则完成该子段是很重要的。

def __next__(self):
        if self.finished:
            raise StopIteration

        subsection = self._subsection()
        return subsection

        for item in subsection:
            pass

一些测试:

for item in repeating_sections(x):
    print(list(item))
#>>> [38, 1200, 1200, 306, 306, 391, 391, 82, 82, 35, 35, 902, 902, 955, 955, 13]
#>>> [955, 847, 847, 835, 835, 698, 698, 777, 777, 896, 896, 923, 923, 940, 940, 569, 569, 53, 53, 411]
#>>> [53, 1009, 1009, 1884]
#>>> [1009, 878]
#>>> [923, 886, 886, 511, 511, 942, 942, 1067, 1067, 1888, 1888, 243, 243, 1556]

for item in repeating_sections([1, 0, 0, 1, 0, 0, 1]):
    print(list(item))
#>>> [1, 0, 0, 1]
#>>> [0, 0]
#>>> [1]

一些时间来证明这并不是完全没有意义的:

SETUP="
x = [38, 1200, 1200, 306, 306, 391, 391, 82, 82, 35, 35, 902, 902, 955, 955, 13, 955, 847, 847, 835, 83, 5698, 698, 777, 777, 896, 896, 923, 923, 940, 940, 569, 569, 53, 53, 411]
x *= 5000

class repeating_sections:
    def __init__(self, iterable):
        self.iter = iter(iterable)

        try:
            self._cache = next(self.iter)
            self.finished = False
        except StopIteration:
            self.finished = True

    def _subsection(self):
        yield self._cache

        try:
            while True:
                item1 = next(self.iter)

                try:
                    item2 = next(self.iter)
                except StopIteration:
                    yield item1
                    raise

                if item1 == item2:
                    yield item1
                    yield item2

                else:
                    yield item1
                    self._cache = item2
                    return

        except StopIteration:
            self.finished = True

    def __iter__(self):
        return self

    def __next__(self):
        if self.finished:
            raise StopIteration

        subsection = self._subsection()
        return subsection

        for item in subsection:
            pass

def weird_split(alist):
    sublist = []
    for i, n in enumerate(alist[:-1]):
        sublist.append(n)
        # make sure we only create a new list if the current one is not empty
        if len(sublist) > 1 and n != alist[i-1] and n != alist[i+1]:
            yield sublist
            sublist = []
    # always add the last element
    sublist.append(alist[-1])
    yield sublist
"

python -m timeit -s "$SETUP" "for section in repeating_sections(x):" "    for item in section: pass"
python -m timeit -s "$SETUP" "for section in weird_split(x):"        "    for item in section: pass"

结果:

10 loops, best of 3: 150 msec per loop
10 loops, best of 3: 207 msec per loop

虽然差别不大,但还是快多了。

sdnqo3pr

sdnqo3pr5#

def group(l,skip=0):
    prevind = 0
    currind = skip+1
    for val in l[currind::2]:
        if val != l[currind-1]:
            if currind-prevind-1 > 1: yield l[prevind:currind-1]
            prevind = currind-1
        currind += 2
    if prevind != currind:
        yield l[prevind:currind]

对于您定义的列表,当使用skip=1调用时返回

[38, 1200, 1200, 306, 306, 391, 391, 82, 82, 35, 35, 902, 902, 955, 955]
[13, 955, 847, 847, 835, 835, 698, 698, 777, 777, 896, 896, 923, 923, 940, 940, 569, 569, 53, 53]
[411, 53, 1009, 1009]
[1884, 1009]
[878, 923, 886, 886, 511, 511, 942, 942, 1067, 1067, 1888, 1888, 243, 243, 1556]

下面是一个简单的示例列表[1,1,3,3,2,5]

for g in group(l2):
    print g

[1, 1, 3, 3]
[2, 5]

skip是该函数的可选参数的原因是,在您的示例中,尽管 * 38 * 不等于 * 1200 *,但它仍被包含在内。如果这是一个错误,则只需删除skip并将currind初始设置为等于1

    • 说明:**

在一个列表[a,b,c,d,e,...]中,我们想连续比较两个元素,即a == bc == d,然后当比较没有返回True时,捕获所有前面的元素(不包括那些已经捕获的)。为此,我们需要跟踪上次捕获发生的位置,其初始值为0(即没有捕获)。然后我们检查每一对,通过遍历列表中从currind开始的每个 * 第二 * 元素(不跳过元素时)为1。然后将从l[currind::2]获得的值与它之前的值l[currind-1]进行比较。currindcurrind '中每个 * 第二 * 个元素的索引初始值(默认为1)。如果值 * 不 * 匹配,那么我们需要执行捕获,但只有当结果捕获将包含一个项!因此currind-prevind-1〉1(因为列表分片的长度为-1,所以它需要为2或更大才能提取至少1个元素)。l[prevind:currind-1]执行此捕获,从上次不匹配的比较的索引开始(或默认为0),直到每个比较对a,bc,d等中的第一个值之前的元素***。然后prevind被设置为currind-1,即捕获的最后一个元素的索引。然后,我们将currind递增2,以到达下一个val的索引,最后,如果有一个对剩余,我们提取它。
因此,对于[1,1,3,3,2,5]

val is 1, at index 1. comparing to value at 0 i.e 1
make currind the index of last element of the next pair
val is 3, at index 3. comparing to value at 2 i.e 3
make currind the index of last element of the next pair
val is 5, at index 5. comparing to value at 4 i.e 2
not equal so get slice between 0,4
[1, 1, 3, 3]
make currind the index of last element of the next pair  #happens after the for loop
[2, 5]

相关问题