pandas 如何根据不同列表中的数字为列表/系列/数组中的所有元素分配数字标签？

nvbavucw 于 2023-03-11 发布在其他

关注(0)|答案(5)|浏览(88)

我有两个包含两个数字序列的列表，例如：

A = [1.0, 2.9, 3.4, 4.2, 5.5....100.3]
B = [1.1, 1.2, 1.3, 2.5, 3.0, 3.1, 5.2]

我想根据列表B中的元素是否福尔斯在列表A的（任何）区间内来创建另一个标签列表。

C = [group_1, group_1, group_1, group_1, group_2, group_2, group_3]

即1.1、1.2、1.3、2.5都落在列表A的1.0 - 2.9的区间内，因此是group_1;3.0、3.1均落在2.9 - 3.4区间内，属第2组;和5.2福尔斯在4.2 - 5.5的区间内，因此是组_3等。
列表B中的数字落在列表A的哪个区间并不重要，关键是要以连续的方式对列表B中的所有元素进行分组/标记。
原始数据很大，因此不可能手动将标签/组分配给列表B中的元素。

pandas

来源：https://stackoverflow.com/questions/75668272/how-to-assign-numeric-labels-to-all-elements-in-a-list-series-array-based-on-num

5条答案

按热度按时间

zf9nrax11#

因此，假设A已经排序，那么可以使用二进制搜索，它已经在（相当笨拙的）bisect模块中的python标准库中提供了：

>>> A = [1.0, 2.9, 3.4, 4.2, 5.5]
>>> B = [1.1, 1.2, 1.3, 2.5, 3.0, 3.1, 5.2]
>>> [bisect.bisect_left(A, b) for b in B]
[1, 1, 1, 1, 2, 2, 4]

这需要O(N * logN)时间。
注意，要仔细阅读documentation，当B中的值等于A中的值时，bisect_left和bisect_right的行为，以及不会落在任何地方的项的行为。

赞(0）回复(0）举报 2023-03-11

wsxa1bj12#

你可以在O(n)解决方案中尝试这个方法（假设两个列表都排序了，并且一个数字必须在A中的一个区间内）：

A = [1.0, 2.9, 3.4, 4.2, 5.5, 100.3]
B = [1.1, 1.2, 1.3, 2.5, 3.0, 3.1, 5.2]

grp = 0
i1, i2 = iter(A), iter(B)
a, b = next(i1), next(i2)

out = []
while True:
    try:
        if a < b:
            a = next(i1)
            grp += 1
        else:
            out.append(grp)
            b = next(i2)
    except StopIteration:
        break

print(out)

图纸：

[1, 1, 1, 1, 2, 2, 4]

赞(0）回复(0）举报 2023-03-11

h4cxqtbf3#

您可以在O(len(B))中根据以下代码进行回答：

C= [0]*len(B)
i, j = 0, 0

while i < len(B):
    if (B[i] > A[j] and B[i] < A[j+1]):
        C[i] = j
        i += 1
    else:
        j += 1

赞(0）回复(0）举报 2023-03-11

dgenwo3n4#

我认为itertools.groupby加上一个微小的可变“key函数”会非常适合（特别是当需求可能改变，或者您需要在其他地方使用此模式时）：

import itertools

class ThresholdIndexer:
    """Callable that returns the index of the last threshold <= arg.

    Preconditions:
      - thresholds is not empty.
      - thresholds is sorted.
      - For all calls, `thresholds[0] <= call[i].arg <= thresholds[-1]`.
      - For all calls, `call[i - 1].arg <= call[i].arg`.
    """

    def __init__(self, thresholds):
        self.thresholds = thresholds
        self.i = 0

    def __call__(self, arg):
        while not (self.thresholds[self.i] <= arg <= self.thresholds[self.i + 1]):
            self.i += 1
        return self.i 

A = [1.0, 2.9, 3.4, 4.2, 5.5, 100.3]
B = [1.1, 1.2, 1.3, 2.5, 3.0, 3.1, 5.2]

for group_key, group_items in itertools.groupby(B, key=ThresholdIndexer(A)):
    print(f'{group_key}: {", ".join(str(i) for i in group_items)}')

"""Output:
0: 1.1, 1.2, 1.3, 2.5
1: 3.0, 3.1
3: 5.2
"""

这个方法是O（NA + NB）。
你可以通过二进制搜索__call__中的正确索引来消除这些先决条件，而不是假设后面的某个索引“肯定”是正确的，但是复杂度会上升到O（NB × log NA）。

赞(0）回复(0）举报 2023-03-11

wqnecbli5#

试试这个：

import numpy as np

A = [1.0, 2.9, 3.4, 4.2, 5.5, 100.3]
B = [1.1, 1.2, 1.3, 2.5, 3.0, 3.1, 5.2]
A_arr = np.array(A)
B_arr = np.array(B)
C = [np.searchsorted(A_arr, b) for b in B_arr]
print(C)
>>>
[1, 1, 1, 1, 2, 2, 4]

赞(0）回复(0）举报 2023-03-11

我来回答

pandas 如何根据不同列表中的数字为列表/系列/数组中的所有元素分配数字标签？

5条答案

相关问题

热门标签

最新问答