寻找一种在pandas dataframe中根据OR、AND和括号拆分字符串的方法

zpgglvta  于 2023-05-12  发布在  其他
关注(0)|答案(1)|浏览(139)

所以我在pandas中有一个数据框,它由一个包含Components的列和一个包含Constraints的行组成。这些约束决定了必须在什么类别中过滤组件。现在这些约束不是很直接,所以我正在寻找一种方法将它们拆分为多个更小、更可读的约束。例如,如果一个约束是'A and(B or C)',我想分成两行'A and B'和'A and C'。但并非所有约束都像本例一样简单。
下面是dataframe的一小部分可能的样子:
| 成分|约束|
| --------------|--------------|
| 一百二十三|A和(B或C)|
| 四五六|((MIRROR ='ELECTRIC' and MIRRORCAMERA!='NO')或(MIRROR ='MANUAL'和(MIRROR_RIGHT!='NO'或MIRROR_LEFT!='NO')))和STEERWHEEL_LOCK='NO'|
| 七八九|LENGTH='122' or(LENGTH='135' and BATTERY='551')or LENGTH='149' or(LENGTH='181' and(BATTERY='674' or(BATTERY='551' and CHARGER!='NO')))|

import pandas as pd
dataex = {'Component': [123, 
                        456, 
                        789], 
          'Constraint': ["A and (B or C)", 
                         "((MIRROR='ELECTRIC' and MIRRORCAMERA!='NO') or (MIRROR='MANUAL' and (MIRROR_RIGHT!='NO' or MIRROR_LEFT!='NO'))) and STEERWHEEL_LOCK='NO'", 
                         "LENGTH='122' or (LENGTH='135' and BATTERY='551') or LENGTH='149' or (LENGTH='181' and (BATTERY='674' or (BATTERY='551' and CHARGER!='NO')))"]}
df_example = pd.DataFrame(data=dataex)

就像我说的,我希望根据约束中的and和or和括号将所有这些拆分为多行(如果需要的话)。因此,我有以下结果:
| 组件|约束|
| --------------|--------------|
| 一百二十三|A和B|
| 一百二十三|A和C|
| 四五六|STEERWHEEL_LOCK ='NO'和MIRROR ='ELECTRIC'和MIRRORCAMERA!='没有'|
| 四五六|STEERWHEEL_LOCK ='NO'和MIRROR ='MANUAL'和MIRROR_RIGHT!='否'|
| 四五六|STEERWHEEL_LOCK ='NO'和MIRROR ='MANUAL'和MIRROR_LEFT!='没有'|
| 七八九|长度='122'|
| 七八九|长度='135'和电池='551'|
| 七八九|长度='149'|
| 七八九|长度='181'和电池='674'|
| 七八九|长度='181'和电池='551'和充电器!='否'|

import pandas as pd
datares = {'Component':[123, 123, 456, 456, 456, 789, 789, 789, 789, 789],
           'Constraint':["A and B",
                         "A and C",
                         "STEERWHEEL_LOCK='NO' and MIRROR='ELECTRIC' and MIRRORCAMERA!='NO'",
                         "STEERWHEEL_LOCK='NO' and MIRROR='MANUAL' and MIRROR_RIGHT!='NO'",
                         "STEERWHEEL_LOCK='NO' and MIRROR='MANUAL' and MIRROR_LEFT!='NO'",
                         "LENGTH='122'",
                         "LENGTH='135' and BATTERY='551'",
                         "LENGTH='149'",
                         "LENGTH='181' and BATTERY='674'",
                         "LENGTH='181' and BATTERY='551' and CHARGER!='NO'"
                        ]}
df_result = pd.DataFrame(data=datares)

我试过拆分'or'上的约束,然后将它们分成数组,然后在它们上循环以获得结果,但对于一些更困难的约束,你会得到数组中的数组中的数组,然后过一段时间就会变得非常混乱。我也试过做一种逻辑树,但我还没有让它在Python中工作。
我希望你们中的一些人可能有一个好主意或模块来帮助我解决我的问题。谢谢!

kq4fsx7k

kq4fsx7k1#

从你的描述中,我认为你需要把表达式放在“析取范式”(DNF)中,它看起来像Or(And(v1,v2),And(v1,v4),...)。下面的代码将使用一个额外的包来完成这一点(DNF在电子设计自动化中很常见)。要安装软件包,请执行pip3 install pyeda
将表达式拆分为相应的And表达式的代码如下所示。
备注:

  • 如果过滤器中有超过数字/字母(如问号等),则匹配过滤器的正则表达式可能需要调整。
  • 你没有例子来说明如何否定一个独立变量。我用的是“不是(A)”
  • 如果没有独立变量(只有比较变量),代码会简单得多
  • 如果你有更多的操作符(比如小于,等等),代码会稍微复杂一些

总体思路:

  • 将你的表达式转换成布尔表达式(x & y)|这是通过使所有比较成为变量来完成的。
  • 将布尔表达式转换为DNF
  • 用原始比较替换回变量。
from pyeda.inter import *
import re
import pandas as pd
dataex = {'Component': [123, 
                        456, 
                        789], 
          'Constraint': ["A and (B or C)", 
                         "((MIRROR='ELECTRIC' and MIRRORCAMERA!='NO') or (MIRROR='MANUAL' and (MIRROR_RIGHT!='NO' or MIRROR_LEFT!='NO'))) and STEERWHEEL_LOCK='NO'", 
                         "LENGTH='122' or (LENGTH='135' and BATTERY='551') or LENGTH='149' or (LENGTH='181' and (BATTERY='674' or (BATTERY='551' and CHARGER!='NO')))"]}
df_example = pd.DataFrame(data=dataex)

def transform_expr(input_expression):

    def move_not_before(x):
        if '!=' in x.group(1):
            return '~'+x.group(1).replace("!=","=")
        else:
            return x.group(1)

    expr1 = (re.sub("([a-zA-Z0-9'_]*!=[a-zA-Z0-9'_]*)", move_not_before,input_expression))

    variables = {}
    values = {}
    current_key = 0
    expr2 = ""
    last_idx = 0
    # Make transformations to reach a boolean form
    for idx in re.finditer("([a-zA-Z0-9'_]*=[a-zA-Z0-9'_]*)", expr1):
        expr2 += expr1[last_idx:idx.span(1)[0]]
        if idx[1] in values:
            expr2 += values[idx[1]]
        else:
            expr2 += f'v{current_key}'
            variables[f'v{current_key}']=idx[1]
            values[idx[1]] = f'v{current_key}'
            current_key+=1
        last_idx = idx.span(1)[1]
    expr2 += expr1[last_idx:]

    expr3 = re.sub("and", "&", expr2)
    expr4 = re.sub("or", "|", expr3)
    expr5 = expr(expr4).to_dnf()

    result = []
    # We know expr5 is like Or(And(...),And(...)...), xs has the children
    for v in expr5.xs:
        # We remove the And(...)
        if "," in str(v):
            arr = str(v)[4:-1].replace(" ","").split(",")
        else:
            # For cases in which you have only one variable
            arr = [str(v)]
        r = []
        for x in arr:
            if x[0]=="~":
                variable_name = x[1:]
            else:
                variable_name = x

            if variable_name not in variables:
                # How do we negate a standalone variable?
                if x[0]=="~":
                    variable_value = f"not({variable_name})"
                else:
                    variable_value = variable_name
            else:
                variable_value = variables[variable_name]

            if x[0]=="~":
                r.append(variable_value.replace("=","!="))
            else:
                r.append(variable_value)

        result.append(" and ".join(r))

    return result

df_example['Constraint'] = df_example['Constraint'].map(transform_expr)

df_result = df_example.explode('Constraint')

pd.set_option("max_colwidth", None)
print(df_result)

它将打印:

Component                                                         Constraint
0        123                                                            A and B
0        123                                                            A and C
1        456  MIRROR='ELECTRIC' and MIRRORCAMERA!='NO' and STEERWHEEL_LOCK='NO'
1        456    MIRROR='MANUAL' and MIRROR_RIGHT!='NO' and STEERWHEEL_LOCK='NO'
1        456     MIRROR='MANUAL' and MIRROR_LEFT!='NO' and STEERWHEEL_LOCK='NO'
2        789                                                       LENGTH='122'
2        789                                                       LENGTH='149'
2        789                                     LENGTH='135' and BATTERY='551'
2        789                                     LENGTH='181' and BATTERY='674'
2        789                   BATTERY='551' and LENGTH='181' and CHARGER!='NO'

相关问题