regex 将字典解析与包含的字符串拆分操作一起使用

e5nqia27  于 2023-06-07  发布在  其他
关注(0)|答案(2)|浏览(175)

考虑一个小的属性解析器片段:

testx="""var1 = foo
         var2 = bar"""

dd = { l.split('=')[0].strip():l.split('=')[1].strip() for l in testx.split('\n')} 
print(dd)
# {'var1': 'foo', 'var2': 'bar'}

这是可行的,但由于在l.split('=')[0].strip():l.split('=')[1].strip()中两次调用了“split”,所以很难看。如何将字典理解更改为只需要拆分一次,然后将字典条目构建为:

l[0].strip():l[1].strip()

这种重构需要嵌套的理解还是不同的方式来构造单级理解?

ryhaxcpt

ryhaxcpt1#

如果你使用的是Python >= 3.8,这就是为什么要添加赋值表达式的原因:

>>> {(parts:=l.split('='))[0].strip(): parts[1].strip() for l in testx.split("\n")}
{'var1': 'foo', 'var2': 'bar'}

在此之前,您可以执行以下操作:

>>> {key.strip():value.strip() for l in testx.split('\n') for key, value in [l.split("=")]}
{'var1': 'foo', 'var2': 'bar'}

老实说,我觉得更有可读性。
但老实说,这两个对我来说都是相当不可读的。在一天结束的时候,我不认为你能打败:

>>> result = {}
>>> for l in testx.split("\n"):
...     key, value = l.split("=")
...     result[key.strip()] = value.strip()
...
>>> result
{'var1': 'foo', 'var2': 'bar'}

编辑

注意,for <target list> in [<expression>]习惯用法实际上在Python 3.9中已经 * 优化 *:
https://docs.python.org/3/whatsnew/3.9.html#optimizations
优化了解析中分配临时变量的习惯用法。现在,解析式中的for y in [expr]与简单赋值y = expr一样快。例如:
sums = [s for s in [0] for x in data for s in [s + x]]
:=操作符不同,该习惯用法不会将变量泄漏到外部作用域。
比较Pyhton 3.8和Pyhton 3.9中的字节码,你会注意到Python 3.9版本中没有嵌套迭代:
Python 3.8:

Python 3.8.1 (default, Jan  8 2020, 16:15:59)
[Clang 4.0.1 (tags/RELEASE_401/final)] :: Anaconda, Inc. on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dis
>>> dis.dis('{k:v for l in "a b|c d".split("|") for k,v in [l.split()]}')
  1           0 LOAD_CONST               0 (<code object <dictcomp> at 0x7fdbd6249d40, file "<dis>", line 1>)
              2 LOAD_CONST               1 ('<dictcomp>')
              4 MAKE_FUNCTION            0
              6 LOAD_CONST               2 ('a b|c d')
              8 LOAD_METHOD              0 (split)
             10 LOAD_CONST               3 ('|')
             12 CALL_METHOD              1
             14 GET_ITER
             16 CALL_FUNCTION            1
             18 RETURN_VALUE

Disassembly of <code object <dictcomp> at 0x7fdbd6249d40, file "<dis>", line 1>:
  1           0 BUILD_MAP                0
              2 LOAD_FAST                0 (.0)
        >>    4 FOR_ITER                30 (to 36)
              6 STORE_FAST               1 (l)
              8 LOAD_FAST                1 (l)
             10 LOAD_METHOD              0 (split)
             12 CALL_METHOD              0
             14 BUILD_TUPLE              1
             16 GET_ITER
        >>   18 FOR_ITER                14 (to 34)
             20 UNPACK_SEQUENCE          2
             22 STORE_FAST               2 (k)
             24 STORE_FAST               3 (v)
             26 LOAD_FAST                2 (k)
             28 LOAD_FAST                3 (v)
             30 MAP_ADD                  3
             32 JUMP_ABSOLUTE           18
        >>   34 JUMP_ABSOLUTE            4
        >>   36 RETURN_VALUE

对比Python 3.9:

Python 3.9.0 | packaged by conda-forge | (default, Oct 14 2020, 22:56:29)
[Clang 10.0.1 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import dis
>>> dis.dis('{k:v for l in "a b|c d".split("|") for k,v in [l.split()]}')
  1           0 LOAD_CONST               0 (<code object <dictcomp> at 0x7fb3587d1870, file "<dis>", line 1>)
              2 LOAD_CONST               1 ('<dictcomp>')
              4 MAKE_FUNCTION            0
              6 LOAD_CONST               2 ('a b|c d')
              8 LOAD_METHOD              0 (split)
             10 LOAD_CONST               3 ('|')
             12 CALL_METHOD              1
             14 GET_ITER
             16 CALL_FUNCTION            1
             18 RETURN_VALUE

Disassembly of <code object <dictcomp> at 0x7fb3587d1870, file "<dis>", line 1>:
  1           0 BUILD_MAP                0
              2 LOAD_FAST                0 (.0)
        >>    4 FOR_ITER                22 (to 28)
              6 STORE_FAST               1 (l)
              8 LOAD_FAST                1 (l)
             10 LOAD_METHOD              0 (split)
             12 CALL_METHOD              0
             14 UNPACK_SEQUENCE          2
             16 STORE_FAST               2 (k)
             18 STORE_FAST               3 (v)
             20 LOAD_FAST                2 (k)
             22 LOAD_FAST                3 (v)
             24 MAP_ADD                  2
             26 JUMP_ABSOLUTE            4
        >>   28 RETURN_VALUE
uujelgoq

uujelgoq2#

使用re.findall

import re
testx="""var1 = foo
         var2 = bar"""

dct = dict(re.findall(r'(\S+)\s*=\s*(\S+)', testx))
print(dct)
# {'var1': 'foo', 'var2': 'bar'}

相关问题