python 使用lark分析reST标记语言,如section

ebdffaop  于 2023-01-19  发布在  Python
关注(0)|答案(1)|浏览(159)

我想定义一个基本的语法,比如开始使用lark。下面是我的M(不是)WE。

from lark import Lark

GRAMMAR = r"""
?start: _NL* (day_heading)*

day_heading : "==" _NL day_nb _NL "==" _NL+ (paragraph _NL)*
day_nb      : /\d{2}/
paragraph   : /[^\n={2}]+/ (_NL+ paragraph)*
_NL         : /(\r?\n[\t ]*)+/
"""

parser = Lark(GRAMMAR)

tree = parser.parse("""

==
12
==

Bla, bla
Bli, Bli


Blu, Blu

==
10
==

Blo, blo

    """)

print(tree.pretty())

这将打印:

start
  day_heading
    day_nb      12
    paragraph
      Bla, bla
      paragraph
        Bli, Bli
        paragraph       Blu, Blu
  day_heading
    day_nb      10
    paragraph   Blo, blo

在YAML格式中,树如下所示。

- section:
  - day: 12
  - contents:
    - paragraph: |
      Bla, bla
      Bli, bli
    - paragraph: "Blu, blu"

- section:
  - day: 10
  - content:
    - paragraph: "Blo, blo"

如何修改我的EBNF?

s3fp2yjn

s3fp2yjn1#

下面是一个可能的答案。用NL替换_NL可以保留新行。

from lark import Lark

GRAMMAR = r"""
?start: _NL* (day_heading)*

day_heading : "==" _NL day_nb _NL "==" _NL+ (paragraph)+
day_nb      : /\d{2}/

paragraph : (line _NL)+

line : /[^\n={2}]+/
_NL  : /(\r?\n[\t ]*)+/
"""

parser = Lark(GRAMMAR)

tree = parser.parse("""

==
12
==

Bla, bla
Bli, Bli


Blu, Blu

==
10
==

Blo, blo

    """)

print(tree.pretty())

这产生:

start
  day_heading
    day_nb      12
    paragraph
      line      Bla, bla
      line      Bli, Bli
      line      Blu, Blu
  day_heading
    day_nb      10
    paragraph
      line      Blo, blo

相关问题