我有一个输入文件,其中有多行和多个字段,用空格分隔。我的定义文件是:scanner.xrl
:
Definitions.
DIGIT = [0-9]
ALPHANUM = [0-9a-zA-Z_]
Rules.
(\s|\t)+ : skip_token.
\n : {end_token, {new_line, TokenLine}}.
{ALPHANUM}+ : {token, {string, TokenLine, TokenChars}}.
Erlang code.
parser.yrl
:
Nonterminals line.
Terminals string.
Rootsymbol line.
Endsymbol new_line.
line -> string : ['$1'].
line -> string line: ['$1'|'$2'].
Erlang code.
当按原样运行时,第一行被解析,然后停止:
1> A = <<"a b c\nd e\nf\n">>.
2> {ok, T, _} = scanner:string(binary_to_list(A)).
{ok,[{string,1,"a"},
{string,1,"b"},
{string,1,"c"},
{new_line,1},
{string,2,"d"},
{string,2,"e"},
{new_line,2},
{string,3,"f"},
{new_line,3}],
4}
3> parser:parse(T).
{ok,[{string,1,"a"},{string,1,"b"},{string,1,"c"}]}
如果我从parser.yrl
中删除Endsymbol
行,并将scanner.xrl
文件更改为:
Definitions.
DIGIT = [0-9]
ALPHANUM = [0-9a-zA-Z_]
Rules.
(\s|\t|\n)+ : skip_token.
{ALPHANUM}+ : {token, {string, TokenLine, TokenChars}}.
Erlang code.
我的所有行都被解析为单个项:
1> A = <<"a b c\nd e\nf\n">>.
<<"a b c\nd e\nf\n">>
2> {ok, T, _} = scanner:string(binary_to_list(A)).
{ok,[{string,1,"a"},
{string,1,"b"},
{string,1,"c"},
{string,2,"d"},
{string,2,"e"},
{string,3,"f"}],
4}
3> parser:parse(T).
{ok,[{string,1,"a"},
{string,1,"b"},
{string,1,"c"},
{string,2,"d"},
{string,2,"e"},
{string,3,"f"}]}
什么是正确的方式来通知解析器每一行都应该被当作一个单独的项?我希望我的结果看起来像这样:
{ok,[[{string,1,"a"},
{string,1,"b"},
{string,1,"c"}],
[{string,2,"d"},
{string,2,"e"}],
[{string,3,"f"}]]}
型
1条答案
按热度按时间w1jd8yoj1#
下面是一个正确的词法分析器/解析器对,它只使用1个shift/reduce来完成这项工作,但我认为它会解决您的问题,您只需要根据自己的喜好清理令牌。
我非常肯定有更简单、更快的方法来完成这一任务,但在我的“lexer战斗时代”,很难找到至少一些信息,我希望这将给予如何继续使用Erlang进行解析的想法。
扫描仪.xrl
解析器.yrl
输出
解析器流程如下所示:
请允许我对我在原始代码中发现的问题给予一些评论。