haskell 解析许多后面没有符号的术语

izkcnapc  于 2023-06-23  发布在  其他
关注(0)|答案(1)|浏览(113)

我正在尝试开发一个lambda演算解释器,它支持术语的定义。为此,我使用Haskell中的Megaparsec库。
我有以下语法:

term := var | lambda var . term | term term
def := var = term
instruction := :e t | def

我的解析器应该返回一个instruction的列表,其中:e t将计算为t的任何缩减,def将允许我有命名的定义。
到目前为止,我已经设法解析了一个非常相似的语法,除了我使用分号来分隔每个指令:

...
instruction := :e t; | def;

我想去掉那些分号的必要性,但我没有运气。我的代码目前是:

sc :: Parser ()
sc = L.space space1 (L.skipLineComment "--") (L.skipBlockComment "{-" "-}")

lexeme :: Parser a -> Parser a
lexeme = L.lexeme sc

symbol :: Text -> Parser Text
symbol = L.symbol sc

semicolon :: Parser Text
semicolon = symbol ";"

lowercaseWord :: Parser [Char]
lowercaseWord = some lowerChar

variable :: Parser Term
variable = lexeme lowercaseWord <&> Var

lamAbs :: Parser Term
lamAbs = do
  symbol "lambda"
  space
  v <- lexeme lowercaseWord
  space
  symbol "."
  space
  t <- term
  space
  return $ Abs v t

term :: Parser Term
term = foldl1 App <$> sepBy1 (parens term <|> try lamAbs <|> variable) sc
  where parens = between (symbol "(") (symbol ")")

definition :: Parser (Instruction Term)
definition = do
  n <- lexeme lowercaseWord
  space
  symbol "="
  space
  t <- term
  return $ IDef n t

commandEval :: Parser (Instruction Term)
commandEval = do
  symbol ":e"
  t <- term
  optional newline
  return $ IEval t

program :: Parser [Instruction Term]
program = sc *> sepEndBy (definition <|> commandEval) semicolon <* eof

parse :: ProgramParser (Instruction Term)
parse text = case runParser program "" text of
  Left err  -> Left $ errorBundlePretty err
  Right ast -> Right ast

当我试图通过将program的定义更改为:

program = sc *> some (definition <|> commandEval) <* eof

解析器将不能弄清楚,一旦它发现类似var = ...的东西,它需要停止解析term,以继续解析下一个定义。
例如,以下输入:

a = (lambda x . x)
    x

:e b a

b = lambda y . y

返回以下错误:

6:3:
  |
6 | b = lambda y . y
  |   ^
unexpected '='
expecting ":e", "lambda", '(', end of input, lowercase letter, or newline

我尝试更改term的定义,以明确它后面不应该跟一个小写单词和一个=。但这也不管用:

term = foldl1 App <$> sepBy1 ((parens term <|> try lamAbs <|> variable) <* notFollowedBy newdef) sc
  where parens = between (symbol "(") (symbol ")")
        newdef = (lexeme lowercaseWord *> space) *> symbol "="

我如何才能使它不需要分号,并且我的解析器在找到var = term后自动停止解析术语应用程序?
提前感谢您的帮助。

ftf50wuq

ftf50wuq1#

正如@Poselsky所指出的,使用Text.Megaparsec.Char.Lexer的要点是,您不需要(也不应该)在更高级别的解析器中处理白色。因此,我首先删除lamAbsdefinition中的space的每次使用,从commandEval中删除optional newline,并在term中将sepBy1 xxx sc替换为some xxx。(在所有这些情况下,这些解析器不执行任何功能。在前面的词素解析器已经吸收了所有尾随空格之后,它们在输入流上运行。)
我也会把lexeme调用移到lowercaseWord中:

lowercaseWord :: Parser [Char]
lowercaseWord = lexeme (some lowerChar)

然后,除了原始解析器(现在是symbollowercaseWord),唯一需要处理空白的地方是顶级解析器program开始处的sc。应该假设所有其他解析器都从非空白开始并吸收尾随的空白,并且这些解析器可以按顺序组合以创建遵循相同规则的另一个解析器,因此不需要杂散的spacescnewline调用。
这将简化您的代码,尽管它不会解决您的问题。
你的尝试修复是正确的想法。问题是它不能解析newdef * 之前的最后一个变量,因为variable * 后面是newdef
实际上,您不需要尝试解析整个定义。你真正需要做的就是确保variable后面没有symbol "=",你需要把它 Package 在try块中,以回溯variable解析的名称:

term :: Parser Term
term = foldl1 App <$> some (parens term <|> try lamAbs <|>
                            try (variable <* notFollowedBy (symbol "=")))
  where parens = between (symbol "(") (symbol ")")

请注意,您的解析器仍然存在一些问题。例如,它接受以下输入,这可能是您不期望的:

x = lambdalicious. man

(Note这与删除space调用没有任何关系--您原来的解析器也解析了它。
问题是symbol "lambda"可以匹配较大单词的前缀,因此"lambdalicious"被解析为"lambda licious"。如果您不希望":eval"解析为":e val",则symbol ":e"也存在类似的问题。
通常的解决方案是定义:

reserved :: Text -> Parser Text
reserved s = lexeme $ try (string s <* notFollowedBy lowerChar)

然后按预期进行以下工作:

reserved "lambda"
reserved ":e"

以下是我的完整测试程序,包含了所有上述修订:

{-# LANGUAGE OverloadedStrings #-}

module LambdaParser where

import Data.Void

import Data.Functor
import Data.Text (Text)
import Text.Megaparsec hiding (parse)
import Text.Megaparsec.Char
import qualified Text.Megaparsec.Char.Lexer as L

type Parser = Parsec Void Text

data Term = Var String | Instruction Term | Abs String Term | App Term Term deriving (Show)
data Instruction t = IDef String t | IEval Term deriving (Show)

sc :: Parser ()
sc = L.space space1 (L.skipLineComment "--") (L.skipBlockComment "{-" "-}")

lexeme :: Parser a -> Parser a
lexeme = L.lexeme sc

symbol :: Text -> Parser Text
symbol = L.symbol sc

reserved :: Text -> Parser Text
reserved s = lexeme $ try (string s <* notFollowedBy lowerChar)

semicolon :: Parser Text
semicolon = symbol ";"

lowercaseWord :: Parser [Char]
lowercaseWord = lexeme (some lowerChar)

variable :: Parser Term
variable = lowercaseWord <&> Var

lamAbs :: Parser Term
lamAbs = do
  reserved "lambda"
  v <- lowercaseWord
  symbol "."
  t <- term
  return $ Abs v t

term :: Parser Term
term = foldl1 App <$> some (parens term <|> try lamAbs <|>
                            try (variable <* notFollowedBy (symbol "=")))
  where parens = between (symbol "(") (symbol ")")

definition :: Parser (Instruction Term)
definition = do
  n <- lowercaseWord
  symbol "="
  t <- term
  return $ IDef n t

commandEval :: Parser (Instruction Term)
commandEval = do
  reserved ":e"
  t <- term
  return $ IEval t

program :: Parser [Instruction Term]
program = sc *> some (definition <|> commandEval) <* eof

parse :: Text -> Either String [Instruction Term]
parse text = case runParser program "" text of
  Left err  -> Left $ errorBundlePretty err
  Right ast -> Right ast

main = do
  let ex1 = "a = (lambda x . x)\n   x\n\n:e b a\n\nb = lambda y . y"
  case parse ex1 of
    Left err -> putStrLn err
    Right pgm -> print pgm

相关问题