c++ AST与非终结语法分析器的合成属性

vsnjm48y  于 2022-11-27  发布在  其他
关注(0)|答案(1)|浏览(116)

我正在尝试使用Boost spirit x3将树解析器的结果存储到递归结构中。
我的语法很好,AST也很好。但是我很难理解它们是如何联系在一起的。
下面是AST:

#include <boost/spirit/home/x3.hpp>
#include <boost/spirit/home/x3/support/ast/variant.hpp>

#include <boost/fusion/adapted/struct/adapt_struct.hpp>
#include <boost/fusion/include/adapt_struct.hpp>

#include <optional>

namespace ast
{

  struct node
  {
    // are the optional necessary here?
    std::optional<std::vector<node>> children;
    std::optional<std::string> name;
    std::optional<double> length;

  };
}

BOOST_FUSION_ADAPT_STRUCT(ast::node, children, name, length)

然后,这是用来合成这些节点的语法:(我添加了一些注解,说明我认为非终结符解析器正在合成的内容)。

#include <boost/spirit/home/x3.hpp>
#include "ast.h"
#include <boost/fusion/include/vector.hpp>
#include <boost/fusion/include/std_pair.hpp>

  namespace parser
  {
    namespace x3 = boost::spirit::x3;

    // synthesize: it should be a simple ast::node, but the grammar says otherwise???
    x3::rule<class tree, ast::node> const tree{"tree"};
    x3::rule<class branch> branch{"branch"};

    // synthesize: std::string
    auto name     = x3::lexeme[x3::alpha >> *x3::alnum]; // to be improved later
    // synthesize: double
    auto length   = ':' >> x3::double_;
    // synthesize: std::optional<std::string>
    auto leaf     = -name;
    // synthesize: std::pair< std::vector<...>, std::optional<std::string> >
    auto internal = '(' >> (branch % ',') >> ')' >> -name;
    // synthesized attribute: std::variant<node, std::optional<std::string>
    auto subtree  = internal | leaf;
    // synthesize: std::pair<node, std::optional<double>>
    auto const tree_def   = x3::skip(x3::blank)[subtree >> -length >> ';' >> x3::eoi];
    // synthesize: std::pair<node, std::optional<double>>
    auto const branch_def = subtree >> -length;

    BOOST_SPIRIT_DEFINE(branch, tree);
  } // end namespace parser

  // What is that for?
  auto tree()
  {
    return parser::tree;
  }

尽管多次尝试“修复”它,但它无法用/boost/spirit/home/x3/operator/detail/sequence.hpp:143:9: error: static_assert failed due to requirement 'actual_size <= expected_size' "Size of the passed attribute is bigger than expected."编译
我的猜测是AST与语法不兼容(我没有看到语法中出现任何类似简单tuple<vector<node>, string, double>的东西)。如果是这样,我应该将AST复杂化,或者找到一种简单的方法来表达语法,还是将其作为语义动作的用例?

e0uiprwp

e0uiprwp1#

首先,事情很少是语义动作的用例(Boost Spirit: "Semantic actions are evil"?)¹。
其次,我的直觉告诉我,optional对于向量/字符串字段来说可能是不必要的。在我看来,序列容器已经是optional的泛化(其中optional就像最大大小为1的容器)。
第三,据我所知,std::optional支持可能有限制。* 似乎我记错了variant支持将Boost Spirit解析器从boost::variant转换为std::variant*。

传播规则

我建议你记住两件事
1.总是尽可能地将AST声明与规则声明匹配。特别是,AST不区分叶节点和内部节点,这会导致混乱,因为您希望两个规则都“神奇地”兼容。

  1. x3::rule就是这样做的,您可以显式地使用它。我经常使用的x3设备之一是as<T>[p](有时是它的功能版本:as<T>(p, name)):
template <typename T> struct as_type {
    auto operator[](auto p) const {
        struct Tag {};
        return x3::rule<Tag, T>{"as"} = p;
    }
};

template <typename T> static inline as_type<T> as{};

请注意,您的某些(大多数?)注解行

// synthesize: std::string
// synthesize: double
// synthesize: std::optional<std::string>
// synthesize: std::pair< std::vector<...>, std::optional<std::string> >
// synthesized attribute: std::variant<node, std::optional<std::string>
// synthesize: std::pair<node, std::optional<double>>
// synthesize: std::pair<node, std::optional<double>>

不是很准确。例如,alpha >> *alnum的属性类型为fusion::deque<char, std::string> >。请注意,传播机制在将x3::move_to示例化到实际绑定引用时可能会消除细微的差异。
事实证明确实如此,请参阅下面的简化、清理
但随着是,这涟漪落了下来:leaf现在不是optional<string>了等等。
因为您定义了branch,但没有属性类型,所以它的属性类型实际上是x3::unused_type。这意味着subtree也不包含vector<...>。我必须检查,所以做了一个工具:

template <typename ExpectedAttributeType> void check(auto p) {
    static_assert(
        std::is_same_v<
            ExpectedAttributeType,
            typename x3::traits::attribute_of<decltype(p), x3::unused_type>::type>);
};

而现在我们可以看到:

check<std::string>(name); // assuming a fixed `name` rule that coerces std::string
check<std::vector<std::string>>('(' >> name % ',' >> ')');

但是对于公开unused_type的解析器:

check<x3::unused_type>(x3::omit[name]);
check<std::vector<x3::unused_type>>('(' >> x3::omit[name] % ',' >> ')'); // FAILS!
check<x3::unused_type>('(' >> x3::omit[name] % ',' >> ')'); // Passes

看到我说的保持紧密控制的意思了吗?连锁React使得属性类型与你所期望的完全不同。
此外,pair<node, optional<double>>将不与您的任何AST类型兼容(当然,它碰巧 * 只是 * ast::node)。

展示,不要告诉别人

我知道你学得很快,但我不知道如何最好地解释我一步一步地进行的转换,我将向你展示将上述原则应用于问题代码的结果:

x3::rule<class branch, ast::node> node{"node"};

auto name     = as<std::string>[x3::lexeme[x3::alpha >> *x3::alnum]];
auto length   = as<double>[':' >> x3::double_];
auto leaf     = as<std::string>[name | x3::attr(std::string{})];
auto children = '(' >> (node % ',') >> ')' | x3::attr(ast::nodes{});
auto node_def = children >> leaf >> -length;
auto tree     = x3::skip(x3::blank)[node >> ';' >> x3::eoi];

BOOST_SPIRIT_DEFINE(node);

你可以自己期望:

void checks() {
    check<double>(length);
    check<std::string>(leaf);
    check<std::string>(name);
    check<ast::nodes>('(' >> node % ',' >> ')'); // Passes
    //check<ast::nodes>(children); // FAILS, actually variant<ast::nodes, ast::nodes> (!!)
    check<ast::nodes>(as<ast::nodes>[children]); // Trivially compatible
    check<ast::node>(tree);
}

要知道,它的内部机制是多么的微妙,平心而论,灵气的可预测性和包容性,比X3要高得多,我怀疑这其中的差别,可能是有利于可维护性和编译时间?
测试
从Grammar借用一些测试用例,使用Boost Spirit X3解析树失败,并调试打印解析后的AST:
打印

============ running internal tests:
"(,)"    PASS -> node { [node { [], "" }, node { [], "" }], "" }
"(A,B)F"     PASS -> node { [node { [], "A" }, node { [], "B" }], "F" }
"(A:10,B:10)F"   PASS -> node { [node { [], "A":10 }, node { [], "B":10 }], "F" }
============ running tree tests:
";"  PASS -> node { [], "" }
"(,);"   PASS -> node { [node { [], "" }, node { [], "" }], "" }
"(,,(,));"   PASS -> node { [node { [], "" }, node { [], "" }, node { [node { [], "" }, node { [], "" }], "" }], "" }
"(A,B,(C,D));"   PASS -> node { [node { [], "A" }, node { [], "B" }, node { [node { [], "C" }, node { [], "D" }], "" }], "" }
"(A,B,(C,D)E)F;"     PASS -> node { [node { [], "A" }, node { [], "B" }, node { [node { [], "C" }, node { [], "D" }], "E" }], "F" }
"(:0.1,:0.2,(:0.3,:0.4):0.5);"   PASS -> node { [node { [], "":0.1 }, node { [], "":0.2 }, node { [node { [], "":0.3 }, node { [], "":0.4 }], "":0.5 }], "" }
"(:0.1,:0.2,(:0.3,:0.4):0.5):0.0;"   PASS -> node { [node { [], "":0.1 }, node { [], "":0.2 }, node { [node { [], "":0.3 }, node { [], "":0.4 }], "":0.5 }], "":0 }
"(A:0.1,B:0.2,(C:0.3,D:0.4):0.5);"   PASS -> node { [node { [], "A":0.1 }, node { [], "B":0.2 }, node { [node { [], "C":0.3 }, node { [], "D":0.4 }], "":0.5 }], "" }
"(A:0.1,B:0.2,(C:0.3,D:0.4)E:0.5)F;"     PASS -> node { [node { [], "A":0.1 }, node { [], "B":0.2 }, node { [node { [], "C":0.3 }, node { [], "D":0.4 }], "E":0.5 }], "F" }
"((B:0.2,(C:0.3,D:0.4)E:0.5)F:0.1)A;"    PASS -> node { [node { [node { [], "B":0.2 }, node { [node { [], "C":0.3 }, node { [], "D":0.4 }], "E":0.5 }], "F":0.1 }], "A" }

简化、清理

在完成所有工作后,我通常会删除不必要的规则示例化。
毫无疑问,AST与规则1:1的匹配使得所有内容都可以简单兼容,而无需强制。在您的例子中,我可能还会删除optional<double>,如果您感兴趣的话,让我向您展示一下:

struct node {
    nodes       children;
    std::string name;
    double      length;
};

规则变成:

x3::rule<class branch, ast::node> node{"node"};

auto name     = x3::lexeme[x3::alpha >> *x3::alnum];
auto length   = ':' >> x3::double_ | x3::attr(0.0);
auto leaf     = name | x3::attr(std::string{});
auto children = '(' >> (node % ',') >> ')' | x3::attr(ast::nodes{});
auto node_def = children >> leaf >> -length;
auto tree     = x3::skip(x3::blank)[node >> ';' >> x3::eoi];

BOOST_SPIRIT_DEFINE(node);

为了更好地衡量,让我们重新编写checks,以便仅验证预期的兼容性:

template <typename ExpectedAttributeType> void compatible(auto p) {
    static_assert(
        std::is_same_v<ExpectedAttributeType,
                       typename x3::traits::attribute_of<
                           decltype(x3::rule<struct _, ExpectedAttributeType>{} = p),
                           x3::unused_type>::type>);
};

void checks() {
    compatible<double>(length);
    compatible<std::string>(leaf);
    compatible<std::string>(name);
    compatible<ast::nodes>(children);
    compatible<ast::node>(tree);
}

完整列表


#include <boost/fusion/include/adapted.hpp>
#include <boost/spirit/home/x3.hpp>
#include <iomanip>
#include <iostream>

namespace ast {
    using nodes = std::vector<struct node>;
    struct node {
        nodes       children;
        std::string name;
        double      length;
    };

    // for debug output
    static inline std::ostream& operator<<(std::ostream& os, node const& n) {
        os << "node { [";

        for (auto sep = ""; auto& c : n.children)
            os << std::exchange(sep, ", ") << c;

        os << "], " << quoted(n.name) ;

        if (n.length != 0)
            os << ":" << n.length;
        return os << " }";
    }
} // namespace ast

BOOST_FUSION_ADAPT_STRUCT(ast::node, children, name, length)

#include <boost/fusion/include/vector.hpp>
#include <boost/fusion/include/std_pair.hpp>

namespace parser {
    namespace x3 = boost::spirit::x3;

    static x3::rule<class branch, ast::node> const node{"node"};

    static auto const name     = x3::lexeme[x3::alpha >> *x3::alnum];
    static auto const length   = ':' >> x3::double_ | x3::attr(0.0);
    static auto const leaf     = name | x3::attr(std::string{});
    static auto const children = '(' >> (node % ',') >> ')' | x3::attr(ast::nodes{});
    static auto const node_def = children >> leaf >> -length;
    static auto const tree     = x3::skip(x3::blank)[node >> ';' >> x3::eoi];

    BOOST_SPIRIT_DEFINE(node);

    template <typename ExpectedAttributeType> void compatible(auto p) {
        static_assert(
            std::is_same_v<ExpectedAttributeType,
                           typename x3::traits::attribute_of<
                               decltype(x3::rule<struct _, ExpectedAttributeType>{} = p),
                               x3::unused_type>::type>);
    };

    void checks() {
        compatible<double>(length);
        compatible<std::string>(leaf);
        compatible<std::string>(name);
        compatible<ast::nodes>(children);
        compatible<ast::node>(tree);
    }
} // namespace parser

namespace test {
    void run_tests(auto name, auto p, std::initializer_list<char const*> cases) {
        std::cerr << "============ running " << name << " tests:\n";
        for (std::string const input : cases)
        {
            ast::node n;
            auto      ok = parse(begin(input), end(input), p, n);

            std::cout << quoted(input) << "\t " << (ok ? "PASS" : "FAIL");
            if (ok)
                std::cout << " -> " << n << std::endl;
            else
                std::cout << std::endl;
        }
    }

    void internal() {
        run_tests("internal", parser::node,
                  {
                      "(,)",
                      "(A,B)F",
                      "(A:10,B:10)F",
                  });
    }

    void tree() {
        run_tests("tree", parser::tree,
            {
                ";",
                "(,);",
                "(,,(,));",
                "(A,B,(C,D));",
                "(A,B,(C,D)E)F;",
                "(:0.1,:0.2,(:0.3,:0.4):0.5);",
                "(:0.1,:0.2,(:0.3,:0.4):0.5):0.0;",
                "(A:0.1,B:0.2,(C:0.3,D:0.4):0.5);",
                "(A:0.1,B:0.2,(C:0.3,D:0.4)E:0.5)F;",
                "((B:0.2,(C:0.3,D:0.4)E:0.5)F:0.1)A;",
            });
    }
} // namespace test

int main() {
    test::internal();
    test::tree();
}

具有相同的输出(除了调试输出中抑制的0.0长度)。
公平地说,在X3中,我会更高兴地考虑它,因为写它们已经变得更自然了。

相关问题