regex 如何在正则表达式中捕获C++运算符和分隔符？

7rfyedvj 于 2022-11-18 发布在其他

关注(0)|答案(1)|浏览(116)

我正在大学里做一个项目，试图用 C++ 创建一个Python JIT编译器。我正在进行标记化步骤，我设法从代码中提取了所有字符串和注解。我需要的是将代码分成一个由Python操作符划分的词位流（+、-、/等）和分隔符（逗号、分号和点）。它本质上是分割字符串，但也包括分隔符。从this question开始，我考虑使用正则表达式来捕获所有的符号，无论它们是不是分隔符。我唯一的问题是如何指定一个正则表达式：

包含多个字符（-=、//、！=）;
包括正则表达式符号，如[，]，（，）等。

感谢您提前回复。

/// @brief Breaks the line down into a list of lexemes by 
/// the delimiters preversing the delimiters themselves.
/// @param line The reference to the line to be tokenised.
/// @return List of lexemes ready to be parsed.
list<string> breakDown(string& line){
    list<string> lexemes;
    //const char expression[] = "[=-,;()[]]";
    regex delimiters("(=|(|)|[|])|(=|(|)|[|])+)"); //This one doesn't work.
    regex_iterator<string::iterator> it(line.begin(), line.end(), delimiters);
    regex_iterator<string::iterator> end;
    while (it != end) {
        auto match = *it;
        cout << "Match : " << match.str() << "\n"; 
        string before = line.substring(0, match.position());
        line = line.substring(match.position() + match.length());
        lexemes.append(before);
        lexemes.append(match.str());
        it++;
    }
    return lexemes;
}

regex

来源：https://stackoverflow.com/questions/74279837/how-to-capture-c-operators-and-separators-in-regular-expression