php 排除MediaWiki扩展中的正则表达式不工作的故障

4ktjp1zp  于 2023-06-04  发布在  PHP
关注(0)|答案(1)|浏览(308)

onParserBeforePreprocess函数的正则表达式特性在我正在构建的扩展中不起作用,我不知道为什么。
让我详细说明onParserBeforePreprocess函数不工作的问题。
extension.json:

{
    "name": "EnhanceMarkup",
    "description": "Provides enhanced markup functionalities",
    "version": "1.0",
    "author": [
        "Jeong Gaon"
    ],
    "url": "https://www.gaon.xyz/mw_extensions",
    "type": "other",
    "license-name": "Apache-2.0",
    "AutoloadClasses": {
        "EnhanceMarkupHooks": "includes/EnhanceMarkupHooks.php"
    },
    "ResourceModules": {
        "ext.EnhanceMarkup.styles": {
            "styles": "resources/ext.EnhanceMarkup.styles.css",
            "localBasePath": "",
            "remoteExtPath": "EnhanceMarkup"
        },
        "ext.EnhanceMarkup.scripts": {
            "scripts": ["resources/ext.EnhanceMarkup.scripts.js", "resources/lib/math.js"],
            "localBasePath": "",
            "remoteExtPath": "EnhanceMarkup"
        }
    },
    "Hooks": {
        "InternalParseBeforeLinks": "EnhanceMarkupHooks::onInternalParseBeforeLinks",
        "ParserFirstCallInit": "EnhanceMarkupHooks::onParserFirstCallInit",
        "BeforePageDisplay": "EnhanceMarkupHooks::onBeforePageDisplay"
    },
    "manifest_version": 2
}

includes/EnhanceMarkupHooks.php:

<?php
class EnhanceMarkupHooks
{
    public static function onBeforePageDisplay(OutputPage &$out, Skin &$skin)
    {
        $out->addModules("ext.EnhanceMarkup.styles");
        $out->addModules("ext.EnhanceMarkup.scripts");
        return true;
    }

    public static function onParserFirstCallInit(Parser $parser)
    {
        // Register each of your custom parser functions with the parser
        $parser->setHook("random", [self::class, "randomRender"]);

        return true;
    }

    public static function onInternalParseBeforeLinks(Parser &$parser, &$text)
    {
        // - * 4+ == <hr>
        // Replace sequences of 3-9 '*', '-', or '_' with a horizontal rule
        $text = preg_replace('/^([-]{3,9})$/m', "<hr>", $text);

        // [pagecount] show all count of page
        // Replace [pagecount] with the total number of pages
        $text = preg_replace_callback(
            "/\[pagecount\]/",
            function ($matches) use ($parser) {
                $dbr = wfGetDB(DB_REPLICA);
                $count = $dbr->selectRowCount("page");
                return $count;
            },
            $text
        );

        // Replace [*A text] with <ref group="A">text</ref>
        $text = preg_replace(
            "/\[\*\s+([^ ]+)\s+(.*?)\]/",
            '<ref group="$1">$2</ref>',
            $text
        );

        // Replace [*A] with <ref group="A" />
        $text = preg_replace(
            "/\[\*\s+([^ ]+)\s*\]/",
            '<ref group="$1" />',
            $text
        );

        // Replace [* text] with <ref>text</ref>
        $text = preg_replace("/\[\*\s+(.*?)\]/", '<ref>$1</ref>', $text);

        // Replace [include text] with {{text}}
        $text = preg_replace("/\[\include\s+(.*?)\]/", '{{$1}}', $text);

        // Replace [br] with <br>
        $text = str_replace("[br]", "<br>", $text);

        // Font Size up {{{+1 (content) }}} - Range: 1~5
        $text = preg_replace_callback('/\{\{\{\+([1-5])\s*(.*?)\s*\}\}\}/s', function($matches) {
            return '<span style="font-size:'.(1 + $matches[1]).'em;">'.$matches[2].'</span>';
        }, $text);
        
        // Font Size down {{{-1 (content) }}} - Range: 1~5
        $text = preg_replace_callback('/\{\{\{-([1-5])\s*(.*?)\s*\}\}\}/s', function($matches) {
            return '<span style="font-size:'.(1 - $matches[1]/10).'em;">'.$matches[2].'</span>';
        }, $text);

        return true;
    }

    // Random
    // <random range="50">True|False</random>
    public static function randomRender(
        $input,
        array $args,
        Parser $parser,
        PPFrame $frame
    ) {
        // Disable caching
        $parser->getOutput()->updateCacheExpiry(0);

        // Parse the input
        $parts = explode("|", $input);

        // Get the range from args
        $range = isset($args["range"]) ? $args["range"] : 2; // default to 2

        // Generate a random number within the range
        $randomNumber = mt_rand(1, $range);

        // Choose the output based on the random number
        if ($randomNumber <= $range / 2) {
            // If the random number is in the first half of the range, return the first part
            return $parts[0];
        } else {
            // Otherwise, return the second part if it exists, or the first part if it doesn't
            return isset($parts[1]) ? $parts[1] : $parts[0];
        }
    }
}

查看代码,似乎没有什么特别的错误-如果它应该工作,在wiki中输入类似[* texts]的内容应该会生成一个名为texts的脚注,但由于某种原因,它会按字面意思输出。
例如,如果你输入'hello[br] world',你应该在hello下看到world,但什么都没有。
我的MediaWiki站点地址是https://www.gaonwiki.com
如果你需要更多的信息就告诉我。我会提供的谢谢你。

hyrbngr7

hyrbngr71#

**A)**为了匹配[*A Text]描述的引用,我将像这样更正模式:

/\[\*(?<group>\w+)\s+(?<text>[^\]]+)\]/
我们的想法是使用命名捕获组(?<group_name>...pattern...),更精确地说,\w+匹配单词字符,然后\s+匹配一个或多个空格,然后任何不是结束括号的字符都使用[^\]]+
替换为<ref group="$group">$text</ref>
以下是一些测试:https://regex101.com/r/vueNcM/2

**B)**第2步,为了只匹配[*A],我将使用/\[\*(?<group>\w+)\]/并替换为<ref group="$group" />

以下是测试:https://regex101.com/r/gYFOzO/2

**C)**第3步,用<ref>text</ref>替换[* text],我会先用/\[\*\s+(?<text>[^\]]+)\]/,然后用<ref>$text</ref>替换。

测试可在这里:https://regex101.com/r/aYTOH9/1
但是如果您希望在文本中允许转义括号(如果用户需要在文本中使用一些括号,则使用/\[\*\s+(?<text>(?:\\\]|[^\]])+)\]/
测试:https://regex101.com/r/aYTOH9/2
在这种情况下,你必须执行 preg_replace_callback(),而不是简单的 preg_replace(),因为我们必须取消对括号的转义:

$text = preg_replace_callback(
    '/\[\*\s+(?<text>(?:\\\\\]|[^\]])+)\]/',
    function ($matches) {
        return '<ref>' .
            preg_replace('/\\\\([\[\]])/', '$1', $matches['text']) .
            '</ref>';
    },
    $text
);

在这里测试PHP:https://onlinephp.io/c/2b5249

创建过滤器时的安全问题

如果用户输入这个会发生什么?

Shit happens with [* <script>alert('I got you')</script>]

是否会有另一个过滤器来避免XSS攻击?
如果它没有被安全地转义,那么用 preg_replace_callback() 替换所有的 preg_replace() 调用,就像上面的例子C)一样,并对捕获的值进行清理操作:

// Replace [* Some text] by <ref>Some text</ref>
// Also handle escaped brackets in text, such as [* An \[important\] reference]
$text = preg_replace_callback(
    // In the pattern, \ should be doubled, but only for known PHP escaped
    // sequences, such as \t, \n, \a, or \\. This makes the pattern below not
    // very readable :-( In JavaSript it would be simple like this:
    // /\[\*\s+(?<text>(?:\\\]|[^\]])+)\]/
    '/\[\*\s+(?<text>(?:\\\\\]|[^\]])+)\]/',
    function ($matches) {
        // 1) Unescape "\[" or "\]" by "[" and respectively "]".
        // 2) As we are creating HTML, the text should be sanitized as it may
        // contain some stuff like <strong>Bold</strong> or worse some JavaScript
        // <script>alert('XSS attack')</script>.
        return '<ref>' .
            htmlspecialchars(
                preg_replace('/\\\\([\[\]])/', '$1', $matches['text'])
            ) .
            '</ref>';
    },
    $text
);

PHP代码在这里:https://onlinephp.io/c/8a7f8

相关问题