regex PHP:将字符串拆分为一个数组,数组中的单词用波浪号括起来并保留这些单词

q5iwbnjs  于 2023-03-04  发布在  PHP
关注(0)|答案(2)|浏览(176)

很晚了,我想我已经盯着这个太久了弄不明白,但是:我得到了一堆原始文本,其中波浪线(~)中的任何内容都是标题,其他内容都是纯文本。例如:
标题文本在同一行(& T):~THE BURGER MINI~A tiny little burger patty in a tiny little bun.
标题和文本在不同的行上:

~THE BURGER MAX~
A gigantic hunk of steak in between two toasted baguettes, each stuffed with beef & cheese`

两者的结合:

~THE BURGER ZERO~
No burger, no bun, just air.

~THE BURGER ITALIANO~
A soft mix of ground beef & mozzarella stuffed between two pillowy pieces of pasta.~NOTE~This is basically giant ravioli.

最终,我想要达到的结果应该是:

Array
(
    [0] => Array
        (
            [title] => THE BURGER ZERO
        )

    [1] => Array
        (
            [text] => No burger, no bun, just air.
        )

    [2] => Array
        (
            [title] => THE BURGER ITALIANO
        )

    [3] => Array
        (
            [text] => A soft mix of ground beef & mozzarella stuffed between two pillowy pieces of pasta.
        )

    [4] => Array
        (
            [title] => NOTE
        )

    [5] => Array
        (
            [text] => This is basically giant ravioli.
        )

)

...这样我就可以区分标题和文本,但关键是 * 按它们出现的顺序 *。
我可以将换行符中的字符串拆分成一个数组,如下所示:

$tempArray = preg_split('/\s*\R\s*/', trim($str), NULL, PREG_SPLIT_NO_EMPTY);

但在那之后,我就卡住了。在波浪线(preg_split('/~(.*?)~/uim', $line);)中的任何组上使用preg_split都会给我所有的段落文本,但会丢失标题(因为它们被用于拆分)。我一直在用各种形式的preg_matchpreg_match_all来敲我的头,但我得到的只是头痛。
有没有一种直接的方法可以得到我想要的东西,并且适用于上面所有的例子?

cetgtptt

cetgtptt1#

preg_match_all('/~([^~]+)~\n*([^~\n]+)/', $str, $match);

因此,匹配一个波浪号,然后匹配一个或多个波浪号以外的其他波浪号,再匹配另一个波浪号。捕获波浪号之间的内容:

~([^~]+)~

后跟零个或多个换行符:

\n*

后面跟着一个或多个除波浪号和换行符以外的任何内容。并捕获这些内容。

([^~\n]+)

这将为您提供$match[1]中的标题和$match[2]中的描述:

print_r($match[1]);

Array
(
    [0] => THE BURGER ZERO
    [1] => THE BURGER ITALIANO
    [2] => NOTE
)

print_r($match[1]);

Array
(
    [0] => No burger, no bun, just air.
    [1] => A soft mix of ground beef & mozzarella stuffed between two pillowy pieces of pasta.
    [2] => This is basically giant ravioli.
)

然后您可以将其合并到单个数组中:

$items = array_combine($match[1], $match[2]);
print_r($items);

Array
(
    [THE BURGER ZERO] => No burger, no bun, just air.
    [THE BURGER ITALIANO] => A soft mix of ground beef & mozzarella stuffed between two pillowy pieces of pasta.
    [NOTE] => This is basically giant ravioli.
)
wqsoz72f

wqsoz72f2#

<?php
$input = '~THE BURGER ZERO~
No burger, no bun, just air.

~THE BURGER ITALIANO~
A soft mix of ground beef & mozzarella stuffed between two pillowy pieces of pasta.~NOTE~This is basically giant ravioli.';

$splittedText = array_values(array_filter(explode ("~", $input)));

foreach($splittedText as $key => $value){
    if (ctype_upper(str_replace(' ', '', $value))){
        $splittedText[$key] = ['title' => $value];
    }
    else{
        $splittedText[$key] = ['text' => $value];
    }
}

print_r($splittedText);

此解决方案不使用任何正则表达式。
它的工作原理是

  • 首先在波形划线上爆炸整个字符串
  • 然后清除数组中的空点,重新排列键并迭代数组
  • 检查我们迭代的值是否都是大写字母(去掉空格),如果是,那么我们将键设置为“title”,否则它是“text”,如预期输出所示。

输出为:

Array
(
    [0] => Array
        (
            [title] => THE BURGER ZERO
        )

    [1] => Array
        (
            [text] => 
No burger, no bun, just air.

        )

    [2] => Array
        (
            [title] => THE BURGER ITALIANO
        )

    [3] => Array
        (
            [text] => 
A soft mix of ground beef & mozzarella stuffed between two pillowy pieces of pasta.
        )

    [4] => Array
        (
            [title] => NOTE
        )

    [5] => Array
        (
            [text] => This is basically giant ravioli.
        )

)

相关问题