regex 如何< a>用href url、数据属性和文本组预匹配整个标记

gmxoilav  于 2023-05-01  发布在  其他
关注(0)|答案(1)|浏览(68)

在一个给定的html内容中,我需要preg_match_all <a>标签,其中包含href url、text和data-name属性。我将分享我目前坚持的工作场所。有人能帮我吗?
HTML内容:

<a data-name="something" href="google.ru">test</a>
<a href="http://link.com">text2</a>
<a class="external" href="https://example.com">text 4</a>
<a href='sterium.com'>text 66</a><a href="sterium.com" data-name="">aaa</a>

预期输出:

$match[0]= '<a data-name="something" href="google.ru">test</a>';
$match[0][0] = 'google.ru';
$match[0][1] = 'test';
$match[0][2] = 'something';
$match[1]= '<a href="http://link.com">text2</a>';
$match[1][0] = 'http://link.com';
$match[1][1] = 'text2';
$match[2]= '<a class="external" href="https://example.com">text 4</a>';
$match[2][0] = 'https://example.com';
$match[2][1] = 'text 4';
$match[3]= '<a href=\'sterium.com\'>text 66</a>';
$match[3][0] = 'sterium.com';
$match[3][1] = 'text 66';
$match[4]= '<a href="sterium.com" data-name="">aaa</a>';
$match[4][0] = 'sterium.com';
$match[4][1] = 'aaa';
$match[4][2] = '';

REGEX101

$re = '#<a.*(?:href=["|\'](.*?[^"\'])["|\'])>(.*)</a>#';
$str = '<a data-name="something" href="google.ru">test</a>
<a href="http://link.com">text2</a>
<a class="external" href="https://example.com">text 4</a>
<a href=\'sterium.com\'>text 66</a><a href="sterium.com" data-name="">aaa</a>';

preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);

// Print the entire match result
var_dump($matches);
0pizxfdo

0pizxfdo1#

不要使用正则表达式来解析HTML。相反,使用内置的DOMDocument类,它更健壮。将字符串加载到DOMDocument后,可以搜索所有a标签,然后提取它们的nodeValuehrefdata-name属性:

$doc = new DOMDocument();
$doc->loadhtml($str);
$anchors = $doc->getElementsByTagName('a');
$matches = [];
foreach ($anchors as $a) {
    $matches[] = array($a->nodeValue, $a->attributes->getNamedItem('href')->nodeValue, $a->attributes->getNamedItem('data-name')?->nodeValue ?? '');
}

输出(用于示例数据):

Array
(
    [0] => Array
        (
            [0] => test
            [1] => google.ru
            [2] => something
        )
    [1] => Array
        (
            [0] => text2
            [1] => http://link.com
            [2] => 
        )
    [2] => Array
        (
            [0] => text 4
            [1] => https://example.com
            [2] => 
        )
    [3] => Array
        (
            [0] => text 66
            [1] => sterium.com
            [2] => 
        )
    [4] => Array
        (
            [0] => aaa
            [1] => sterium.com
            [2] => 
        )
)

Demo on 3v4l.org

相关问题