regex 使用php正则表达式从html标签元素中删除属性

hts6caw3 于 2023-01-10 发布在 PHP

关注(0)|答案(3)|浏览(156)

想删除html标签内的任何属性，我认为这可以使用正则表达式实现，但我不擅长使用正则表达式。
尝试使用str_replace，但这不是正确的方法。我搜索了类似的问题，但没有找到任何问题。

- 示例：**

在变量中获得如下HTML标记：

$str = '
<p class="class_style" style="font-size: medium; line-height: normal; letter-spacing: normal;">content</p>
<span class="another_class_style" style="font-size: medium; line-height: normal; letter-spacing: normal;">content</span>
<ul class="another_class_style" style="background:#006;"></ul>
<li class="another_class_style" style=" list-style:circle; color:#930;">content</li>';

调用某个preg_match（）

$new_str = preg_match('', $str)

- 预期产出：**

$new_str = '
<p>content</p>
<span>content</span>
<ul></ul>
<li>content</li>';

请注意，我不打算剥离的html标签，而我只是需要删除任何标签元素的标签。

php strip_tags() isn't an option

会很感激你的帮助。

regex

来源：https://stackoverflow.com/questions/18897072/using-php-regex-to-remove-attributes-from-html-tag-elements

3条答案

按热度按时间

zvms9eto1#

虽然regex可以完成这个任务，但通常鼓励使用DOM函数来过滤或其他HTML操作。下面是一个可重用的类，它使用DOM方法来删除不需要的属性。您只需设置所需的HTML标记和属性，它就会过滤掉不需要的HTML部分。

class allow_some_html_tags {
    var $doc = null;
    var $xpath = null;
    var $allowed_tags = "";
    var $allowed_properties = array();

    function loadHTML( $html ) {
        $this->doc = new DOMDocument();
        $html = strip_tags( $html, $this->allowed_tags );
        @$this->doc->loadHTML( $html );
        $this->xpath = new DOMXPath( $this->doc );
    }
    function setAllowed( $tags = array(), $properties = array() ) {
        foreach( $tags as $allow ) $this->allowed_tags .= "<{$allow}>";
        foreach( $properties as $allow ) $this->allowed_properties[$allow] = 1;
    }
    function getAttributes( $tag ) {
        $r = array();
        for( $i = 0; $i < $tag->attributes->length; $i++ )
            $r[] = $tag->attributes->item($i)->name;
        return( $r );
    }
    function getCleanHTML() {
        $tags = $this->xpath->query("//*");
        foreach( $tags as $tag ) {
            $a = $this->getAttributes( $tag );
            foreach( $a as $attribute ) {
                if( !isset( $this->allowed_properties[$attribute] ) )
                    $tag->removeAttribute( $attribute );
            }
        }
        return( strip_tags( $this->doc->saveHTML(), $this->allowed_tags ) );
    }
}

这个类使用strip_tags两次--一次是为了快速删除不需要的标签，然后在属性从剩余的标签中删除之后，它删除由DOM函数（doctype、html、body）插入的额外标签。

$comments = new allow_some_html_tags();
$comments->setAllowed( array( "p", "span", "ul", "li" ), array("tabindex") );
$comments->loadHTML( $str );
$clean = $comments->getCleanHTML();

setAllowed函数有两个数组--一组允许的标签和一组允许的属性（如果你以后决定保留一些）。我修改了你的输入字符串，在某个地方包含一个添加的tabindex=“1”属性来说明过滤。$clean的输出是：

<p>content</p>
<span>content</span>
<ul tabindex="3"></ul><li>content</li>

赞(0）回复(0）举报 2023-01-10

6jjcrrmo2#

$str = '
<p class="class_style" style="font-size: medium; line-height: normal; letter-spacing: normal;">content</p>
<span class="another_class_style" style="font-size: medium; line-height: normal; letter-spacing: normal;">content</span>
<ul class="another_class_style" style="background:#006;"></ul>
<li class="another_class_style" style=" list-style:circle; color:#930;">content</li>';

$clean = preg_replace('/ .*".*"/', '', $str);

echo $clean;

将返回：

<p>content</p>
<span>content</span>
<ul></ul>
<li>content</li>

但是请不要使用正则表达式来解析HTML，使用DOM解析器。

赞(0）回复(0）举报 2023-01-10

jei2mxaa3#

在php中删除html标签最简单的方法是strip_tags()
或者您可以通过

preg_replace("/<.*?>/", "", $str);

赞(0）回复(0）举报 2023-01-10

我来回答

regex 使用php正则表达式从html标签元素中删除属性

3条答案

相关问题

热门标签

最新问答