php DOM文档：：加载HTML()：警告-html解析实体引用：实体中没有名称

prdp8dxp 于 2023-03-07 发布在 PHP

关注(0)|答案(9)|浏览(194)

我发现了几个类似的问题，但到目前为止，没有一个能帮助我。
我试图输出一个HTML块中所有图片的“src”，所以我使用DOMDocument()。这个方法实际上是有效的，但是我在一些页面上得到了一个警告，我不知道为什么。一些帖子建议抑制警告，但是我更想知道为什么会产生警告。
警告：DOMDocument：：加载HTML（）：html解析实体引用：实体中没有名称，行：10
生成错误的post->post_content的一个示例是-

On Wednesday 21st November specialist rights of way solicitor Jonathan Cheal of Dyne Drewett will be speaking at the Annual Briefing for Rural Practice Surveyors and Agricultural Valuers in Petersfield.
<br>
Jonathan is one of many speakers during the day and he is specifically addressing issues of public rights of way and village greens.
<br>
Other speakers include:-
<br>
<ul>
<li>James Atrrill, Chairman of the Agricultural Valuers Associates of Hants, Wilts and Dorset;</li>
<li>Martin Lowry, Chairman of the RICS Countryside Policies Panel;</li>
<li>Angus Burnett, Director at Martin & Company;</li>
<li>Esther Smith, Partner at Thomas Eggar;</li>
<li>Jeremy Barrell, Barrell Tree Consultancy;</li>
<li>Robin Satow, Chairman of the RICS Surrey Local Association;</li>
<li>James Cooper, Stnsted Oark Foundation;</li>
<li>Fenella Collins, Head of Planning at the CLA; and</li>
<li>Tom Bodley, Partner at Batcheller Monkhouse</li>
</ul>

如果有帮助的话，我可以发布更多的post->post_content包含的例子。
我已经允许访问一个开发网站临时，所以你可以看到一些例子[注-链接不再访问，因为问题已经回答] -

对于如何解决这个问题有什么建议吗？谢谢。

$dom = new DOMDocument();
$dom->loadHTML(apply_filters('the_content', $post->post_content)); // Have tried stripping all tags but <img>, still generates warning
$nodes = $dom->getElementsByTagName('img');
foreach($nodes as $img) :
    $images[] = $img->getAttribute('src');
endforeach;

php

来源：https://stackoverflow.com/questions/14648442/domdocumentloadhtml-warning-htmlparseentityref-no-name-in-entity

9条答案

按热度按时间

oogrdqng1#

这个正确答案来自@lonesomeday的评论。
我最好的猜测是在HTML中的某个地方有一个未转义的和号（&），这将使解析器认为我们在实体引用中（例如©）。，它认为实体结束了，然后意识到它所拥有的内容不符合实体，于是发出警告并以纯文本的形式返回内容。

赞(0）回复(0）举报 2023-03-07

0x6upsns2#

正如这里提到的
警告：DOMDocument：：加载HTML（）：html解析实体引用：期待';'在实体中，
您可以使用：

libxml_use_internal_errors(true);

参见http://php.net/manual/en/function.libxml-use-internal-errors.php

赞(0）回复(0）举报 2023-03-07

ggazkfy83#

我没有在上面发表评论所需的声誉，但使用htmlspecialchars解决了我的问题：

$inputHTML = htmlspecialchars($post->post_content);
$dom = new DOMDocument();
$dom->loadHTML(apply_filters('the_content', $inputHTML)); // Have tried stripping all tags but <img>, still generates warning
$nodes = $dom->getElementsByTagName('img');
foreach($nodes as $img) :
    $images[] = $img->getAttribute('src');
endforeach;

出于我的目的，我也使用strip_tags($inputHTML, "<strong><em><br>")，所以所有的图像标签也被剥离-我不确定这是否会是一个问题，否则。

赞(0）回复(0）举报 2023-03-07

cbjzeqam4#

检查HTML代码中任何地方的“&”字符。我遇到这个问题是因为那个场景。

赞(0）回复(0）举报 2023-03-07

dsf9zpds5#

在HTML中的某个地方使用未转义的“&”，并将“&”替换为&amp。以下是我的解决方案！

$html = preg_replace('/&(?!amp)/', '&amp;', $html);

它将用“&amp”替换单个“&”符号，但当前的“&amp”将保持不变。

赞(0）回复(0）举报 2023-03-07

yws3nbqq6#

我最终用正确的方法解决了这个问题，使用了tidy

// Configuration
$config = array(
    'indent'         => true,
    'output-xhtml'   => true,
    'wrap'           => 200);

// Tidy to avoid errors during load html
$tidy = new tidy;
$tidy->parseString($bill->bill_text, $config, 'utf8');
$tidy->cleanRepair();

$domDocument = new DOMDocument();
$domDocument->loadHTML(mb_convert_encoding($tidy, 'HTML-ENTITIES', 'UTF-8'));

赞(0）回复(0）举报 2023-03-07

bxgwgixi7#

对于拉拉威尔来说，
使用{{ }}代替{！！！！}
我面对了这个问题并设法解决了它。

赞(0）回复(0）举报 2023-03-07

mjqavswn8#

我发现我的表标签中有一个错误。有一个额外的</td>，我删除了，宾果。

赞(0）回复(0）举报 2023-03-07

kxkpmulp9#

在你的字符串中用“and”替换“&”。对所有其他符号都这样做

赞(0）回复(0）举报 2023-03-07