如何在php中使用XPath和SimpleXML访问嵌入式节点?

gr8qqesn  于 2023-05-21  发布在  PHP
关注(0)|答案(2)|浏览(82)

我正在尝试访问以下XML文档的Text节点中的所有文本:

<Section>
 <Subsection lims:inforce-start-date="2003-07-01" lims:fid="182941" lims:id="182941">
  <Label>(2)</Label>
  <Text>
   In subsection (1),
    <DefinedTermEn>beer</DefinedTermEn>
   and
    <DefinedTermEn>malt liquor</DefinedTermEn>
   have the meaning assigned by section 4.
   </Text>
 </Subsection>
</Section>

在Xpath中,使用$xml->xpath("Body/Section/Subsection")将返回以下内容:

object(SimpleXMLElement)#7 (3) {
    
    ["Label"]=>
    string(3) "(2)"
    ["Text"]=>
    string(64) "In subsection (1),  and  have the meaning assigned by section 4."

使内淋巴结消失。有没有一种方法可以将一个节点中所有子节点的所有内容“展平”,这样我就可以得到一段连续的文本?例如In subsection (1), beer and malt liquor have the meaning assigned by section 4.

ni65a41a

ni65a41a1#

混合节点对于SimpleXML来说太复杂了-使用DOM。DOMNode::$textContent属性将返回任何节点的文本内容。对于元素节点,这包括任何后代节点的文本内容。DOMXpath::evaluate()也支持返回标量值的表达式。如果将节点列表转换为字符串,它将返回列表中第一个节点的文本内容。

// bootstrap DOM
$document = new DOMDocument();
$document->loadXML(getXML());
$xpath = new DOMXpath($document);

// iterate the subsection element nodes
foreach ($xpath->evaluate('//Subsection') as $subsection) {
    var_dump(
        [
            // text content of the "Label" child element
            'label' => $xpath->evaluate('string(Label)', $subsection),
            // text content of the "Text" child element
            'text' => $xpath->evaluate('string(Text)', $subsection),
        ]
    );
}

function getXML() {
  return <<<'XML'
<Section xmlns:lims="urn:lims">
 <Subsection lims:inforce-start-date="2003-07-01" lims:fid="182941" lims:id="182941">
  <Label>(2)</Label>
  <Text>
   In subsection (1),
    <DefinedTermEn>beer</DefinedTermEn>
   and
    <DefinedTermEn>malt liquor</DefinedTermEn>
   have the meaning assigned by section 4.
   </Text>
 </Subsection>
</Section>
XML;
}

输出:

array(2) {
  ["label"]=>
  string(3) "(2)"
  ["text"]=>
  string(101) "
   In subsection (1),
    beer
   and
    malt liquor
   have the meaning assigned by section 4.
   "
}
n6lpvg4x

n6lpvg4x2#

@ThW发布的答案解释了DOM如何更适合于此,但是这种方法可能会给您留下空白问题。您可能需要考虑编写一个函数来递归Text元素中的节点树,并构建一个字符串来修剪每个文本节点中的空格,只留下一行。

<?php

$input = <<<END
<Body>
<Section>
 <Subsection lims:inforce-start-date="2003-07-01" lims:fid="182941" lims:id="182941">
  <Label>(2)</Label>
  <Text>
   In subsection (1),
    <DefinedTermEn>beer</DefinedTermEn>
   and
    <DefinedTermEn>malt liquor</DefinedTermEn>
   have the meaning assigned by section 4.
   </Text>
 </Subsection>
</Section>
</Body>
END;

// Create DOMDocument instance and load XML
$dom = new DOMDocument();
$dom->loadXML($input);

// Instantiate XPath with our document
$xpath = new DOMXPath($dom);

// Get the Text elements
$textElements = $xpath->query("/Body/Section/Subsection/Text");

/**
 * Recursive function to build a string from the text content within
 * a DOMNode and its children. Whitespace is trimmed.
 *
 * @param DOMNode $node
 * @return string
 */
function getBranchText(DOMNode $node ) : string
{
    $buffer = [];

    if($node->nodeType == XML_TEXT_NODE || $node->nodeType == XML_CDATA_SECTION_NODE)
    {
        $buffer[] = trim($node->nodeValue);
    }
    elseif ($node->nodeType == XML_ELEMENT_NODE)
    {
        foreach($node->childNodes as $currChild)
        {
            $buffer[] = getBranchText($currChild);
        }
    }

    return implode(' ', $buffer);
}

$output = getBranchText($textElements[0]);

echo $output.PHP_EOL;

输出:

In subsection (1), beer and malt liquor have the meaning assigned by section 4.

相关问题