regex 如何将xml中所有标签中的标签转换为小写而不改变属性值的大小写?

j2cgzkjk  于 2023-03-20  发布在  其他
关注(0)|答案(4)|浏览(125)

我继承了一些xml文件,其中所有的标记都是大写的。我想使用正则表达式或通过XSLT将它们转换为小写。如果能够知道这两种方法,那将非常方便。不幸的是,我发现正则表达式和XSLT语法有时令人困惑,但我正在努力。:)
(Edit:添加以下人为示例)
之前:

<?xml version="1.0"?>
<NOVEL TITLE="Now That's A Novel Title" AUTHOR="Harry Handelbar">
  <PREFACE>  <!-- XHTML FORMATTED TEXT -->
    <P>It would be remiss of me to neglect to thank the bottle.</P>
  </PREFACE>
  <CHAPTER TITLE="" TYPE="NUM">
    <PROLOGUE>Success, like death, marks the end of... </PROLOGUE>
      <MAINTEXT> <!-- XHTML FORMATTED TEXT -->
      <P>It seems a violent betrayal, me divulging how...</P>
      <P>The years had not been kind Felix Lake. His constant...</P>
    </MAINTEXT>
  </CHAPTER>
  <CHAPTER TITLE="" TYPE="NUM">
  <MAINTEXT> <!-- XHTML FORMATTED TEXT -->
    <P>As luck would not have it, he did.</P>
    <!-- ECT ECT ECT -->
 </MAINTEXT>
  </CHAPTER>
</NOVEL>

之后:

<?xml version="1.0"?>
<novel title="Now That's A Novel Title" author="Harry Handelbar">
  <preface>  <!-- XHTML FORMATTED TEXT -->
    <p>It would be remiss of me to neglect to thank the bottle.</p>
  </preface>
  <chapter title="" type="NUM">
    <prologue>Success, like death, marks the end of... </prologue>
      <maintext> <!-- XHTML FORMATTED TEXT -->
      <p>It seems a violent betrayal, me divulging how...</p>
      <p>The years had not been kind Felix Lake. His constant...</p>
    </maintext>
  </chapter>
  <chapter title="" type="NUM">
  <maintext> <!-- XHTML FORMATTED TEXT -->
    <p>As luck would not have it, he did.</p>
    <!-- ECT ECT ECT -->
 </maintext>
  </chapter>
</novel>

希望能有所帮助。
编辑:我对P标签的错误- after也应该是小写的)

sirbozc5

sirbozc51#

尝试(未测试):

XSLT 2.0:

<xsl:stylesheet version="2.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>

<xsl:template match="*">
    <xsl:element name="{lower-case(local-name())}" namespace="{namespace-uri()}">
        <xsl:apply-templates select="@*|node()"/>
    </xsl:element>
</xsl:template>

<xsl:template match="@*">
    <xsl:attribute name="{lower-case(local-name())}" namespace="{namespace-uri()}">
        <xsl:value-of select="."/>
    </xsl:attribute>
</xsl:template>

<xsl:template match="comment() | text() | processing-instruction()">
    <xsl:copy/>
</xsl:template>

</xsl:stylesheet>

上述内容的XSLT 1.0版本如下所示:

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>

<xsl:variable name="uppercase" select="'ABCDEFGHIJKLMNOPQRSTUVWXYZ'" />
<xsl:variable name="lowercase" select="'abcdefghijklmnopqrstuvwxyz'" />

<xsl:template match="*">
    <xsl:element name="{translate(local-name(), $uppercase, $lowercase)}" namespace="{namespace-uri()}">
        <xsl:apply-templates select="@*|node()"/>
    </xsl:element>
</xsl:template>

<xsl:template match="@*">
    <xsl:attribute name="{translate(local-name(), $uppercase, $lowercase)}" namespace="{namespace-uri()}">
        <xsl:value-of select="."/>
    </xsl:attribute>
</xsl:template>

<xsl:template match="comment() | text() | processing-instruction()">
    <xsl:copy/>
</xsl:template>

</xsl:stylesheet>

但是,这是假设您的元素和属性名称不包含除明确列出的26个字符以外的大写字符(即,不包含俄语、希腊语、变音符号等)。

ulydmbyx

ulydmbyx2#

尝试使用此正则表达式:

<(\/?[a-zA-Z]*)\b.*?>

在线测试人员:http://regex101.com/#PCRE
享受您的代码

ie3xauqp

ie3xauqp3#

通过使用PHP,你可以这样做...

<?php

$pattern= '/<\\w+|<\/\\w+/';
$fp = fopen("/Applications/XAMPP/htdocs/test/test.xml", "r") or die("can't read stdin");
while (!feof($fp)) {
    $line = fgets($fp);
    $line = preg_replace_callback(
        $pattern,
        function ($matches) {
            return strtolower($matches[0]);
        },
        $line
    );
    echo htmlentities($line);
}
fclose($fp);
?>

它工作正常;)

fwzugrvs

fwzugrvs4#

在我看来,您可能需要2个正则表达式-一个用于转换标记名,另一个用于转换可变数量的属性-值对。
我是这么做的-

blah:tmp shreyas$ cat old.xml | perl -pe "s|(</?)([^> ]+)(.*?>)|\1\L\2\E\3|g" | perl -pe "s|(\w+)( ?= ?\".*?\")|\L\1\E\2|g" > processed.xml
blah:tmp shreyas$ diff new.xml processed.xml 
4c4
<     <P>It would be remiss of me to neglect to thank the bottle.</P>
---
>     <p>It would be remiss of me to neglect to thank the bottle.</p>
9,10c9,10
<       <P>It seems a violent betrayal, me divulging how...</P>
<       <P>The years had not been kind Felix Lake. His constant...</P>
---
>       <p>It seems a violent betrayal, me divulging how...</p>
>       <p>The years had not been kind Felix Lake. His constant...</p>
15c15
<     <P>As luck would not have it, he did.</P>
---
>     <p>As luck would not have it, he did.</p>

old.xml是您的Before xml,new.xml是您的After xml。processed.xml是命令生成的文件。
正如你所看到的,你的after xml中的P标签仍然是大写的。我不确定他们是错别字还是例外。我把他们当作错别字处理,因为你提到把所有标签都改为小写。
只需稍加修改,就可以在继承的所有XML集上运行这些命令,并快速转换它们。

相关问题