使用shell脚本从xml文件中删除多余的行和空格

dluptydi 于 2023-10-23 发布在 Shell

关注(0)|答案(2)|浏览(171)

我有一个XML文件，里面有很多数据。但有些标签已在另一行，而不是在同一行。我需要做这使用shell脚本
输入

<lineid>Product 
testing machine 
</lineid>

预期输出

<lineid>Product testing machine </lineid>

在输入中，我给出了额外的一行，因为输入也显示为与输出相同。
输入数据不是单行，我想在单行，也想在同一个文件中做的变化。

shell

来源：https://stackoverflow.com/questions/76993557/remove-extra-lines-and-spaces-from-xml-file-using-shell-script

2条答案

按热度按时间

ia2d9nvy1#

这应该把所有内容放在一行中，并删除多余的空格。它需要一个文件名作为参数。因此，如果您将此脚本保存为formatter.sh，并将输入文件保存为input.txt，则可以将其称为：

./formatter.sh input.txt

输出被保存到同一个文件，所以一定要尝试它的副本！

#!/bin/bash

input_file="$1"  # Replace with the path to your input file

if [ -f "$input_file" ]; then
    input=$(cat "$input_file")
    formatted=$(echo "$input" | tr -d '\n' | sed -e 's/ *$//' -e 's/  */ /g')
    echo "$formatted" > "$input_file"
else
    echo "Input file not found: $input_file"
fi

赞(0）回复(0）举报 2023-10-23

63lcw9qa2#

根据我对您的要求的理解，简单XML的标签可以压缩为这样的内容：

#!/bin/bash

if [ $# -lt 1 ]; then echo "no file provided"; exit 1; fi
xml_input="$1"
if [ ! -r ${xml_input} ]; then echo "file not readable"; exit 1; fi
xml_temp="$(mktemp /tmp/${xml_input}.XXXXXXXXX)" || exit 1

tr '\n' ' ' < "${xml_input}" > "${xml_temp}"
sed -i 's/\r/ /g' "${xml_temp}"
sed -i 's/  */ /g' "${xml_temp}"
sed -i 's/?> /?>/g' "${xml_temp}"
sed -i 's/?>/?>\n/g' "${xml_temp}"
sed -i 's/> </>\n</g' "${xml_temp}"
mv "${xml_temp}" "${xml_input}"

这将转换：

<?xml version="1.0" encoding="UTF-8"?><root>

<lineid>
     Product  
     testing machine  
     
     </lineid>
                    <lineid>Product testing machine

                    </lineid>
    </root>

收件人：

<?xml version="1.0" encoding="UTF-8"?>
<root>
<lineid> Product testing machine </lineid>
<lineid>Product testing machine </lineid>
</root>

但是一个适合所有XML情况的shell脚本将是巨大的，或者只是一个用另一种语言编写的实际解析器的调用者。有很多很好的解释：
https://stackoverflow.com/a/8577108/1919793
Can you provide some examples of why it is hard to parse XML and HTML with a regex?
Why it's not possible to use regex to parse HTML/XML: a formal explanation in layman's terms
许多文本编辑器会为你做得更好：
How do I format XML in Notepad++?

赞(0）回复(0）举报 2023-10-23

我来回答

使用shell脚本从xml文件中删除多余的行和空格

2条答案

相关问题

热门标签

最新问答