shell 根据AWK脚本中的模式处理文本的特定部分

piv4azn7  于 2022-11-16  发布在  Shell
关注(0)|答案(2)|浏览(144)

我在awk中开发一个脚本,根据我的喜好将tex文档转换为html。

#!/bin/awk -f

BEGIN {
    FS="\n";
    print "<html><body>"
}
# Function to print a row with one argument to handle either a 'th' tag or 'td' tag
function printRow(tag) {
    for(i=1; i<=NF; i++) print "<"tag">"$i"</"tag">";
}

NR>1 {
   [conditions]
   printRow("p")
}

END {
    print "</body></html>"
}

正如所见,它处于一个非常年轻的发展阶段。

\documentclass[a4paper, 11pt, titlepage]{article}
\usepackage{fancyhdr}
\usepackage{graphicx}
\usepackage{imakeidx}
[...]

\begin{document}

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nulla placerat lectus sit amet augue facilisis, eget viverra sem pellentesque. Nulla vehicula metus risus, vel condimentum nunc dignissim eget. Vivamus quis sagittis tellus, eget ullamcorper libero. Nulla vitae fringilla nunc. Vivamus id suscipit mi. Phasellus porta lacinia dolor, at congue eros rhoncus vitae. Donec vel condimentum sapien. Curabitur est massa, finibus vel iaculis id, dignissim nec nisl. Sed non justo orci. Morbi quis orci efficitur sem porttitor pulvinar. Duis consectetur rhoncus posuere. Duis cursus neque semper lectus fermentum rhoncus.

\end{document}

我想要的是,脚本只解释\begin{document}\end{document}之间的行,因为在它们之前是库、变量等的导入;目前我对这些不感兴趣。
如何使它只处理该模式中的文本?

wko9yo5t

wko9yo5t1#

GNU AWK有一个名为Range的特性,当你提供两个被,剪切的条件时,动作将只应用于具有这些条件的行之间(包括这些行),考虑下面的简单例子,让file.txt内容为

junk
\begin{document}
desired text
more desired text
\end{document}
more junk

然后

awk '$0=="\\begin{document}",$0=="\\end{document}"{print}' file.txt

给出输出

\begin{document}
desired text
more desired text
\end{document}
  • (在gawk 4.2.1中测试)*
wwtsj6pe

wwtsj6pe2#

使用正则表达式设置一个标志,然后根据该标志进行打印:

awk '/^\\begin{document}/{flag=1} 
flag
/^\\end{document}/{flag=0}' file

打印起始字符串和结束字符串之间的所有内容(包括起始字符串和结束字符串):

\begin{document}

Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nulla placerat lectus sit amet augue facilisis, eget viverra sem pellentesque. Nulla vehicula metus risus, vel condimentum nunc dignissim eget. Vivamus quis sagittis tellus, eget ullamcorper libero. Nulla vitae fringilla nunc. Vivamus id suscipit mi. Phasellus porta lacinia dolor, at congue eros rhoncus vitae. Donec vel condimentum sapien. Curabitur est massa, finibus vel iaculis id, dignissim nec nisl. Sed non justo orci. Morbi quis orci efficitur sem porttitor pulvinar. Duis consectetur rhoncus posuere. Duis cursus neque semper lectus fermentum rhoncus.

\end{document}

如果您只需要起始字符串和结束字符串之间的文本,但不包括起始字符串和结束字符串:

awk '
/^\\begin{document}/{flag=1; next} 
/^\\end{document}/{flag=0}
flag' file

印刷品:

# leading blank line printed...
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nulla placerat lectus sit amet augue facilisis, eget viverra sem pellentesque. Nulla vehicula metus risus, vel condimentum nunc dignissim eget. Vivamus quis sagittis tellus, eget ullamcorper libero. Nulla vitae fringilla nunc. Vivamus id suscipit mi. Phasellus porta lacinia dolor, at congue eros rhoncus vitae. Donec vel condimentum sapien. Curabitur est massa, finibus vel iaculis id, dignissim nec nisl. Sed non justo orci. Morbi quis orci efficitur sem porttitor pulvinar. Duis consectetur rhoncus posuere. Duis cursus neque semper lectus fermentum rhoncus.
# ending blank line printed...

相关问题