ubuntu 如何< a>从html文件中找出所有标签链接和名称

xpcnnkqh 于 2023-02-03 发布在其他

关注(0)|答案(2)|浏览(116)

下面是一个测试文件，其中包含<a></a>标记内的链接和名称。

/临时文件/测试_html.txt

<tr>
<td><a href="http://www.example.com/link1">example link 1</a></td>
</tr>
<tr>
<td><a href="http://www.example.com/link2">example link 2</a></td>
</tr>
<tr>
<td><a href="http://www.example.com/link3">example link 3</a></td>
</tr>
<tr>
<td><a href="https://www.example.com/4/0/1/40116601-1FDC-real-world-link/bar" target="_blank" class="real-world-class">Real World Link</a>&nbsp;</td>
</tr>

下面的命令可以从文件中找到所有链接，但不能同时打印link和name：
How to strip out all of the links of an HTML file in Bash or grep or batch and store them in a text file

# sed -n 's/.*href="\([^"]*\).*/\1/p' /tmp/test_html.txt

我希望该命令可以先用name打印all links line by line，然后再用href打印。

以下是预期输出：

# sed <...command....> /tmp/test_html.txt

example link 1 | http://www.example.com/link1
example link 2 | http://www.example.com/link2
example link 3 | http://www.example.com/link3
Real World Link | https://www.example.com/4/0/1/40116601-1FDC-real-world-link/bar

如何编写sed命令？

ubuntu

来源：https://stackoverflow.com/questions/75296351/how-to-find-out-all-a-tag-links-and-names-from-html-file

2条答案

按热度按时间

pftdvrlh1#

这可能对您有用（GNU sed）：

sed -En 's/.*href="([^"]*)"[^>]*>([^<]*)<.*/\2 | \1/p' file

使用-n选项过滤行，并使用-E选项简化regexp。
匹配包含href且后跟内部文本的行，并根据需要使用反向引用设置格式。

赞(0）回复(0）举报 2023-02-03

n53p2ov02#

这个解决办法似乎行得通;请标记为正确或发表评论，解释为什么它是不正确的;谢谢!

cat input3 | sed -n 's/^.*<a href="\(.*\)">example link\( [0-9][0-9]*\)<\/a><\/td>$/example link\2 | \1/p'

赞(0）回复(0）举报 2023-02-03

我来回答

ubuntu 如何< a>从html文件中找出所有标签链接和名称

2条答案

相关问题

热门标签

最新问答