shell 从URL列表中获取页面标题

wfsdck30 于 2023-08-07 发布在 Shell

关注(0)|答案(2)|浏览(101)

我有一个URL列表，我需要在另一个列表中保存页面标题。wget或curl似乎是正确的方法，但我不知 prop 体如何操作。你能帮忙吗？谢啦，谢啦

shell

来源：https://stackoverflow.com/questions/55842311/get-page-titles-from-a-list-of-urls

2条答案

按热度按时间

wlzqhblo1#

你是说类似的事吗

wget_title_from_filelist.sh

#!/bin/bash
while read -r URL; do
    echo -n "$URL --> "
    wget -q -O - "$URL" | \
       tr "\n" " " | \
       sed 's|.*<title>\([^<]*\).*</head>.*|\1|;s|^\s*||;s|\s*$||'
    echo
done

字符串

filelist.txt

https://stackoverflow.com
https://cnn.com
https://reddit.com
https://archive.org

型

用法

./wget_title_from_filelist.sh < filelist.txt

型

输出

https://stackoverflow.com --> Stack Overflow - Where Developers Learn, Share, &amp; Build Careers
https://cnn.com --> CNN International - Breaking News, US News, World News and Video
https://reddit.com --> reddit: the front page of the internet
https://archive.org --> Internet Archive: Digital Library of Free &amp; Borrowable Books, Movies, Music &amp; Wayback Machine

型

解释

tr "\n" " "     # remove \n, create one line of input for sed

sed 's|.*<title>\([^<]*\).*</head>.*|\1|;   # find <title> in <head>
s|^\s*||;                                   # remove leading spaces
s|\s*$||'                                   # remove trailing spaces

型

赞(0）回复(0）举报 2023-08-07

6xfqseft2#

改进@utlox的答案，以处理具有属性（<title k=v>）的标题标签：

#!/bin/bash
while read -r URL; do
    echo -n "$URL --> "
    wget -q -O - "$URL" | \
       tr "\n" " " | \
       sed 's|.*<title[^>]*>\([^<]*\).*</head>.*|\1|;s|^\s*||;s|\s*$||'
    echo
done

字符串

赞(0）回复(0）举报 2023-08-07

我来回答

shell 从URL列表中获取页面标题

2条答案

相关问题

热门标签

最新问答