我的第一个perl脚本：在循环中使用“get($url)”方法？

l2osamch 于 2022-12-19 发布在 Perl

关注(0)|答案(3)|浏览(150)

看起来很简单，用一系列嵌套的循环遍历大量按年/月/日排序的URL，然后下载XML文件，因为这是我的第一个脚本，所以我从循环开始;在任何语言中都很熟悉的东西。我运行它只是打印构造的URL，它工作得很完美。
然后，我编写了下载内容并单独保存的代码，这对于多个测试用例的示例URL也非常有效。
但是当我把这两段代码组合在一起的时候，它坏了，程序只是卡住了，什么也不做。
因此，我运行了调试器，当我逐步调试时，调试器卡在了这一行上：

warnings::register::import(/usr/share/perl/5.10/warnings/register.pm:25):25:vec($warnings::Bits{$k}, $warnings::LAST_BIT, 1) = 0;

如果我只是点击r从子例程返回，它会工作并继续到调用堆栈中的另一个点，在那里类似的事情会反复发生一段时间。堆栈跟踪：

warnings::register::import('warnings::register') called from file `/usr/lib/perl/5.10/Socket.pm' line 7
Socket::BEGIN() called from file `/usr/lib/perl/5.10/Socket.pm' line 7
eval {...} called from file `/usr/lib/perl/5.10/Socket.pm' line 7
require 'Socket.pm' called from file `/usr/lib/perl/5.10/IO/Socket.pm' line 12
IO::Socket::BEGIN() called from file `/usr/lib/perl/5.10/Socket.pm' line 7
eval {...} called from file `/usr/lib/perl/5.10/Socket.pm' line 7
require 'IO/Socket.pm' called from file `/usr/share/perl5/LWP/Simple.pm' line 158
LWP::Simple::_trivial_http_get('www.aDatabase.com', 80, '/sittings/1987/oct/20.xml') called from file `/usr/share/perl5/LWP/Simple.pm' line 136
LWP::Simple::_get('http://www.aDatabase.com/1987/oct/20.xml') called from file `xmlfetch.pl' line 28

正如你所看到的，它被这个“get（$url）”方法卡住了，我不知道为什么？下面是我的代码：

#!/usr/bin/perl

use LWP::Simple;

$urlBase = 'http://www.aDatabase.com/subheading/';
$day=1;
$month=1;
@months=("list of months","jan","feb","mar","apr","may","jun","jul","aug","sep","oct","nov","dec");
$year=1987;
$nullXML = "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<nil-classes type=\"array\"/>\n";
    
while($year<=2006)
    {
    $month=1;
    while($month<=12)
        {
        $day=1;
        while($day<=31)
            {
            $newUrl = "$urlBase$year/$months[$month]/$day.xml";
            $content = get($newUrl);
            if($content ne $nullXML)
                {
                $filename = "$year-$month-$day.xml";
                open(FILE, ">$filename");
                print FILE $content;
                close(FILE);
                }
            $day++;
            }
        $month++;
        }
    $year++;
    }

我几乎肯定这是一些微小的，我只是不知道，但谷歌没有发现任何东西。

**EDIT：**这是官方的，它只是在get方法中永远挂起，运行几个循环，然后再挂起一段时间。但它仍然是一个问题。为什么会发生这种情况？

perl

来源：https://stackoverflow.com/questions/467104/my-first-perl-script-using-geturl-method-in-a-loop

3条答案

按热度按时间

bqf10yzr1#

由于http://www.adatabase.com/1987/oct/20.xml是一个404（并且不是可以从你的程序中生成的东西（路径中没有'subheading'），我假设这不是你正在使用的真实的的链接，这使得我们很难测试。作为一个一般规则，请使用example.com而不是虚构的主机名，这就是为什么它被保留。
你真的应该

use strict;
use warnings;

在你的代码中--这将有助于突出你可能有的任何作用域问题（如果是这样的话，我会很惊讶，但是有可能LWP代码的一部分正在扰乱你的$urlBase或其他东西）。我认为这应该足以改变初始变量声明（和$newUrl，$content和$filename），将'my'放在前面，使你的代码更严格。
如果使用strict和warnings不能让你更接近解决方案，你可以警告你将要使用的每个循环的链接，这样当它卡住时，你可以在浏览器中尝试，看看会发生什么，或者使用包嗅探器（如Wireshark）可以给予你一些线索。

赞(0）回复(0）举报 2022-12-19

7gyucuyw2#

(2006 - 1986) * 12 * 31超过7000。不停顿地请求网页是不好的。
稍微更像Perl的版本（代码风格方面）：

#!/usr/bin/perl
use strict;
use warnings;

use LWP::Simple qw(get);    

my $urlBase = 'http://www.example.com/subheading/';
my @months  = qw/jan feb mar apr may jun jul aug sep oct nov dec/;
my $nullXML = <<'NULLXML';
<?xml version="1.0" encoding="UTF-8"?>
<nil-classes type="array"/>
NULLXML

for my $year (1987..2006) {
    for my $month (0..$#months) {
        for my $day (1..31) {
            my $newUrl = "$urlBase$year/$months[$month]/$day.xml";
            my $content = "abc"; #XXX get($newUrl);
            if ($content ne $nullXML) {
               my $filename = "$year-@{[$month+1]}-$day.xml";
               open my $fh, ">$filename" 
                   or die "Can't open '$filename': $!";
               print $fh $content;
               # $fh implicitly closed
            }
        }
    }
}

赞(0）回复(0）举报 2022-12-19

pokxtpni3#

LWP有一个getstore函数，它为你做了大部分的获取和保存工作。你也可以 checkout LWP::Parallel::UserAgent，并对你如何访问远程站点进行更多的控制。

赞(0）回复(0）举报 2022-12-19