我需要你的帮助来学习的xml/文本格式文件提取我的xml/txt文件包含的一个巨大的数据如下所述的格式。
<authorList>
<author>
<fullName>Oliver LA</fullName>
<firstName>L A</firstName>
<lastName>Oliver</lastName>
<initials>LA</initials>
<authorAffiliationDetailsList>
<authorAffiliation>
<affiliation>University of Liverpool, Liverpool, UK. Electronic address: l.oliver@liverpool.ac.uk.</affiliation>
</authorAffiliation>
</authorAffiliationDetailsList>
</author>
<author>
<fullName>Hutton DP</fullName>
<firstName>D P</firstName>
<lastName>Hutton</lastName>
<initials>DP</initials>
<authorAffiliationDetailsList>
<authorAffiliation>
<affiliation>North West Radiotherapy Operational Delivery Network, The Christie Hospital, Manchester, UK; University of Liverpool, Liverpool, UK.</affiliation>
</authorAffiliation>
</authorAffiliationDetailsList>
</author>
<author>
<fullName>Hall T</fullName>
<firstName>T</firstName>
<lastName>Hall</lastName>
<initials>T</initials>
<authorAffiliationDetailsList>
<authorAffiliation>
<affiliation>North West Radiotherapy Operational Delivery Network, The Christie Hospital, Manchester, UK.</affiliation>
</authorAffiliation>
</authorAffiliationDetailsList>
</author>
<author>
<fullName>Cain M</fullName>
<firstName>M</firstName>
<lastName>Cain</lastName>
<initials>M</initials>
<authorAffiliationDetailsList>
<authorAffiliation>
<affiliation>Clatterbridge Cancer Centre, Liverpool, UK.</affiliation>
</authorAffiliation>
</authorAffiliationDetailsList>
</author>
<author>
<fullName>Bates M</fullName>
<firstName>M</firstName>
<lastName>Bates</lastName>
<initials>M</initials>
<authorAffiliationDetailsList>
<authorAffiliation>
<affiliation>East of England Radiotherapy Network, Norfolk & Norwich University Hospital, Norwich, UK.</affiliation>
</authorAffiliation>
</authorAffiliationDetailsList>
</author>
<author>
<fullName>Cree A</fullName>
<firstName>A</firstName>
<lastName>Cree</lastName>
<initials>A</initials>
<authorAffiliationDetailsList>
<authorAffiliation>
<affiliation>Clatterbridge Cancer Centre, Liverpool, UK.</affiliation>
</authorAffiliation>
</authorAffiliationDetailsList>
</author>
<author>
<fullName>Mullen E</fullName>
<firstName>E</firstName>
<lastName>Mullen</lastName>
<initials>E</initials>
<authorAffiliationDetailsList>
<authorAffiliation>
<affiliation>Clatterbridge Cancer Centre, Liverpool, UK.</affiliation>
</authorAffiliation>
</authorAffiliationDetailsList>
</author>
</authorList>
我需要的输出格式,如电子邮件,名,姓,从属关系和输出应导出到文本文件。
通过使用Perl软件,我开发了下面提到的代码。
#!usr/bin/perl
use strict;
use warnings;
open(FILEHANDLE, "<data.xml")|| die "Can't open";
my @line;
my @affi;
my @lines;
my $ct =1 ;
print "Enter the start position:-";
my $start= <STDIN>;
print "Enter the end position:-";
my $end = <STDIN>;
print "Processing your data...\n";
my $i =0;
my $t =0;
while(<FILEHANDLE>)
{
if($ct>$end)
{
close(FILEHANDLE);
exit;
}
if($ct>=$start)
{
$lines[$t] = $_;
$t++;
}
if($ct == $end)
{
my $i = 0;
my $j = 0;
my @last;
my @first;
my $l = @lines;
my $s = 0;
while($j<$l)
{
if ($lines[$j] =~m/@/)
{
$line[$i] = $lines[$j];
$s = $j-3;
$first[$i]=$lines[$s];
$s--;
$last[$i] = $lines[$s];
#$j = $j+3;
#$last[$i]= $lines[$j];
#$j++;
#$first[$i] = $lines[$j];
$i++;
}
$j++;
}
my $k = 0;
foreach(@line)
{
$line[$k] =~ s/<.*>(.* )(.*@.*)<.*>/$2/;
$affi[$k] = $1;
$line[$k] = $2;
$line[$k] =~ s/\.$//;
$k++;
}
my $u = 0;
foreach(@first)
{
$first[$u] =~s/<firstName>(.*)<.*>/$1/;
$first[$u]=$l;
$u++
}
my $m = 0;
foreach(@last)
{
$last[$m] =~s/<lastName>(.*)<.*>/$1/;
$last[$m] = $1;
$m++
}
my $q=@line;
open(FILE,">RAVI.txt")|| die "can't open";
my $p;
for($p =0; $p<$q; $p++)
{
print FILE "$line[$p],$first[$p],$last[$p],$affi[$p]\n";
}
close(FILE);
}
$ct++;
}
通过使用此代码,我能够得到输出电子邮件,,姓氏,隶属关系格式。
我无法通过使用给定数据中的代码获得firstName。我是Perl技术的新手。我请求您帮助我修复代码中的错误。提前感谢您。
2条答案
按热度按时间mpbci0fu1#
正如我在评论中所说,最好使用已知的
XML
解析器,其中之一是XML::XPath:输出
用法
zour9fqk2#
你的错误是试图编写自己的XML解析器。这是一件很难做对的事情。使用已经编写好的解析器要好得多。
我总是使用XML::LibXML(它的文档很糟糕,但有一个great tutorial online)。
您的程序的第一次尝试类似于以下内容:
请注意,我将所有输出都放在引号中--这是因为affiliation节点包含嵌入式逗号。
实际上,您需要对从属关系数据进行更多的处理来提取电子邮件地址,但我希望这能帮助您找到解决方案。