我在写一个脚本从基因库中检索数据。我只需要注解的COMMENT部分之前的信息。
这是我的输入:
LOCUS mitochondrion_genome 19524 bp DNA HTG 17-DEC-2022
DEFINITION Drosophila melanogaster primary_assembly mitochondrion_genome BDGP6.32 full
sequence 1..19524 reannotated via EnsEMBL
ACCESSION primary_assembly:BDGP6.32:mitochondrion_genome:1:19524:1
VERSION mitochondrion_genomeBDGP6.32
KEYWORDS .
SOURCE fruit fly
ORGANISM Drosophila melanogaster
Eukaryota; Opisthokonta; Metazoa; Eumetazoa; Bilateria;
Protostomia; Ecdysozoa; Panarthropoda; Arthropoda; Mandibulata;
Pancrustacea; Hexapoda; Insecta; Dicondylia; Pterygota; Neoptera;
Endopterygota; Diptera; Brachycera; Muscomorpha; Eremoneura;
Cyclorrhapha; Schizophora; Acalyptratae; Ephydroidea;
Drosophilidae; Drosophilinae.
COMMENT This sequence was annotated by FlyBase (https://www.flybase.org). Please visit the
Ensembl or EnsemblGenomes web site, http://www.ensembl.org/ or
http://www.ensemblgenomes.org/ for more information.
当前脚本:
$genbank = <STDIN>;
chomp ($genbank);
open (READ, "<$genbank") or die;
@data = <READ>;
close READ;
$end= $#data;
for ($line= 0; $line<= $end; $line++){
if ($data[$line] =~ /LOCUS/){
@annotation = (@annotation, $data[$line]);
until ($data[$line] =~ /COMMENT/){
$line++;
@annotation = (@annotation, $data[$line]);
}}}
print @annotation;
其输出:
LOCUS mitochondrion_genome 19524 bp DNA HTG 17-DEC-2022
DEFINITION Drosophila melanogaster primary_assembly mitochondrion_genome BDGP6.32 full
sequence 1..19524 reannotated via EnsEMBL
ACCESSION primary_assembly:BDGP6.32:mitochondrion_genome:1:19524:1
VERSION mitochondrion_genomeBDGP6.32
KEYWORDS .
SOURCE fruit fly
ORGANISM Drosophila melanogaster
Eukaryota; Opisthokonta; Metazoa; Eumetazoa; Bilateria;
Protostomia; Ecdysozoa; Panarthropoda; Arthropoda; Mandibulata;
Pancrustacea; Hexapoda; Insecta; Dicondylia; Pterygota; Neoptera;
Endopterygota; Diptera; Brachycera; Muscomorpha; Eremoneura;
Cyclorrhapha; Schizophora; Acalyptratae; Ephydroidea;
Drosophilidae; Drosophilinae.
COMMENT This sequence was annotated by FlyBase (https://www.flybase.org). Please visit the
正如你所看到的,这个方法有一个问题。
如何修改代码,使其检索数据,但在COMMENT处停止,而不检索整行?
所有GenBank文件的第一行都以LOCUS开头,我想这可以用来编写更好的代码(因此可以在没有正则表达式匹配的情况下完成)。我不知道该怎么做。我真的很感激你的输入!!
1条答案
按热度按时间qyswt5oh1#
这看起来比它需要的要复杂得多。我会用这样的话: