Trouble retrieving multiple sequences from NCBI in a single list query

View: New views
7 Messages — Rating Filter:   Alert me  

Trouble retrieving multiple sequences from NCBI in a single list query

by jluis.lavin :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


Hello all,

I´m a newbie who is having terrible troubles trying to retrieve a list
multiple sequences from the NCBI and write them to a single file in Fasta
format.
The code I´ve written seems to read mylist and retrive the sequences, but
it kinda overwrites them so that I only get the last sequence on the list.
I´ve been told to ask the people on this mailing list for help, since you
may have come across this problem also or at last will know how to solve
it...

Here is my code, which basically consist on an STDIN for the list to be
read into an array and a loop to read each sequence (stopping when the
list ends) and retrieve a sequence each time the loop is launched,
writting that sequence to a fasta file. I only get a sequence back
although it seems to perform the retrieving process with each of the
sequences of the list...


#!/usr/bin/perl -w
use strict;
use Bio::DB::GenPept;
use Bio::DB::GenBank;
use Bio::SeqIO;
print "Enter your list name:";
my $archivo=<STDIN>;
chomp $archivo;
die ("Can´t open input\n") unless (open(INFILE, $archivo));
my @lista = <INFILE>;
foreach my $seq (@lista) {
    if ($seq eq '') {
        die ("empty list")
        }
    else {
my $db = new Bio::DB::GenPept("-format" => "Fasta");
my $seqobj = $db->get_Seq_by_acc($seq);
my $out = new Bio::SeqIO (-file => ">extracted_seqs.fasta",
-format => 'fasta');
$out->write_seq($seqobj);
}
}
exit;


An example list of sequences can be this one:

YP_003107578.1
YP_003106103.1
YP_003106552.1
YP_003106560.1
YP_003107053.1
YP_003107450.1
YP_003108000.1
YP_003105023.1
YP_003105264.1

Thanks in advance for your help ;)

--
José Luis Lavín Trueba, PhD

Dpto. de Producción Agraria
Grupo de Genética y Microbiología
Universidad Pública de Navarra
31006 Pamplona
Navarra
SPAIN


_______________________________________________
Bioperl-l mailing list
Bioperl-l@...
http://lists.open-bio.org/mailman/listinfo/bioperl-l

Re: Trouble retrieving multiple sequences from NCBI in a single list query

by Dave Messina-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

>
> The code I´ve written seems to read mylist and retrive the sequences, but
> it kinda overwrites them so that I only get the last sequence on the list.
>

With this line

my $out = new Bio::SeqIO (-file => ">extracted_seqs.fasta", -format =>
'fasta');


you are opening the filehandle for the output file inside your loop, so each
time it is writing over the previous file with an empty file. Then, you
write a single sequence to that file with this line

$out->write_seq($seqobj);


So when you are done, you just have the last sequence in the output file.

If you move the opening of the output filehandle outside the loop (it needs
to be done only once), then it should work as you expect.

Also, I notice the newline characters are not being removed from your
sequence IDs  (actually I'm a little surprised that the sequences are being
retrieved). Just to be safe, you may want to add the line

chomp @lista;


after

my @lista = <INFILE>;




Dave

_______________________________________________
Bioperl-l mailing list
Bioperl-l@...
http://lists.open-bio.org/mailman/listinfo/bioperl-l

Re: Trouble retrieving multiple sequences from NCBI in a single list query

by Hotz, Hans-Rudolf-2 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi

try

my $out = new Bio::SeqIO (-file => ">>extracted_seqs.fasta",
                                     ^

this way you no longer overwrite your existing file, but append the next
sequence.

Regards, Hans



On 11/4/09 9:43 AM, "jluis.lavin@..." <jluis.lavin@...>
wrote:

>
> Hello all,
>
> I´m a newbie who is having terrible troubles trying to retrieve a list
> multiple sequences from the NCBI and write them to a single file in Fasta
> format.
> The code I´ve written seems to read mylist and retrive the sequences, but
> it kinda overwrites them so that I only get the last sequence on the list.
> I´ve been told to ask the people on this mailing list for help, since you
> may have come across this problem also or at last will know how to solve
> it...
>
> Here is my code, which basically consist on an STDIN for the list to be
> read into an array and a loop to read each sequence (stopping when the
> list ends) and retrieve a sequence each time the loop is launched,
> writting that sequence to a fasta file. I only get a sequence back
> although it seems to perform the retrieving process with each of the
> sequences of the list...
>
>
> #!/usr/bin/perl -w
> use strict;
> use Bio::DB::GenPept;
> use Bio::DB::GenBank;
> use Bio::SeqIO;
> print "Enter your list name:";
> my $archivo=<STDIN>;
> chomp $archivo;
> die ("Can´t open input\n") unless (open(INFILE, $archivo));
> my @lista = <INFILE>;
> foreach my $seq (@lista) {
>     if ($seq eq '') {
>         die ("empty list")
>         }
>     else {
> my $db = new Bio::DB::GenPept("-format" => "Fasta");
> my $seqobj = $db->get_Seq_by_acc($seq);
> my $out = new Bio::SeqIO (-file => ">extracted_seqs.fasta",
> -format => 'fasta');
> $out->write_seq($seqobj);
> }
> }
> exit;
>
>
> An example list of sequences can be this one:
>
> YP_003107578.1
> YP_003106103.1
> YP_003106552.1
> YP_003106560.1
> YP_003107053.1
> YP_003107450.1
> YP_003108000.1
> YP_003105023.1
> YP_003105264.1
>
> Thanks in advance for your help ;)


_______________________________________________
Bioperl-l mailing list
Bioperl-l@...
http://lists.open-bio.org/mailman/listinfo/bioperl-l

Re: Trouble retrieving multiple sequences from NCBI in a single list query

by jluis.lavin :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Thank you very very much Dave,
I´ve had a really frustrating time trying to find out what I was doing
wrong, it has been so frustrating that I was about to quit Bioperl.
Now I can try to focus on BLAST parsing for my comparative genomic analysis

You´re great in this mailing list, because you give a fast and neat advice
to all the questions asked here by newbies like me ;)


El Mie, 4 de Noviembre de 2009, 10:52, Dave Messina escribió:

>>
>> The code I´ve written seems to read mylist and retrive the sequences,
>> but
>> it kinda overwrites them so that I only get the last sequence on the
>> list.
>>
>
> With this line
>
> my $out = new Bio::SeqIO (-file => ">extracted_seqs.fasta", -format =>
> 'fasta');
>
>
> you are opening the filehandle for the output file inside your loop, so
> each
> time it is writing over the previous file with an empty file. Then, you
> write a single sequence to that file with this line
>
> $out->write_seq($seqobj);
>
>
> So when you are done, you just have the last sequence in the output file.
>
> If you move the opening of the output filehandle outside the loop (it
> needs
> to be done only once), then it should work as you expect.
>
> Also, I notice the newline characters are not being removed from your
> sequence IDs  (actually I'm a little surprised that the sequences are
> being
> retrieved). Just to be safe, you may want to add the line
>
> chomp @lista;
>
>
> after
>
> my @lista = <INFILE>;
>
>
>
>
> Dave
>


--
Dr. José Luis Lavín Trueba

Dpto. de Producción Agraria
Grupo de Genética y Microbiología
Universidad Pública de Navarra
31006 Pamplona
Navarra
SPAIN


_______________________________________________
Bioperl-l mailing list
Bioperl-l@...
http://lists.open-bio.org/mailman/listinfo/bioperl-l

Re: Trouble retrieving multiple sequences from NCBI in asingle list query

by jluis.lavin :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Thank you very much for your answer Hans!!!
It works perfectly,also a neat and fast solution, like Dave´s.

Blessings to you all ;)

El Mie, 4 de Noviembre de 2009, 11:05, Hotz, Hans-Rudolf escribió:

> Hi
>
> try
>
> my $out = new Bio::SeqIO (-file => ">>extracted_seqs.fasta",
>                                      ^
>
> this way you no longer overwrite your existing file, but append the next
> sequence.
>
> Regards, Hans
>
>
>
> On 11/4/09 9:43 AM, "jluis.lavin@..." <jluis.lavin@...>
> wrote:
>
>>
>> Hello all,
>>
>> I´m a newbie who is having terrible troubles trying to retrieve a list
>> multiple sequences from the NCBI and write them to a single file in
>> Fasta
>> format.
>> The code I´ve written seems to read mylist and retrive the sequences,
>> but
>> it kinda overwrites them so that I only get the last sequence on the
>> list.
>> I´ve been told to ask the people on this mailing list for help, since
>> you
>> may have come across this problem also or at last will know how to solve
>> it...
>>
>> Here is my code, which basically consist on an STDIN for the list to be
>> read into an array and a loop to read each sequence (stopping when the
>> list ends) and retrieve a sequence each time the loop is launched,
>> writting that sequence to a fasta file. I only get a sequence back
>> although it seems to perform the retrieving process with each of the
>> sequences of the list...
>>
>>
>> #!/usr/bin/perl -w
>> use strict;
>> use Bio::DB::GenPept;
>> use Bio::DB::GenBank;
>> use Bio::SeqIO;
>> print "Enter your list name:";
>> my $archivo=<STDIN>;
>> chomp $archivo;
>> die ("Can´t open input\n") unless (open(INFILE, $archivo));
>> my @lista = <INFILE>;
>> foreach my $seq (@lista) {
>>     if ($seq eq '') {
>>         die ("empty list")
>>         }
>>     else {
>> my $db = new Bio::DB::GenPept("-format" => "Fasta");
>> my $seqobj = $db->get_Seq_by_acc($seq);
>> my $out = new Bio::SeqIO (-file => ">extracted_seqs.fasta",
>> -format => 'fasta');
>> $out->write_seq($seqobj);
>> }
>> }
>> exit;
>>
>>
>> An example list of sequences can be this one:
>>
>> YP_003107578.1
>> YP_003106103.1
>> YP_003106552.1
>> YP_003106560.1
>> YP_003107053.1
>> YP_003107450.1
>> YP_003108000.1
>> YP_003105023.1
>> YP_003105264.1
>>
>> Thanks in advance for your help ;)
>
>


--
Dr. José Luis Lavín Trueba

Dpto. de Producción Agraria
Grupo de Genética y Microbiología
Universidad Pública de Navarra
31006 Pamplona
Navarra
SPAIN


_______________________________________________
Bioperl-l mailing list
Bioperl-l@...
http://lists.open-bio.org/mailman/listinfo/bioperl-l

Re: Trouble retrieving multiple sequences from NCBI in a single list query

by Dave Messina-3 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Aw shucks, José, glad I could be of help. There are plenty of people who
answer questions around here, but my timezone sometimes gives me an
advantage for the European ones. :)


Dave

_______________________________________________
Bioperl-l mailing list
Bioperl-l@...
http://lists.open-bio.org/mailman/listinfo/bioperl-l

Re: Trouble retrieving multiple sequences from NCBI ina single list query

by Mark A. Jensen :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

True, Dave, you compete only with crazed east coast core developers who're doing
"just one more thing" at 2am....
----- Original Message -----
From: "Dave Messina" <David.Messina@...>
To: <jluis.lavin@...>
Cc: <bioperl-l@...>
Sent: Wednesday, November 04, 2009 9:11 AM
Subject: Re: [Bioperl-l] Trouble retrieving multiple sequences from NCBI ina
single list query


> Aw shucks, José, glad I could be of help. There are plenty of people who
> answer questions around here, but my timezone sometimes gives me an
> advantage for the European ones. :)
>
>
> Dave
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@...
> http://lists.open-bio.org/mailman/listinfo/bioperl-l 

_______________________________________________
Bioperl-l mailing list
Bioperl-l@...
http://lists.open-bio.org/mailman/listinfo/bioperl-l