|
View:
New views
7 Messages
—
Rating Filter:
Alert me
|
|
|
Trouble retrieving multiple sequences from NCBI in a single list queryHello all, I´m a newbie who is having terrible troubles trying to retrieve a list multiple sequences from the NCBI and write them to a single file in Fasta format. The code I´ve written seems to read mylist and retrive the sequences, but it kinda overwrites them so that I only get the last sequence on the list. I´ve been told to ask the people on this mailing list for help, since you may have come across this problem also or at last will know how to solve it... Here is my code, which basically consist on an STDIN for the list to be read into an array and a loop to read each sequence (stopping when the list ends) and retrieve a sequence each time the loop is launched, writting that sequence to a fasta file. I only get a sequence back although it seems to perform the retrieving process with each of the sequences of the list... #!/usr/bin/perl -w use strict; use Bio::DB::GenPept; use Bio::DB::GenBank; use Bio::SeqIO; print "Enter your list name:"; my $archivo=<STDIN>; chomp $archivo; die ("Can´t open input\n") unless (open(INFILE, $archivo)); my @lista = <INFILE>; foreach my $seq (@lista) { if ($seq eq '') { die ("empty list") } else { my $db = new Bio::DB::GenPept("-format" => "Fasta"); my $seqobj = $db->get_Seq_by_acc($seq); my $out = new Bio::SeqIO (-file => ">extracted_seqs.fasta", -format => 'fasta'); $out->write_seq($seqobj); } } exit; An example list of sequences can be this one: YP_003107578.1 YP_003106103.1 YP_003106552.1 YP_003106560.1 YP_003107053.1 YP_003107450.1 YP_003108000.1 YP_003105023.1 YP_003105264.1 Thanks in advance for your help ;) -- José Luis Lavín Trueba, PhD Dpto. de Producción Agraria Grupo de Genética y Microbiología Universidad Pública de Navarra 31006 Pamplona Navarra SPAIN _______________________________________________ Bioperl-l mailing list Bioperl-l@... http://lists.open-bio.org/mailman/listinfo/bioperl-l |
|
|
Re: Trouble retrieving multiple sequences from NCBI in a single list query>
> The code I´ve written seems to read mylist and retrive the sequences, but > it kinda overwrites them so that I only get the last sequence on the list. > With this line my $out = new Bio::SeqIO (-file => ">extracted_seqs.fasta", -format => 'fasta'); you are opening the filehandle for the output file inside your loop, so each time it is writing over the previous file with an empty file. Then, you write a single sequence to that file with this line $out->write_seq($seqobj); So when you are done, you just have the last sequence in the output file. If you move the opening of the output filehandle outside the loop (it needs to be done only once), then it should work as you expect. Also, I notice the newline characters are not being removed from your sequence IDs (actually I'm a little surprised that the sequences are being retrieved). Just to be safe, you may want to add the line chomp @lista; after my @lista = <INFILE>; Dave _______________________________________________ Bioperl-l mailing list Bioperl-l@... http://lists.open-bio.org/mailman/listinfo/bioperl-l |
|
|
Re: Trouble retrieving multiple sequences from NCBI in a single list queryHi
try my $out = new Bio::SeqIO (-file => ">>extracted_seqs.fasta", ^ this way you no longer overwrite your existing file, but append the next sequence. Regards, Hans On 11/4/09 9:43 AM, "jluis.lavin@..." <jluis.lavin@...> wrote: > > Hello all, > > I´m a newbie who is having terrible troubles trying to retrieve a list > multiple sequences from the NCBI and write them to a single file in Fasta > format. > The code I´ve written seems to read mylist and retrive the sequences, but > it kinda overwrites them so that I only get the last sequence on the list. > I´ve been told to ask the people on this mailing list for help, since you > may have come across this problem also or at last will know how to solve > it... > > Here is my code, which basically consist on an STDIN for the list to be > read into an array and a loop to read each sequence (stopping when the > list ends) and retrieve a sequence each time the loop is launched, > writting that sequence to a fasta file. I only get a sequence back > although it seems to perform the retrieving process with each of the > sequences of the list... > > > #!/usr/bin/perl -w > use strict; > use Bio::DB::GenPept; > use Bio::DB::GenBank; > use Bio::SeqIO; > print "Enter your list name:"; > my $archivo=<STDIN>; > chomp $archivo; > die ("Can´t open input\n") unless (open(INFILE, $archivo)); > my @lista = <INFILE>; > foreach my $seq (@lista) { > if ($seq eq '') { > die ("empty list") > } > else { > my $db = new Bio::DB::GenPept("-format" => "Fasta"); > my $seqobj = $db->get_Seq_by_acc($seq); > my $out = new Bio::SeqIO (-file => ">extracted_seqs.fasta", > -format => 'fasta'); > $out->write_seq($seqobj); > } > } > exit; > > > An example list of sequences can be this one: > > YP_003107578.1 > YP_003106103.1 > YP_003106552.1 > YP_003106560.1 > YP_003107053.1 > YP_003107450.1 > YP_003108000.1 > YP_003105023.1 > YP_003105264.1 > > Thanks in advance for your help ;) _______________________________________________ Bioperl-l mailing list Bioperl-l@... http://lists.open-bio.org/mailman/listinfo/bioperl-l |
|
|
Re: Trouble retrieving multiple sequences from NCBI in a single list queryThank you very very much Dave,
I´ve had a really frustrating time trying to find out what I was doing wrong, it has been so frustrating that I was about to quit Bioperl. Now I can try to focus on BLAST parsing for my comparative genomic analysis You´re great in this mailing list, because you give a fast and neat advice to all the questions asked here by newbies like me ;) El Mie, 4 de Noviembre de 2009, 10:52, Dave Messina escribió: >> >> The code I´ve written seems to read mylist and retrive the sequences, >> but >> it kinda overwrites them so that I only get the last sequence on the >> list. >> > > With this line > > my $out = new Bio::SeqIO (-file => ">extracted_seqs.fasta", -format => > 'fasta'); > > > you are opening the filehandle for the output file inside your loop, so > each > time it is writing over the previous file with an empty file. Then, you > write a single sequence to that file with this line > > $out->write_seq($seqobj); > > > So when you are done, you just have the last sequence in the output file. > > If you move the opening of the output filehandle outside the loop (it > needs > to be done only once), then it should work as you expect. > > Also, I notice the newline characters are not being removed from your > sequence IDs (actually I'm a little surprised that the sequences are > being > retrieved). Just to be safe, you may want to add the line > > chomp @lista; > > > after > > my @lista = <INFILE>; > > > > > Dave > -- Dr. José Luis Lavín Trueba Dpto. de Producción Agraria Grupo de Genética y Microbiología Universidad Pública de Navarra 31006 Pamplona Navarra SPAIN _______________________________________________ Bioperl-l mailing list Bioperl-l@... http://lists.open-bio.org/mailman/listinfo/bioperl-l |
|
|
Re: Trouble retrieving multiple sequences from NCBI in asingle list queryThank you very much for your answer Hans!!!
It works perfectly,also a neat and fast solution, like Dave´s. Blessings to you all ;) El Mie, 4 de Noviembre de 2009, 11:05, Hotz, Hans-Rudolf escribió: > Hi > > try > > my $out = new Bio::SeqIO (-file => ">>extracted_seqs.fasta", > ^ > > this way you no longer overwrite your existing file, but append the next > sequence. > > Regards, Hans > > > > On 11/4/09 9:43 AM, "jluis.lavin@..." <jluis.lavin@...> > wrote: > >> >> Hello all, >> >> I´m a newbie who is having terrible troubles trying to retrieve a list >> multiple sequences from the NCBI and write them to a single file in >> Fasta >> format. >> The code I´ve written seems to read mylist and retrive the sequences, >> but >> it kinda overwrites them so that I only get the last sequence on the >> list. >> I´ve been told to ask the people on this mailing list for help, since >> you >> may have come across this problem also or at last will know how to solve >> it... >> >> Here is my code, which basically consist on an STDIN for the list to be >> read into an array and a loop to read each sequence (stopping when the >> list ends) and retrieve a sequence each time the loop is launched, >> writting that sequence to a fasta file. I only get a sequence back >> although it seems to perform the retrieving process with each of the >> sequences of the list... >> >> >> #!/usr/bin/perl -w >> use strict; >> use Bio::DB::GenPept; >> use Bio::DB::GenBank; >> use Bio::SeqIO; >> print "Enter your list name:"; >> my $archivo=<STDIN>; >> chomp $archivo; >> die ("Can´t open input\n") unless (open(INFILE, $archivo)); >> my @lista = <INFILE>; >> foreach my $seq (@lista) { >> if ($seq eq '') { >> die ("empty list") >> } >> else { >> my $db = new Bio::DB::GenPept("-format" => "Fasta"); >> my $seqobj = $db->get_Seq_by_acc($seq); >> my $out = new Bio::SeqIO (-file => ">extracted_seqs.fasta", >> -format => 'fasta'); >> $out->write_seq($seqobj); >> } >> } >> exit; >> >> >> An example list of sequences can be this one: >> >> YP_003107578.1 >> YP_003106103.1 >> YP_003106552.1 >> YP_003106560.1 >> YP_003107053.1 >> YP_003107450.1 >> YP_003108000.1 >> YP_003105023.1 >> YP_003105264.1 >> >> Thanks in advance for your help ;) > > -- Dr. José Luis Lavín Trueba Dpto. de Producción Agraria Grupo de Genética y Microbiología Universidad Pública de Navarra 31006 Pamplona Navarra SPAIN _______________________________________________ Bioperl-l mailing list Bioperl-l@... http://lists.open-bio.org/mailman/listinfo/bioperl-l |
|
|
Re: Trouble retrieving multiple sequences from NCBI in a single list queryAw shucks, José, glad I could be of help. There are plenty of people who
answer questions around here, but my timezone sometimes gives me an advantage for the European ones. :) Dave _______________________________________________ Bioperl-l mailing list Bioperl-l@... http://lists.open-bio.org/mailman/listinfo/bioperl-l |
|
|
Re: Trouble retrieving multiple sequences from NCBI ina single list queryTrue, Dave, you compete only with crazed east coast core developers who're doing
"just one more thing" at 2am.... ----- Original Message ----- From: "Dave Messina" <David.Messina@...> To: <jluis.lavin@...> Cc: <bioperl-l@...> Sent: Wednesday, November 04, 2009 9:11 AM Subject: Re: [Bioperl-l] Trouble retrieving multiple sequences from NCBI ina single list query > Aw shucks, José, glad I could be of help. There are plenty of people who > answer questions around here, but my timezone sometimes gives me an > advantage for the European ones. :) > > > Dave > > _______________________________________________ > Bioperl-l mailing list > Bioperl-l@... > http://lists.open-bio.org/mailman/listinfo/bioperl-l _______________________________________________ Bioperl-l mailing list Bioperl-l@... http://lists.open-bio.org/mailman/listinfo/bioperl-l |
| Free embeddable forum powered by Nabble | Forum Help |