Seached the web and found the answer now, quote the answer as following:
The error was thrown by my Bio::ASN1::EntrezGene module because it
expects a text file, while you fed it with a binary file. To use
gzipped ASN binary file from NCBI, download the NCBI gene2xml
(ftp://ftp.ncbi.nih.gov/toolbox/ncbi_tools/converters/by_program/gene2xml),
then use this syntax to run my parser on the binary files:
my $parser = Bio::ASN1::EntrezGene->new('file' => "gene2xml -i
Homo_sapiens.ags.gz -c -x -b | "); # Homo_sapiens.ags.gz is the gzipped
binary file directly downloaded from NCBI
Same syntax should be used when you're using SeqIO (thus SeqIO::entrezgene).
Mingyi
But there is still one thing, I want to parse "gene_info.gz" in Gene of NCBI. ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/gene_info.gz.
It doesn't work.Is that means "gene_info.gz"( tab-delimited,one line per GeneID, Column header line is the first line in the file) is not the right format for Bio::ASN1::EntrezGene?
zoujing wrote:
I am a geen hand in Bioperl. When I run perl with "parse_entrez_gene_example.pl Sus_scrofa.ags", it turned out the error information:
Data Error: none conforming data found on line 1 in Sus_scrofa.ags.
But the Sus_scrofa.ags is download from NCBI, with the format of ASN1, should be the same as Homo_sapiens in the example. So it should be no error as the code is the example from Mingyi.
I wonder why this happen, and should I change something about the file?