You want to try Bio::SearchIO, I think. It's not quite clear what you
for a nice introduction to the Bio::SearchIO system by its authors. They
You didn't waste your time writing regexps, by the way. For a Perl
> Hi,
> I need a little help, to parse a file, but I tried to search some
> modules of bioperl, but there are a lot, and I don't know how to
> start, I find moduls for all db, for different web site, but not for
> my favorite PDBsum....so I parsed a lot of thing on my own, even if I
> was new in learning perl....but now I'm waiting for help...because I
> need to parse a FASTA file, resulted from aligned sequences...I need
> to extract the aligned sequences, only for the pdb in my lista....
>
>
> my fasta file is like:
>
> Query: /ebi/research/thornton/tmp/sas307986/seq.fasta
> 1>>>Sequence 3e7e:A - 333 aa
> Library: /ebi/research/thornton/www/databases/html/pdbsum/data/pdblib
> 17840403 residues in 79353 sequences
>
> opt E()
> < 20 286 0:===
> 22 1 0:= one = represents 135 library sequences
> 24 1 0:=
> 26 0 2:*
> 28 21 18:*
> 30 36 109:*
> 32 237 421:== *
> 34 956 1140:========*
> 36 1924 2342:=============== *
> 38 3591 3871:=========================== *
> 40 4904 5400:===================================== *
> 42 6750 6600:================================================*=
> 44 7145 7281:=====================================================*
> 46 8047 7416:======================================================*=====
> .........
>
>>>2np8:A (159 aa)
> initn: 125 init1: 72 opt: 136 Z-score: 168.6 bits: 38.5 E(): 0.011
> Smith-Waterman score: 136; 26.0% identity (57.1% similar) in 154 aa
> overlap (59-204:13-153)
>
> 10 20 30 40 50 60
> Sequen NFIVGNPWDDKLIFKLLSGLSKPVSSYPNTFEWQCKLPAIKPKTEFQLGSKLVYVHHLLG
> ::
> 2np8:A QWALEDFEIGRPLG
> 10
>
> 70 80 90 100 110
> Sequen EGAFAQVYEATQNKQKFVL--KVQKPANPWEFYIGTQLMER--LKPSMQH-MFMKFYSAH
> .: :..:: : ....::.: :: :. . . :: .. .. ..: ....:.
> 2np8:A KGKFGNVYLAREKQSKFILALKVLFKAQLEKAGVEHQLRREVEIQSHLRHPNILRLYG--
> 20 30 40 50 60 70
>
> 120 130 140 150 160 170
> Sequen LFQNGS--VLVGELYSYGTLLNAINLYKNTPEKVMPQGLVISFAMRMLYMIEQVHDCEII
> :.... :. : ::. .. .. :. . .. .. . :. ..:
> 2np8:A YFHDATRVYLILEYAPLGTVYRELQKLSKFDEQR-----TATYITELANALSYCHSKRVI
> 80 90 100 110 120
>
> 180 190 200 210 220 230
> Sequen HGDIKPDNFILGNGFLEQSAG-LALIDLGQSIDMKLFPKGTIFTAKCETSGFQCVEMLSN
> : ::::.:..:: ::: : . :.: :.
> 2np8:A HRDIKPENLLLG------SAGELKIADFGWSVHAPSSR
> 130 140 150
>
> 240 250 260 270 280 290
> Sequen KPWNYQIDYFGVAATVYCMLFGTYMKVKNEGGECKPEGLFRRLPHLDMWNEFFHVMLNIP
>
> 300 310 320 330
> Sequen DCHHLPSLDLLRQKLKKVFQQHYTNKIRALRNRLIVLLLEC
>
>>>2ojg:A (337 aa)
> initn: 85 init1: 53 opt: 140 Z-score: 168.1 bits: 39.5 E(): 0.012
> Smith-Waterman score: 140; 20.3% identity (56.2% similar) in 217 aa
> overlap (46-252:1-204)
>
> 10 20 30 40 50 60
> Sequen NFIVGNPWDDKLIFKLLSGLSKPVSSYPNTFEWQCKLPAIKPKTEFQLGSKLVYVHHLLG
> :..: . . . .. :
> 2ojg:A FDVGPRYTNLSYI-G
> 10
>
> 70 80 90 100 110
> Sequen EGAFAQVYEATQNKQKFVLKVQKPANPWEFYIGTQ-LMERLKPSMQHMFMKFYSAHLFQN
> :::...: : .: .: . ..: .:.: : ....: ....: ...
> 2ojg:A EGAYGMVCSAYDNVNKVRVAIKK-ISPFEHQTYCQRTLREIK-----ILLRFRHENIIGI
> 20 30 40 50 60
>
> 120 130 140 150 160 170
> Sequen GSVL-------VGELYSYGTLLNAINLYKNTPEKVMPQGLVISFAMRMLYMIEQVHDCEI
> .... . ..: :... .::: . . . . : ...: .. .:. ..
> 2ojg:A NDIIRAPTIEQMKDVYIVQDLMET-DLYKLLKTQHLSNDHICYFLYQILRGLKYIHSANV
> 70 80 90 100 110 120
>
> 180 190 200 210 220 230
> Sequen IHGDIKPDNFILGNGFLEQSAGLALIDLGQS-IDMKLFPKGTIFTAKCETSGFQCVE-ML
> .: :.::.:..:.. . : . :.: . . . ..: : .. : ::
> 2ojg:A LHRDLKPSNLLLNT-----TCDLKICDFGLARVADPDHDHTGFLTEYVATRWYRAPEIML
> 130 140 150 160 170 180
>
> 240 250 260 270 280 290
> Sequen SNKPWNYQIDYFGVAATVYCMLFGTYMKVKNEGGECKPEGLFRRLPHLDMWNEFFHVMLN
> ..: .. .:: ..:. . ::
> 2ojg:A NSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHILGILGSPSQEDLNCIINLKA
> 190 200 210 220 230 240
>
> 300 310 320 330
> Sequen IPDCHHLPSLDLLRQKLKKVFQQHYTNKIRALRNRLIVLLLEC
>
> 2ojg:A RNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHKRIEVEQALAHPYLEQYYDPS
> 250 260 270 280 290 300
>
> 2ojg:A DEPIAEAPFKFELDDLPKEKLKELIFEETARFQPG
> 310 320 330
>
>>>2oji:A (344 aa)
> initn: 85 init1: 53 opt: 140 Z-score: 168.0 bits: 39.5 E(): 0.012
> Smith-Waterman score: 140; 20.3% identity (56.2% similar) in 217 aa
> overlap (46-252:5-208)
>
> 10 20 30 40 50 60
> Sequen NFIVGNPWDDKLIFKLLSGLSKPVSSYPNTFEWQCKLPAIKPKTEFQLGSKLVYVHHLLG
> :..: . . . .. :
> 2oji:A RGQVFDVGPRYTNLSYI-G
> 10
>
> 70 80 90 100 110
> Sequen EGAFAQVYEATQNKQKFVLKVQKPANPWEFYIGTQ-LMERLKPSMQHMFMKFYSAHLFQN
> :::...: : .: .: . ..: .:.: : ....: ....: ...
> 2oji:A EGAYGMVCSAYDNVNKVRVAIKK-ISPFEHQTYCQRTLREIK-----ILLRFRHENIIGI
> 20 30 40 50 60 70
>
> 120 130 140 150 160 170
> Sequen GSVL-------VGELYSYGTLLNAINLYKNTPEKVMPQGLVISFAMRMLYMIEQVHDCEI
> .... . ..: :... .::: . . . . : ...: .. .:. ..
> 2oji:A NDIIRAPTIEQMKDVYIVQDLMET-DLYKLLKTQHLSNDHICYFLYQILRGLKYIHSANV
> 80 90 100 110 120 130
>
> 180 190 200 210 220 230
> Sequen IHGDIKPDNFILGNGFLEQSAGLALIDLGQS-IDMKLFPKGTIFTAKCETSGFQCVE-ML
> .: :.::.:..:.. . : . :.: . . . ..: : .. : ::
> 2oji:A LHRDLKPSNLLLNT-----TCDLKICDFGLARVADPDHDHTGFLTEYVATRWYRAPEIML
> 140 150 160 170 180
>
> 240 250 260 270 280 290
> Sequen SNKPWNYQIDYFGVAATVYCMLFGTYMKVKNEGGECKPEGLFRRLPHLDMWNEFFHVMLN
> ..: .. .:: ..:. . ::
> 2oji:A NSKGYTKSIDIWSVGCILAEMLSNRPIFPGKHYLDQLNHILGILGSPSQEDLNCIINLKA
> 190 200 210 220 230 240
>
> 300 310 320 330
> Sequen IPDCHHLPSLDLLRQKLKKVFQQHYTNKIRALRNRLIVLLLEC
>
> 2oji:A RNYLLSLPHKNKVPWNRLFPNADSKALDLLDKMLTFNPHKRIEVEQALAHPYLEQYYDPS
> 250 260 270 280 290 300
>
> 2oji:A DEPIAEAPFKFDMELDDLPKEKLKELIFEETARFQPGY
> 310 320 330 340
>
> .......
> I show a part of the file...if I want for example only that two
> alignment? are there moduls to parse...because I've tried to parse
> whit regex but....without results :-(....
> If anyone has suggestion for muduls or anything else, I'll be very
> happy to learn
> thanks
> Paola
> _______________________________________________
> Bioperl-l mailing list
>
Bioperl-l@...
>
http://lists.open-bio.org/mailman/listinfo/bioperl-l>
>