protein graft, pdb cleanup

View: New views
7 Messages — Rating Filter:   Alert me  

protein graft, pdb cleanup

by wkadlo :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


i have grafted a cap of a protein onto another uncapped protein, and
now the pdb file is not consistent.

i am about to write a perl script to fix up the numbering and remove
the redundant terminus.

are there already tools to do such a thing?

_______________________________________________
Molmodel mailing list
Molmodel@...
http://www.bio.net/biomail/listinfo/molmodel

Re: protein graft, pdb cleanup

by Raoul Fleckman :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 2006-11-24, wkadlo <gkowadlo@...> wrote:
>
> i have grafted a cap of a protein onto another uncapped protein, and
> now the pdb file is not consistent.
>
> i am about to write a perl script to fix up the numbering and remove
> the redundant terminus.
>
> are there already tools to do such a thing?

there might be such a script available, but that would depend greatly
what you mean by "grafted".  it's not clear if you did that grafting
at the petrie-dish or the in-silico level. and if it is small change,
(oh, let's say less than 25 residues), it might be easier, and even
less error-prone to just perform that edit once in an editor.

if you didn't perform this graft at the petrie-dish level (followed by
a new structural data set), how do you know the new xyz coordinates?
_______________________________________________
Molmodel mailing list
Molmodel@...
http://www.bio.net/biomail/listinfo/molmodel

Re: protein graft, pdb cleanup

by wkadlo :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

I did the grafting with a hypothetical protein in the program Discovery
(Accelrys).
so i have all the 3D coordinates, but the numbering in the file is
wrong (each fragment has retained the numbering from the original pdb's
that they came from).



Raoul Fleckman wrote:

> On 2006-11-24, wkadlo <gkowadlo@...> wrote:
> >
> > i have grafted a cap of a protein onto another uncapped protein, and
> > now the pdb file is not consistent.
> >
> > i am about to write a perl script to fix up the numbering and remove
> > the redundant terminus.
> >
> > are there already tools to do such a thing?
>
> there might be such a script available, but that would depend greatly
> what you mean by "grafted".  it's not clear if you did that grafting
> at the petrie-dish or the in-silico level. and if it is small change,
> (oh, let's say less than 25 residues), it might be easier, and even
> less error-prone to just perform that edit once in an editor.
>
> if you didn't perform this graft at the petrie-dish level (followed by
> a new structural data set), how do you know the new xyz coordinates?

_______________________________________________
Molmodel mailing list
Molmodel@...
http://www.bio.net/biomail/listinfo/molmodel

Re: protein graft, pdb cleanup

by Raoul Fleckman :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 2006-11-27, wkadlo <gkowadlo@...> wrote:
> I did the grafting with a hypothetical protein in the program Discovery
> (Accelrys).
> so i have all the 3D coordinates, but the numbering in the file is
> wrong (each fragment has retained the numbering from the original pdb's
> that they came from).

you did "grafting" with "Discovery" which somehow placed the modeled in
bit into a reasonable 3D context (we hope...) and yet the software
having done this failed to simply renumber the atom-sequence-numbers
(and the residue sequence numbers for you)?  hmm...

ok..whatever. please keep in mind that the PDB format is old. it has its
roots in punch cards, (please don't ask what that is, if you're too
young to know).  therefore, its fields are column number sensitive; and
it is even possible to have numeric fields get large enough so that no
white space is left between fields.  therefore, the attached perl script
is designed in an odd (FORTRANish) way to deal with that.  if you just
run it on a PDB input without modification, it should just output its
input.  that is, you need to figure out what the new atom and residue
sequence numbers ($asn and $rsn, respectively in the script) should be
to continue on after the unchanging sequence and edit the script
accordingly - as per the comments in the script.

here's a second warning: i have observed software that connects residue
N to residue N+1 no matter how many Angstroms might separate them; and
i've seen software that refuses to do so.  so if your "i have all the 3D
coordinates" up there isn't quite right, get ready to view some
pretzels.

here's the script appended. it's probably only worth slightly more than
you paid for it; but maybe it'll save you some time.  good luck:

#-----------
#!/usr/bin/perl
#         1         2         3         4         5         6         7
#1234567890123456789012345678901234567890123456789012345678901234567890123456
#ATOM      1  CB  ASP A   8       -3.48   27.95   19.71  1.00  36.1      A    
#ATOM   1439  OW  WAT A1259      38.232  42.977   2.328  5.81  7.20   8    10
#ATOM      1  CB  ASP     8      -3.220  27.537  20.293  0.00 49.28   6
# put $cid in $segid if $segid is blank or $segid in $cid if $cid is blank
# change nothing if already there  
# assumes segid and cid are BOTH one character
# add one to each residue number
# 1- 6  6 $id    card id
# 7-11  5 $asn   atom sequence number **
#12     1 void
#13-16  4 $anam  atom name
#17     1 $ali   alternate location indicator
#18-20  3 $rnam  residue name
#21     1 void
#22     1 $cid   chain identifier
#23-26  4 $rsn   residue sequence number **
#27     1 $ci    code for insertions (e.g. 66A, 66B)
#28-30  3 void   !! is this right ?? there's something up there!!
#31-38  8 $x     x coordinate
#39-46  8 $y     y coordinate
#47-54  8 $z     z coordinate
#55-60  6 $occ   occupany
#61-66  6 $bval  temperature factor
#67     1 void
#68-70  3 $fn    footnote
#71-72  2 void
#73     1 $segid
my $intemp = 'A6 A5 x A4 A A3 x A A4 A x3 A8 A8 A8 A6 A6 x A3 x2 A' ;
#since x generates a zero byte pack must use a different output pattern
my $outemp = 'A6 A5 A A4 A A3 A A A4 A A3 A8 A8 A8 A6 A6 A A3 A2 A A3';
my $K = ' ';
my $K2 = ' ' x 2;
my $K3 = ' ' x 3;
my $sol = $K3;
#a smoother way around this *might* be to make all fields gobble the x before
#it, that is, instead of 'A5 x A4' have 'A5 A5'

while(<>) {
  unless (/^ATOM/) {
    print;
    next;
  }
  chomp;
  my $len = length $_;
  if($len < 76) { $_ .= (' ' x (76 - $len)); }

  ($id,$asn,$anam,$ali,$rnam,$cid,$rsn,$ci,$x,$y,$z,$occ,
   $bval,$fn,$segid) = unpack($intemp,$_);

  # ----  here alter stuff with the fields (most likely: $asn and $rsn)
  #
  # for example, if the unchanging part of the protein ends with its
  # final residue number ($rsn) equal to 197, then you want to replace
  # the $rsn below with "$rsnx++" and initialize it before the above
  # "while(<>) {" with "my $rsnx = 198"
  # ... and quite similarly for the atom sequence number ($asn)
  #
  # then you might be able to do something like:
  #
  # this_script.pl new_attachment.pdb >> copy_of_constant_chain.pdb
  #
  # ----

  my $card = pack($outemp,$id,$asn,$K,$anam,$ali,$rnam,$K,$cid,
    $rsn,$ci,$K3,$x,$y,$z,$occ,$bval,$K,$K3,$K2,$segid,$sol);
  print $card, "\n" ;  
}



_______________________________________________
Molmodel mailing list
Molmodel@...
http://www.bio.net/biomail/listinfo/molmodel

Re: protein graft, pdb cleanup

by Raoul Fleckman :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On 2006-11-27, Raoul Fleckman <raoul.fleckman@...> wrote:

> On 2006-11-27, wkadlo <gkowadlo@...> wrote:
>
>   # ----  here alter stuff with the fields (most likely: $asn and $rsn)
>   #
>   # for example, if the unchanging part of the protein ends with its
>   # final residue number ($rsn) equal to 197, then you want to replace
>   # the $rsn below with "$rsnx++" and initialize it before the above
>   # "while(<>) {" with "my $rsnx = 198"
>   # ... and quite similarly for the atom sequence number ($asn)
>...

bad form replying to myself... but the residue number needs to be set as
an increment not on each line as i indicated here, but rather as an
offset of whatever the input is: "$rsn += xxx"; where "xxx" will change
to old residue sequence number to one beyond the last one seen in the
constant part of the protein listing.  ...sorry about that.  again, good
luck.


_______________________________________________
Molmodel mailing list
Molmodel@...
http://www.bio.net/biomail/listinfo/molmodel

Re: protein graft, pdb cleanup

by wkadlo :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

hi, thanks a lot for taking the time to reply with that very helpful
advice and script.
i'll give it a go! :-)


Raoul Fleckman wrote:

> On 2006-11-27, Raoul Fleckman <raoul.fleckman@...> wrote:
> > On 2006-11-27, wkadlo <gkowadlo@...> wrote:
> >
> >   # ----  here alter stuff with the fields (most likely: $asn and $rsn)
> >   #
> >   # for example, if the unchanging part of the protein ends with its
> >   # final residue number ($rsn) equal to 197, then you want to replace
> >   # the $rsn below with "$rsnx++" and initialize it before the above
> >   # "while(<>) {" with "my $rsnx = 198"
> >   # ... and quite similarly for the atom sequence number ($asn)
> >...
>
> bad form replying to myself... but the residue number needs to be set as
> an increment not on each line as i indicated here, but rather as an
> offset of whatever the input is: "$rsn += xxx"; where "xxx" will change
> to old residue sequence number to one beyond the last one seen in the
> constant part of the protein listing.  ...sorry about that.  again, good
> luck.

_______________________________________________
Molmodel mailing list
Molmodel@...
http://www.bio.net/biomail/listinfo/molmodel

Re: protein graft, pdb cleanup

by wkadlo :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

thanks again, that was really helpful, and improved my perl as well.


wkadlo wrote:

> hi, thanks a lot for taking the time to reply with that very helpful
> advice and script.
> i'll give it a go! :-)
>
>
> Raoul Fleckman wrote:
>
> > On 2006-11-27, Raoul Fleckman <raoul.fleckman@...> wrote:
> > > On 2006-11-27, wkadlo <gkowadlo@...> wrote:
> > >
> > >   # ----  here alter stuff with the fields (most likely: $asn and $rsn)
> > >   #
> > >   # for example, if the unchanging part of the protein ends with its
> > >   # final residue number ($rsn) equal to 197, then you want to replace
> > >   # the $rsn below with "$rsnx++" and initialize it before the above
> > >   # "while(<>) {" with "my $rsnx = 198"
> > >   # ... and quite similarly for the atom sequence number ($asn)
> > >...
> >
> > bad form replying to myself... but the residue number needs to be set as
> > an increment not on each line as i indicated here, but rather as an
> > offset of whatever the input is: "$rsn += xxx"; where "xxx" will change
> > to old residue sequence number to one beyond the last one seen in the
> > constant part of the protein listing.  ...sorry about that.  again, good
> > luck.

_______________________________________________
Molmodel mailing list
Molmodel@...
http://www.bio.net/biomail/listinfo/molmodel