Aromatization & SDF bond type

View: New views
8 Messages — Rating Filter:   Alert me  

Aromatization & SDF bond type

by Vincent Le Guilloux :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hi,

I would like to talk a bit about aromaticity and SDF interpretation in
the CDK.

When I load a molecule, say a benzene, respecting the SDF Format
regarding bond types (1 and 2 order), the CDK load this molecule as
is, and set the aromatic flag to 1 when the aromaticity detection is
performed.

When I load the same molecule but with all bond types defined at 4
(aromatic), the CDK will directly set the aromatic flag to 1, but will
define all bonds as SINGLE bonds.

Now if I pass a benzene molecule with bond types defined at 4 through
the CDK, and if I regenerate the SDF file without hydrogen, I will
get... cyclohexane. If I add explicit hydrogen, I will get something
that could be interpreted as benzene missing all double bonds, or
cyclohexane missing one hydrogen for each carbon atom. Bellow is given
a small snippet example.

Maybe I'm doing something wrong, in which case I'm talking for
nothing. If not, as such bond definition is not that uncommon (at
least to my modest knowledge), I think this is an important issue.

One possible solution would be to "dearomatize" the molecules, but the
current CDK implementation is limited to benzene, pyridine & pyrrole.

Just one question beside this discussion: why not putting the aromatic
bond flag in the output in such cases?

I also would have one suggestion on how to improve the CDK: defining a
clear (customizable), documented standardization protocol which would
include issues like aromatization/dearomatization, ionization,
tautomerization, 2D/3D cleaning... The best example to my knowledge is
the chemaxon's standardizer:
http://chemaxon.com/jchem/doc/user/Standardizer.html

Well, one suggestion among many others I guess :)

Cheers!
Vincent



------------------------------------------------------------------------------
Come build with us! The BlackBerry(R) Developer Conference in SF, CA
is the only developer event you need to attend this year. Jumpstart your
developing skills, take BlackBerry mobile applications to market and stay
ahead of the curve. Join us from November 9 - 12, 2009. Register now!
http://p.sf.net/sfu/devconference
_______________________________________________
Cdk-user mailing list
Cdk-user@...
https://lists.sourceforge.net/lists/listinfo/cdk-user

Re: Aromatization & SDF bond type

by Rajarshi Guha-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On Nov 3, 2009, at 11:06 AM, Vincent Le Guilloux wrote:

> When I load the same molecule but with all bond types defined at 4
> (aromatic), the CDK will directly set the aromatic flag to 1, but will
> define all bonds as SINGLE bonds.
>
> Now if I pass a benzene molecule with bond types defined at 4 through
> the CDK, and if I regenerate the SDF file without hydrogen, I will
> get... cyclohexane. If I add explicit hydrogen, I will get something
> that could be interpreted as benzene missing all double bonds, or
> cyclohexane missing one hydrogen for each carbon atom. Bellow is given
> a small snippet example.
>
> Maybe I'm doing something wrong, in which case I'm talking for
> nothing. If not, as such bond definition is not that uncommon (at
> least to my modest knowledge), I think this is an important issue.

Indeed, this is an important issue and a number of bugs in different  
subsystems occur because of this. Even, two molecules input from the  
same SMILES but one is aromatic and one is kekule, will differ - even  
after aromaticity detection on both of them (because the single/double  
bond assignments are not fixed)

The problem is that if a method (needs to) looks at bond order then it  
will be confused due to this. I suppose they should also be updated to  
check aromaticity, if it's relevant.

> Just one question beside this discussion: why not putting the aromatic
> bond flag in the output in such cases?

After doing aromaticity detection, aromatic bonds are marked as such.  
I'm not sure I understand what you mean hear.

>
> I also would have one suggestion on how to improve the CDK: defining a
> clear (customizable), documented standardization protocol which would
> include issues like aromatization/dearomatization, ionization,
> tautomerization, 2D/3D cleaning... The best example to my knowledge is
> the chemaxon's standardizer:
> http://chemaxon.com/jchem/doc/user/Standardizer.html


Indeed. This is useful and necessary. There is some code here at the  
NCGC that can be used for this. It is based on ChemAxon code, but  
should be convertible to the CDK (though it'll require some additional  
CDK methods to be implemented)

----------------------------------------------------
Rajarshi Guha        | NIH Chemical Genomics Center
http://www.rguha.net | http://ncgc.nih.gov
----------------------------------------------------
Every nonzero finite dimensional inner product
space has an orthonormal basis.
It makes sense, when you don't think about it.



------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Cdk-user mailing list
Cdk-user@...
https://lists.sourceforge.net/lists/listinfo/cdk-user

Re: Aromatization & SDF bond type

by Vincent Le Guilloux :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Quoting Rajarshi Guha <rajarshi.guha@...>:

>
> On Nov 3, 2009, at 11:06 AM, Vincent Le Guilloux wrote:
>
>> When I load the same molecule but with all bond types defined at 4
>> (aromatic), the CDK will directly set the aromatic flag to 1, but will
>> define all bonds as SINGLE bonds.
>>
>> Now if I pass a benzene molecule with bond types defined at 4 through
>> the CDK, and if I regenerate the SDF file without hydrogen, I will
>> get... cyclohexane. If I add explicit hydrogen, I will get something
>> that could be interpreted as benzene missing all double bonds, or
>> cyclohexane missing one hydrogen for each carbon atom. Bellow is given
>> a small snippet example.
>>
>> Maybe I'm doing something wrong, in which case I'm talking for
>> nothing. If not, as such bond definition is not that uncommon (at
>> least to my modest knowledge), I think this is an important issue.
>
> Indeed, this is an important issue and a number of bugs in different
> subsystems occur because of this. Even, two molecules input from the
> same SMILES but one is aromatic and one is kekule, will differ - even
> after aromaticity detection on both of them (because the single/double
> bond assignments are not fixed)
>
> The problem is that if a method (needs to) looks at bond order then it
> will be confused due to this.

Yes but they will also perceive wrong bond order. Setting bond orders  
to 1 when a bond is aromatic is just wrong to my knowledge, and  
dangerous as illustrated.

> I suppose they should also be updated to check aromaticity, if it's relevant.
>

I think so.

In such cases, the bond order should be set to 'aromatic' instead of  
single, even if it's not really a bond order, strictly speaking. As  
you said, methods working on bond types should then check for  
aromaticity.

And this would fix the problem of the output, see bellow.

>> Just one question beside this discussion: why not putting the aromatic
>> bond flag in the output in such cases?
>
> After doing aromaticity detection, aromatic bonds are marked as such.
> I'm not sure I understand what you mean hear.
>

I'm talking about SDF output:

If the user gives an aromatic structure as input, with aromatic bonds  
defined using the aromatic flag (4) instead of single/double bonds, as  
there is currently no satisfying way to "dearomatize" a molecule, the  
SDF output should define the output aromatic bonds with this same  
flag, instead of single order bond, which, as illustrated, transforms  
the input structure to another one at SDF output.

This is, I think, a much better solution than setting all bonds to single.

>>
>> I also would have one suggestion on how to improve the CDK: defining a
>> clear (customizable), documented standardization protocol which would
>> include issues like aromatization/dearomatization, ionization,
>> tautomerization, 2D/3D cleaning... The best example to my knowledge is
>> the chemaxon's standardizer:
>> http://chemaxon.com/jchem/doc/user/Standardizer.html
>
>
> Indeed. This is useful and necessary. There is some code here at the
> NCGC that can be used for this. It is based on ChemAxon code, but
> should be convertible to the CDK (though it'll require some additional
> CDK methods to be implemented)
>

The chemaxon's code isn't opensource. To my knowledge there is no  
equivalent to the chemaxon standardizer as free/opensource tools,  
though it's an essential issue. I really think this would be a  
valuable feature to the CDK.

But, so much to do, so little time... :)



------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Cdk-user mailing list
Cdk-user@...
https://lists.sourceforge.net/lists/listinfo/cdk-user

Re: Aromatization & SDF bond type

by Egon Willighagen-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Tue, Nov 3, 2009 at 5:06 PM, Vincent Le Guilloux
<vincent.le-guilloux@...> wrote:
> When I load the same molecule but with all bond types defined at 4
> (aromatic)

Bond order 4 in MDL molfiles are only meant to be used for SSS
queries, not for storage of chemical graphs...

The CDK must not write those, nor do the current MDL readers actually
read queries... the fact that they read 4 as aromatic flag = 1 is
actually incorrect...

> the CDK will directly set the aromatic flag to 1, but will define all bonds as SINGLE bonds.

The latter is because the IBond.Order structure does not have the
concept of an unknown bond order (yet)... SINGLE is used to indicate
bonding, but is clearly imprecise too...

This is a problem and must be fixed:

* support for unknown bond orders
* warn about query content in MDL molfiles (RELAXED mode)
* fail on query conent in MDL molfiles (STRICT mode)

Egon

--
Post-doc @ Uppsala University
Homepage: http://egonw.github.com/
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Cdk-user mailing list
Cdk-user@...
https://lists.sourceforge.net/lists/listinfo/cdk-user

Re: Aromatization & SDF bond type

by Nina Jeliazkova :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

Hello All,

Vincent Le Guilloux wrote:
Quoting Rajarshi Guha rajarshi.guha@...:
  
On Nov 3, 2009, at 11:06 AM, Vincent Le Guilloux wrote:

    
When I load the same molecule but with all bond types defined at 4
(aromatic), the CDK will directly set the aromatic flag to 1, but will
define all bonds as SINGLE bonds.

Now if I pass a benzene molecule with bond types defined at 4 through
the CDK, and if I regenerate the SDF file without hydrogen, I will
get... cyclohexane. If I add explicit hydrogen, I will get something
that could be interpreted as benzene missing all double bonds, or
cyclohexane missing one hydrogen for each carbon atom. Bellow is given
a small snippet example.

Maybe I'm doing something wrong, in which case I'm talking for
nothing. If not, as such bond definition is not that uncommon (at
least to my modest knowledge), I think this is an important issue.
      
Indeed, this is an important issue and a number of bugs in different
subsystems occur because of this. Even, two molecules input from the
same SMILES but one is aromatic and one is kekule, will differ - even
after aromaticity detection on both of them (because the single/double
bond assignments are not fixed)

The problem is that if a method (needs to) looks at bond order then it
will be confused due to this.
    

Yes but they will also perceive wrong bond order. Setting bond orders  
to 1 when a bond is aromatic is just wrong to my knowledge, and  
dangerous as illustrated.

  
I suppose they should also be updated to check aromaticity, if it's relevant.

    

I think so.

In such cases, the bond order should be set to 'aromatic' instead of  
single, even if it's not really a bond order, strictly speaking. As  
you said, methods working on bond types should then check for  
aromaticity.

And this would fix the problem of the output, see bellow.

  
Just one question beside this discussion: why not putting the aromatic
bond flag in the output in such cases?
      
After doing aromaticity detection, aromatic bonds are marked as such.
I'm not sure I understand what you mean hear.

    

I'm talking about SDF output:

If the user gives an aromatic structure as input, with aromatic bonds  
defined using the aromatic flag (4) instead of single/double bonds, as  
there is currently no satisfying way to "dearomatize" a molecule, the  
SDF output should define the output aromatic bonds with this same  
flag, instead of single order bond, which, as illustrated, transforms  
the input structure to another one at SDF output.

This is, I think, a much better solution than setting all bonds to single.

  
This has been discussed many times, especially since it was decided to remove "aromatic" bond type in CDK core and introduce aromatic flag.

(one of my recent finding is HINReader doesn't work properly when reading aromatic bonds).

  
I also would have one suggestion on how to improve the CDK: defining a
clear (customizable), documented standardization protocol which would
include issues like aromatization/dearomatization, ionization,
tautomerization, 2D/3D cleaning... The best example to my knowledge is
the chemaxon's standardizer:
http://chemaxon.com/jchem/doc/user/Standardizer.html
      
Indeed. This is useful and necessary. There is some code here at the
NCGC that can be used for this. It is based on ChemAxon code, but
should be convertible to the CDK (though it'll require some additional
CDK methods to be implemented)

    

The chemaxon's code isn't opensource. To my knowledge there is no  
equivalent to the chemaxon standardizer as free/opensource tools,  
though it's an essential issue. I really think this would be a  
valuable feature to the CDK.

But, so much to do, so little time... :)
  
Having struggled with this issue many times, I am definitely interested in contributing to a "CDK standardizer".
 As a start, the algorithm for dearomatization should be improved. Are there any pointers or published algorithms one can start with?

Best regards,
Nina Jeliazkova


------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Cdk-user mailing list
Cdk-user@...
https://lists.sourceforge.net/lists/listinfo/cdk-user
  


------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Cdk-user mailing list
Cdk-user@...
https://lists.sourceforge.net/lists/listinfo/cdk-user

Re: Aromatization & SDF bond type

by Egon Willighagen-5 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message

On Wed, Nov 4, 2009 at 9:53 AM, Vincent Le Guilloux
<vincent.le-guilloux@...> wrote:
> In such cases, the bond order should be set to 'aromatic' instead of
> single, even if it's not really a bond order, strictly speaking. As
> you said, methods working on bond types should then check for
> aromaticity.

Originally, the CDK has a bond order aromatic... but that gave too
many problems. However, when that was changed to have an 'aromatic'
flag, we 'forgot' to accomodate that some input may not have set bond
orders... your query-flavored MDL molfile is such input (though
arguably false input), but we see the same problem with SMILES, such
as 'c1ccccc1', where no bond order information is given...

We really need to add the IBond.Order.UNKNOWN ASAP, and fix all
available readers accordingly.

>> After doing aromaticity detection, aromatic bonds are marked as such.
>> I'm not sure I understand what you mean hear.
>
> I'm talking about SDF output:

We must also fix the MDLWriter to not use query functionality in its
output, and it should likely throw an exception when unsupported bond
orders are found in the IMolecule passed, including
IBond.Order.QUADRUPLE and the future IBond.Order.UNKNOWN...

> If the user gives an aromatic structure as input, with aromatic bonds
> defined using the aromatic flag (4) instead of single/double bonds, as
> there is currently no satisfying way to "dearomatize" a molecule,

What is your experience with the DeduceBondOrder tool? Not so good, it
seems... ?

> the
> SDF output should define the output aromatic bonds with this same
> flag, instead of single order bond, which, as illustrated, transforms
> the input structure to another one at SDF output.

See above. The MDL Writer is not supporting queries, and we it should
not be writing bond order 4.

> This is, I think, a much better solution than setting all bonds to single.

Yes, the current way it works is severely hampered by limitations in
our data model.

Egon


--
Post-doc @ Uppsala University
Homepage: http://egonw.github.com/
Blog: http://chem-bla-ics.blogspot.com/
PubList: http://www.citeulike.org/user/egonw/tag/papers

------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Cdk-user mailing list
Cdk-user@...
https://lists.sourceforge.net/lists/listinfo/cdk-user

Re: Aromatization & SDF bond type

by Vincent Le Guilloux :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


>
> We must also fix the MDLWriter to not use query functionality in its
> output, and it should likely throw an exception when unsupported bond
> orders are found in the IMolecule passed, including
> IBond.Order.QUADRUPLE and the future IBond.Order.UNKNOWN...
>

I would suggest a relaxed and strict mode, as you pointed previously,  
which would be consistent with the relaxed/strict mode for input. The  
relaxed output mode would allow aromatic flag 4 to be outputted as  
bond type when given as input, and when no bond order has been  
redefined.

The strict mode would throw an exception.

>> If the user gives an aromatic structure as input, with aromatic bonds
>> defined using the aromatic flag (4) instead of single/double bonds, as
>> there is currently no satisfying way to "dearomatize" a molecule,
>
> What is your experience with the DeduceBondOrder tool? Not so good, it
> seems... ?

Well, I've no experience with it: I only known the DearomatizationTool  
for this purpose. I guess I will give it a try :)



------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Cdk-user mailing list
Cdk-user@...
https://lists.sourceforge.net/lists/listinfo/cdk-user

Re: Aromatization & SDF bond type

by Rajarshi Guha-4 :: Rate this Message:

Reply to Author | View Threaded | Show Only this Message


On Nov 4, 2009, at 3:53 AM, Vincent Le Guilloux wrote:

> Quoting Rajarshi Guha <rajarshi.guha@...>:
>>
>> On Nov 3, 2009, at 11:06 AM, Vincent Le Guilloux wrote:
>>
> The chemaxon's code isn't opensource.

Correct - I was referring to NCGC code which will be public domain. It  
doesn't use the ChemAxon standardizer directly, but does use some  
ChemAxon methods which are absent in the CDK

> But, so much to do, so little time... :)


Too true :(

----------------------------------------------------
Rajarshi Guha        | NIH Chemical Genomics Center
http://www.rguha.net | http://ncgc.nih.gov
----------------------------------------------------
Entropy requires no maintenance.
        -- Markoff Chaney



------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day
trial. Simplify your report design, integration and deployment - and focus on
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Cdk-user mailing list
Cdk-user@...
https://lists.sourceforge.net/lists/listinfo/cdk-user