|
View:
New views
10 Messages
—
Rating Filter:
Alert me
|
|
|
SMARTS questionHello, for my Orchem project I'm looking into CDK SMARTS matching inside the database. Things work pretty well, I just now encountered a particular (non aromatic) compound that does not seem to match against itself. It seems the hydrogen is the reason, example below. The standalone program creates an atom container based on SMILES, and a SMARTS query tool using that same smiles, then looks if the SMILES container matches itself as a SMARTS query. It doesn't work, but does work if you remove the [H]. The Universal Isomorphism Tester looks to be the culprit, getSubgraphMaps does not give a list back. I was wondering if this is expected behaviour. thanks, Mark ___________________________ Output ------ smiles is:[H][C@@]1(CCC(C)=CC1=O)C(C)=C smarts is:[H][C@@]1(CCC(C)=CC1=O)C(C)=C no match smiles is:[C@@]1(CCC(C)=CC1=O)C(C)=C smarts is:[C@@]1(CCC(C)=CC1=O)C(C)=C match Class ----- package uk.ac.ebi.orchem.scratch; import org.openscience.cdk.interfaces.IMolecule; import org.openscience.cdk.nonotify.NoNotificationChemObjectBuilder; import org.openscience.cdk.smiles.SmilesParser; import org.openscience.cdk.smiles.smarts.SMARTSQueryTool; import org.openscience.cdk.tools.manipulator.AtomContainerManipulator; public class UITTest { public static void main(String[] args) throws Exception { test("[H][C@@]1(CCC(C)=CC1=O)C(C)=C"); test("[C@@]1(CCC(C)=CC1=O)C(C)=C"); } public static void test (String smiles ) throws Exception { System.out.println("smiles is:"+smiles); SmilesParser sp = new SmilesParser (NoNotificationChemObjectBuilder.getInstance()); IMolecule mol = sp.parseSmiles(smiles); AtomContainerManipulator.percieveAtomTypesAndConfigureAtoms(mol); SMARTSQueryTool querytool = new SMARTSQueryTool(smiles); System.out.println("smarts is:"+querytool.getSmarts()); if (querytool.matches(mol)) System.out.println("match"); else { System.out.println("no match"); } } } ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ Cdk-user mailing list Cdk-user@... https://lists.sourceforge.net/lists/listinfo/cdk-user |
|
|
Re: SMARTS questionCan you replicate this for other cases with an explicit [H].
On Wed, Nov 4, 2009 at 11:23 AM, Mark Rijnbeek <markr@...> wrote:
-- Rajarshi Guha NIH Chemical Genomics Center ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ Cdk-user mailing list Cdk-user@... https://lists.sourceforge.net/lists/listinfo/cdk-user |
|
|
Re: SMARTS questionOn Nov 4, 2009, at 11:23 AM, Mark Rijnbeek wrote: > > It doesn't work, but does work if you remove the [H]. The Universal > Isomorphism Tester looks to be the culprit, getSubgraphMaps does not > give a list back. I was wondering if this is expected behaviour. I think this behavior is also an issue with the SMARTS query tool as [H] does not match [H] - which does not invoke the UIT ---------------------------------------------------- Rajarshi Guha | NIH Chemical Genomics Center http://www.rguha.net | http://ncgc.nih.gov ---------------------------------------------------- "I'd love to go out with you, but my favorite commercial is on TV." ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ Cdk-user mailing list Cdk-user@... https://lists.sourceforge.net/lists/listinfo/cdk-user |
|
|
|
|
|
Re: SMARTS questionOn Wed, Nov 11, 2009 at 4:07 PM, Mark Rijnbeek <markr@...> wrote:
> There is something related in AnyAtom.java.. > > public boolean matches(IAtom atom) { > if (atom.getSymbol().equals("H")) { > Integer massNumber = atom.getMassNumber(); > return massNumber != null; > } > return true; > } > > So it looks to me that not matching [H] to itself is intentional, > considering what I see in AnyAtom. Perhaps Egon can comment? Not sure I can... those lines were actually added by Rajarshi: Updated code to handle the * SMARTS pattern so that it ignores H's unless they have an isotopic mass specification. This means * no longe... See commit 02516de8553779d33bda1bcbd76027882b81aeca (git show 02516de8553779d33bda1bcbd76027882b81aeca) Egon -- Post-doc @ Uppsala University Homepage: http://egonw.github.com/ Blog: http://chem-bla-ics.blogspot.com/ PubList: http://www.citeulike.org/user/egonw/tag/papers ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ Cdk-user mailing list Cdk-user@... https://lists.sourceforge.net/lists/listinfo/cdk-user |
|
|
Re: SMARTS questionOn Nov 11, 2009, at 10:30 AM, Egon Willighagen wrote: > On Wed, Nov 11, 2009 at 4:07 PM, Mark Rijnbeek <markr@...> > wrote: >> There is something related in AnyAtom.java.. >> >> public boolean matches(IAtom atom) { >> if (atom.getSymbol().equals("H")) { >> Integer massNumber = atom.getMassNumber(); >> return massNumber != null; >> } >> return true; >> } >> >> So it looks to me that not matching [H] to itself is intentional, >> considering what I see in AnyAtom. Perhaps Egon can comment? > > Not sure I can... those lines were actually added by Rajarshi: > > Updated code to handle the * SMARTS pattern so that it ignores H's > unless they have an isotopic mass specification. This means * no > longe... Aah, indeed. I think this was based on some discussion on the OpenBabel or BO lists regarding interpretation of SMARTS matching for H's. I know that h<n> was deprecated in favor of H<n> and this maybe related to that. One general solution might be to have 2 modes, like daylight: normal matching (ignore H's) and explicit H matching ---------------------------------------------------- Rajarshi Guha | NIH Chemical Genomics Center http://www.rguha.net | http://ncgc.nih.gov ---------------------------------------------------- In matrimony, to hesitate is sometimes to be saved. -- Butler ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ Cdk-user mailing list Cdk-user@... https://lists.sourceforge.net/lists/listinfo/cdk-user |
|
|
Re: SMARTS question>>
>> Not sure I can... those lines were actually added by Rajarshi: >> >> Updated code to handle the * SMARTS pattern so that it ignores H's >> unless they have an isotopic mass specification. This means * no >> longe... > > > Aah, indeed. I think this was based on some discussion on the OpenBabel > or BO lists regarding interpretation of SMARTS matching for H's. I know > that h<n> was deprecated in favor of H<n> and this maybe related to that. > > One general solution might be to have 2 modes, like daylight: normal > matching (ignore H's) and explicit H matching > The problem seems to me that the hydrogen in for example [H][C@@]1(CCC(C)=CC1=O)C(C)=C isn't "ignored". Method HydrogenAtom.match() returns false, and that makes the match false. Perhaps I don't get it, but I'd say to ignore would mean to return true. Mark ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ Cdk-user mailing list Cdk-user@... https://lists.sourceforge.net/lists/listinfo/cdk-user |
|
|
Re: SMARTS questionOn Nov 11, 2009, at 11:18 AM, Mark Rijnbeek wrote: >>> >>> Not sure I can... those lines were actually added by Rajarshi: >>> >>> Updated code to handle the * SMARTS pattern so that it ignores H's >>> unless they have an isotopic mass specification. This means * no >>> longe... >> Aah, indeed. I think this was based on some discussion on the >> OpenBabel or BO lists regarding interpretation of SMARTS matching >> for H's. I know that h<n> was deprecated in favor of H<n> and this >> maybe related to that. >> One general solution might be to have 2 modes, like daylight: >> normal matching (ignore H's) and explicit H matching > > The problem seems to me that the hydrogen in for example [H] > [C@@]1(CCC(C)=CC1=O)C(C)=C isn't "ignored". Method > HydrogenAtom.match() returns false, and that makes the match false. > Perhaps I don't get it, but I'd say to ignore would mean to return > true. Aah, indeed. ---------------------------------------------------- Rajarshi Guha | NIH Chemical Genomics Center http://www.rguha.net | http://ncgc.nih.gov ---------------------------------------------------- There is no truth to the allegation that statisticians are mean. They are just your standard normal deviates. ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ Cdk-user mailing list Cdk-user@... https://lists.sourceforge.net/lists/listinfo/cdk-user |
|
|
Re: SMARTS question>
>>>> >>>> Not sure I can... those lines were actually added by Rajarshi: >>>> >>>> Updated code to handle the * SMARTS pattern so that it ignores H's >>>> unless they have an isotopic mass specification. This means * no >>>> longe... >>> Aah, indeed. I think this was based on some discussion on the >>> OpenBabel or BO lists regarding interpretation of SMARTS matching for >>> H's. I know that h<n> was deprecated in favor of H<n> and this maybe >>> related to that. >>> One general solution might be to have 2 modes, like daylight: normal >>> matching (ignore H's) and explicit H matching >> >> The problem seems to me that the hydrogen in for example >> [H][C@@]1(CCC(C)=CC1=O)C(C)=C isn't "ignored". Method >> HydrogenAtom.match() returns false, and that makes the match false. >> Perhaps I don't get it, but I'd say to ignore would mean to return true. > > > Aah, indeed. > What do you think the best patch would be? I'm a bit lost, because I don't know what was discussed on the OpenBabel lists. Would this change tackle it ? if ( atom.getMassNumber() == null || (atom.getMassNumber() != null && atom.getMassNumber() > 1)) { return true; } The match is then true for all these test cases test("[C@@]1(CCC(C)=CC1=O)C(C)=C[H]"); test("[H][C@@]1(CCC(C)=CC1=O)C(C)=C"); test("[H][H]"); test("[2H]"); test("[H]"); thanks Mark ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ Cdk-user mailing list Cdk-user@... https://lists.sourceforge.net/lists/listinfo/cdk-user |
|
|
Re: SMARTS questionOn Nov 12, 2009, at 4:40 AM, Mark Rijnbeek wrote: >>>>> >>>>> Not sure I can... those lines were actually added by Rajarshi: >>>>> >>>>> Updated code to handle the * SMARTS pattern so that it ignores >>>>> H's >>>>> unless they have an isotopic mass specification. This means * no >>>>> longe... >>>> Aah, indeed. I think this was based on some discussion on the >>>> OpenBabel or BO lists regarding interpretation of SMARTS matching >>>> for H's. I know that h<n> was deprecated in favor of H<n> and >>>> this maybe related to that. >>>> One general solution might be to have 2 modes, like daylight: >>>> normal matching (ignore H's) and explicit H matching >>> >>> The problem seems to me that the hydrogen in for example [H] >>> [C@@]1(CCC(C)=CC1=O)C(C)=C isn't "ignored". Method >>> HydrogenAtom.match() returns false, and that makes the match >>> false. Perhaps I don't get it, but I'd say to ignore would mean to >>> return true. >> Aah, indeed. > > What do you think the best patch would be? I'm a bit lost, because I > don't know what was discussed on the OpenBabel lists. > > Would this change tackle it ? > > if ( atom.getMassNumber() == null || > (atom.getMassNumber() != null && atom.getMassNumber() > 1)) { > return true; > } > > The match is then true for all these test cases > > test("[C@@]1(CCC(C)=CC1=O)C(C)=C[H]"); > test("[H][C@@]1(CCC(C)=CC1=O)C(C)=C"); > test("[H][H]"); > test("[2H]"); > test("[H]"); I think this is correct. Does this patch change the number of failures in SMARTSSearchTest? (There are currently 9 failures I think). If not, then this is likely OK. I wonder, wouldn't a simple patch just be: if atom symbol == 'H" return true? I don't remember the reason for the multiple conditions being checked for in HydrogenAtom.match() (which was written by Dazhi) ---------------------------------------------------- Rajarshi Guha | NIH Chemical Genomics Center http://www.rguha.net | http://ncgc.nih.gov ---------------------------------------------------- Nothing spoils fun like finding out it builds character" -Calvin ------------------------------------------------------------------------------ Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day trial. Simplify your report design, integration and deployment - and focus on what you do best, core application coding. Discover what's new with Crystal Reports now. http://p.sf.net/sfu/bobj-july _______________________________________________ Cdk-user mailing list Cdk-user@... https://lists.sourceforge.net/lists/listinfo/cdk-user |
| Free embeddable forum powered by Nabble | Forum Help |