RDKit-discuss Roundup #1

3 minute read

Undefined Stereo Depiction, Tautomer InChIKeys, and Double Bond Stereo in MOL files

In this series, I highlight issues posted in the RDKit-discuss mailing list which I found interesting and responded to. I reflect briefly on lessons learned here.

Unspecified Stereo Depiction

I posted this question, as it came up in my work trying to depict different stereoisomers of the same molecule. More concretely, I expected molecules with no stereochemistry specified to be depicted as having crossed double bonds or single squiggly bonds.

Turns out, you need to specify STEREOANY as a property of the bond in order for it to be depicted as crossed, as shown by Paolo Tosco. (Big thanks to him for such a detailed tutorial.)

Importantly, keep in mind the distinction between stereochemistry unknown vs. unspecified, as mentioned by Greg Landrum.

There is no way to express these two things differently in a SMILES, and therefore in my opinion, the depiction should likewise be identical for both cases (instead of randomly giving a stereochemistry in the depiction when stereochemistry is in fact unspecified, which is currently what happens and was a source of confusion when I was depicting my molecules).

The issue is still open on GitHub. For now, important to bear in mind that the depiction for unspecified stereo in RDKit can give weird unexpected results and it is best to cross-check with John May’s CDKDepict.

Tautomer InChIKeys

Gustavo asked why two isomers of the same compound (cis vs. trans on sp2 N) were getting the same InChIKey, as he was trying to generate all stereoisomers for molecules in a certain database, then filter possible duplicates by InChIKey.

In a gist, I managed to reproduce his issue, but couldn’t find a good answer (only that InChIKey auxiliary info was different for the different isomers). As a workaround, I suggested using Canonical SMILES for deduplication.

Luckily, others came to the rescue. Igor explained that Standard InChIs inherently give the same result for different tautomers, and the solution would be to use Non-standard InChIs or a /FixedH option upon InChIKey generation via e.g., Chem.inchi.MolToInchi(mol, options="/FixedH") as shown by Paolo.

All in all, good to know: Standard InChIKeys for different tautomers are identical!

Double Bond Stereo in MOL Files

This question was an interesting one: why does bond stereochemistry for a double bond with 2 identical substituents on one end (i.e., is symmetrical) get a bond stereo value = 3 in the Bond block of a MOL file?

My first time dealing with MOL files, it was intriguing that bond stereo for double bonds is limited to two options: 0 = Use x-, y-, z-coords from atom block to determine cis or trans and 3 = Cis or trans (either) double bond.

James, the OP, brought up a further concept - STEREOANY vs. STEREONONE - which are bond property options in the RDKit. He explained that he expected the double bond in question to be STEREONONE because of its symmetry, and therefore have a bond stereo value = 0.

However, I’m not so sure these concepts can be equated. In my gist, I first tried to inspect the behaviour of this bond stereo value within the Bond block of the MOL file by trying some simpler examples, then successfully reproduced his example.

I even tried playing around with the 2D and 3D coordinates. However, I was still getting bond stereo value = 3.

Personally, I suspect that having symmetry on the double bond means the molecule gets treated as a case of undefined stereochemistry (like FC=FC as shown in my gist), and therefore gives bond stereo value = 3.

However, the issue remains unsolved and I’m curious to know what other RDKitters will have to say.

Interestingly, adding explicit Hs makes a difference to the double bond stereo value of the exact same molecule (explicit Hs gives value 3, implicit gives value 0). Not sure why…do you? :) Leave a comment!

Hits

Updated:

Leave a comment

Your email address will not be published. Required fields are marked *

Loading...