Showing posts with label computational chemistry. Show all posts
Showing posts with label computational chemistry. Show all posts

Monday, May 05, 2025

InChI metal-reconnected layer

While I wasn’t looking, the InChI folks implemented the metal-reconnected layer. Isn’t it nice? I discovered it quite by chance thanks to the Beilstein-Institut ChemInfo Labs page. You can see how it works on InChI Web Demo.

Consider ferrocyanide (a):

(a)
  1. [Fe(CN)6]4−
    hexacyanidoferrate(4−) (additive)
    ferrocyanide (trivial)

Its standard InChI is:

InChI=1S/6CN.Fe/c6*1-2;/q;;;;;;-4 (1)

The main layer contains two types of entities: 6CN (i.e. six CN molecules) and Fe (one iron atom). If we try to convert (1) back to structure, using the same Web Demo tool, we get six free-floating CN radicals and a separate Fe4− anion. Ew. But if we tick the “Include Bonds to Metal” box in the Web Demo tool, we have

InChI=1/6CN.Fe/c6*1-2;/q;;;;;;-4/rC6FeN6/c8-1-7(2-9,3-10,4-11,5-12)6-13/q-4 (2)

where the metal-reconnected layer (/r) appears. It looks like an alternative InChI added directly after the standard one, with its own connectivity (/c) and charge (/q) sublayers. In this layer, there is only one entity: C6FeN6, i.e. [Fe(CN)6]. The string (2) is correctly converted back to the structure (a).

Now let’s look at the structure of a salt known as Prussian Blue (b):

(b)
  1. Fe4[Fe(CN)6]4−
    iron(3+) hexacyanidoferrate(4−) (additive)
    ferric ferrocyanide (trivial)
    Prussian Blue (trivial)

Its standard InChI is:

InChI=1S/18CN.7Fe/c18*1-2;;;;;;;/q;;;;;;;;;;;;;;;;;;3*-4;4*+3 (3)

The main layer contains two types of entities: 18CN (i.e. 18 CN molecules) and 7Fe (seven iron atoms). Converting (3) to structure brings about a horrible mess. With “bonds to metal”, however, we get

InChI=1/18CN.7Fe/c18*1-2;;;;;;;/q;;;;;;;;;;;;;;;;;;3*-4;4*+3/r3C6FeN6.4Fe/c3*8-1-7(2-9,3-10,4-11,5-12)6-13;;;;/q3*-4;4*+3 (4)

In the metal-reconnected layer (/r) we see two different types of entities: 3C6FeN6 (i.e. three [Fe(CN)6]) and 4Fe. The string (4) is correctly converted back to the structure (b).

Years ago, I was complaining (to the universe) about different InChIs for the same molecular entity, viz. chromate (ce). I’ve revisited it with the new version of InChI.

[Cr(O)2(O-)2] [Cr(O)4]2- [Cr(2+)(O-)4]
(c) (d) (e)
    [CrO4]2−
    chromate (trivial)
    tetraoxidochromate(2−) (additive)
    tetraoxidochromate(VI) (additive)

Alas, the standard InChIs for the representations (c), (d) and (e) remain different. Try to convert them back to structures: they also are all different and all wrong (all have extra hydrons). Nevertheless, I see a sign of progress: the metal-reconnected layers for the corresponding strings (5), (6) and (7) are identical!

(c) InChI=1/Cr.4O/q;;;2*-1/rCrO4/c2-1(3,4)5/q-2 (5)
(d) InChI=1/Cr.4O/q-2;;;;/rCrO4/c2-1(3,4)5/q-2 (6)
(e) InChI=1/Cr.4O/q+2;4*-1/rCrO4/c2-1(3,4)5/q-2 (7)

Moreover, all three strings, (5), (6) and (7), are converted back to the structure (c).

What about our old friend ferrocene? Depends how you draw it. I’ll stick to the ChEBI’s decacoordinate-iron representation (f):

ferrocene with 10-coordinate iron
(f)
  1. bis(η5-cyclopentadienyl)iron (additive)
    ferrocene (trivial)

The standard InChI for ferrocene is:

InChI=1S/2C5H5.Fe/c2*1-2-4-5-3-1;/h2*1-5H; (8)

Converting (8) to structure results in two standalone cyclopentadienyl radicals and a neutral iron atom. With “bonds to metal”:

InChI=1/2C5H5.Fe/c2*1-2-4-5-3-1;/h2*1-5H;/rC10H10Fe/c1-2-4-5-3(1)11(1,2,4,5)6-7(11)9(11)10(11)8(6)11/h1-10H (9)

In the /r layer we see a single entity, C10H10Fe, i.e. [Fe(C5H5)2]. The string (9) is correctly converted back to the structure (f).

Tuesday, August 26, 2014

Ogres are not like cakes

I was intrigued by the article in New Scientist which starts with the question, “Do you speak chemistry?” [1]. So much that I asked my friend to send me the original paper [2] authored by the Bartosz Grzybowski group of Northwestern University in Evanston, Illinois. It is a curious reading.

Don’t get me wrong. I have nothing against the analogies. I love the analogies. If the linguistic analogy works for chemistry, it’s fine by me. As long as everybody understands that it is just an analogy.

The authors try to “demonstrate that a natural language such as English and organic chemistry have the same structure in terms of the frequency of, respectively, text fragments and molecular fragments”. How do they do that? They start by looking at the maximum common substrings (MCS) found in 100 sentences randomly chosen from English Wikipedia.

Perhaps not surprisingly, the most common fragment of the sentences is “e”, followed by “a” and “o”.
That is surprising to me though, considering that only “a” is a word in English. I wouldn’t be surprised if it happened to be Spanish Wikipedia. Are the authors talking about letter frequency per chance? But the “top three” letters in English (from most to least common) are known to be E, T, A while in Spanish they are E, A, O. Anyway, they show that the distribution of the fragments, whatever they are, follows the power law. Then they show that the distribution of the common molecular fragments, derived from the corpus of organic molecules, also follows the power law. Big deal: so do the earthquake magnitudes, populations of cities and stock market crashes [3]. Cadeddu et al. do not seem to be bothered with that at all:
We have just shown that there exists a set of molecular fragments with which organic molecules can be described akin to a language.
So far so bad; whether you are a linguist, a computational chemist or an organic chemist, both methodology and conclusions of this paper are bound to make you cringe. So, my immediate reaction was to dismiss it altogether. Ogres are not like cakes. Organic molecules are not like a language. End of story.

But could it be that I am missing something? On the one hand, the language of chemistry — whether we are talking trivial names, systematic names, or graphical diagrams — is very much like any other language: a system of communication. On the other hand, the molecules themselves are not. Unless they are the information macromolecules. The message encoded in a single DNA molecule can be very much abstracted from its chemical structure. Without any doubt, genetic code is a communication system, therefore it is a language, although not man-made.

It’s interesting that the authors view organic molecules as “sentences” rather than “words”; the latter would be the nomenclaturist’s approach. I guess it depends on your taste, or language preferences. Most systematic chemical names look alien in English but would fit rather nicely in German or Finnish. I personally view any chemical name as a noun phrase describing a corresponding molecular entity; a molecular entity itself is not a noun phrase. However, in natural languages, there rarely is a confusion regarding the boundaries of a word:

a word is the smallest element that may be uttered in isolation with semantic or pragmatic content (with literal or practical meaning).
On the contrary, Grzybowski’s “words” are the molecular fragments which do not exist in isolation. It is also worth noting that in the world of biopolymers, say nucleic acids, each monomer (as complex as any of Grzybowski’s “sentences”), is often represented as a letter, while an entire bacterial genome (still a single DNA molecule) could be considered a War and Peace (or Crime and Punishment).

Cadeddu et al. further claim that linguistic approach identifies the symmetry/repeat units in molecules such as α-cyclodextrin and porphyrin:

We emphasize that this is not a small feat given we have not even considered any (x, y, z) coordinates of the atoms making up these molecules and performed no linear-algebra analyses to find symmetries which, incidentally, can be a computationally intensive procedure involving manipulation of matrices.
I find this modest remark regarding the size of the “feat” within the body of a scientific article in a respectable journal really cute. Are the authors even aware that there are chemical similarity/substructure search engines? You don’t need atomic coordinates to identify the fragments with the same connectivity.

Which brings me to the final point. What is the “chemical linguistics” anyway? If the “words” of chemistry, as postulated in [2], are nothing else but molecular fragments, or substructures, then the chemoinformaticians were doing the substructure search of chemical databases for donkey’s years without knowing that it is called chemical linguistics. I am aware of completely different use of this term in a sense “mining of natural language texts for chemical information” [4, 5]. This latter use is well-established and I think applying the name “chemical linguistics” to unrelated area will only confuse everybody.

  1. Aron, J. (2014) Language of chemistry is unveiled by molecular make-up. New Scientist no. 2981, p. 8.
  2. Cadeddu, A., Wylie, E.K., Jurczak, J., Wampler-Doty, M. and Grzybowski, B.A. (2014) Organic chemistry as a language and the implications of chemical linguistics for structural and retrosynthetic analyses. Angewandte Chemie 126, 8246—8250.
  3. Buchanan, M. (2000) Ubiquity, Weidenfeld & Nicolson, London.
  4. Goebels, L., Grotz, H., Lawson, A.L., Roller, S. and Wisniewski, J. (2005) Method and software for extracting chemical data. Patent DE 102005020083 A1.
  5. Day, N.E., Corbett, P.T. and Murray-Rust, P. (2007) Semantic chemical publishing. ACS National Meeting #233, Chicago.

Monday, June 30, 2014

Phycocyanin against Alzheimer’s?

Could a light-harvesting protein phycocyanin be used as a novel drug against Alzheimer’s disease (AD) [1, 2]?

In the present study, intact hexameric phycocyanin was isolated and crystallized from the cyanobacterium Leptolyngbya sp. N62DM, and the structure was solved to a resolution of 2.6 Å. Molecular docking studies show that the phycocyanin αβ-dimer interacts with the enzyme β-secretase, which catalyzes the proteolysis of the amyloid precursor protein to form plaques. The molecular docking studies suggest that the interaction between phycocyanin and β-secretase is energetically more favorable than previously reported inhibitor-β-secretase interactions. Transgenic Caenorhabditis elegans worms, with a genotype to serve as an AD-model, were significantly protected by phycocyanin. Therefore, the present study provides a novel structure-based molecular mechanism of phycocyanin-mediated therapy against AD.
  1. Singh, N.K., Hasan, S.S., Kumar, J., Raj, I., Pathan, A.A., Parmar, A., Shakil, S., Gourinath, S. and Madamwar D. (2014) Crystal structure and interaction of phycocyanin with β-secretase: A putative therapy for Alzheimer's disease. CNS Neurol. Disord. Drug Targets 13, 691—698.
  2. PDB:4L1E

Saturday, January 18, 2014

Sodium chloride revisited

Everybody knows that the formula of sodium chloride is NaCl. Right? Right. But recently, the team of Artem Oganov at Stony Brook University have shown that there are other stable types of crystalline sodium chloride. They have predicted several thermodynamically stable compounds: Na3Cl, Na2Cl, Na3Cl2, NaCl3, and NaCl7. Moreover, by utilising high-pressure techniques, they synthesised cubic and orthorhombic NaCl3 and two-dimensional tetragonal Na3Cl [1, 2].

NaCl3 (space group Pm3n)
Na3Cl (space group P4/mmm)

“One of these materials — Na3Cl — has a fascinating structure”, says Oganov. “It is comprised of layers of NaCl and layers of pure sodium. The NaCl layers act as insulators; the pure sodium layers conduct electricity” [3].

  1. Zhang, W., Oganov, A.R., Goncharov, A.F., Zhu, Q., Boulfelfel, S.E., Lyakhov, A.O., Stavrou, E., Somayazulu, M., Prakapenka, V.B. and Konôpková, Z. (2013) Unexpected stable stoichiometries of sodium chlorides. Science 342, 1502—1505; arXiv:1310.7674v1
  2. Ibáñez Insa, J. (2013) Reformulating table salt under pressure. Science 342, 1459—1460.
  3. SBU team discovers new compounds that challenge the foundation of chemistry. Stony Brook University Newsroom, December 19, 2013.

Sunday, February 05, 2012

Carbon—carbon quadruple bond

Quadruple and higher order metal—metal bonds are known for transition metals, lanthanoids and actinoids. But for main group elements? Using four different computational methods, Shaik et al. [1] show that

C2 and its isoelectronic molecules CN+, BN and CB (each having eight valence electrons) are bound by a quadruple bond. The bonding comprises not only one σ- and two π-bonds, but also one weak ‘inverted’ bond, which can be characterized by the interaction of electrons in two outwardly pointing sp hybrid orbitals.
According to Shaik, the existence of the fourth bond in C2 suggests that it is not really diradical C22• [2]:
If C2 were a diradical it would immediately form higher clusters. I think the fact that you can isolate C2 tells you it has a barrier, small as it may be, to prevent that.
  1. Shaik, S., Danovich, D., Wu, W., Su, P., Rzepa, H.S. and Hiberty, P.C. Quadruple bonding in C2 and analogous eight-valence electron species. Nature Chemistry 4, 195—200.
  2. Extance, A. Calculations reveal carbon-carbon quadruple bond. Chemistry World, 29 January 2012.

Friday, September 11, 2009

Charge-shift bonding

Here’s something one doesn’t see in a chemistry textbook. The recent perspective paper in Nature Chemistry deals with a distinct class of electron-pair bonding called “charge-shift” (CS) bonding, which exists alongside classical covalent and ionic bonding. And in not-so exotic molecules.

<A> striking example is the difference between H2 and F2; two homonuclear bonds that by all criteria should be classified as covalent bonds, but exhibit fundamental differences. Consider the energy curves (Fig. 1) of the two bonds calculated recently. Figure 1a shows that the H—H bond is indeed covalent; its covalent structure accounts for most of the bonding energy (relative to the ‘exact’ curve). By contrast, for the F—F bond in Fig. 1b, the covalent structure is entirely repulsive, and what determines the bonding energy and the equilibrium distance is the covalent–ionic mixing. This mixing leads to a resonance energy stabilization, which we have termed the ‘charge-shift resonance energy’ (RECS). Thus, despite their apparent similarity, the two bonds are very different; whereas the H—H bond is a true covalent bond, the F—F bond is a CS bond that is completely determined by the RECS quantity.

No less striking example is so-called inverted C—C bond in [1.1.1]propellane (described in a paper from the same group of authors), which “closely resembles the single bond of difluorine”.

Saturday, January 17, 2009

Circumnames

I came across these “circumnames” quite by chance: several compounds were mentioned in this paper and some other can be found in this astrochemistry database. Is there any elegant way to name them systematically?

circumpyrene
(a) circumpyrene
ovalene
(b) ovalene
pyrene
(c) pyrene

For instance, ACD/Name gives the molecule (a) a systematic name dinaphtho[2,1,8,7-hijk:2',1',8',7'-stuv]ovalene, i.e. two naphtho groups are fused to the top and bottom of the ovalene (b) molecule, while the non-systematic name “circumpyrene” means that pyrene (c) core is completely encircled by fused benzene rings. Unfortunately, I was unable to generate ACD/Names for bigger molecules, such as circumcoronene (d) [i.e. coronene (e) encircled by fused benzene rings] and circumovalene (f). Apparently, ACD/Name cannot name compounds with more than 15 fused rings.

circumcoronene
(d) circumcoronene
coronene
(e) coronene
circumovalene
(f) circumovalene

If the “core” structure is surrounded by two rows of fused benzene rings, the doubly “circum” names like circumcircumpyrene (g) and circumcircumcoronene (h) appear.

circumcircumpyrene
(g) circumcircumpyrene
circumcircumcoronene
(h) circumcircumcoronene