package molenc

  1. Overview
  2. Docs
Molecular encoder/featurizer using rdkit and OCaml

Install

Dune Dependency

Authors

Maintainers

Sources

v1.0.0.tar.gz
sha256=4147bb8c00bacbbf73710f6556b96bde7c8851419d06894b386c4f64feed07d4
md5=7e09a62f569eb45d601525f5b1433a5a

Description

Chemical fingerprints are lossy encodings of molecules. molenc allows to encode molecules using unfolded-counted fingerprints (i.e. a potentially very long but sparse vector of positive integers).

Currently, Faulon fingerprints are supported. In the future, atom pair fingerprints might be added. Currently, atom types are the quadruplet (#pi-electrons, element symbol, #HA neighbors, formal charge). In the future, pharmacophore features might be supported (a more abstract/fuzzy atom typing scheme).

Bibliography:

Carhart, R. E., Smith, D. H., & Venkataraghavan, R. (1985). Atom pairs as molecular features in structure-activity studies: definition and applications. Journal of Chemical Information and Computer Sciences, 25(2), 64-73.

Kearsley, S. K., Sallamack, S., Fluder, E. M., Andose, J. D., Mosley, R. T., & Sheridan, R. P. (1996). Chemical similarity using physiochemical property descriptors. Journal of Chemical Information and Computer Sciences, 36(1), 118-127.

Faulon, J. L., Visco, D. P., & Pophale, R. S. (2003). The signature molecular descriptor. 1. Using extended valence sequences in QSAR and QSPR studies. Journal of chemical information and computer sciences, 43(3), 707-720.

OpenSMILES specification. Craig A. James et. al. v1.0 2016-05-15. http://opensmiles.org/opensmiles.html

Published: 20 Jun 2019

README

molenc

Molecular encoder using rdkit and OCaml.

OUTDATED DESCRIPTION The implemented fingerprint is J-L Faulon's "Signature Molecular Descriptor". This is a counted, unfolded fingerprint of molecules.

The fingerprint can be run using atom types (#pi-electrons, element symbol, #HA neighbors, formal charge) or rdkit pharmacophore features (TODO) (Donor, Acceptor, PosIonizable, NegIonizable, Aromatic, Hydrophobe), if you want a fuzzier description of your molecules.

Bibliography

Carhart, R. E., Smith, D. H., & Venkataraghavan, R. (1985). Atom pairs as molecular features in structure-activity studies: definition and applications. Journal of Chemical Information and Computer Sciences, 25(2), 64-73.

Kearsley, S. K., Sallamack, S., Fluder, E. M., Andose, J. D., Mosley, R. T., & Sheridan, R. P. (1996). Chemical similarity using physiochemical property descriptors. Journal of Chemical Information and Computer Sciences, 36(1), 118-127.

Faulon, J. L., Visco, D. P., & Pophale, R. S. (2003). The signature molecular descriptor. 1. Using extended valence sequences in QSAR and QSPR studies. Journal of chemical information and computer sciences, 43(3), 707-720.

Dependencies (7)

  1. ocaml >= "4.04.0" & < "5.0"
  2. conf-python-3
  3. conf-rdkit
  4. minicli
  5. dolog < "4.0.0"
  6. batteries
  7. dune < "3.0"

Dev Dependencies

None

Used by (2)

  1. linwrap >= "9.0.3"
  2. rankers < "2.0.9"

Conflicts

None

OCaml

Innovation. Community. Security.