This page describes
mining for molecules. Since molecules may be represented by
molecular graphs this is strongly related to
graph mining and
structured data mining. The main problem is how to represent molecules while discriminating the data instances. One way to do this is chemical similarity
metrics, which has a long tradition in the field of
cheminformatics.
Typical approaches to calculate chemical similarities use chemical fingerprints, but this loses the underlying information about the molecule topology. Mining the molecular graphs directly
avoids this problem. So does the inverse QSAR problem which is preferable for vectorial mappings.
Coding(Moleculei,Moleculeji)
Kernel methods
- Marginalized graph kernel
- Optimal assignment kernel
- Pharmacophore kernel
Maximum Common Graph methods
- MCS-HSCS (Highest Scoring Common Substructure (HSCS) ranking strategy for single MCS)
Coding(Moleculei)
Molecular query methods
- Warmr
- AGM
- PolyFARM
- FSG
- MolFea
- MoFa/MoSS
- Gaston
- LAZAR
- ParMol (contains MoFa, FFSM, gSpan, and Gaston)
- optimized gSpan
- SMIREP
- DMax
- SAm/AIm/RHC
- AFGen
See also
References
- Schölkopf, B., K. Tsuda and J. P. Vert: Kernel Methods in Computational Biology, MIT Press, Cambridge, MA, 2004.
- R.O. Duda, P.E. Hart, D.G. Stork, Pattern Classification, John Wiley & Sons, 2001. ISBN 0-471-05669-3
- Gusfield, D., Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology, Cambridge University Press, 1997. ISBN 0-521-58519-8
- R. Todeschini, V. Consonni, Handbook of Molecular Descriptors, Wiley-VCH, 2000. ISBN 3527299130
See also
External links