
Short description: DMax Chemistry Assistant™ is a relational data mining tool for QSAR, compound screening data analysis, and virtual screening. It automatically finds, formulates and shows scientific hypotheses that best match measurements of activity (or any other observable property) of small molecules. It also makes a statistical estimate of the confidence you can have in each hypothesis.
DMax Chemistry Assistant has the unique ability to start from individual functional groups and rings and construct hypotheses that combine these building blocks with relational expressions, such as "A is linked to B via a conjugated system". Thus, a priori descriptors or fingerprints are not required. Still, your existing descriptors can be imported and used in the hypothesis-building.
The hypotheses are automatically validated on a separate test set, and can be collectively applied to unseen compounds for virtual screening.
Related papers
Ando, H. Y.; Dehaspe, L.; Luyten, W.; Craenenbroeck, E. V.; Vandecasteele, H.; Meervelt, L. V. Discovering H-Bonding Rules in Crystals with Inductive Logic Programming. Mol. Pharm. 2006, 3, 665–674.
De Grave, K.; Costa F. Molecular Graph Augmentation with Rings and Functional Groups. J. Chem. Inf. Model. 2010.
Free license: Request a free license.
DMax Chemistry Assistant used to be a commercial software product of PharmaDM, but is now maintained at the DTAI research group at K.U.Leuven, and is being licenced for free to both academic and commercial users. It works on Linux and Windows (see the system requirements).
Tutorial: browse the tutorial.
Screenshots
The few screenshots below only illustrate a subset of the functionalities. Browse the tutorial for more. The illustrations are based on data taken from NCI human tumor growth cell line (COLO 205 Colon).

Above: you can decide on the eligibility for inclusion in the model of arbitrary types of 2D structural features, HTS observations, and other properties.
Below is an example of an automatically generated hypothesis that explains (with very high confidence) low values for "COLO 205 Colon: logGI50".

The color codes link the text to the molecule drawings.
Below are the statistics (obtained on a separate test set) for the hypothesis in isolation.

On the bases of all hypotheses, a model is constructed for the prediction of the property. The performance of that model (again, on a separate test set) is shown as below.

You can use the generated model for virtual screening (both ranking and prediction). In the example below, a trivial single-hypothesis model is applied to the new compound library "nci_sample".

Notice in the left panel that new compound "524367" is predicted to have a value for "COLO 205 Colon: logGI50" of -5.96. You can find the hypotheses underlying that prediction in the bottom panel. In this case, there is just one hypothesis that applies.
The examples underlying this hypothesis are shown in the right panel. For instance, one of the examples from our NCI data set that support the hypothesis is "477482".
Notice that color codes link the ranked molecules in the left panel, to the hypothesis text in the bottom panel, to the reference molecules in the right panel.
Contact: Kurt De Grave


