Small screenshot

Short description: DMax Chemistry Assistantâ„¢ is a relational data mining tool for QSAR, compound screening data analysis, and virtual screening. It automatically finds, formulates and shows scientific hypotheses that best match measurements of activity (or any other observable property) of small molecules. It also makes a statistical estimate of the confidence you can have in each hypothesis.

DMax Chemistry Assistant has the unique ability to start from individual functional groups and rings and construct hypotheses that combine these building blocks with relational expressions, such as "A is linked to B via a conjugated system". Thus, a priori descriptors or fingerprints are not required. Still, your existing descriptors can be imported and used in the hypothesis-building.

The hypotheses are automatically validated on a separate test set, and can be collectively applied to unseen compounds for virtual screening.

Related papers

Ando, H. Y.; Dehaspe, L.; Luyten, W.; Craenenbroeck, E. V.; Vandecasteele, H.; Meervelt, L. V. Discovering H-Bonding Rules in Crystals with Inductive Logic Programming. Mol. Pharm. 2006, 3, 665–674.

De Grave, K.; Costa F. Molecular Graph Augmentation with Rings and Functional Groups. J. Chem. Inf. Model. 2010.

Download: Download DMax Chemistry Assistant

DMax Chemistry Assistant used to be a commercial software product of PharmaDM, but is now maintained at the DTAI research group at KU Leuven, and is being licensed for free to both academic and commercial users. It works on Linux and Windows.

Tutorial: browse the tutorial.

The few screenshots below only illustrate a subset of the functionalities. Browse the tutorial for more. The illustrations are based on data taken from NCI human tumor growth cell line (COLO 205 Colon).

Selecting the background knowledge (electron flow, methoxy group, carbolic ester, ...) and the target measurements.

Above: you can decide on the eligibility for inclusion in the model of arbitrary types of 2D structural features, HTS observations, and other properties.

Below is an example of an automatically generated hypothesis that explains (with very high confidence) low values for "COLO 205 Colon: logGI50".

 the compound contains a general ether A, and a non-hetero, non-aromatic ring B, and A is connected to B, and there are also ring C and 6-ring D, and ...

The color codes link the text to the molecule drawings.

Below are the statistics (obtained on a separate test set) for the hypothesis in isolation.

Histogram and p-value of the compound libary

On the bases of all hypotheses, a model is constructed for the prediction of the property. The performance of that model (again, on a separate test set) is shown as below.

Cumulative response, lift, ROC and pedicted versus actual curves

You can use the generated model for virtual screening (both ranking and prediction). In the example below, a trivial single-hypothesis model is applied to the new compound library "nci_sample".

Virtual screening on an unmeasured compound library. The measured compounds are shown to the right for comparison.

Notice in the left panel that new compound "524367" is predicted to have a value for "COLO 205 Colon: logGI50" of -5.96. You can find the hypotheses underlying that prediction in the bottom panel. In this case, there is just one hypothesis that applies.

The examples underlying this hypothesis are shown in the right panel. For instance, one of the examples from our NCI data set that support the hypothesis is "477482".

Notice that color codes link the ranked molecules in the left panel, to the hypothesis text in the bottom panel, to the reference molecules in the right panel.

Known issues:

    Please do not visit the PharmaDM website. The company has been liquidated and the domain name is now owned by an aggressive SEO company.

      Contact: Kurt De Grave

      Molecular graph augmentation

      To produce augmented molecular graphs as in the paper

      Kurt De Grave and Fabrizio Costa. Molecular Graph Augmentation with Rings and Functional Groups. Journal of Chemical Information and Modeling. 2010.

      do the following:

      1. Import an SDFile (MDL structure-data file V2000) into DMax Chemistry Assistant.
        After creating a session with the SDFile, you can find the knowledge base in ~/pharmaDM_user/ChemistryAssistant/sessions/session-name/data/molecules.kb
        Alternatively, just before finishing the session wizard, the knowledge base is in ~/pharmaDM_user/ChemistryAssistant/tmp/session_under_construction/data/molecules.kb
        The import wizard will report the number of molecules that could be imported successfully and any errors that occured, make sure to check this if your SDF source is obscure.
      2. Run the little program KB2Graphs to convert a subset of the DMax knowledge base into gspan format.
        Change to the dist directory and run java -jar KB2Graphs.jar --help to list the options.

      The order of the molecules in the SDF is preserved. KB2Graphs can optionally omit the augmentation, to produce graphs that are not augmented but identical with regard to bond encoding, aromaticity perception, etc.