Our lab’s research agenda in this area is to develop artificial intelligence, machine learning, and data mining tools that can help clinicians and health care researchers analyze, interpret, and exploit the burgeoning collection of data and knowledge in order to significantly impact health care.
With the widespread adoption of electronic health/medical records (EHRs/EMRs) and continuing improvements in data collection (e.g., wearable sensors, imminent achievement of the $1000-genome), electronic health data is growing at a staggering rate. An emerging area of research tries to use this data, in combination with domain knowledge, to build models (e.g., predictive) to help with tasks such as clinical decision support.
Learn predictive models from complex data and knowledge
We are interested in building models that combine existing knowledge with collected data. Specifically, our expertise lays in integrating multiple sources of data and knowledge, reasoning at the symbolic level, coping with uncertainty, and learning models from relational databases that capture uncertainty.
Reason and make inferences about data
The goal of learning is to construct a learned model that can be used to make predictions about the future or inferences about the data. Here, our expertise lies in how to do this efficiently.
Discover insights that are comprehensible to domain experts
Our research focuses on learning methods, such as sets of if-then rules, that are readily comprehensible for researchers without a technical background in computer science to interpret. This can lead to advances such as novel discoveries or the formation of new research hypothesis for a given application.
- Learning from Electronic Health Records
The analysis of electronic medical records (EMR) data poses significant technical challenges for learning and rea- soning. EMRs are relational databases that store a wealth of information about a patient’s clinical history: disease diagnoses, procedures, prescriptions, lab results, etc. Using EMRs it is possible to build models to address important medical problems such as predicting which patients are most at risk for having an adverse response to a certain drug. Successfully analyzing EMRs requires accounting for their relational schemas (i.e., the database contains separate relational tables for diagnoses, prescriptions, labs, etc.), longitudinal nature (e.g., time of diagnosis may be impor- tant), and the fact that different patients may have dramatically different numbers of entries in any given table, such as diagnoses or vitals. Furthermore, it is important to model the uncertain, non-deterministic relationships between patients’ clinical histories and current and future predictions about their health status. Our group employs techniques from statistical relational learning (SRL) to address these challenges. SRL offers three benefits for analyzing medical data. First, it can capture important relationships, such as the time between two events occurring or the interactions between two individuals, that occur in the data. Second, it models the inherent uncertainty in the underlying data. Third, it can naturally make use of existing domain knowledge during the learning and mining process. Along with SRL contributions, we develop new algorithmic ideas making our approaches scalable and appropriate for big databases. We have developed a suite of algorithms for analyzing data that focus on automatically discovering statistical, structural regularities (e.g, rules or probabilistic models) from data and have successfully applied them to the following problems.
- Diagnosing Breast Cancer from Structured Mammography Reports
Labeling an abnormality as benign or malignant from a structured mammography report is a challenging task for both radiologists and machines. To tackle this problem, we have developed an algorithm that automatically constructs prob- abilistic first-order logical rules from the data that lead to two important results. First, presently most of the women identified for a possible malignancy on a mammogram are called back unnecessarily, with concomitant stress, proce- dure (additional imaging and/or biopsy) and expense. Our research, which achieves superior performance compared to both previous machine learning approaches and radiologists, has demonstrated the potential to dramatically re- duce this fraction without reducing the number of cancers correctly diagnosed. Second, the (probabilistic) first-order logical rules are easy for domain experts to understand. In our work on mammography, a radiologist collaborator reviewed several learned rules and was particularly intrigued by the following rule:
Abnormality A in mammogram M for Patient P may be malignant if:
A has BI-RADS category 5, and
A has a mass present, and
A has a mass with high density, and
P has a prior history of breast cancer, and
P has another abnormality on same mammogram (B), and
B has no pleomorphic microcalcifications, and
B had no punctate calcifications.
This rule suggested a hitherto unknown relationship between malignancy and high density masses. In general, mass density was not previously thought to be a highly predictive feature.
- Predicting Adverse Drug Events from Electronic Medical Records.
Consider the task of learning a model that, based on a patient’s clinical history, can predict at prescription time whether a patient may be susceptible to an adverse reaction (i.e., side effect) of a medication. A patient’s clinical history records information about specific prescribed medications (e.g., name, dosage, duration) or specific disease diagnoses. It does not explicitly mention important connections between different medications or diagnoses, such as which other medications could have been prescribed to treat an illness. This latent information may be necessary to build accurate models. We addressed this problem by developing an algorithm that, while learn- ing a model, automatically discovers clusters of diseases or medicines that are informative for the specific prediction task. We evaluated our algorithm on three real-world tasks where the goal is to use electronic medical records to predict whether a patient will have an adverse reaction to a medication. We found that our approach is more accurate than performing no clustering, pre-clustering, and using expert-constructed medical heterarchies. Furthermore, our algorithm uncovered latent structure that a doctor with expertise in our tasks of interest deemed (a) to capture impor- tant known relationships and (b) to suggest possible connections that deserve further investigation.
- E. Burnside, J.Davis, V. SantosCosta, I. Dutra, C. Kahn, J. Fine, and D. Page. Knowledge discovery from structured mammography reports using inductive logic programming. In Proceedings of the American Medical Informatics Association Fall Symposium, pages 96–100, 2005.
- J. Davis, B. Berg, D. Page, V. Santos Costa, P. Peissig, and M. Caldwell. Demand-driven clustering in relational domains for predicting adverse drug events. In Proceedings of the 29th International Conference on Machine Learning, 2012.
- K. De Grave, J. Ramon, and L. De Raedt. Active learning for high throughput screening. In Proceedings of the Eleventh International Conference on Discovery Science, volume 5255 of Lecture Notes in Computer Science, pages 185–196, 2008.
- K. Deforche. Modeling HIV resistance evolution under drug selective pressure. PhD thesis, Katholieke Universiteit Leuven, 2008.
- T. Fannes, E. Vandermarliere, L. Schietgat, S. Degroeve, L. Martens, and J. Ramon. Predicting tryptic cleavage from proteomics data using decision tree ensembles. Journal of Proteome Research, 12:2253–2259, 2013.
- G. Li, J. Verheyen, J. Ramon, J. Eusebio, K. Theys, and A-M. Vandamme. Hiv-1 gag and protease coevolution networks. In Proceedings of the 14th European Aids Conference, Brussels, Belgium, October 2013.
- G. Meyfroidt, F. Guiza Grandas, D. Cottem, W. De Becker, K. Van Loon, J.-M. Aerts, D. Berckmans, J. Ramon, M. Bruynooghe, and G. Van den Berghe. Computerized prediction of intensive care unit discharge after cardiac surgery: development and validation of a gaussian processes model. BMC Medical Informatics and Decision Making, 11:1–13, 2011.
- J. Ramon, D. Fierens, F. Guiza, G. Meyfroidt, H. Blockeel, M. Bruynooghe, and G. Van Den Berghe. Mining data from intensive care patients. Advanced Engineering Informatics, 21(3):243–256, 2007.
- J. Van Eyck, J. Ramon, F. Guiza, G. Meyfroidt, M. Bruynooghe, and G. Van Den Berghe. Guided monte carlo tree search for planning in learned environments. In Proceedings of the Asian Conference on Machine Learning (ACML), 2013.
- J. Van Haaren, J. Davis, M. Lappenschaar, and A. Hommersom. Exploring disease interactions using markov networks. In AAAI-13 Workshop on Expanding the Boundaries of Health Informatics Using AI, 2013.
- Van Eyck, Jelle, Jan Ramon, Fabian Guiza Grandas, Geert Meyfroidt, Maurice Bruynooghe, and Greet Van den Berghe. "Guided Monte Carlo tree search for planning in learned environments." In JMLR Workshop and Conference Proceedings, vol. 29. 2013.