Research

Current research

My research interests are related to the use of machine learning and visualization in pharmaceutical research to aid decision making. Computational techniques such as these are becoming more prevalent in the drug discovery process. I am interested in different aspects of how these techniques can contribute to drug discovery research:

Machine learning model interpretation: machine learning models create associations between structural elements of sets of molecules and biochemical properties that are expensive to measure. However, in many cases it is not possible to know what these associations are. I am interested in techniques that are able to make these models more understandable.
Visualization of chemical space: visualization techniques are a useful way to condense a large amount of information in an understandable fashion. I am interested in developing novel visualizations that bridge the structural information of sets of molecules to their physico-chemical and biochemical properties.

Previous research

The D3i4AD project

I worked as a Marie Curie fellow as part of the European Project: "Diagnostics and Drug Discovery Initiative for Alzheimer's Disease" (D3i4AD). As part of this project, I developed deep neural networks with a dual purpose: to predict if a compound would be active in a phenotypic screen and against which target in the pathway studied in the screen. These models would be used to screen for compounds and would aid in the target deconvolution of the hit compounds.

Machine learning

Machine learning generates mathematical models that can predict a range of properties given a suitable training data. I first worked on using matched molecular pairs (see below) as input to predict changes in activity caused by these small chemical modifications. I also analyzed how missing data affects the performance of multitask machine learning. This provided a first aproximation of how much performance can be gained by testing new compound-assay combinations.

Visualization of chemical space

Chemical space is the collection of all possible compounds that are physically possible. Although not all possible compounds have been synthesized and characterized, public and private collections of compounds have become very large. The large size makes systematic analysis of the structure-activity relationship complicated. Tailored visualizations are able to ease the analysis by providing either general or focused view of chemical space. A large part of my thesis was dedicated to the design and development of visualization to explore complex, high-dimensional chemical spaces. I expanded on the chemical space network concept to generate coordinate-free representations of a coordinate-based chemical space, as well as introducing a layout algorithm to provide a global view character to these networks. Chemical space views that provide clear regions where favorable compounds are separated from unfavorable were generated to help in multi-objective compound design. I also worked in a collaboration with Pfizer scientists to assess the progression of a lead optimization series using SAR matrices and visualizations.

Matched molecular pairs

Matched molecular pairs are compound pairs that share a large commong substructure but differ at a specific site. They constitute an intuitive way to represent chemical similarity and have become increasingly important for molecular design. I performed as part of my thesis several data mining analysis of matched molecular pairs on public databases such as PubChem and ChEMBL. This analysis focused on how the structural modification encoded on these pairs changed physico-chemical properties like activity or ionizability. Finally, I proposed an extension of the concept by taking a retrosynthetic approach during the generation of pairs.

Molecular modelling

Docking of compounds on protein binding pockets provides hypothesis on the binding conformation of molecules that can be used for further optimization. I assisted medicinal chemistry groups by performing docking studies on compound series in order to better understand the differences in activity present between the compounds. I also worked on the analysis of conformations of macrocycle, complex molecules that posses a large ring. Many interesting natural products are macrocycles. As part of this analysis we compared traditional conformational sampling methods like LowModeMD against short MD simulations to see which was better at finding active conformations.

Bioinformatics

During the last year of my undergraduate I was part of the workgroup of Prof. Julian Perera. They had obtained partial results of the sequencing of Rhodococcus ruber Strain Chol-4. I performed the first partial assembly of the contigs that we received and also characterized the regions where the genes responsible for the degradation of cholesterol were located.

Antonio de la Vega de León

AI/ML expert for Drug Discovery