Data science in early drug discovery – getting it right from the start

Share this article

Woman looking at red pill with scientific symbols
The application of data science in the early stages of drug development is not new – progress in algorithms and computing power has been ongoing for years. We have reached the point where we have to reflect on the road travelled and look forward to upcoming opportunities and challenges. To further pave the way and reach the top in health data science, stakeholders will have to find each other and work together. Once everyone is on board, data science knows no bounds!

A quicker path to better medicine

Data science essentially offers us a way to make sense of enormous quantities of data. Where the human brain is limited in the amount of complex data it is capable of analyzing, algorithms open a new path forward. In the earliest stages of drug discovery – where molecules are screened and optimized to impact a specific cellular, disease-causing target – data science makes a decisive difference. “Data science allows us to discover patterns and identify new hypotheses that a human wouldn’t necessarily make,” says Christine Durinx, managing director of the world-class research institute VIB.

“Data science allows us to discover patterns and identify new hypotheses that a human wouldn’t necessarily make.” – Christine Durinx

Moreover, it can significantly improve the efficiency of the drug discovery process, enabling researchers to focus their time and effort on experiments likely to be worthwhile. Pieter Peeters, Senior Director Translation Biology at Janssen, confirms: “We use data science in early discovery to create translational insights, but also to inform us on the optimal way forward. As such, we can deliver precise therapies to patients faster.”

Janssen is setting the pace

In traditional drug discovery, we would screen millions of compounds for therapeutic properties in a very time- and labor-intensive process. The Janssen Biosignature project, led by Peeters, is an example of how we can implement data science to significantly speed up the selection procedure. “By correlating specific cellular responses, obtained through microscopic analyses, with therapeutic potential, our algorithms can identify patterns and use this knowledge to inform us of the potential of uncharacterized molecules,” says Peeters.

“We trained AI models on billions of existing data points… intending to identify the most effective compounds for drug development.” – Hugo Ceulemans

Other modelling efforts aim to leverage the world’s largest collection of molecules with known biochemical or cellular activity, to enable more accurate predictions in drug development. This mission is called MELLODDY (Machine Learning Ledger Orchestration for Drug Discovery). Hugo Ceulemans, Senior Scientific Director of Drug Discovery Data Sciences at Janssen, is the industrial project lead of this European, IMI-funded, public-private consortium, involving ten pharmaceutical companies, five technology partners, and two academic institutes. In June 2022, MELLODDY came to a successful close. “We trained AI models on billions of existing data points, covering the behavior of more than 20 million small molecules in over 40,000 biological assays,” says Ceulemans, “intending to identify the most effective compounds for drug development.” Using federated learning (where only insights are shared, and the data stays local), the consortium analyzed competitive data, while preserving full intellectual property and respecting their highly confidential nature.

Tackling the hurdles ahead

As technology evolves, new challenges inevitably arise. Experts in data science agree that there are specific obstacles on the road to success and it will take collaborative efforts to overcome them.

  1. We need bilingual professionals to ensure translational communication

Moving forwards, we will increasingly need people capable of catalyzing communication between biologists and data scientists. “We train these bilingual professionals on the job,” Peeters says. “At Janssen, we see it as an investment in the future.” This aim is also one of the core missions at VIB: “We invest in training of our employees, enabling them to make the most of their data,” Durinx affirms. It will also be in all our benefit to prepare future employees well in advance, by integrating data science in earlier educational programs.

  1. We need more data to maximize the impact of our results

Vast amounts of data are necessary in order to generate relevant output capable of impacting healthcare. Patient data is extremely valuable, yet subject to strict privacy regulations. Research data can be equally useful but is liable to IP rules. Improvements have been made to both to overcome these restrictions, including federated learning. Sharing data is a trust-based exercise, whether it be patient or research data. In this context, transparency and education are key. “We need to increase awareness of the potential of data science, both within the ecosystem and among the general public,” Durinx says.

  1. We need more qualitative data to gain more qualitative results

“As the saying goes: garbage in; garbage out.” – a clear statement by Durinx on the importance of data quality. “As long as data is not appropriately qualitative and standardized, researchers will not be able to explore its full potential.” The problem of data quality mainly manifests on the clinical level. Symptoms are often described differently between different medical centers, and parameters are measured in various ways. “It will therefore be essential to have a critical human mind with deep domain knowledge to guide the work.”

Read this previous BioVox article to find out what Belgium needs to do to become number one for real-world data in healthcare!

  1. We need to join forces to reach the top

We need the know-how to translate data into high-quality insights. “Scientific breakthroughs take place on the edge of different disciplines,” says Durinx. “Data science requires a collaborative approach.” We must work together, across domains, stakeholders, and institute borders. “Technology is not something to own. Increased insights are beneficial for everyone,” confirms Peeters. Catalyzing such collaboration can be fairly easy. It is all about creating a stimulating environment to exchange knowledge and expertise. For Belgium, establishing the National Health Data Platform will be a significant step in the right direction, but further action is still urgently needed. “We have to act now and move forward in a more determined manner, together, if we are to realize the full potential of data science in healthcare,” Peeters concludes.