Preparing for a Data Science Career in the Pharmaceutical Industry with a Master’s Degree

The development of new drugs in the face of increasing numbers of antibiotic-resistant strains of superbugs is one of the most important parts of modern healthcare research. But while the gravity of the work is more significant than ever, the epitome of white-coated lab techs as the workforce behind pharmaceutical development has become outdated.

Today, the big breakthroughs in pharmacology research and development are being produced by data scientists.

For master’s-prepared data scientists, Big Pharma may be the best shot at a meal ticket. A 2011 study showed that the pharmaceutical industry was already the leading employer of data scientists in the United States. Most of those positions were occupied by scientists holding a masters or Ph.D.

But the demand for skilled data scientists is nowhere close to being met.

Only Intensive Data Analysis Can Curb the Skyrocketing Costs of Pharmaceutical R&D

Breakthroughs in pharmaceutical R&D come at significant cost and become more difficult to achieve all the time. Pharmaceutical development faces challenges from all sides today. It’s entirely possible that the only people who will be able to save it will come from the next crop of data scientists.

In April 2013, McKinsey Global Institute, the business and research arm of consulting firm McKinsey & Company, identified how the pharmaceutical industry would experience certain key advantages from improved data analytics:

  • Predictive modeling of drug compounds and the biological processes affected by them
  • Identification of trial patient candidates through disparate data sources beyond just medical records, incorporating additional lifestyle factors into screening processes without undertaking laborious interviews with hundreds or thousands of possible participants
  • Real-time monitoring of trial results to enable rapid intervention to address safety or efficacy problems before complete result sets become available.
  • Blending data between discovery and clinical development to smash silos that previously caused clinicians and researchers to miss important elements.

The pharmaceutical industry will need to continue to make substantial investments in highly-trained data scientists to accomplish these monumental tasks.

A Narrowing Research Focus Requires More Intensive Analysis

As many of the traditional and large-scale challenges of healthcare have been solved – at least conceptually – with improved drugs and treatment regimes, drug companies have been able to focus on less common, but equally compelling diseases and disabilities. Ironically, expert data analysis has been able to diagnose conditions that may have gone unnoticed in previous eras. From rare cardiac disorders to unusual blood diseases, big pharma has been able to produce compounds to cure or mitigate suffering for people all over the world.

The flood of data and investments in finding causes and cures to increasingly niche diseases brings its own challenges, however. As the potential population of drug purchasers shrinks with the limited numbers afflicted by more rare conditions, the pool of buyers over which the costs of development can be amortized also shrinks. In this scenario, the operational challenges of bringing a drug to market at an affordable price skyrocket.

Big money is on the line when it comes to investing in drug research in such a market. Deloitte reported in 2013 that it cost more than a billion dollars to bring a new compound to market – an increase of 18% over only three years. At the same time, peak sales of those assets declined by 43%, leaving margins slimmer than ever. Late stage trial terminations caused losses of $243 billion over the four years that followed.

Such investments are made less risky by the sort of detailed analysis that only the most skilled data scientists can offer.

Speed Saves Lives: Data Driven Drug Trials Floor the Gas

With intensive investment in data analysis elsewhere in healthcare, the cycle of increased development cost and shrinking demand is bound to accelerate. More and more effective diagnostics will drive demand for more and more specifically-focused drugs to be marketed to smaller and smaller groups of afflicted patients.

Data science may offer the only affordable way to approach this problem. The vast infrastructure and long timelines conventional drug trials require is too expensive to support development today.

Moreover, the rate at which drug-resistant strains of bacteria are emerging mandate much faster development of new types of antibiotics.

Yet, at the same time the cost of developing new drugs has skyrocketed, the failure rate of drugs in late stage clinical testing has been increasing as well. Those failures may not only bankrupt drug companies; they may well lead to a global crisis in untreatable bacterial infections.

Bringing new compounds to market quickly and economically will be the key challenge for the pharmaceutical industry over the next twenty years. And it seems very unlikely to meet that challenge without a serious revolution in the methods by which drugs are developed and tested.

Affordable Investigation: Using Big Data to Drive Avenues of Research

California-based NuMedii is pioneering some of the most promising efforts to update pharmaceutical compound development.

NuMedii uses intensive data mining techniques to predict compound efficacy prior to actually synthesizing the compound. The company runs specialized, network-based algorithms against billions of data points from disease, clinical, and pharmacological data sets to identify the formulations most likely to find success during actual clinical trials. This approach seeks to remove much of the cost and risk of proceeding down dark alleys during the development process.

Of course, it is neither easy nor inexpensive to develop such a comprehensive database. Other data scientists working in the U.K. at a company called MedChemica have developed another innovative approach to address the issue. By designing a carefully secured method of mining preclinical—and therefore, proprietary—data from a variety of competing companies, MedChemica can tap into larger data sets than any one of the individual companies could manage separately. The process yields more accurate results about the potential toxicity of compounds in development without exposing the specific source data to competitors. All of the participants in the consortium benefit.

The challenge is in scaling up the processes. As one of the MedChemica collaborators, Hans-Joachim Boehm, puts it, “There have been some interesting papers on how to analyze Big Data, but when you look closely, you realize it takes a huge amount of curation and isn’t necessarily scalable. What’s possible on a thousand records won’t necessarily work on millions. What’s needed is a way to build compatible and well-annotated databases and analyze the databases using processes that can be scaled up.”

That work, of course, falls squarely on the shoulders of data scientists. Master’s program graduates in the field can expect to find their services in demand at pharmaceutical companies of all stripes for the foreseeable future. And their work in those places may be one of the most valuable artifacts modern society contributes to the collective future of humanity.

Back to Top