Advances in biotechnology have become omni-pervasive and completely integrated into modern life. Modern miracles emerging from biotech have sprouted in so many diverse fields that they may not even be immediately recognized as being associated with biotechnology in the first place:
- Genetically modified crops in agriculture
- Chemical processing in heavy industry
- Development of new drugs in the pharmaceutical industry
- Environmental monitoring and cleanup
- Genetic screening and DNA editing
Equally unrecognized may be the role that master’s-prepared data scientists play in each of those advancements as they contribute to almost every area of biotech research.
The Amalgamation of Data Science and Biotechnology
The application of data science in biotechnology is about far more than simply mining massive data sets. Data science is fundamentally about the extraction of knowledge from information; and it is the degree of knowledge that is the important part, not the volume of information.
The Emergence of Bioinformatics
According to a June, 2014 article in Science Magazine, the field of bioinformatics has evolved from being simply another tool in the researcher’s toolbox into a discipline in its own right. Analysts who once existed simply to deliver the reports that would help answer the questions that scientists and clinicians posed are now being called upon to help define the very questions that are being asked.
Data science is now its own focus of research within the sphere of biotechnology.
This new role demands a higher level of education for prospective data scientists working as bioinformaticians, as they progress from the role of mere technician into fully-fledged research scientists.
Gaining New Knowledge from the Relationship Between Data Sets
Some of the most exciting prospects in biotech may belong almost exclusively in the realm of data science. With so many massive arrays of data in so many specialized fields, some industry observers believe that the next major advances in life sciences may come from the ability to analyze the subtle links between those data sets. Tying environmental data to disease patterns, or disease patterns to drug research, or drug efficacy to dietary trends- these may ultimately prove to be even more beneficial in the long term than the most intensively focused analysis of DNA.
Rob Kitchin, a professor at the National University of Ireland argues that the coupling of big data with new analytic paradigms will ultimately change science from being knowledge-driven to being data-driven. In the process, relationships that scientists might not have even considered looking for may end up revealing themselves.
Kitchin is careful to note, however, that such a world will not simply emerge wholly formed from the deep ocean of data. An underlying knowledge of the methods of collection, storage, and algorithmic analysis applied to that data will be crucial for exploring meaning. That kind of knowledge can only come from data scientists with analytical skills that have been honed by earning an advanced degree in the field.
Mining the Future: From Flu Mapping to Biological Mediums for Data Storage
With this paradigm shift, it’s entirely possible that data scientists of the future won’t think of themselves as having a career in biotechnology. Instead, they may think of themselves as biotechnologists working in the field of data science.
The field is already finding new ways to explore healthcare trends from non-traditional data sets. Google’s Flu Trends was one basic effort. Although the basic functionality remains as part of a larger tool set, Flu Trends itself has since been discontinued. When it was available to the public, it was a service that made use of nothing more than geographically-mapped search phrase data to help track flu outbreaks around the world. And it was surprisingly accurate.
There is no shortage of intriguing possibilities on the biotech horizon, which are sure to be pioneered by data scientists in the coming years. One fascinating possibility is the use of DNA itself as a storage medium for digital data. A December 2015 piece in the New York Times described experiments performed at the University of Washington and the University of Illinois in which digital data was encoded into DNA molecules. When the technique is fully developed, a system capable of storing all digital information currently in existence would occupy only about nine liters of organic soup.
Whether or not your next hard drive is organic, such industry-shaking advances are going to become commonplace for data science graduates entering the biotechnology industry in the coming years. It is hard to imagine almost any development in biotechnology in the next few decades that will not in some way be driven by the work of these scientists.
Biotechnology’s Broad Playing Field for Data Science
Most people instantly associate the term data science with “Big Data:” vast, deep, and detailed sets of information generated by automated sensors, DNA sequencing equipment and years of in-depth research.
Within the many sub-disciplines of the biotech industry, the role of data science has been to find meaning within these massive troves of biological data – whether for the purpose of sequencing genomes, creating new combinations of pharmaceutical compounds, performing predictive diagnoses, environmental monitoring, or creating genetically modified seeds:
Genomics is the field that most immediately comes to mind when data science is mentioned in the same breath as biotechnology. There’s no question that the modern study of genomics could hardly exist without big data. It took 13 years and almost 3 billion dollars to sequence the first complete human genome at a cost of about a dollar per DNA base pair. Those 3 billion base pairs comprise a data set that will be studied for decades, if not centuries, representing perhaps the ultimate challenge in biotech data management and analysis.
The cost and time to sequence a genome have dropped dramatically; in 2016 a complete genome can be processed for around $1000, and technology is approaching the point where it could happen in a matter of hours.
Modern pharmaceutical research and development processes rely heavily on molecular data modeling based on massive libraries containing hundreds of millions of chemical compounds. This modeling represents a virtual experimentation designed by data scientists to quickly and cheaply screen most of the options before actual candidate drugs are synthesized for clinical trials.
The technique saves millions of dollars in development costs and speeds trials to get lifesaving drugs approved and on the market fast.
In healthcare, data scientists are being asked to help find solutions to the problems of securely storing the mountains of data associated with electronic medical records—and how to take advantage of that data to improve predictive diagnosis and evaluate courses of treatment.
In Germany, at the National Center for Tumor Diseases, data scientists have developed a tool to allow clinical staff to display and analyze patient information in real time from multiple patient data sources.
In environmental science, data scientists help design monitoring systems and then work to integrate and compare the data they generate against historical observations.
The vexing issue of testing and comparing information collected across decades – and sometimes centuries – using different methods with different degrees of sensitivity is a thorny problem that only the best educated data scientists can even hope to resolve.
In agriculture, data scientists are key to understanding the complex variables being manipulated in genetically modified crops. Continuing the fastidious work began by Gregor Mendel in the mid-1800s, agricultural data scientists study the traits expressed by genes over many hundreds of generations. They work with farmers to store and analyze soil and leaf samples looking for new ways to maximize crop production and minimize the impact farming has on the environment.