What is Data Science?

Our lives are measured, mapped, and recorded digitally—quite literally from birth to death. Whether we post something to social media, check the forecast on our phone, undergo a medical procedure, enter a search term in Google, or shop for weekly groceries, digital bits catalog our actions. It’s a constant, never-ending collection of data, generated by and about human behavior. And this just scratches the surface. Imagine the churning oceans of data that come from smart oilfields equipped with sensors that produce real-time data on everything from well head production to drill rig performance, the thousands of satellites feeding information back to their terrestrial handlers on everything from changing environmental conditions to the positions of enemy combatants in foreign theaters, the DNA sequences of every living thing we find relevant enough to explore …

The result is a vast universe of data growing at a rate of 2.5 quintillion bytes every day in the US alone. And all that data is collected with the ultimate goal of being organized, analyzed and assigned a purpose.

Big data, is our new currency, and data science deals with how we tap into it. Data science is about digging into the furthest reaches of data, organizing it and making sense of it in the process, and then applying what we learn in groundbreaking ways.

As we collect, collate and curate an ever-growing library of data, the application of data science allows us to gain a better understanding of everything – from human behavior and market trends, to genomics and biological systems, to the heretofore unseen particles responsible for atomic mass.

Breaking it Down: Understanding the Data Science Process, from the Ground Up

Cost-effective data storage and scalable processing have given organizations the ability to acquire, store, and process large volumes of data. But then what? The collection of all that data is of little to no use without the capacity to extract insights from it—insights that can be put into motion to achieve commercial, social, scientific and environmental goals.

Data science, at its core, is the process of turning data into action. Data scientists use a number of tools, processes, and technologies to turn big data into unique insights, which are then used by decision makers to initiate change, whether in a business, social, environmental or scientific context. In other words, the insights gained from data science result in actionable information based on predictive models of what could be rather than just a look at what has already happened in the past.

Whether in the public or private sector, organizations build their data science capabilities over time. The data science evolutionary process begins with an organization overrun by data and searching for answers to vexing questions; questions that may be extremely narrow in their scope, or general enough to incorporate data from seemingly disparate sources.

How can we improve our manufacturing process to produce a better product? … How can we better predict future disease outbreaks using the real-time exchange of clinical health information? … Can we create a model to determine an accurate readmission probability for congestive heart failure patients?

Data science is always a multi-step process that leverages big data and the tools and processes by wich it is cleaned, organized, and given accessible meaning through visual representation.

The data science process is broken down into a series of stages:

  1. Acquire: Obtain the data
  2. Prepare: Manipulate the data to fit analytic needs
  3. Analyze: Explore the data
  4. Act: Turn the data into actions

Within the data science process, a number of sub-steps exist, which include:

  • Defining the business outcome and ensuring the modeling output is practical and actionable from a business perspective
  • Assessing the currently available data and the volume of data required to develop the model (data mining)
  • Selecting the appropriate development tools or technologies depending on the volume, velocity, and variety of data
  • Acquiring data and identifying sources
  • Identifying and remediating data quality issues

Once the data science process has been satisfied, data scientists may choose to:

  • Publish or share the results with colleagues for peer review
  • Embed the model into a report or dashboard within the organization to make business decisions
  • Deploy the model into production

The Value of Data Science

Data scientists now have the ability to link seemingly unrelated datasets that may not have a relevant connection that is obvious upon initial investigation. This allows them to generate even more insights from their data assets. Data scientists have also adopted plenty of creative approaches to visualizing the data in a way that allows it be useful in making strategic decisions.

Data science profoundly influences everything from business decisions to national security to what consumer products we buy. It impacts retail markets, solves public health dilemmas, and even seeks solutions to the causal factors behind social unrest.

Its value in dollars cannot yet be fully quantified. It is estimated that as data science continues to evolve in just the healthcare sector it is not far from saving the United States some $300 billion annually (McKinsey Global Institute, 2015).

What makes data science so valuable is that its reach knows no bounds.

Just a few examples of data science in motion include:

  • A number of companies like Netflix and Amazon utilize “recommendation engines” to make watch-next suggestions based on the prior interests of their customers.
  • Retailers use algorithms based on big data to track the purchase habits of customers and then offer special discounts and coupons to those patrons.
  • Credit card companies use data mining to evaluate the risk of default among customers by examining their purchase habits.
  • Police departments turn to data science to predict where and when crimes are most likely to occur and then allocate their resources accordingly.
  • Public health agencies use data science to find associations between air quality and health, which allows them to recommend policy changes.
  • Data science enables researchers to identify genes that rose to prominence in the course of human evolution, which leads to vaccines and other medical breakthroughs.

By all estimates, we are in the midst of a data revolution. Sure, the quantity of data is certainly revolutionary, but what makes this time in history so amazing is that we can do something extraordinary with the data, thanks to improved statistical and computational methods.

In other words, data itself isn’t relevant, actionable, or interesting, but when we are able to dig into this ever-expanding treasure trove of information and make sense of it, the ability to improve lives is nearly immeasurable.

Back to Top