To Master Data Science, You Must First Master Its Component Fields

The birth of each new field of science represents a benchmark for human progress. In some cases, whole new fields are developed as a result of the skillful melding of other areas of science. In fact, it was the combination of geometry and cosmology thousands of years ago that informed the emerging field of navigation, facilitating the first tentative steps towards global trade and commerce, the circumnavigation of the globe, and along with it a whole new understanding of the shape and dimensions of Earth itself.

Within the last 10 years we’ve witnessed the coming of age of a new field in its own right: data science. An amalgamation of statistics, math and computer science, the field is poised to have as much an impact on the world as the various fields that comprise it.

We were fortunate enough to be able to sit down with Dr. Bhushan Kapoor, professor and chairman of the Department of Information Systems and Decision Sciences at Cal State University-Fullerton. Dr. Kapoor was instrumental in fostering the development of the data science programs now offered at Cal State, and brings years of experience to the conversation to help us gain a better understanding of how these different fields came together to create what we now know as data science.

It Takes a Targeted Education To Unlock the Full Potential of Data Science

Just as a thorough understanding of geometry and cosmology were required before they could be joined to form a new field in navigation, Dr. Kapoor explains that data science is really the product of four distinct fields of science coming together. Mastering the field of data science really involves mastery of the four niche areas that comprise it:

  • Databases and data warehousing
  • Computer programming
  • Statistics
  • Operation research modeling

Data science can be seen as the poetic culmination of a lot of shared human history. It incorporates statistics, which arose out of probability science – there is evidence of games of probability dating as far back as the Paleolithic Era of prehistory. And, of course, data science also takes full advantage of the latest developments in computing power and capabilities. New computer programs have grown from their predecessors to improve statistical modeling, and in the last 10 years we’ve revolutionized data generation, along with the methods and capabilities for collecting and storing massive troves of information.

To access the full potential of this field, it takes a deep understanding of how these component fields come together; something that can only come from studying data science at the graduate level. You only need to look as far as the curriculum that Dr. Kapoor designed for the data science program at Cal State for an example of this:

  • Databases and data warehousing– data integration, and data engineering
  • Computer programming– Oracle, Python, R, and A++
  • Statistics– Business statistics and applied statistics
  • Operation research modeling– Optimization modeling, projection modeling, predictive analytics, and classic modeling

Dr. Kapoor explains…

“For a student to be successful they really need to at least get an overview of all these four components and even have specialized in-depth knowledge in at least one or more of these four components.”

Graduate Curriculum is Built Around Industry Demands

Graduate programs are designed with the needs of industry in mind, so you can expect a quality master’s program to provide you with exactly the same skills employers are looking for. The following are qualification specs taken from real ads from the major companies that are recruiting talented data scientists just as fast as graduate schools can turn them out:

  • Google– Data Scientist/Quantitative Analyst: Minimum requirements include experience with software like R, S-Plus, SAS, Python, Julia, and MATLAB; a graduate degree in fields like computer science, statistics, or applied mathematics
  • Facebook– Data Scientist: Minimum requirements include experience with software like SQL, PHP, Python, Pearl, R, and SAS; an understanding of statistical concepts like regressions and hypothesis testing; a bachelor’s degree in a field like statistics, computer science, engineering, physics, or math
  • Amazon– Data Scientist: Minimum requirements include experience with machine learning; software programs like Perl and Python; at least a master’s degree in a relevant field
  • Twitter– Data Scientist: Minimum requirements include experience building data-based models, a deep understanding of data platforms, a history of raw data analysis, and a good comprehension of Python or R

What should pop right out at you is how the major components are showing up as the basic individual qualifications that all top employers expect job candidates to have.

You need to know how to create and organize databases so you can use programming languages like Python and SQL to extrapolate data sets… You need to know how to create statistical models based on your organized data with modeling software like R and SAS… And you need a foundation in statistics so you can write the algorithms that capture your target data.

Dr. Kapoor explains how coming from a related field makes it especially easy to become data-science-proficient…

“The people who come from a physics or math background have the advantage in the sense that they already have some skills in some components of data science. Most of them are pretty strong in statistics, and they may also know something about the optimization aspect of it too. And some students who come from computer science may be pretty good in databases. They may be good in programming, so they may already bring in a skills set with them.”

Getting the Skills You Need

As businesses in every industry scramble to put data to use in a way that is smarter and more profitable, it’s becoming increasingly common for employers to cover the cost of upgrading the capability of existing human capitol. This means everyone in the organization from supply chain and logistics managers to accountants and marketing staff are getting post-bac certs, graduate certificates, or full-on master’s degrees in data science, often at no cost to them.

If you’re not in the situation of having an employer fund your education and you don’t want to enroll in a full-fledged data science degree program, you’ll still find plenty of certification programs out there that are pretty accessible and affordable.

Nowadays you can even find many free lectures and courses online that relate to specific components of data science. MOOCs – massive open online courses – are also very popular.

Dr. Kapoor emphasizes that to be successful in data science you must be well versed in statistics. He also recommends Microsoft SQL Server certification to introduce you to databases, and courses that teach programming languages like Python and R…

“Databases and programming are foundations for being a data scientist. So all data scientists will need to be exposed to certain areas, and one of them would be databases. The other one would be programming. If you don’t have a background in those and you want to learn a little bit more about it, it would be beneficial to take some courses in databases…. and certainly in programming as well.”