Dr. Bhushan Kapoor has been a professor with the Department of Information Systems and Decision Sciences (ISDS) as part of the Mihaylo College of Business and Economics at California State University-Fullerton since 1982, and has served as department chair since 2010.
Throughout his tenure he’s been the recipient of seven awards and three grants from his university and college. His credits include several chapters in books that have become standard reading in the field of data science, in addition to some 30 articles published in peer-reviewed journals. And when he isn’t contributing new literature and teaching at Cal State, Dr. Kapoor can be found presenting his research at conferences throughout the world.
As the ISDS Department Chair at Cal State Fullerton, Professor Kapoor has been responsible for developing new undergraduate and graduate courses, programs, concentrations, collaborations, and emphases that relate to data science.
He recently sat down with us for an interview to share his experiences in this field.
Q: In your student days you earned your bachelor’s and master’s degrees in mathematics, and went on to a PhD in economics. How did this lead to where you are today?
Professor Kapoor: While I was studying as a student of mathematics and then later on economics, I took a variety of courses and got a variety of intersecting skills including applied mathematics, programming, statistics, and econometrics. And that prepared me to be at California University Fullerton, which offers several programs that require these kind of skills.
Q: Can you talk a little about the data science programs and collaborations you’ve initiated and contributed to at your own university?
Professor Kapoor: At the undergrad level we offer several concentrations including information systems, decision science, and business analytics. Business analytics is like data science applied to business.
Besides these concentrations we also collaborate with other departments in the business college, such as accounting and marketing, and not economics. In collaborating with accounting we created a program – a concentration – called accounting analytics, which is really data science applied to accounting problems and accounting data.
Then we also collaborated with the marketing department and we created a program called marketing analytics, which is really data science applied to marketing problems and marketing data.
And now we’re also considering creating a joint concentration with the economics department, and that would be business and economics analytics, which is again like data science applied to economic data and economic problems.
So when we create these joint programs we’re not just taking a few courses from each department. We sometimes create brand new courses which would have overlapping material. For example, we created a new course in marketing analytics and we also created a new course in social media analytics for the marketing analytics concentration program.
So these cross-listed courses are very unique in the sense that two professors – one from each department – jointly teach that particular course. So the students are able to get dual perspectives in that particular course. They get to know what the marketing department professor’s view is on that, and the view of the ISDS [Information Systems and Decision Sciences] professor is on that particular course. So I really love that arrangement.
Besides the undergrad programs we also offer several grad programs. We have two master’s degrees we offer in the department. One is an MS in Information Systems. Another is an MS in Information Technology.
The MS in Information Systems is a really large program. We have about 200 students in this particular program and we offer three concentrations: a concentration in information systems, a concentration in decision sciences, and the third concentration we offer is in business analytics, which as I said is really data science applied to business problems and business data.
The other master’s program we offer in the department is an MS in Information Technology. This is a program in about the 14th year. It’s cohort-based and a secure online program, and just recently we created a new concentration in data science. The other concentration is IT management.
I must add that this is a highly ranked program. We were ranked number one in California by the US News and World Report, and number nine in the country. Last year we were ranked number two in California. The only other school who was ranked above us was USC [University of Southern California]. And this year we beat them and we became number one – I feel proud to say that.
Besides these two master’s programs we also offer some concentrations in the MBA, and those are an MBA concentration in information systems and an MBA concentration in business analytics.
Q: How would you sum up the different components of the data science field, and how can students be successful in this field?
Professor Kapoor: Data science is a very broad field that consists of, I would say, four major components.
One major component is the data. We teach databases, data warehousing, data integration; the people who specialize in this are called data engineers. Another component of data science is programming. We teach Oracle programming, Python, system analysis and design; that’s the second important major component of data science. The third component of data science is more towards statistics – business statistics and applied statistics. And the fourth component is operation research science, where we teach programming, we teach optimization modeling; things like that.
So for a student to be successful they really need to at least get an overview of all these four components and even have specialized in-depth knowledge in at least one or more of these four components.
The other important thing which many people overlook is the domain knowledge in which these skills will be used. So a person interested in marketing – if a student comes to me and asks me what’s the best way to do it – I would say that marketing knowledge would be helpful to that student, and then the student can gain additional knowledge in data science and learn how to apply the data science techniques to marketing data. So that person would be extremely productive and useful for marketing-type jobs.
Q: In 2017 The Economist published an article detailing how the data industry is now worth more than the oil industry. How did this field go from non-existent to being worth more than the oil industry?
Professor Kapoor: I think it’s the technology that has played a big part in it. Data science starts with collecting data and storing them. Google and Amazon gave us the fantastic means to store data in a way that it can be accessed very fast. You can parse it. You can analyze it.
These techniques that these few companies gave us opened up all the possibilities for being able to take large volumes of data – even if it is scattered in different places – and be able to retrieve the information that you need from all different locations. You have the information – you parse that information that you need – and now you’re able to analyze that very quickly because the processing power is very fast now.
Then we can apply those techniques to self-driving cars, or for example, personalized medicine; in fact, everywhere. Human resources, transportation, marketing; in all different areas. Wherever there is data these techniques can be applied to the advantage of those companies.
Q: 10 years from now, how is data science going to have changed peoples’ lives?
Professor Kapoor: The things that really interest me, I find, one of them is the self-driving car. I would say that maybe within the next 10 years you will see that completely self-automated cars are everywhere. And I say 10 years because of regulations. If there were no regulations it would probably be even sooner than 10 years that you would see them everywhere.
This will have a huge impact in the sense that people will not have to own their cars. We will not need to have parking lots. There is a humongous impact when you see that everybody is using self-driving cars that can talk to one another. This is an important and major change we’ll see in our lifetime.
The other thing I see which impacts everybody is personalized medicine; DNA synthesis being cheaper, and being able to do it much more easily. With data science they’ll be able to project what kind of disease each individual is prone to, that they might get. They can find a solution to those things and people can have a healthy life. So this is another thing I see.
Of course there will be robotics, artificial intelligence – we’re already there – and we’ll see more and more. They might disrupt our jobs but at the same time a lot of things may become easier. And we may be able to take advantage of that by increasing productivity and making our lives better.
Q: What does data science applied to healthcare look like?
Professor Kapoor: As the healthcare industry collects more and more data…so when you go to a hospital and they look at what your DNA sequence is, and they look at the disease that you have, then they will give you medicine and see what impact that medicine has on that disease. When you have all that information together and you have billions of records, you’re able to parse together and see which particular gene may or could have your certain disease, and then we see what kind of medicines have worked and not worked.
So that’s where the big data comes in. When you have granular-level data on patients – on their DNA – and on the history of the disease they have, and the medicines they’ve taken, and their whole lifestyle. You can instantly put all that together and that’s where we can do wonders. Doctors may be able to tell you, you’re likely to get such-and-such a disease in the future with your particular DNA, and pre-select a cure for that.
Q: You have six Microsoft certifications. Would you recommend industry certifications as something that is still relevant for data science professionals today?
Professor Kapoor: I don’t know of many certifications that are directly related to data science. But I would say that certifications in databases and programming in general would be very useful. So to be specific, for example, a Microsoft SQL Server certification would be a very useful thing to have for students.
Databases and programing are foundations for being a data scientist. So all data scientists will need to be exposed to certain areas, and one of them would be databases. The other one would be programing.
If you don’t have a background in those and you want to learn a little bit more about it, it would be beneficial to you to take some courses in databases and then certifications. And certainly in programing as well.
Q: As a department head and professor, what are some of the biggest challenges you see data science students facing, and what advice would you give them?
Professor Kapoor: You know, creating models is easy. Taking data and using a projection model or predictive analytics or classic modeling is easy. What is challenging and what is difficult today is for the students to relate that model to the problems or data they’re working on. To be able to know the basic statistics behind it – that’s where the students struggle.
Many times they – their models, which have their own assumptions – they don’t know much about those assumptions, or know enough details about the assumptions. So they just go by what the model says. They know how to apply the model – the model gives them the results – but it’s still not very clear to them whether or not a particular model is applicable to a problem.
So that’s a big challenge for them. So I suggest that they should definitely take as many classes as they can in statistics. If they have a strong background in statistics they will do well in data science. So it’s my firm suggestion to them to go ahead and take a lot of classes in statistics. Either [campus] classes, or online classes if they don’t have access to a university class. There are a lot of good professors there. They’re specialized and they can take those classes. And once they are strong in statistics they will do well in data science.
Q: You’ve written articles and taught courses about privacy. Can data science and privacy co-exist? Are we just waiting for a Pearl Harbor-type of event to happen for the data science industry?
Professor Kapoor: In general there’s a trade off between the two. If you want the benefits of data science you have to let go of some of your privacy. Having said that, there has to be strong regulations and oversight by government agencies over what companies that collect data can and can’t do with it.
Companies like Facebook and Google – there has to be oversight and there have to be regulations. It should be clear to them what is acceptable and what is not acceptable.
There could also be times when it’s not their intention to reveal private information; there could be threats from outside. And that’s where I do worry. These threats could be from sources that are very powerful – as powerful as a foreign government. So we have to be extra vigilant about this. It doesn’t mean that we have to stop data science, but we do have to be more vigilant and put more resources into making sure that data is secure and private, and that we have the trust of the people from whom we collect the data.
Q: Are concerns about privacy a significant threat to the data science field? Perhaps too much regulation is also a threat?
Professor Kapoor: We don’t know. The jury has not decided on one or the other – too much or too little – or if we have the right amount of regulations in this area.
We don’t want to make it like a utility company where we strangle them with too many regulations. And at the same time we can’t let them be totally scot free to do whatever they want to do. So it has to be somewhere in the middle where there are regulations that protect the privacy of people, but at the same time people, companies, and organizations are able to get the benefits of using data science for the benefit of the people.
So that, I would say, is the trade off between the two.
Q: What would you recommend for undergraduates going into data science today? Any specific niches or subfields?
Professor Kapoor: Personally I like the healthcare field, the application of data science in healthcare analytics, because it helps everybody in their lives; it has a lot more impact.
But here’s the beauty of data science: it can be applied in any field. Wherever there is data, we can use data science to solve problems. So if a student comes to me and says, “I like marketing,” I would suggest they go into marketing – learn about the domain knowledge – then take some courses or a program in data science and stay in marketing.
And that applies to several other fields, like in manufacturing. I would suggest students take courses in supply chain analytics, e-commerce, and retail analytics. So depending upon their interests and background, students can in fact stay in their own area and still do well in that particular field. That’s the beauty of data science.
Q: Would you also say the current debate about our own government collecting data on its own citizens should also be left up to the citizens to determine how much privacy we want to trade for the potential benefit?
Professor Kapoor: Exactly. For example, if we go to stores and we make transactions they are taking our pictures. So we are losing our privacy. But people accept it. For the sake of their safety, people are okay with letting go some of their privacy. So it really depends on people: what kind of balance they want; which side of the spectrum they want.
If they want more privacy they will have to let go of some of the benefits. If they want more benefits they have to let go of some of their privacy. So it’s really up to the people where exactly they want it.