The average American spends 2 hours each day social networking. That’s two hours worth of clicks, views, likes, shares and comments that all go into massive databases to be culled for further analysis- to better understand behaviors and improve the user experience, to create marketing profiles, to gather user census data, and a lot more.
Not only do people spend a lot of time on social media, they seem to be comfortable revealing a lot of personal information. Drunken weekends and parties, ditching work, relationship status updates, bragging about indiscretions- some of the most intimate and embarrassing details about our lives end up on social media networks for everybody you went to high school with to see.
- SMU - Master of Science in Data Science - Bachelor's Degree Required.
- Syracuse University - M.S. in Applied Data Science: GRE Waivers available
- UC Berkeley - Master of Information and Data Science Online - Bachelor's Degree Required.
- Syracuse University - Master of Information Management Online
But perhaps just as personal, and even less-often considered, is the information that social media companies can glean from how users interact with the social media platform itself. Information like:
- What times and locations you access their system
- Who you interact with
- The type of devices you use
- What sort of media you view and how long you look at it
Data scientists are learning to tell surprisingly accurate stories about people from that accumulation of information. There’s even speculation that your “Like” pattern on Facebook could give an astute data scientist a fair approximation of your IQ.
A master’s-educated data scientist can not only weave all of that together for a personal portrait of an individual, but can combine many such profiles to reveal a gestalt of the modern world: a living, breathing perspective on the concerns, complaints, and obsessions of their audience.
That’s a powerful brew of data for everyone from online retailers to politicians. And social media companies control the tap.
Reading the Tea Leaves: Extrapolating Personal Information from Seemingly Unrelated Data
The modern phenomena of social media is built almost entirely on a digital foundation. This makes it easy to get numbers based on the actions and interactions of individuals online. But it takes a master’s-educated data scientist to figure out how to tie these data points together in ways that are useful, interesting and profitable.
In a 2014 interview with Christian Rudder, co-founder of online dating site OkCupid, he revealed that profile data allowed the company to establish – with some 60 percent certainty – whether or not the customer’s parents had been divorced before the customer was 21. Although the questions OKCupid uses to establish a profile don’t explicitly ask for this information, the pattern of responses from apparently unrelated questions was such that the inference could be drawn accurately.
Rudder believes that many personal traits have similar reveals. He also points out that data scientists at social media companies have just begun to scratch the surface. Predictive algorithms are likely to improve over time. And, though the mountain of data available to social networks may seem massive today, he points out that the endless deluge of data that is coming will dwarf what is currently available as more and more of our interactions move online.
With all that information at their fingertips, what will data scientists be able to learn about users as the ability to digest endless troves of data advances?
Social Networks Aren’t Just Passively Gathering Information Anymore
Data scientists and researchers at social media companies benefit from having direct access to the code base and the ability to update it frequently for their own purposes. They are not limited to piggy-backing onto existing processes and hacking together instrumentation to imbue meaning—if they want a piece of data collected regularly, they can create code to collect it.
A/B tests, in which different users are presented with slightly different versions of pages as a way to gauge reaction, are trivial to set up. And, increasingly, machine learning that uses algorithms to improve data collection and investigation feature prominently in data analysis at social media companies.
So simple curiosity has expanded data science at social media companies beyond marketing and into peripheral fields like economics and identity.
This depth of insight and control makes some people uneasy.
Facebook Did What?! … Social Media and the Implications of Social Experimentation
In 2014 it was discovered that Facebook had been intentionally manipulating the news feeds it presented to different users expressly to study their emotional reactions.
Scientists from Cornell and the University of California-San Francisco worked in conjunction with Facebook to alter the text snippets posted in the news feeds of more than 700,000 users. Some were altered to display more positive words; others to show more negative words. Subsequent posts from those user groups were then analyzed to determine whether or not the posts they read had altered their emotions in line with the direction the news feed had been manipulated.
In itself, this is a fascinating use of data science in social experimentation, but it was also an ethical breach that left users infuriated. Technically, it was fully in compliance with the company’s own privacy and fair use policies, but that simply made matters worse in the mind of some critics: the organization appeared to be legitimizing its right to arbitrarily mess with the heads of its users.
The future is sure to reveal more such quandaries about what personal privacy means in a data-rich environment controlled by a private company. Data scientists and ethicists will have to work together carefully to develop new standards and practices for responsible research.
One thing is certain. The scope of the revelations to be found in mining social network data is too valuable to abandon. Data science has established itself as inseparable from social media processes.
Social Data Analytics Predates Even Dinosaurs Like Myspace
Social networks have been on the minds of researchers long before the Internet came along. Sociologists were looking into the ways that people were tied together through their daily contacts and personal ties way back in the 20th century.
By the 1970’s, practical applications for the research were being found as the terrorism threat began to rear its head. Military and criminal justice analysts used social network theory to track terror cells and identify high-value targets.
But it took the Internet to make this sort of analysis practical for private concerns since the military and police had access to records – and the resources and capabilities to analyze them – that the private sector did not. That is, until users started making their personal information available wholesale through social networks.
Online dating sites might have been the first widespread efforts at social networking, but the concept quickly blossomed into more generic sharing sites like Friendster and Myspace.
All those sites soon accumulated a wealth of data about their users, simply in the course of connecting them to one another. But it might have been Facebook that really understood the implications of cleaning, organizing and interpreting all that data.
Although the company was leery of diving immediately into selling advertising, founder Mark Zuckerberg realized from the beginning that his company would be sitting on a treasure trove of data that advertisers would be salivating over. But for the same reason, the information was vital to Facebook itself as a means of measuring its own service and performance and catering to users.
So Facebook, Twitter, Instagram, and other prominent social media sites dove into Big Data almost immediately for reasons identical to those of any other retailer: the better they understand their audience, the more appealing they can make their product.