Preparing for a Data Science Career in the Video Game Industry with a Master’s Degree

In the summer of 2012, Candy Crush Saga had taken over mobile gaming. The blogosphere and industry publications alike called it the most addictive game since Angry Birds. By the end of 2013, the simple puzzle game would be installed on half a billion devices. Developer King Software wasted no time deploying the game to other platforms, and in 2016 was acquired by industry behemoth Activision– mostly on the strength of the success of Candy Crush Saga.

Globally, PC and console gaming represent a $100 billion industry. On average, U.S. residents spend around a half hour a day playing electronic games. And with so many games either available only online or distributed with electronic tethers back to their manufacturer, the industry has been able to collect an awful lot of data. Electronic Arts alone collects 50 terabytes a day from in-game telemetry.

Tracking how people play is one thing, but analyzing that mountain of data for actionable trends and design guidance is a job for only the most talented data scientists.

The Science of Compulsive Game Play

What makes Candy Crush Saga so addictive? In part, it’s data scientists combing through replay and retention data. According to an article in the March 2015 issue of Computing Magazine, data scientists at King Software are integral to level design tweaks that keep players coming back for more:

Changes are made in certain aspects of game design … monitoring algorithms (courtesy of King’s data scientists) record information on how long players take to complete the new level, and whether or not they opt to replay it … Through judicious A/B testing and data analysis, the game edges further and further into compulsive playing territory.

King isn’t alone; Zynga, Kabam, and Valve software have all leveraged in-game instrumentation to help guide game design. As more and more games today are played on mobile devices or on consoles and computers with constant Internet connections, it has become possible and purposeful for game developers to track every single move a player makes in a game.

The application of data science to gaming goes far beyond improving playability. Data scientists in the game industry are also expected to:

  • Analyze inter-player dynamics in co-op and head-to-head multiplayer titles to ensure a challenging but not futile game experience
  • Look for “hot spots” in games where players frequently give up
  • Detect fraud in multiplayer matches
  • Identify and market to high-spending consumers for in-game purchases

Fun is the Hardest Metric to Measure for Game Designers

One of the challenges data scientists in the game industry face is in making the findings of their analyses accessible and applicable to the challenges the game developers face. A May 2014 editorial in game design consultant Nils Pihl’s influential game industry blog “Gama Sutra,” laments the degree of translation required between data scientists and developers. Ultimately, developers and data scientists are both charged with achieving that very hard-to-quantify goal of making a game “fun.”

Since fun is a metric that no instrumentation can measure directly, it’s up to data scientists to approximate this information from data that can be collected, such as playtime, replays, and abandoned game sessions.

Using Data to Engineer the Right Balance of Challenge and Reward

As with Candy Crush Saga, data scientists are increasingly important in game level design. Levels afford players a measure of progress as they move through challenges. Levels may encompass particular scenarios and even contribute to the game’s storyline in important ways.

Once, game level architecture and difficulty was the exclusive province of the designer. Often working alone, designers used instinct and experience to engineer an experience that they imagined to be both challenging and rewarding, while easy enough to be accomplished. Designers would use little more than their wits and their gut to scale this so that the game would become progressively harder and more rewarding as players progressed through the levels.

Over time, as the game industry grew and development teams became more specialized, small teams of play testers were included in the loop to offer feedback. However, what developers found was that even as they increased the number of play testers, the sample was still far too small to be representative of the average player experience over millions and millions of users from every corner of the globe.

With the telemetry engineered into games, data scientists can offer important insights based on the actual gameplay of real users. Heat maps and other visual representations of gameplay can give level designers fast, accurate feedback about where players tend to get stuck or what parts of maps may be too easy to complete.

Collating Data to Establish Player Archetypes

With many large developers rolling out unified platforms like Xbox Live,, and Steam, they are presented with unprecedented opportunities to examine player behavior not only within a single game environment, but across multiple games, and even in social interactions outside of the games themselves. Fairly complex profiles can be built around individual gamers. These profiles can be aggregated to create plausible archetypes.

Establishing an understanding of these player types gives game developers a more specific basis for future game design and marketing strategies, while also allowing them to market existing games to customers who are likely to enjoy them.

The Economics of the In-Game Marketplace

Data science also has an important role in free-to-play (F2P) games. These games allow anyone to install and play free of charge, making their money by offering in-game sales of upgrades or virtual gear. Essentially, an entire in-game market has to be constructed, and its economics analyzed and configured properly in order to support development expenses.

Runescape, a F2P MMORPG (Massively Multiplayer Online Role-Playing Game), developed an in-game quest recommendation engine based on player data analysis. The recommendation engine was designed to steer players to content that would be most engaging for their playing pattern, and most likely to generate sales.

Massively Multiplayer Also Means Massively Data-Dependent

MMORPGs like Runescape generate more than just player data; the huge, immersive worlds are, in fact, made of data, with millions of quests, character attributes, area parameters, and spell and weapon characteristics to store.

Storing that information would be a challenge in any environment, but for gaming, performance is key: a business user might be content to sit and wait a couple of seconds while a query is returned from the server with last year’s revenue numbers, but a gamer in mid-sword swing is going to be through the roof if the game stutters while a database is queried for his “chances to hit” percentage.

Data scientists are responsible for designing the architecture that enables millisecond response times from backend game databases– and then for keeping it all under wraps. In the highly competitive world of MMORPGs, such information is a highly guarded trade secret. Only minor bits and pieces have slipped out over the years about the tools and techniques used to build high-performance backend database for games like World of Warcraft, but those tidbits hint at serious work being done by master’s-educated data scientists.

Back to Top