Understanding the Role of Tableau in Data Science

The skies over Dallas opened up in a rare spate of summer rain and the umpires called a rain delay at Globe Life Park, home of the Texas Rangers, before the game even got underway. Out at the ticket windows, enthusiasm for attending what promised to be a late and sopping ballgame dwindled and lines emptied.

Upstairs, in the front office, staff using a real-time data visualization tool noticed that the bars on ticket sales had fallen as precipitously as the rain: a ticket hadn’t been sold in more than fifteen minutes. But the ticket windows were still fully staffed- hourly employees twiddling their thumbs with nothing to do.

The front office quickly closed down three-quarters of the windows and sent the ticket-takers home that day. Then, once again, they gave thanks that they had found a powerful data analysis tool to allow them to make such decisions quickly and accurately.

The Rangers are perennial rivals of the Seattle Mariners in the AL West, having traded the division championship back and forth several times during fierce contests in the mid-90’s. But despite the intense competition, there is one thing about Seattle that the Rangers absolutely love: Tableau Software.

Nestled safely on the opposite end of downtown Seattle from Safeco Field, on the shores of urban Lake Union in the funky Fremont neighborhood, Tableau (that’s Tab, like the soft drink, and low, like you feel after drinking a Tab) makes the real-time data visualization tools that data scientists in all sectors have come to rely on.

Tableau’s tools are used by everyone from the United Way to the Army National Guard to visualize information quickly, easily, and interactively from a variety of previously impenetrable sources.

Tableau has literally changed the way people look at data- and in the process, boosted the field of data science almost single-handedly.

A Picture is Worth a Billion Numbers: Tableau Taps Into the Power of Vision

The fact that most people process data visually was well understood when three Stanford scientists got together to form Tableau more than a decade ago. Infographics were old hat by 2003. But most were produced by hand, after a data analyst scoured a variety of data sources for the numbers, and an artist put together a graphic representation. Excel and Access (a popular desktop database of the era) could produce basic types of charts, but the data sources were restricted and only relatively narrow comparisons could be made.

Folks who knew some R could produce some terrific graphics with tools built into that language, but considerable expertise was required. Python programmers were still grasping for the data visualization libraries that would later help that language surge in popularity among data scientists.

Tableau stepped in to build a tool that non-programmers could use, and which could pull data in from even the largest and most diverse sources. Tableau Data Extracts was designed to work out of the box with more than fifty different data sources, including:

  • Excel
  • Raw text files
  • Access
  • Hadoop
  • Amazon EMR
  • Microsoft SQL Server
  • Salesforce
  • Any ODBC-compliant database!

The real value to the product wasn’t the connections, but the interactive visualizations. All the numbers coming out of those data sources were morphed into representations any user could understand… and work with directly. For example, the Ranger’s ticket sales data could be mapped onto a picture of the stadium seating chart,  and managers could point, click, drag, and slide directly on the graphic to change the timeline and other factors that would help them better understand sales trends in relation to other cost and profit centers. And all of it could be built by a novice with zero programming experience.

On the surface it might seem like Tableau had turned data science into something that required less in the way of specialized expertise. However, stripping useful information from ever more intricately interrelated sets of data would continue to challenge data scientists and Tableau would become another tool in their growing arsenal.

With Tableau, You Don’t Have to Do It The Hard Way: Easier Interaction Assists Data Scientists

There is a strain of individual found in every profession who seems to insist that you’re not a real professional unless you do it the hard way… for instance, generating all your bar charts by hand-crafting Python code with pandas, or hacking out an R program to put together a scatterplot.

But business executives don’t care how hard you work as long as the data is accurate and delivered quickly. And in many cases, Tableau solutions provide exactly that without all the sweat and late-night coding marathons. It’s easy to explore underlying data and get a feel for it, or whip together quick answers to basic questions in a format that will be readily understood by laypersons. The VizQL language that powers the dashboards makes it easier to explore data on the fly versus designing a Python or R program for a specific goal. A quirk surfacing in the data can be immediately viewed and pursued with Tableau.

Like any specialized tool, it has limitations. Tableau excels at visualization, but not necessarily at the hardcore data analysis and statistical work that are the bread and butter of data scientists. For amateurs, Tableau may be their only option for looking at information. For a true data scientist, it’s just one tool in a larger toolbox. But the key to being a real professional is knowing when to reach for the right tool for the job at hand– you don’t take a chainsaw to a twig.

There is a Version of Tableau That is Right for You

For the uninitiated, the most challenging aspect of Tableau may be the six different basic packages it comes in.

  • Tableau Desktop – Can be installed on Windows or Mac, single-user version
  • Tableau Server – A powerful backend server that can deliver data to numerous Desktop or Mobile editions so users can share work and data sources
  • Tableau Online – A Cloud-hosted version of Tableau server
  • Tableau Mobile – A mobile app for on-the-go access to Server data or offline snapshots
  • Tableau Reader – A read-only tool that allows users to view and interpret visualizations authored by Desktop or other versions, but not create them
  • Tableau Public – Allows web-publishing of visualizations generated by other versions, live and interactive for any web user

Most data scientists will appreciate the horsepower of Tableau Desktop paired with the Online or Server versions of the software.

They’ll also be happy to find that, under the hood, Tableau can easily be integrated with raw R code, which can be used to power up Tableau’s interactive dashboard displays in ways that are difficult or impossible with that tool by itself.

It also works seamlessly with Hadoop and other sources of large, unstructured data that otherwise present a significant challenge to data scientists. Progressive Insurance, for instance, uses Tableau’s Hadoop integration to quickly sample subsets of the larger data store to model data that may reveal new insights or lead to further analysis with other tools.

Back to Top