JCM: Let’s start with the basics: just what is data analytics?
DJ: Data analytics, at its most basic level, is simply about organizing data in order to glean insights from them.
JCM: Okay, but what kinds of insights are we talking about? Or, more generally, what kinds of problems do people use data analytics to solve?
DJ: So, the questions one can ask of data fall into three large buckets.
First, we have questions about the structure of existing data. For example, “Who exactly is buying my product?” But also, more generally, “What patterns or correlations can we find in existing data?”
Second, we have questions about cause and effect, or scientific questions. For example, “Does the use of my product increase business performance for my customers?”
And finally, we have questions about likely future outcomes based on the data, which fall into the category of predictive analytics. “Based on the data I have, which customers are likely to buy which of my products in the next six months?”
JCM: Is there a particular set of tools or methods you use in data analytics to answer these different types questions?
DJ: Yes and no. Analytics draws upon a broad toolset for teasing the answers to these questions from the data. But not all tools are equally efficient—or desirable—in finding answers to particular questions. Think of an archeological dig: some parts of the job call for an earth-mover, such as removing top layers of soil, debris, or even structures to get closer to desired strata at scale, whereas some parts call for a shovel, some a hand spade, and some, final parts demand a toothbrush.
JCM: So let’s say you want to start developing your data-analytics skills from scratch. Would you need to learn programming? Or can you start with SQL queries and applications like Microsoft Excel and Tableau?
DJ: You could always start to learn data analytics within SQL, Excel, and Tableau. But those tools would limit you to a fairly restricted set of data against which to ask the three questions I mentioned before. The data would have to be relatively clean, structured data without too many variables, which are also known as dimensions.
JCM: Okay, let’s say you add the ability to analyze large datasets with a programming language like R or Python. What else will you be able to do with data then?
DJ: If you add some big-data programming skills, you can tackle a broader set of data. Through programming, you can help reduce the dimensionality of data, which might have hundreds or even thousands of variables, to find only the most relevant few you need to get desired insights.
Better programming chops also opens the door to dealing with additional data types, such as unstructured data or streaming data.
JCM: You mentioned big-data programming skills. What’s another skill set that would open up more capabilities and possibilities?
DJ: Machine learning skills, for sure.
JCM: And what else can you do with machine learning skills?
DJ: Machine learning (ML) skills let you make more sophisticated extrapolations and classifications from the data. For example, with Excel you can do only statistical regression, but ML makes available more sophisticated tools like logistic regression, decision trees, and support-vector machines.
I should add, by the way, that ML doesn’t necessarily require coding; open-source tools, like Weka, have a pretty good GUI, as do commercial tools like TIBCO Spotfire.
JCM: Okay, that’s interesting. So can you tell me one more skill set that can open additional possibilities in data analytics?
DJ: Deep learning would be the next step or skill set beyond machine learning.
JCM: Deep learning? How is that different from machine learning?
DJ: Deep learning (DL) goes a little beyond traditional ML in that it supports unsupervised learning. To explain that a little more, traditional ML requires you to identify the important features in the data on which to train the ML algorithm to make extrapolations/classifications. DL, on the other hand, uses unsupervised learning, which means that the algorithm itself goes through countless iterations to identify the pertinent features to use without human direction.
JCM: So to sum up, it sounds like you can do a lot of data analysis without programming, as long as that data is well-structured. But to really dig into the massive amount of unstructured data out there in the wild, or to perform unsupervised learning of data, you really need programming skills.
DJ: That’s accurate. If people without programming skills are interested in exploring the field, they could always start by learning how to analyze structured data through applications and SQL queries. Then a lot of those concepts and skills they learn along the way will serve as a foundation to richer analysis through programming, if they choose to pursue that later.
JCM: Very good. Thanks for your time, David.
DJ: You’re welcome!