“Dark data” is typically thought of as data that is collected, but that remains unused for anything more than its intended purpose. Gartner originally coined the term and defined it as, “the information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes (for example, analytics, business relationships and direct monetizing).”[i]
The bulk of that data is unstructured data. Unstructured, or raw, data includes raw text, text messages, emails (such as internal organizational emails), videos (such as surveillance footage), audio (such as call center recordings), image files, data from Internet of Things (IoT) sensors, and geographic (geolocation) data. Dark data can include structured data too, assuming the data is not being analyzed.
The definition of dark data doesn’t have to be restricted to unused data however. It can be expanded to include any data that is unfound or flat-out hidden. For example, data buried in the deep web, which is not collected by most companies and is hard to find. Deep-web data is data that isn’t indexed by typical search engines, and it can include data from academia, government agencies, user communities, and more. It can also include data on the dark web, which consists of sites only accessible through special means and that are largely untraceable.
Data that falls within this expanded definition of dark data used to be all but unusable due to its location, its sheer quantity, and the resources needed to find and analyze it. However, tools have evolved or have been created to allow organizations to analyze dark data along with big data. Deloitte coined a new term for this analysis, “dark analytics,” in its “Dark analytics: Illuminating opportunities hidden within unstructured data: Tech Trends 2017” report.[ii]
The Dark-Analytics Side of Big-Data Analytics
Deloitte describes dark analytics as the practice of bringing dark data into the light by analyzing it and using it, preferably alongside all the other data available to an organization. It can encompass analyzing data that an organization has or has access to, but isn’t currently collecting or analyzing. On a broader level, it can encompass analyzing the data in the deep web. It’s this more comprehensive collection of data and its analysis that Deloitte dubs dark analytics in its report.2 According to the report, dark analytics efforts can include:
- Unstructured data a company already has, but isn’t analyzing
- More obscure forms of unstructured data, such as audio, video, and image files, that require specialized tools or techniques to analyze, and so aren’t being analyzed currently
- Data in the deep web, which requires specialized tools, vendors, or techniques to collect and analyze
Big Versus Dark, or Just Big, Dark Data
The first item in Deloitte’s list of items that dark analytics includes is really a part of big-data analytics. Big-data analytics is the process of analyzing large amounts of structured and unstructured data with the goal of uncovering trends that inform better business decisions. The only difference between big-data analytics and dark analytics is the awareness, use, and availability of the data. Big data is known, collected, and analyzed. Dark data is unknown, uncollected, and unanalyzed. Once dark data is tapped for use, found, or collected, it can be analyzed as part of the standard big-data analysis process. The only requirements are that the company have the storage (or financial) resources to store or filter the dark data and the tools or techniques to analyze it.
The Tools and Techniques Used to Collect and Analyze Dark Data
Some of the tools and techniques not readily available to an organization for analyzing or finding dark data include video and sound analytics, computer vision, machine learning, and advanced pattern recognition. Some tools commonly in use do offer the ability to analyze dark data using some or all of these techniques, including Apache Hadoop, IBM Watson, SAP HANA 2, and Microsoft Cognitive Services in Microsoft Azure. SAP HANA 2 has predictive analytics capabilities. Microsoft Cognitive Services can capture data, such as customer identity or sentiment, that can be mined. Many companies, especially larger enterprises, have these tools already. Stanford University also created the open-source DeepDive solution, which can be used to extract value from dark data and place it in SQL tables that can be integrated with other databases for analysis.
Collecting data on the deep web is less straightforward. Specific tools are needed. Stanford University built a prototype engine called Hidden Web Exposer (HiWE), which scrapes the deep web for information, including information on search and response forms and electronic databases. Multiple commercial companies offer tools for harvesting deep web data, including Deep Web Technologies, BrightPlanet, and AGT, among others.
Why Dark Data Matters
We hear a lot about digital transformation today. As Deloitte stated in its article on dark analytics, “data is competitive currency.”2 It might be more precise to say that data is the competitive currency of digital transformation.
Why? Because consumer behavior has shifted dramatically in the last few decades. We just don’t shop like we used to. We don’t go to the mall. We go online to shop, to share, and to review and rate the products we buy. That shift has fundamentally changed how marketing operates. Traditional marketing is being replaced by digital marketing. And when a retailer, bank, or other online or brick-and-mortar provider gets a customer through the door, even if the door is virtual, they must make the most of the visit. They must tie customers’ behavior in the “store” to their overall shopping behaviors and locations, social-media activity, and more in order to effectively compete for an ever-overcrowded arena of online and offline options and easily available discounts. If retailers are exclusively online, outside of web analytics, they need a way to understand their consumers and see where they can compete.
Imagine a scenario where a customer comes into a store. The store uses dark-analytics techniques to analyze the customer’s sentiment from surveillance footage. The customer leaves 20 minutes later without making a purchase. Another customer comes in, his sentiment is analyzed, and he spends $200 before leaving the store 20 minutes later. Over time, the store can analyze this dark data and determine which customer sentiment drives the greatest purchases and which simply results in customers leaving empty-handed. The store can then use that data to determine strategies to focus on ideal customers, upsell the spenders to spend more, or shift the non-spenders to spend a little.
Going a bit deeper, the store could use facial-recognition technology to determine who the customer is and track any social-media comments that might be relevant to the shopping experience; the store could then market to the customer accordingly.
The possibilities are endless and not limited to retail or even strictly marketing or customer experiences—dark analytics has significant possibilities for internal company and HR purposes, such as determining factors resulting in attrition rates, employee satisfaction, and more.
Where to Begin with Dark Analytics?
Data is a science, hence the evolution of the data scientist as a career. Where to begin efforts with dark data depend on your organization. If you’re a large enterprise, you probably have dark data nailed, are well underway already, or can get there with some minor internal adjustments. If you’re a smaller organization, you might feel overwhelmed by the idea. Regardless of where your organization is today, here are just a few ideas to help get you thinking about how to approach the dark-data dimension:
- Determine what information will help you make better business decisions—internal and external; what questions do you want answers to?
- You could flip this and start with what your business pains and weaknesses are to then begin to identify how data insights can be used to help you begin to ease your pains or build strength.
- Audit all of your data sources, including all of the data you’re collecting now, the data you’re not collecting, the data you’re analyzing, and the data you’re not analyzing.
- Identify the best data sources to use to make better business decisions to solve your pains and build your strengths.
- Assign or hire dedicated resources for your data-analysis needs if you haven’t already.
- Audit your data-collection tools. How many different tools are you using? Can they be integrated or consolidated to facilitate analysis of all your data with a single tool and set of insights?
- Find the right tools and techniques; tools don’t have to be fancy, difficult, or expensive.
- Think big and start small—don’t try to do it all at once, especially if you’re newer to the process; pick one pain, one unanalyzed data source, and experiment, or do a larger plan and tackle the plan over time.
- Test and use key performance indicators to assess results.
- Commit; success won’t come overnight, you’ll need to test, assess, rethink, test, assess, and rethink again.
Learn More on This and Other Tech Topics
[ii] Deloitte University Press. “Dark analytics: Illuminating opportunities hidden within unstructured data: Tech Trends 2017.” February 2017. https://dupress.deloitte.com/dup-us-en/focus/tech-trends/2017/dark-data-analyzing-unstructured-data.html.