What, data has a temperature? Yes it does. Data can be hot, warm, or cold; the temperature of data is a basic classification of how often data is accessed, with hot data being accessed most frequently and cold data accessed least frequently.
Knowing the temperature of data can make a huge difference to an organization as it plans or refines its data-management strategy. If that strategy includes making use of big-data insights in real time, an organization is likely looking at—or has already moved to—SAP HANA. SAP HANA, as described in my post, “Can Companies Realize the Benefits of SAP HANA Without the Typical Migration Concerns?” from August 24, 2016, “… moves [data] items off the disk and into random access memory (RAM), which speeds up data queries by reducing latency. … The significantly increased processing speeds compared to a traditional disk-based platform enable real-time data analysis.” That speed and access to real-time data can allow an organization to accomplish amazing things; but it can also take a lot of time and require a significant investment to get there—that might keep some companies from taking advantage of SAP HANA as soon as they’d like.
This is where data temperature and multi-temperate data management can save the day. An organization doesn’t need real-time access to all of its data. It only needs real-time access to hot data, which might be only a small portion of its overall data pool. That means a large portion of an organization’s data pool—its warm and cold data—doesn’t have to be stored in-memory in the SAP HANA database. The organization can use SAP HANA Dynamic Tiering—an add-on to SAP HANA—instead and move warm data from SAP HANA’s in-memory database to extended storage. This move can reduce the size of the SAP HANA database, which can lower hardware, maintenance, and licensing costs, while preserving the ability for all of the data to still be analyzed in SAP HANA.
You’re Getting Warmer
Customer invoices are a good example of data that might be hot one minute, lukewarm in a few months, and cold next year. That migration to the ice age might look something like this:
- Hot data: Customer invoices from the past six months
- Warm data: Customer invoice data from six months to two years old
- Cold data: Customer invoice data that is more than two years old
An organization might actively query newer customer invoices (those that are hot) for sales-trends data, drip-marketing campaigns, customer-experience research, rewards programs, and more. The organization is much less likely to regularly access warm or cold data. It might do so only for warranty claims, “we want you back” campaigns aimed at re-engaging inactive customers, and similar activities.
With SAP HANA, the organization can keep its hot data in hand by storing it in the SAP HANA database. It can use SAP HANA Dynamic Tiering to move warm data to extended tables. And it can move cold data to what is known as “cold storage,” typically tape or hard-disk drives. The organization can also use a store procedure to automatically migrate invoices (or any other data) as the data becomes older/colder at a regular frequency, such as monthly.
Where to Store Warm Data?
Where an organization stores its warm data can make a difference when internal teams need to run queries on it or on both hot and warm data simultaneously. Even though warm data might not be needed often, no one likes to wait, and having warm data query as quickly as possible is a key consideration.
One option is an all-flash infrastructure, which can be used for both SAP HANA and SAP HANA Dynamic Tiering. All-flash storage has actually come down significantly in cost—from up to $35 per GB in 2008 to as little as $0.40 per GB in 2015.[i],[ii]
One all-flash vendor, Pure Storage, recently did a study at the SAP Co-Innovation Lab (COIL) that reports that organizations can reduce costs with no significant loss in performance when using SAP HANA and SAP HANA Dynamic Tiering on the Pure Storage FlashStack solution.[iii],[iv]
Save Money Without Losing Performance—That’s Cool!
Pure Storage’s testing at COIL showed that when an organization runs SAP HANA with SAP HANA Dynamic Tiering on the Pure Storage all-flash infrastructure, it can potentially lower costs up to 75 percent while still obtaining performance virtually equal to running SAP HANA without SAP HANA Dynamic Tiering.3,4 The virtually equal performance was attributed to the read performance of the Pure Storage all-flash product, which enabled it to deliver performance for queries in SAP HANA Dynamic Tiering that were comparable to the performance of data queried entirely from SAP HANA in-memory.
The Pure Storage testing queried 500 million records with 1 TB of data queried to determine a baseline of 0.69 seconds query time with all data in SAP HANA. SAP’s own baseline query performance for running data on SAP HANA of 13.9 times faster than the same data run in a traditional Oracle Database environment, which was used as the baseline for data in a legacy environment.
Pure Storage tested offloading 25, 50, or 75 percent of its 1 TB test data from SAP HANA to SAP HANA Dynamic Tiering run on a Pure Storage FlashStack solution. What the testers found was that using SAP HANA Dynamic Tiering on the Pure Storage FlashStack product resulted in only imperceptible performance loss compared to having all the data in-memory in the SAP HANA database. The theory is that the more data an organization offloads to SAP HANA Dynamic Tiering, the less hardware, software, and licenses for SAP HANA the organization needs, which lowers costs but maintains comparable performance to having all data in SAP HANA.
- With 25 percent of data offloaded to SAP HANA Dynamic Tiering, the Pure Storage testing query took only 2.40 seconds—only 1.71 seconds longer than with all data in-memory in SAP HANA
- With 50 percent of data offloaded to SAP HANA Dynamic Tiering, the Pure Storage testing query took only 1.82 seconds—only 1.13 seconds longer
- With 75 percent of data offloaded to SAP HANA Dynamic Tiering, the Pure Storage testing query took only 1.36 seconds—only 0.67 seconds longer
Pure Storage reported cost savings that ranged up to a 75-percent reduction in overall costs with warm data offloaded SAP HANA Dynamic Tiering. This savings assumed a hardware, software, and licensing cost of more than $2,484,000, with 100 percent of data in SAP HANA, hardware cost savings based on additional memory and CPU requirements, and assumed license costs of $159,000 per 64 GB for SAP HANA Enterprise Edition and maintenance. Reported savings increased with more data stored in and queried from SAP HANA Dynamic Tiering.
Read an overview of the Pure Storage testing done at COIL in Pure Storage’s paper, “Lower Costs and Comparable Performance Justify Migrating to SAP HANA Sooner.” Or you can see the complete testing overview in the longer paper, “FlashStack IoT Solution (@SAP COIL): IoT Design Guide Using Virtualized SAP HANA® with Dynamic Tiering on Pure Storage.”
[i] Leventhal, Adam. Communications of the ACM. “Practice: Flash Storage Memory.” 2008. http://cacm.acm.org/magazines/2008/7/5377-flash-storage-memory/fulltext.
[ii] CIO.com. “News Flash—Flash Storage Isn’t as Expensive as You Think.” March 2015. www.cio.com/article/2894591/flash-storage/news-flash-flash-storage-isn-t-as-expensive-as-you-think.html.
[iii] Pure Storage. “Lower Costs and Comparable Performance Justify Migrating to SAP HANA Sooner.” October 2016. www.purestorage.com//content/dam/purestorage/pdf/whitepapers/PureStorage_SAP_HANA_Dynamic_Tiering_WP.pdf.
[iv] Pure Storage. “FlashStack IoT Solution (@SAP COIL): IoT Design Guide Using Virtualized SAP HANA® with Dynamic Tiering on Pure Storage.” August 2016. www.purestorage.com/content/dam/purestorage/pdf/whitepapers/IoT_Design_Guide_using_Virtualized_SAP_HANA_with_Dynamic_Tiering_on_Pure_Storage.pdf.