Effective machine learning (ML) requires data—lots and lots of data. So why not put your ML where lots of good, clean data is: in your database? Better still, why not run your ML algorithms where your data is most performant: in memory?
This is the reasoning that underpins SAP HANA machine learning. Running machine-learning algorithms where your data resides—in the database—can help reduce latency and alleviate the other delays that can arise when copying data to another server. This can be particularly beneficial when trained machine-learning models perform inference, make predictions, or classify data.
Creating and Training Machine-Learning Models in SAP HANA
SAP HANA Predictive Analytics provides in-database machine-learning capabilities through two primary channels. The first is the SAP HANA Automated Predictive Library (APL), which is targeted principally at businesses and data analysts. The APL provides data analysts and developers with automated, wizard-driven machine-learning capabilities, and it can be used to create predictive models without requiring data-science experience.
The second means for ML in SAP HANA is the SAP HANA Predictive Analysis Library (PAL), an application function library tailored for use by data scientists. The PAL provides a more hands-on ML experience, which provides more granular control for advanced users like data scientists.
Regardless of whether ML models are built using the APL or the PAL, they run within the SAP HANA platform. This means that users do not need to extract the data and perform calculations on their desktops or move data to run predictive workflows. Only the predictive results—the ML model inferences—are sent back to a user’s desktop client after processing.
SAP Predictive Analytics does not require complex predictive models as input. Users simply configure it and tell it what type of data-mining function to apply to a dataset; users set the parameters for analysis so that the system can train on an input dataset. Built-in ML algorithms include:
- Time series
- Key influencers
- Link analysis
Note that SAP Predictive Analytics composes its own models using sophisticated techniques for automated machine learning. It also creates—and selectively deletes—metadata as required to create robust models.
SAP HANA also provides integration for models created in popular open-source machine-learning frameworks and programming languages. For example, a user can invoke models from SQL-based applications or from client-side code written in Python and other programming languages. Applications developed on the SAP HANA XS Advanced (XSA) application server can then consume these machine-learning models.
Users can also incorporate third-party machine-learning assets more invasively in SAP HANA. For example, data scientists can use their extensive collections of models and scripts already developed in R. Users can write R code in the SAP HANA platform and then mix and match those programs with SAP HANA PAL algorithms, all while executing the R code externally on a separate Rserve.
It is also possible to incorporate deep-learning models into SAP HANA. Users can also develop deep-learning neural networks in TensorFlow (a popular open-source framework) and then load those trained models into a TensorFlow Serving server. They can then invoke those models from the SAP HANA platform and SAP HANA applications.
Machine-Learning Inference in SAP HANA
In-database ML in the SAP HANA platform provides machine-learning inference on data at rest or in motion.
Inference on large-scale data—Algorithms in the SAP HANA PAL are designed to execute in parallel on partitioned tables distributed across multiple servers to help ensure that the SAP HANA platform can make full use of all the available hardware. But this optimization is not limited to PAL-based models; SAP HANA provides load-balancing capabilities for models built in TensorFlow or R for parallel model execution on partitioned data.
Real-time inference—Not all aspects of ML involve processing extremely large datasets. Some models need to perform inferences on a small amount of data repeatedly and with minimal latency (image-based quality control on an assembly line, for example). To speed inference in these situations, SAP HANA parses an ML model once and then keeps that model in memory for subsequent executions, eliminating the overhead of reloading the model into memory every time.
Inference using streaming analytics—Some business processes use streaming data from events such as clickstreams or sensors. SAP HANA can combine ML and streaming analytics processing and make inferences on incoming clickstreams immediately. As increasing data needs define the economy, the speed with which organizations can analyze relationships in data and act on those insights will separate successful companies from their competitors. Machine learning is a core component of building intelligent business processes, and ML is faster when it is executed where the data resides: in the database. Features and enhancements in the SAP HANA platform provide a powerful foundation for training and scoring ML models in production environments.