The support of Python in SQL Server was big news at Microsoft Ignite 2017 where 16 breakout sessions touched on the topic. For example, check out session BRK3298: “Learn how to use R and Python integration in Microsoft SQL Server 2016/2017 to build machine learning applications. Learn how to operationalize machine learning models in SQL Server using stored procedures and leverage native scoring capabilities in SQL Server 2017. This session covers customer scenarios, how to change your R/Python script and implement predictive analytics using R/Python integration in SQL Server 2016/2017.”
Microsoft SQL Server 2017 software contains some big changes from previous versions, including support for running Linux (see my previous blog post for details). But one change that particularly got my attention was the integration of Python into SQL Server for analytics. This provides a great way to build complex machine learning into SQL Server applications without having to having to worry about the performance hit (or the regulatory or data-governance concerns) of moving data out of your database.
R and Python are two mainstays of data science, and the new Python integration in SQL Server 2017 builds on Microsoft’s integration of R into SQL Server 2016. Like running R Services in SQL Server 2016, Python in SQL Server 2017 basically runs like any other post-processing script in SQL with the system-stored procedure sp_execute_external_script accepting the Python code. Here’s a simple example of what the invocation of Python scripting in SQL looks like:
execute sp_execute_external_script @language = N'Python', @script = N' import sys print ("Hello, world.") print (sys.version) '
Actual execution of a Python script (or, for that matter, an R script) in SQL Server basically works like this:
You execute a query in SQL Server
The results of that query go to an external runtime (be it Python or R)
Your script runs in the external runtime and post-processes the query results
The processed query results return to your client
Because running Python scripts is much like running a post-processing script, it is natural to ask what Microsoft is really providing with in-database support for Python. As it turns out, quite a bit.
Huge Tracts of…Python Tools
If you run the example script above, in the output you will see that SQL Server 2017 runs Python 3.5.2 distributed through the Anaconda 4.2.0 package. Anaconda is a widely popular data-science ecosystem for Python, and a subset of its tools (such as scikit-learn, NumPy, and Pandas) are available for scripting analytics in SQL Server 2017.
However, you are not limited to the libraries that come with the built-in Anaconda distribution. You can import any Python library that you install on the server (using, for example, the pip Python package manager) into your SQL Server queries. For example, if you wanted to incorporated neural nets or GPU-based computation into your SQL Server application, you could install and use the TensorFlow Python library (or the Microsoft Cognitive Toolkit library).
Microsoft also includes its own Python machine learning library, RevoScalePy. Based on RevoScaleR, RevoScalePy provides a number of core machine learning functions, such as linear regression, logistic regression, and decision trees. While you can get all of these functions (and many more) from multiple Python libraries, RevoScalePy also includes some interesting tools to enable remote and distributed computation. Remote computation enables you to choose the compute context, so that you can, for example, develop and run your Python script on a local IDE but have all of the analytical computation happen on the server. The tools in RevoScalePy also allow you to distribute that computation over multiple cores, processors, or (in a cluster) nodes for parallel computation. Beyond speed, the parallel-computation functionality in RevoScalePy is also useful for analyzing data sets that are too big to fit into memory.
In contrast to the introduction of R support in SQL Server 2016, there is no SQL Server Python services that correspond to SQL Server R Services. Instead, Microsoft has rolled the functionality for both analytics languages into SQL Server Machine Learning Services. The name is evocative and hints at Microsoft’s future aspirations for SQL Server. Indeed, Simon Bisson at InfoWorld suggests that the inclusion of R and Python support in SQL Server could be the opening move in Microsoft eating into the customer base for expensive third-party statistical-analysis solutions like SAS.
Even if the inclusion of more advanced analytical tools in SQL Server doesn’t actually result in Microsoft eating the lunch of analytics vendors, it does presage a wider variety of analytics tools to use even on workaday solutions like SQL Server. All in all, it’s a great time to be a data scientist.
For more details about Python in SQL Server 2017, see this TechNet blog post. And for more coverage on tech developments like this, be sure to follow us on Twitter (@ProwessConsult) or check out some of our other blog posts.