Maybe you’re interested in exploring the data-science tools that are used most often by data scientists today, but you don’t know where to start. Or maybe you’re already a data scientist, but you don’t yet have a dedicated machine with the tools or the horsepower needed for your next project. What if there were a way to easily spin up a virtual machine (VM) in the cloud that had all of the latest and greatest data-science tools pre-installed and pre-configured for you—running either Linux or Windows? Imagine also you could assign that workstation all the resources you needed, and you only had to pay for it while you were using it. That would be pretty handy, right?
Well that’s exactly what Microsoft offers with its Data Science Virtual Machine, or DSVM. The DSVM is a VM image hosted on the Microsoft Azure cloud that is built specifically for doing data science. Available on Windows Server and on Linux, it has many popular data-science and programming tools pre-installed and pre-configured, making it easy to get started building applications for analytics.
Prowess has been using the DSVM for training and education. We’ve developed data science hands-on labs that guide students to quickly get to a DSVM desktop so that can begin exploring tools such as Jupyter Notebooks. But the DSVM isn’t useful just for those who develop training modules; it’s also useful for anyone who wants to start exploring the tools that are used most often by data scientists today, as well as for seasoned data scientists.
Here’s how you can start using your own DSVM: first go to the Azure portal at https://portal.azure.com and click Create a resource.
Then type “data science” in the search box, and a few search hits will automatically appear. Next, select a DSVM in the OS of your choice: Ubuntu, CentOS, Windows Server 2016, or Windows Server 2012.
When you power up and connect to your newly provisioned VM, you’ll see many data-science tools available on the desktop. Here’s a screenshot of the Ubuntu version of the DSVM:
As you can see, it’s loaded with data science goodies. These pre-installed tools on the DSVM include the following:
- Microsoft R Server 9.2.1 with Microsoft R Open 3.4.1, MicrosoftML package with machine learning algorithms, RevoScaleR and revoscalepy for distributed and remote computing, and R and Python operationalization
- Anaconda Python 2.7 and 3.5
- JupyterHub with sample notebooks
- Apache Spark local 2.2.0 with PySpark and SparkR Jupyter kernels
- Single-node local Apache Hadoop
- Azure command-line interface
- Microsoft Visual Studio Code, IntelliJ IDEA, PyCharm, and Atom
- H2O, Deep Water, and Sparkling Water
- Vowpal Wabbit for online learning
- XGBoost for gradient boosting
- Microsoft SQL Server 2017
- Intel Math Kernel Library
Many frameworks and toolkits useful for analytics are also pre-installed, pre-configured, and ready to run on this Linux version of the DSVM, including the Microsoft Cognitive Toolkit, TensorFlow, MXNet, Caffe, Caffe2, Chainer, NVIDIA DIGITS, Keras, Theano, Torch, and PyTorch. (For a more complete list of all the tools pre-installed on both Linux and Windows versions of the DSVM, see here.)
Overall, the DSVM is useful if you want to have a pre-configured analytics desktop available in the cloud, from anywhere. It’s also great (as mentioned earlier) for data-science training and education, for experimentation and evaluation, and for projects that require elasticity in compute power or other resources.
Prowess integrated the DSVM into hands-on labs this spring as a way to support Microsoft in its Red Shirt University tour, whose goal is to spread the word about the great development features in Microsoft Azure. Our job is to stay on top of the latest technologies, and we are proud to offer services that require Azure and data-science expertise.