enabling data science: January 2018

Wednesday, January 31, 2018

Auto starting Jupyter Notebook on AWS Deep Learning server

Cloud and computing on demand is an increasingly powerful and cost effective combination of enabling technologies for data scientists. Further, utilizing machine learning servers such as those based on AWS deep learning AMIs can make a full suite of machine learning tools available in a matter of minutes.

Jupyter Notebook is a popular development interface for data analysis and model training. Currently, AWS has a published procedure for configuring, starting, and connecting to notebook server.
https://docs.aws.amazon.com/dlami/latest/devguide/setup-jupyter.html

However, setting up can be challenging, and repeating the above step each time an instance restarts is not ideal, especially when server is offered to the broad data science community.

Here is an alternative and enhancement to auto start notebook server.

Adapt for your specific environment. Here we assume we to use AWS deep learning conda image (ubuntu). Specially we install into "python3" environment (source activate python3).

Configure Jupyter Notebook

Similar to steps outlined here, configure Jupyter Notebook, which consists of:

Create key and cert. For example, in ~/.jupyter/ directory:

openssl req -x509 -nodes -days 11499 -newkey rsa:1024 -keyout "jupytercert.key" -out "jupytercert.pem" -batch

Create notebook password, copy generated string in .json file

jupyter notebook password

update ~/.jupyter/jupyter_notebook_config.py
c.NotebookApp.open_browser = False

c.NotebookApp.ip = '*'

c.NotebookApp.port = 8888

c.NotebookApp.password = ‘sha1:xxx‘

c.NotebookApp.certfile = '/home/ubuntu/.jupyter/jupytercert.pem'

c.NotebookApp.keyfile = '/home/ubuntu/.jupyter/jupytercert.key'

Set up Auto Start Jupyter Notebook (virtualenv)

Setting up auto start is usually straightforward (for example, use /etc/rc.local). In this case, because the target environment is virtualenv. We don't want to auto start in the default python environement, or as root user. But we still want to use rc.local. Use the following 2 step process.

create a script /home/ubuntu/.jupyter/start_notebook.sh (note use of absolute path to invoke the executable)

#!/bin/bash
source /home/ubuntu/anaconda3/bin/activate python3
/home/ubuntu/anaconda3/envs/python3/bin/jupyter notebook &

Edit /etc/rc.local and add the following, note we switch to ubuntu user, and invoke the startup script:

cd /home/ubuntu

su ubuntu -c "nohup /home/ubuntu/.jupyter/start_notebook.sh >/dev/null 2>&1 &"

The reason for this two step process is to be able to execute multiple commands (I didn't find effective ways to do that easily in rc.local)

User Access to Jupyter Notebook

Jupyter Notebook will always start automatically with instance. Without any additional set up, user can conveniently access Jupyter server at
"https:(server IP):8888"

Sunday, January 14, 2018

Azure automation with Logic App - passing variable in workflow

Similar to AWS Lambda, Azure Logic App can be used for automated workflow. However, clear documentation is harder to come by, with fewer working examples, and often lack of effective technical support.

In a workflow, it should be a common requirement to pass the output of one step to another step. The motivation to post this working solution, is there is no clear example that illustrates how exactly that is done. It should be learned in a few minutes, rather than hours of trial and error.

output from step 1

Using a simple two step workflow to illustrate, in step 1, we use an Azure Function App with a powershell script. We can obtain a user email dynamically from Azure VM's user defined tag field.

$user_email = (Get-AzureRmVM -ResourceGroupName $resourceGroupName -Name $resourceName -ErrorAction $ErrorActionPreference -WarningAction $WarningPreference).Tags["user_email"]

More importantly, the obtained result needs to be sent to this rather odd "Out-File" structure. This is how variable can be passed in the workflow:

$result = $user_email | ConvertTo-Json
Out-File -Encoding Ascii -FilePath $res -inputObject $result

input to step 2

In a subsequent step, we can use the output of previous step, in this case, sending an email to VM's user per tag. This is best illustrated using the graphical interface of Logic App Designer:

Azure recognized a step generates an output, and make it available to be used for subsequent steps. The particular handle is shown as "Body" of Step 1 Function App, again, rather odd representation.

But it does work. And this simple mechanism is a much needed building block to construct complex features in a workflow.