Sunday, February 4, 2018

Auto starting R Studio on AWS Deep Learning server

As an enhancement to machine learning servers built on AWS or Azure, it is often necessary to set up R development environment to meet the needs of data science community.

Adapt for your specific environment. Here we assume we to use AWS deep learning conda image (ubuntu). Specially we use "python3" virtual environment (source activate python3). One of the reasons to use this environment is it is already set up to run Jupyter Notebook (see auto start jupyter), we can therefore add an additional R kernel to it.  Then we have a consolidated image that can be offered to both Python and R users.

The easiest method to install R is using conda:
conda install r r-essentials

RStudio is a popular development environment. Follow instructions to install RStudio, for example:
sudo apt-get install gdebi-core
sudo gdebi rstudio-server-1.1.419-i386.deb

The above procedure also sets up auto start of R studio server by adding /etc/systemd/system/rstudio-server.service. However, because the only available procedure installs RStudio with "sudo" into the default system environment, it cannot find R which has been installed into a different environment. As a result, RStudio fails to start with error indicating RStudio cannot find R.
rstudio-server verify-installation
Unable to find an installation of R on the system (which R didn't return valid output); Unable to locate R binary by scanning standard locations

This can be easily fixed by specifying the exact path to R for RStudio, replace path with your installation of R:
sudo sh -c 'echo "rsession-which-r=/home/ubuntu/anaconda3/envs/python3/bin/R" >> /etc/rstudio/rserver.conf'

Restart instance, now RStudio Server starts successfully. Login with Linux credential at:
"http:(server IP):8787"

Wednesday, January 31, 2018

Auto starting Jupyter Notebook on AWS Deep Learning server

Cloud and computing on demand is an increasingly powerful and cost effective combination of enabling technologies for data scientists. Further, utilizing machine learning servers such as those based on AWS deep learning AMIs can make a full suite of machine learning tools available in a matter of minutes.

Jupyter Notebook is a popular development interface for data analysis and model training. Currently, AWS has a published procedure for configuring, starting, and connecting to notebook server.

However, setting up can be challenging, and repeating the above step each time an instance restarts is not ideal, especially when server is offered to the broad data science community.

Here is an alternative and enhancement to auto start notebook server.

Adapt for your specific environment. Here we assume we to use AWS deep learning conda image (ubuntu). Specially we install into "python3" environment (source activate python3).

Configure Jupyter Notebook

Similar to steps outlined here, configure Jupyter Notebook, which consists of:

Create key and cert. For example, in ~/.jupyter/ directory:
openssl req -x509 -nodes -days 11499 -newkey rsa:1024 -keyout "jupytercert.key" -out "jupytercert.pem" -batch

Create notebook password, copy generated string in .json file
jupyter notebook password

update ~/.jupyter/
c.NotebookApp.open_browser = False
c.NotebookApp.ip = '*'
c.NotebookApp.port = 8888
c.NotebookApp.password = sha1:xxx
c.NotebookApp.certfile = '/home/ubuntu/.jupyter/jupytercert.pem'
c.NotebookApp.keyfile = '/home/ubuntu/.jupyter/jupytercert.key'

Set up Auto Start Jupyter Notebook (virtualenv)

Setting up auto start is usually straightforward (for example, use /etc/rc.local). In this case, because the target environment is virtualenv. We don't want to auto start in the default python environement, or as root user. But we still want to use rc.local. Use the following 2 step process.

create a script /home/ubuntu/.jupyter/ (note use of absolute path to invoke the executable) 
source /home/ubuntu/anaconda3/bin/activate python3
/home/ubuntu/anaconda3/envs/python3/bin/jupyter notebook &

Edit /etc/rc.local and add the following, note we switch to ubuntu user, and invoke the startup script:
cd /home/ubuntu
su ubuntu -c "nohup /home/ubuntu/.jupyter/ >/dev/null 2>&1 &"

The reason for this two step process is to be able to execute multiple commands (I didn't find effective ways to do that easily in rc.local)

User Access to Jupyter Notebook

Jupyter Notebook will always start automatically with instance. Without any additional set up, user can conveniently access Jupyter server at
"https:(server IP):8888"

Sunday, January 14, 2018

Azure automation with Logic App - passing variable in workflow

Similar to AWS Lambda, Azure Logic App can be used for automated workflow. However, clear documentation is harder to come by, with fewer working examples, and often lack of effective technical support.

In a workflow, it should be a common requirement to pass the output of one step to another step. The motivation to post this working solution, is there is no clear example that illustrates how exactly that is done. It should be learned in a few minutes, rather than hours of trial and error.

output from step 1

Using a simple two step workflow to illustrate, in step 1, we use an Azure Function App with a powershell script.  We can obtain a user email dynamically from Azure VM's user defined tag field.
$user_email = (Get-AzureRmVM -ResourceGroupName $resourceGroupName -Name $resourceName -ErrorAction $ErrorActionPreference -WarningAction $WarningPreference).Tags["user_email"]

More importantly, the obtained result needs to be sent to this rather odd "Out-File" structure. This is how variable can be passed in the workflow:
$result = $user_email | ConvertTo-Json
Out-File -Encoding Ascii -FilePath $res -inputObject $result

input to step 2

In a subsequent step, we can use the output of previous step, in this case, sending an email to VM's user per tag. This is best illustrated using the graphical interface of Logic App Designer:

Azure recognized a step generates an output, and make it available to be used for subsequent steps. The particular handle is shown as "Body" of Step 1 Function App, again, rather odd representation.

But it does work. And this simple mechanism is a much needed building block to construct complex features in a workflow.

Saturday, March 18, 2017

Three Networking features AWS should support

AWS is continuously enhancing and adding new features. However, a number of fundamental networking features have been discussed for a while, based on recent interactions with AWS team, still not on roadmap.

Here are three of those features high on my list, and why.

1. Multi-Path Routing (ECMP)
Currently, AWS routing table does not allow multiple routes to the same destination. For example, I can only define my default route in a private route table to a single destination (which can be a single point of failure).
If ECMP is supported, user will have a lot of load sharing and resiliency options. For example, I can define multiple default route to point to redundant load sharing gateways in multiple zones.

However, user still needs to keep those route up to date if the target instances changes. This can be done by keeping the ENI persistent and reattaching to new instances, or trigger lambda to update routes when instance refreshes

2. ELB as Route Table target
Supporting load balancer as a routing target may not seem natural as a network solution, there needs to be internal implementation that forward traffic to resolved load balancer and instances behind them.
This type of capability will allow user to fully benefit from the scalability and resiliency of load balancer, and have "native" high availability without the need for a self-maintained layer of lambda checks and actions.

An example that this can be done can be found with Azure, User Defined Route (UDR) can point to Azure Load Balancer (ALB), this enables route table to send traffic to a cluster of gateway nodes behind of load balancer, which leads to simple and elegant resiliency.

3. Native Transit VPC
In large scale enterprise use of AWS, as the number of VPCs go up, transit VPC can really help to scale by consolidating connectivity. Currently, there is a Cisco CSR based solution. But any third party appliances would require maintenance overhead, and introduce bottlenecks.

The ideal solution would be AWS enabled transit, to allow user to self define, much like peering connections.

I hope the these requirements are echoed by user communities.

Sunday, September 11, 2016

AWS VPC VGW Multipath Routing - difference between Direct Connect and VPN

VPC VGW multi-path scenario
To connect a VPC to enterprise networks or other VPCs, we use Direct Connect or VPN. It is common to have multiple connection paths from a VPC. Routing outbound from a VPC is controlled by VGW. The question is, how does VGW which is an AWS internal logical router handle multi-path routing?

Multi-path is a requirement for high availability. Load sharing on multi-path is often desirable. How VGW handles multi-path routing is actually different based on connection type. Specifically, Direct Connect supports ECMP. VPN does not (after Oct 2015).

Direct Connect
Direct Connect supports the configuration option of redundant paths with Active/Active (BGP multipath), VGW routes traffic over multiple equal cost paths. As a result, we can leverage all bandwidth resources provisioned for DX.

With VPN, VGW currently does not support BGP multipath. VPN chooses one BGP path only.

What if we use static route instead of BGP, can static be used to load share traffic across multiple paths?
In the scenario shown in the diagram, there are dual VPN connections going to two remote CGWs, each with redundant tunnels. If static routes are defined equally, does VGW route ECMP out multiple paths?
  • VGW created prior to Oct 28 2015 supports static multipath.
  • VGW created after Oct 28 2015 selects one active path out of multiple paths defined

The scenario is tested with a new VGW in one VPC, and a pair of customer VPN appliances in aonther VPC. With 4 tunnels/paths, it seems all traffic goes to one tunnel only. AWS support confirmed the behavior that VGW only selects one path only.

Why AWS should support VPN multipath
With VPN, it may be desirable to spread load across multiple customer gateways, because those customer gateways may be Cisco or Palo Alto appliances that has licensed throughput capacity. It is more optimal to spread load across multiple destinations rather than sending all traffic to one while other paths sit idle.

Hopefully AWS will bring consistent multipath routing to VPN, with BGP multipath and static ECMP.

Saturday, July 9, 2016

AWS Auto Scaling Lifecycle Hook with Lambda and CloudFormation

There are a lot of advantages to place instances in AWS Auto Scaling Groups, scaling is the obvious one. Even for a single instance appliance, Auto Scaling provides resiliency, health monitoring and auto recovery. In many cases, ASG High Availability model is superior to running active/standby appliances in terms of seamless automation and cost effectiveness.

However, Auto Scaling has limitations, not all instance actions and properties can be defined with an ASG. For example, instance launched in an ASG can have only one interface. Auto Scaling currently does not support attaching multiple interfaces. AWS Lambda, on the other hand, is great for defining custom actions executed efficiently and on demand. Putting the two together, AWS Auto Scaling lifecycle hook allows Lambda defined custom actions to be inserted during ASG instance launch or termination, which is powerful and flexible.

Reference links below for more details about Auto Scaling lifecycle hooks, as well as an excellent example and implementation steps using AWS console written by Vyom Nagrani  

To automate ASG and lifecycle hook actions, Cloudformation is used to define ASG and lifecycle hook. In the following example, a lifecycle hook is defined to send notification via SNS when instance launches. A Lambda function will be triggered via subscription to the SNS topic.
"GatewayAutoscalingGroupHook" : {
                "Type" : "AWS::AutoScaling::LifecycleHook",
                "Properties" : {
                                "AutoScalingGroupName" : { "Ref": "GatewayAutoscalingGroup" },
                                "HeartbeatTimeout" : 300,
                                "LifecycleTransition" : "autoscaling:EC2_INSTANCE_LAUNCHING",
                                "NotificationMetadata" : { "Fn::Join" : ["", [
                                                { "Ref" : "GatewayInstanceENI1" },
                                                { "Ref" : "GatewayInstanceENI2" },
                                "NotificationTargetARN" : "arn:aws:sns:us-east-1:697686697680:gateway-asg-lifecycle-hook",
                                "RoleARN" : "arn:aws:iam::697686697680:role/gateway-sns-hook-role"

There is an odd behavior with Cloudformation when it is used to define ASG lifecycle hook. According to AWS, Lifecycle hook is defined AFTER the first instance in ASG is created. As a result, the first instance launches without the expected lifecycle hook action. Only when the first instance is deleted, the next instance kicks off lifecycle action, and triggers Lambda function as expected. AWS suggests several workarounds, including launching ASG with 0 instance and increasing to 1 later, or use custom resources.

Use Lambda monitoring features to see if/when the function is triggered by Lifecycle hooks. It is helpful to log the receiving message. AWS sends out a TEST notification when lifecycle hook is initially created. The TEST notification won’t have the complete notification content but it still will trigger Lambda. Since it currently can’t be turned off, Lambda function need to have some error handling for it.