Showing posts with label Lambda. Show all posts
Showing posts with label Lambda. Show all posts

Saturday, July 9, 2016

AWS Auto Scaling Lifecycle Hook with Lambda and CloudFormation



There are a lot of advantages to place instances in AWS Auto Scaling Groups, scaling is the obvious one. Even for a single instance appliance, Auto Scaling provides resiliency, health monitoring and auto recovery. In many cases, ASG High Availability model is superior to running active/standby appliances in terms of seamless automation and cost effectiveness.

However, Auto Scaling has limitations, not all instance actions and properties can be defined with an ASG. For example, instance launched in an ASG can have only one interface. Auto Scaling currently does not support attaching multiple interfaces. AWS Lambda, on the other hand, is great for defining custom actions executed efficiently and on demand. Putting the two together, AWS Auto Scaling lifecycle hook allows Lambda defined custom actions to be inserted during ASG instance launch or termination, which is powerful and flexible.

Reference links below for more details about Auto Scaling lifecycle hooks, as well as an excellent example and implementation steps using AWS console written by Vyom Nagrani  

To automate ASG and lifecycle hook actions, Cloudformation is used to define ASG and lifecycle hook. In the following example, a lifecycle hook is defined to send notification via SNS when instance launches. A Lambda function will be triggered via subscription to the SNS topic.
"GatewayAutoscalingGroupHook" : {
                "Type" : "AWS::AutoScaling::LifecycleHook",
                "Properties" : {
                                "AutoScalingGroupName" : { "Ref": "GatewayAutoscalingGroup" },
                                "HeartbeatTimeout" : 300,
                                "LifecycleTransition" : "autoscaling:EC2_INSTANCE_LAUNCHING",
                                "NotificationMetadata" : { "Fn::Join" : ["", [
                                                "{",
                                                "\"ENI1\"",
                                                ":",
                                                "\"",
                                                { "Ref" : "GatewayInstanceENI1" },
                                                "\"",
                                                ",",
                                                "\"ENI2\"",
                                                ":",
                                                "\"",
                                                { "Ref" : "GatewayInstanceENI2" },
                                                "\"",
                                                "}"
                                ]]},
                                "NotificationTargetARN" : "arn:aws:sns:us-east-1:697686697680:gateway-asg-lifecycle-hook",
                                "RoleARN" : "arn:aws:iam::697686697680:role/gateway-sns-hook-role"
                }
},

There is an odd behavior with Cloudformation when it is used to define ASG lifecycle hook. According to AWS, Lifecycle hook is defined AFTER the first instance in ASG is created. As a result, the first instance launches without the expected lifecycle hook action. Only when the first instance is deleted, the next instance kicks off lifecycle action, and triggers Lambda function as expected. AWS suggests several workarounds, including launching ASG with 0 instance and increasing to 1 later, or use custom resources.

Use Lambda monitoring features to see if/when the function is triggered by Lifecycle hooks. It is helpful to log the receiving message. AWS sends out a TEST notification when lifecycle hook is initially created. The TEST notification won’t have the complete notification content but it still will trigger Lambda. Since it currently can’t be turned off, Lambda function need to have some error handling for it.

Saturday, May 14, 2016

AWS High Availability Gateway – Part 2 – Enhanced HA Model



Basic HA model limitation
In part 1, the basic HA model uses ASG to recover Gateway instance when one fails. However, gateway service is unavailable in the affected zone during instance recovery. The length of downtime depends on how long it takes to build gateway from AMI, install and configure all software and services.  In an enhanced HA model, availability gap during recovery is closed by dynamically updating route table to use Gateway in another zone.

Enhanced HA Model
When a Gateway instance fails, ASG terminates old instance and builds a new one. The basic concept for enhanced HA is to detect a recovery event in Zone A, change route table to use Gateway in Zone B during recovery, and switch back to use Gateway in own zone once recovery completes.
Design mainly consists of two lambda functions:
  1. GatewayHA_Failover_duringRecovery
    1. triggered by CloudWatch event (Gateway instance terminated, indicating ASG initiating recovery)
    2. identify vpc, current and alternate zone, associated route tables and Gateway ENIs
    3. update current zone route table to use Gateway in alternate zone
  2.  GatewayHA_Restore_afterRecovery
    1. triggered by API call (Gateway ENI reattached to instance, indicating recovery completed)
    2. identify vpc, current and alternate zone, associated route tables and Gateway ENIs
    3. update current zone route table to use Gateway in own zone

Note the selection of triggering events. Lambda failover is triggered by instance termination, which indicates ASG starting the process of rebuilding gateway. It is necessary to restore route table to use gateway in own zone, in order to load balance outbound traffic across gateways. The Lambda restore function updates route table as soon as recovery completes (as indicated by the attachment of ENI).

Test Results
When applying HA model to build Squid transparent mode Gateway for internet access, using a test instance to perform continuous HTTP access tests, enhanced HA method shows dramatically decreased down time (from 15 minutes to almost unnoticeable). As observed in route table, when gateway in zone A fails, default route in Zone A’s private route table is updated to use Gateway in Zone B. As soon as gateway is rebuilt in Zone A, its private route table is updated to use Zone A’s gateway again. In this model, load balancing and dynamic failover are achieved with event driven intelligent response and full automation. 

Sample code for both basic and enhanced HA model can be found on Github:

Saturday, April 2, 2016

Why is Lambda function not invoked by CloudWatch Events



AWS Lambda has proven to be an increasingly versatile tool for cloud automation. It is pretty easy to get started since there is little dependency involved, and what you can do with it is limited only by imagination. However, I went through some detours with initial troubleshooting efforts, and hope to share some learning experience here.

A common occurrence is that a Lambda function not getting invoked as expected. In this typical push model, a CloudWatch event occurs which should trigger Lambda invocation. But reviewing CloudWatch provides no indication that Lambda left any logs. At this point, the most useful clue is to look at Lambda “Monitoring” tab. Shown below are clear indication that invocation occurred but failed.

So why would a Lambda function fails to get invoked? Lambda permission model is often overlooked. Specifically, Lambda function is associated with an execution role. The execution role must grant the permissions that Lambda function needs. By keeping access and control outside the function, there is clean separation between permission and the code.

For CloudWatch Events to invoke Lambda, Lambda must have authorization. Perhaps not so intuitive, the authorization for Lambda is by ways of function-level permissions. In other words, Lambda execution role needs to have CloudWatch Events permission for it to be invoked by CloudWatch Events.

One way to fix, add “cloudwatcheventsfullaccess” to the execution role assigned to Lambda function. There are obviously other permissions needed depending on your specific Lambda function, these are useful references: