Sunday, May 18, 2014

AWS automation – CloudFormation bootstrapping early lessons – Part 3

The sample template illustrates a simple bootstrapping scenario; it creates a windows helper instance, which is set up to execute a PowerShell script stored in external repository (S3). You may develop a number of scripts to build, manage, and monitor the VPC and associated resources, this method can be used to deploy those capabilities systematically and automatically.

Everything is built around “WindowInstance”, which has these main components associated with bootstrapping
  •  “UserData” section defines "cloud-init" bootstrapping, which performs the execution of cfn-init
  •   “Metadata” section is defined for "cfn-init" bootstrapping. It installs PowerShell script from S3 to the local directory, and defines the command to run powershell script with
  •  cfn-signal script is used to return the status of command execution back to CloudFormation with the use of a WaitConditionHandle

Note the instance is defined with IAM Instance Profile, which provides it necessary privilege to access external data store, and perform VPC operations.

This simple method works nicely for initial deployment. How to manage ongoing changes? In this simple model, we will pick up new configuration by launching a new instance. We can put the instance behind an auto-scaling group, by terminating the existing instance, a new one will spin up automatically, triggering the execution of updated configurations.

There is an alternative method to trigger updates without launching new instances. AWS has designed cfn-hup to assist with updates by polling the CloudFormation meta-data for changes, and then executes defined actions when a change is detected. Now, instead of recreating the stack and launching a new instance, an update of CloudFormation stack will kick of the configuration change on the running instance. Please see Peter Hancock’s “Updating your AWS bootstrap” for a nice explanation of the technique. 

See sample template below:

{
  "AWSTemplateFormatVersion" : "2010-09-09",

  "Description" : "CF bootstrapping template sample: windows instance running a powershell script, obtained from S3, note how cfn:init defines command to use option switch to run powershell",

  "Parameters" : {
    "KeyPairName" : {
      "Description" : "Name of an existing Amazon EC2 key pair",
      "Type" : "String",
 "Default" : "xxx"
    },
"WindowInstanceSubnet" : {
      "Description" : "Subnet ID to launch instance",
      "Type" : "String",
      "Default" : "subnet-xxx"
    },

    "WindowInstanceSGs": {
    "Description": "Comma-delimited list of Security Group IDs for instance",
    "Type": "CommaDelimitedList",
      "Default": "sg-xxx, sg-xxx"
    },
    "InstanceType" : {
      "Description" : "Amazon EC2 instance type",
      "Type" : "String",
      "Default" : "t1.micro",
      "AllowedValues" : [ "t1.micro", "m1.small", "m1.medium", "m1.large", "m1.xlarge", "m2.xlarge", "m2.2xlarge", "m2.4xlarge", "c1.medium", "c1.xlarge"]
    }
  },

  "Mappings" : {
    "AWSInstanceType2Arch" : {
      "t1.micro"   : { "Arch" : "64" },
      "m1.small"   : { "Arch" : "64" },
      "m1.medium"  : { "Arch" : "64" },
      "m1.large"   : { "Arch" : "64" },
      "m1.xlarge"  : { "Arch" : "64" },
      "m2.xlarge"  : { "Arch" : "64" },
      "m2.2xlarge" : { "Arch" : "64" },
      "m2.4xlarge" : { "Arch" : "64" },
      "c1.medium"  : { "Arch" : "64" },
      "c1.xlarge"  : { "Arch" : "64" }
    },
    "AWSRegionArch2AMI" : {
      "us-east-1"      : {"64" : "ami-dfcdc4b6"},
      "us-west-1"      : {"64" : "ami-c2cef187"},
      "us-west-2"      : {"64" : "ami-16197726"},
      "eu-west-1"      : {"64" : "ami-fde21e8a"},
      "ap-southeast-1" : {"64" : "ami-08f5a45a"},
      "ap-southeast-2" : {"64" : "ami-7377ee49"},
      "ap-northeast-1" : {"64" : "ami-514e3e50"},
      "sa-east-1"      : {"64" : "ami-35319228"}
    }
  },

  "Resources" : {
    "InstanceRole":{
      "Type":"AWS::IAM::Role",
        "Properties" : {
          "AssumeRolePolicyDocument" : {
            "Statement": [{
              "Effect" : "Allow",
              "Principal" : {
                "Service" : [ "ec2.amazonaws.com" ]
              },
              "Action" : [ "sts:AssumeRole" ]
            }]
          },
          "Path" : "/"
        }
    },
      
    "RolePolicies" : {
      "Type" : "AWS::IAM::Policy",
      "Properties" : {
        "PolicyName" : "VPCupdate",
        "PolicyDocument" : {
          "Statement" : [
{
            "Action" : [ "ec2:*" ],
            "Effect" : "Allow",
            "Resource" : "*"
},
{
            "Action" : [ "s3:*" ],
            "Effect" : "Allow",
            "Resource" : "*"
}
 ]
        },
        "Roles" : [ { "Ref" : "InstanceRole" } ]
      }
    },
      
    "InstanceProfile" : {
      "Type":"AWS::IAM::InstanceProfile",
      "Properties" : {
        "Path" : "/",
        "Roles" : [ { "Ref":"InstanceRole" } ]
      }
    },

  "WindowInstance": {
      "Type" : "AWS::EC2::Instance",
      "Metadata" : {
        "AWS::CloudFormation::Init" : {
          "config" : {
            "files" : {
              "C:\\cfn\\yourscript.ps1" : {
                "source" : "https://s3.amazonaws.com/your-cfn-repo/yourscript.ps1"
              }
            },
            "commands" : {
     "1-update" : {
  "command" : "powershell.exe -ExecutionPolicy Bypass -NoLogo -NonInteractive -NoProfile -File C:\\cfn\\yourscript.ps1"
              }
            }
            
          }
        }
      },
      "Properties": {
        "InstanceType" : { "Ref" : "InstanceType" },
        "ImageId" : { "Fn::FindInMap" : [ "AWSRegionArch2AMI", { "Ref" : "AWS::Region" },
                      { "Fn::FindInMap" : [ "AWSInstanceType2Arch", { "Ref" : "InstanceType" }, "Arch" ] } ] },
         "Tags":[
            {
                  "Key":"Name",
                  "Value":"WindowInstance"
            }
        ],
"IamInstanceProfile" : { "Ref" : "InstanceProfile" },
        "SubnetId" : { "Ref" : "WindowInstanceSubnet" },
        "SecurityGroupIds" : { "Ref" : "WindowInstanceSGs" },
        "KeyName" : { "Ref" : "KeyPairName" },
        "UserData" : { "Fn::Base64" : { "Fn::Join" : ["", [
                ""
          ]]}}
        }
    },

    "WindowInstanceWaitHandle" : {
      "Type" : "AWS::CloudFormation::WaitConditionHandle"
    },

    "WindowInstanceWaitCondition" : {
      "Type" : "AWS::CloudFormation::WaitCondition",
      "DependsOn" : "WindowInstance",
      "Properties" : {
        "Handle" : {"Ref" : "WindowInstanceWaitHandle"},
        "Timeout" : "500"
      }
    }
  },

  "Outputs" : {
...
  }
}

Saturday, May 10, 2014

AWS automation – CloudFormation bootstrapping early lessons – Part 2

I shared some lessons from the initial learning curve towards more sophisticated CloudFormation capabilities in part 1 of this post. While it is easy to get started mimicking an existing design, it takes more in-depth understanding of bootstrapping in order to design to your specific target behavior and to troubleshoot more effectively.

Build Incrementally
It may be tempting to develop the full template and scripts all at once, and test full feature set to target design. If you are lucky, then everything work the first time. However, due to the many components involved, more often than not, some troubleshooting will be involved. At that point, running the whole system every time you change a snippet is actually more time-consuming, and often counterproductive to isolating root cause.

In other words, break the solution down into logical components, build incrementally. Start testing and troubleshooting early, at the component level. When the components have been tested, it will be a lot easier to assemble a complete system together successfully.

Logically, an incremental build may flow like this:
  • Develop and valid a basic template that creates target resources
  • Verify that the template launches target instance(s) and/or auto-scaling groups, ELBs, etc.
  • Instance installs specified software and packages successfully
  • Instance can access external data store (such as S3) and create local file structure per design
  • Instance can run the specified command/script/code
  • The specified command/script/code performs the desired function
  • CloudFormation receives signal  upon completion

Think Modular
An incremental approach also encourages the development of reusable code. For example, you may find it beneficial to capture a specific feature in a utility template, which has been tested and proven. In the future, you may develop a new app calling this nested template using parameters.

Disable Rollback
By default, CloudFormation performs rollback if an error is received during stack creation. For troubleshooting, it is often not sufficient just to look at CloudFormation event log, but also necessary to preserve the failed instances in order to collect more detailed clues. Therefore, it is essential to set DisableRollback to true (or if creating stack using console, expand “advanced option” to deselect default option).

After you have examined failed instances, you can manually delete the stack which will clean up the unwanted instances. You can then modify code and repeat the stack creation process.

Troubleshoot on the instance
If things don’t work as expected, the most specific and definitive information is always on the instance itself. Using credential, log on to the instance itself.

Check instance logs, for example, cfn-init logs, on linux: /var/log/cfn-init.log, on windows: C:\cfn\log\cfn-init.log

Take out the guessing
While your final product should be concise and elegant, you should feel free to generate additional information and output to help pinpoint the issue during development and troubleshooting. Why not make it obvious and easy for yourself?

You can apply any development technique here. For example, insert lines into your script or code to print to log file. I also find it more efficient to test the script directly on the instance, which often reveals issues without going through the lengthy steps of deleting and recreating stacks every time you make a change. Because the instance is already in the target VPC, you can use the command line directly to simulate bootstrapping process.

Tune timeout
Waitcondition is used for CloudFormation to receive signal back. If you have experience long delay for Waitcondition to report failure, check its times out value set. A typical bootstrapping operation takes no more than 5 minutes, there is no point waiting much longer. By decreasing timeout to less than 10 minutes, you will save a lot of time and frustration.

Watch external dependencies
A lot of times, a script that runs well locally may not work on bootstrapping. Think of various conditions that the instance relies on externally, think of them as necessary conditions for bootstrapping to run successfully:
  • Internet access from VPC
  • Security groups and policies applied to the instance
  • Instance role and access privilege
  • DNS
  • External data store access protection 

The more sophisticated automation capabilities become, the more components are involved in a complete sequence of events. Later, one process may pass variables to another. There will be more error-handling, more nested templates, parameters, more code, conditions, etc…  But every journey starts from somewhere, the lessons learned from bootstrapping provide a good first step. 

Sunday, May 4, 2014

AWS automation – CloudFormation bootstrapping early lessons – Part 1

Why bootstrapping
One of the empowering and disruptive characteristic of public cloud service such as AWS is that it levels the playing field. A startup has access to the same set of features and potential capabilities at the same cost level as a fortune 100 company has. However, what each company does with cloud service will largely depend on its business objectives, cloud architecture maturity, and its ability to innovate. The approach and outcome will likely differ dramatically.

Developing a DevOps framework and integrate seamlessly with AWS through automation is probably the biggest competitive differentiator among adopters. If you have been working with AWS, you probably noticed the role CloudFormation plays in automation, as a deployment launch pad, and often the glue to integrate with other tools such as scripts, code snippets, and external automation platforms like Chef and Puppet.

CloudFormation supports bootstrapping of instances, which is an essential building block of launching various services and applications. While there are several methods documented by AWS and increasingly more examples shared in the public domain, the initial learning curve still has quite a bit of research and experiments involved. Even in a relatively simple scenario, all the pieces need to fall into places for things to work. A thorough understanding of what happens at each step of the execution is probably fundamental to developing more sophisticated automation capabilities.

Logical Steps
The example used here is to have CloudFormation launch a windows instance bootstrapped with a PowerShell script loaded from S3. Seems simple? The logical sequence actually involves quite a few steps behind the scene:
  1. CloudFormation creates stack and launches instance
  2. CloudFormation passes meta data to launched instance, which is instructed to get a script from S3, and place it in the defined local directory
  3. Using cfn:init, the instance invokes command defined by CloudFormation, in this case executing a PowerShell script
  4. Upon completion of the script, the instance runs cfn:signal  to return status back to CloudFormation
  5. Finally CloudFormation determines whether  operation successful  (if not, it will rollback and terminate the instance)
During development stage, things may not work all at once. In my experience, that turned out to be a good thing, since I learned a lot more from things not working.  It is often necessary to follow the execution sequence step by step, and pinpoint the deviation from desired outcome at each step.

Invoking PowerShell – an example
With various components at play, some symptoms may not be so obvious. It is critical to have an idea of where it fails, in order to look for clues and apply further investigation at the right places, thus pinpoint the root cause, apply fix, and progress incrementally.

Below shows an example of what happens when instance launches. The script fails to generate the expected outcome. Logging on to the instance to examine the cfn log file:
2014-04-12 06:26:04,578 [DEBUG] Running command 3-update
2014-04-12 06:26:04,578 [DEBUG] No test for command 3-update

Note the command attempts to run the script, however it did not show “succeeded” following execution. Although it did not provide further information, we can guess that something went wrong when command was invoked. To collect further evidence, have the script to write output to a log file. We saw nothing recorded, indicating the script did not run successfully. With that knowledge, we can also test run the script directly from a command line on the instance, and noticed the need to set execution policy. Running PowerShell script on bootstrapping requires the proper option setting, which can be defined via cfn:init command:

"command" : "powershell.exe -ExecutionPolicy Bypass -NoLogo -NonInteractive -NoProfile -File C:\\cfn\\yourscript.ps1"

Now recreating CloudFormation stack with the updated template, a fresh instance is started which complete the script successfully. We can then go on check the outcome of the script itself.

2014-04-12 15:36:25,568 [DEBUG] Running command 1-update
2014-04-12 15:36:25,568 [DEBUG] No test for command 1-update
2014-04-12 15:36:51,042 [INFO] Command 1-update succeeded
2014-04-12 15:36:51,042 [DEBUG] Command 1-update output:
2014-04-12 15:36:51,073 [INFO] Waiting 60 seconds for reboot
2014-04-12 15:37:51,102 [INFO] ConfigSets completed
2014-04-12 15:37:51,118 [DEBUG] Deleting Scheduled Task for cfn-init resume
2014-04-12 15:37:51,414 [DEBUG] Scheduled Task deleted
2014-04-12 15:37:51,664 [INFO] Starting new HTTP connection (1): 169.254.169.254
2014-04-12 15:37:51,680 [INFO] Starting new HTTPS connection (1): cloudformation-waitcondition-us-east-1.s3.amazonaws.com


I will summarize additional dev and troubleshooting lessons and share a complete working example in a follow up post.