[AWS] Systems Manager

Nikhil Narayanan
16 min readNov 8, 2020

Introduction

The intention of this document is to introduce you to Systems Manager, to understand different feature of it and how capable it is to maintain AWS infrastructure by understanding different capabilities of AWS Systems Manager in brief.

AWS Systems Manager is an AWS service that we can use to view and control our infrastructure on AWS. AWS Systems Manager allows us to centralize operational data from multiple AWS services and automate tasks across our AWS resources. We can create logical groups of resources such as applications, different layers of an application stack, or production versus development environments. With Systems Manager, we can select a resource group and view its recent API activity, resource configuration changes, related notifications, operational alerts, software inventory, and patch compliance status. We can also take action on each resource group depending on our operational needs. Systems Manager provides a central place to view and manage our AWS resources, so we can have complete visibility and control over our operations.

The capabilities of Systems Manager are grouped into 5 categories -

  1. Operations Management
  2. Application Management
  3. Actions & Changes
  4. Instances & Nodes
  5. Shared Resources

Prerequisite

AWS systems manager has 3 prerequisites that we need to know

  1. A non admin AWS user for Systems Manager
  2. SSM agent needs to be installed in instances and on on-premises machines
  3. We need to create an IAM instance profile for ec2 instances

AWS Systems Manager Agent (SSM Agent) is Amazon software that can be installed and configured on an EC2 instance. SSM Agent makes it possible for Systems Manager to update, manage, and configure these resources. The agent processes requests from the Systems Manager service in the AWS Cloud, and then runs them as specified in the request. SSM Agent then sends status and execution information back to the Systems Manager service by using the Amazon Message Delivery Service (service prefix: ec2messages).

Once these prerequisites are met, then that instance will start showing in Systems Manager Managed Instance section. A managed instance is a machine that has been configured for use with Systems Manager. Systems Manager also helps to configure and maintain our managed instances. Supported machine types include EC2 instances, on-premises servers, and virtual machines (VMs), including VMs in other cloud environments. Supported operating system types include Windows Server, multiple distributions of Linux, and Raspbian.

Systems Manager Capabilities

Now let’s understand in brief about the capabilities of Systems Manager one by one.

01. Run Command

We can use Systems Manager Run Command to remotely and securely manage the configuration of our managed instances at scale. Run Command can perform on-demand changes like updating applications or running Linux shell scripts and Windows PowerShell commands on a target set of dozens or hundreds of instances.

Run Command improves our security by not opening ssh ports to administer EC2 instances.In addition, we can control what action can be perform on what instance. Also, we can create an audit log of who performed what action and when. Run command is accessible from AWS CLI, console and SDK’s.

In the back end, Run Command uses something called a Document to perform actions on our fleet of instances. A documents defines a configuration for our system. We can say a Document essentially is a series of steps executed in sequence. We can also pass runtime parameters, when using a particular document.
Documents are of two types. Predefined document or Managed documents and Custom documents. Predefined documents allows us to run a command, configuring CloudWatch, configuring docker as well as run any scripts. Documents can be shared across accounts.
Custom document is the one which is created by us and we manage them. A Document can be written in json or in yaml language.

General Use Cases:

  1. Monitoring our systems — Things such as systems check, gathering memory ad disk usage
  2. Joining instances to a domain
  3. On demand patching — Patching instances to remove a vulnerability etc
  4. Deploying codes to instances
  5. Process management — Starting or stopping a service on one or a set of instances
  6. Run bootstrap scripts on application instances
  7. User and account management
  8. Managing spark and hadoop jobs on emr etc. The use cases are countless

02. State Manager

We can use Systems Manager State Manager to automate the process of keeping our managed instances in a defined state. We can use State Manager to ensure that our instances are bootstrapped with specific software at startup, joined to a Windows domain or patched with specific software updates. We can say that State Manager helps to define and maintain consistent configuration of operating systems and applications.

Based on a schedule that we define, State Manager automatically reviews our fleet and compares it against the specified configuration policy. If our configuration changes and does not match the wanted state, State Manager reapplies the policy to bring it back to the wanted state. Reapply the configuration to our instances, minimizing configuration drift.

How does it work?
To get started with State Manager, determine the state that we want to apply to our managed instances. The state that we want to apply will determine which SSM document we can use to create a State Manager association. A State Manager association is a configuration that is assigned to our managed instances. This configuration defines the state that we want to maintain on our instances. State Manager uses SSM documents to create an association.

After an association is created in State Manager, if the overall association status is success, all instances that were targeted in that association were successful in applying the configuration. If even a single instance failed to apply, the overall status is Failed.
This is the how State Manger helps to maintain a consistent state for our application.

03. Parameter Store

Parameter Store provides secure, hierarchical storage for configuration data and for managing secrets. We can store data such as passwords, database strings, EC2 instance IDs and Amazon Machine Image (AMI) IDs, and license codes as parameter values. We can store values as plain text or encrypted data. We can then reference values by using the unique name which we specified when we created the parameter.

Parameter datatypes can be of the following

  1. String
  2. StringList
  3. SecureString

Parameter store can be accessed programmatically from different AWS services like, Cloud Formation, Lambda etc, and also through AWS CLI and AWS Console

We use Amazon EventBridge and Amazon Simple Notification Service (Amazon SNS) to notify you about changes to Systems Manager parameters. For example, you can run an AWS Lambda function to recreate a parameter automatically when it expires or is deleted. You can set up a notification to trigger a Lambda function when your database password is updated

Parameter Tier
Systems Manager provide different parameter store tier to manage the parameters. They are : Standard, Advanced and Intelligent. The difference comes with the number of parameters we can add per account per region based and on the parameter size.
Standard can have 10K request and a parameter size of 4KB. On the other hand Advanced parameters have 100K parameters with 8KB size. We need to understand that advanced parameters inquire additional cost . Also parameter store policies are not available for Standard tier.

04. Inventory

Inventory automates the process of collecting software inventory from managed instances. We can use Inventory to gather metadata about applications, files, components, patches, and more on our managed instances. Systems Manager Inventory collects only metadata from our managed instances. Inventory does not access proprietary information or data.

We can choose to inventory all instances in our AWS account, individually select instances, or target groups of instances by using Amazon EC2 tags. Also we have the option to specify a collection interval in terms of minutes, hours, and days. The shortest collection interval is every 30 minutes.

These are some of the metadata that inventory collects

  • Applications: Application names, publishers, versions, etc.
  • AWS components: EC2 driver, agents, versions, etc.
  • Files: Name, size, version, installed date, modification and last accessed times, etc.
  • Network configuration: IP address, MAC address, DNS, gateway, subnet mask, etc.
  • Windows updates: Hotfix ID, installed by, installed date, etc.
  • Instance details: System name, operating systems (OS) name, OS version, last boot, DNS, domain, work group, OS architecture, etc.
  • Services: Name, display name, status, dependent services, service type, start type, etc.
  • Tags: Tags assigned to your instances.
  • Windows Registry: Registry key path, value name, value type, and value.
  • Windows roles: Name, display name, path, feature type, installed state, etc.

We can aggregate the data using the Resource data sync option in Inventory to take that data and give us a view across our environment. It also integrates with IAM to manage permissions and Roles and also integrates with AWS Config which will allow us to apply compliance settings to our resources.

General Use Cases:

  1. Managing software licenses
  2. Enforcing compliances using AWS Config
  3. Using Resource Data Sync to aggregate the inventory data

05. Patch Manager

AWS Systems Manager Patch Manager automates the process of patching managed instances with both security related and minor application patches. Patch Manager doesn’t support upgrading major versions of operating systems. This capability enables you to scan instances for missing patches and apply missing patches individually or to large groups of instances by using EC2 instance tags.

Patch Manager uses patch baselines, which include rules for auto-approving patches within days of their release, as well as a list of approved and rejected patches. We can install patches on a regular basis by scheduling patching to run as a Systems Manager maintenance window task.

For Linux operating systems, we can define the repositories that should be used for patching operations as part of our patch baseline. This allows us to ensure that updates are installed only from trusted repositories regardless of what repositories are configured on the instance. For Linux, we also have the ability to update any package on the instance, not just those that are classified as operating system security updates.

How does it work?
By default, Patch Manager doesn’t install all available patches, but rather a smaller set of patches focused on security. The Systems Manager patch baseline service uses preconfigured repositories on the instance. We can choose a different source repository configured for the instance, typically to install non security updates. That is also possible through patch manager. It is necessary that the instance must be able to connect to the repos so the patching can be performed.

We can specify alternative patch source repositories when we create a custom patch baseline. In each custom patch baseline, we can specify patch source configurations for up to 20 versions of a supported Linux operating system

Amazon Linux and Amazon Linux 2 instances use Yum as the package manager, and Yum uses the concept of an update notice as a file named updateinfo.xml. An update notice is simply a collection of packages that fix specific problems. All packages that are in an update notice are considered Security by Patch Manager. Individual packages are not assigned classifications or severity levels. For this reason, Patch Manager assigns the attributes of an update notice to the related packages

While creating custom baseline, we have the option to choose — after how many days of patch release we need to install that patch to our fleet of servers. If we ave tagged our fleet based on production, gamma stages, then for gamma stages, we can say that install all patches when it is available from vendor

  1. Create custom patch baseline
  2. Select the created custom patch baseline and modify the Patch Group
  3. Select the created custom patch base line and choose Configure Patching

At the back end Patch Manager runs a Run Command to complete the patch installation. More feature can be added to Patch manager using the Systems Manager capabilities like Automation, Explorer to help patch multiple accounts

Patch Manager integrates with AWS Identity and Access Management (IAM), AWS CloudTrail, and Amazon EventBridge to provide a secure patching experience that includes event notifications and the ability to audit usage.

06. Maintenance Window

We can use Maintenance Windows to set up recurring schedules for managed instances to run administrative tasks like installing patches and updates without interrupting business-critical operations. Maintenance Windows also lets you schedule actions on numerous other AWS resource types, such as S3 buckets, SQS queues, AWS KMS keys, and many more.

How does it work?
First we need to define a schedule, the schedule should be during a time window when potentially disruptive actions should take place. Once we define the schedule, the next thing to define will be the duration on how long the maintenance window should last. Then we need to register our targets. This is the logic behind how we are targeting our fleet of resources. It can be done through Tags, Resource Groups etc.
Once we have our targets defined, we need to define the tasks, that needs to be included in our maintenance tasks. We can run a command, an Automation document, Lambda function or Step functions.

General Use Cases:

  • Install or update applications.
  • Apply patches.
  • Install or update SSM Agent.
  • Run PowerShell commands and Linux shell scripts by using a Systems Manager Run Command task.
  • Build AMIs, boot-strap software, and configure instances by using a Systems Manager Automation task.
  • Run AWS Lambda functions that trigger additional actions, such as scanning your instances for patch updates.
  • Run AWS Step Functions state machines to perform tasks such as removing an instance from an Elastic Load Balancing environment, patching the instance, and then adding the instance back to the Elastic Load Balancing environment.
  • Target instances that are offline by specifying an AWS resource group as the target.

07. Automation

Systems Manager automation is a very powerful capability that allows us to orchestrate operational playbooks at scale. We can manage any AWS resource across accounts across multiple regions. We can even orchestrate dynamic playbooks which are automation documents.

How does it work?
AWS Systems Manager Automation assumes current user context by default. There is also an option to specify the service role. We can leverage the use of AWS provided playbooks. These playbooks are in the form of automation documents. AWS continuously publishes new automation documents based on customer demands and customer feedback. If predefined documents doesn’t meet our requirements, then we can always create our custom playbook or document.

  • We can define the action to be performed
  • We can define dynamic parameters
  • Conditional branching based on steps result and we can configure approvals based on workflows. We can create very Complex workflows using custom automation documents.

Automation playbooks can be run across multiple accounts/ regions and we can register automation documents through Maintenance window tasks registration.AWS config rules can integrate with Automation to remediate resource.

What are the benefits?

  1. Anything that is executed through Automation document, which is the automation playbook will get sent to CloudTrail if it is an API, so that there is additional auditing logs available to understand who executed what and when
  2. It is natively integrated with AWS IAM, so we can leverage our existing access control configurations
  3. There is also enhanced operation security by using Automation documents. For example, lets assume, we need to give user the permission to restart EC2 instances, but don’t want to give users the administrative permission to infrastructure — the EC2 instances themselfs. With EC2 automation, we can create a custom automation document that will include a role within it. This role will have the ability to restart the instance. The users will only need to permission to execute this automation document. So when the automation document is executed, the role will be used to reboot the instances, there by never giving the user the administrative permission to restart the instances.
  4. We can integrate best practices to the automation document. These can be shared across enterprise to make sure best practices are being followed
  5. Automation has enhanced integration with other AWS services. Automation has the ability to call API actions like to create CF stacks. They also have the ability to run scripts(Python). These scripts can be embedded within automation document.
  6. Automation allows to do automation at scale

General Use Cases:

  • Automating the creation of Golden AMI
  • Handling one click automation tasks, like configuring S3 buckets
  • Performing routine maintenance tasks such as Patching ASG
  • Automatically remediating resources through AWS Config
  • Taking backup of resources — DynamoDB or RDS

08. OpsCenter

It is the central remediation hub for AWS resources where all work items or operation items can be centralized. This enables us to faster issue resolution by providing a standardised view to investigate and resolve the operational issues. Opscenter also provide contextual data to help diagnose our issues. For example, data from CloudWatch , CloudTrail etc. We can also associate automation documents for remediation.

OpsCenter is integrated with Amazon EventBridge. This means we can create EventBridge rules that automatically create OpsItems for any AWS service that publishes events to EventBridge. There are limits for Systems Manager OpsCenter and also it is chargeable.

What are the benefits?

  1. We don’t need to navigate to multiple console pages to identify the issues that may be impacting a resource
  2. Ops Items are aggregated across services and stored in a central location
  3. Ops Item have service specific and contextual relevant data to help you address your issues more quickly
  4. allows association for related resources. We have a load balancer with multiple EC2 instances for our particular application, we can create association at the load balancer and the EC2 instances to help you identify the relations between resources
  5. Eliminate duplicate OpsItems
  6. Provides a way to view the resolution information about similar OpsItems
  7. We can execute Systems Manager automation document to resolve issues

How does it work?
At the core, OpsItems are events. Events can be created automatically from services that publishes to CloudWatch events such as security hub, a state of an EC2 instance, whether it is stopped or started. It could also be throttling issues within EBS volumes or DynamoDB table. We can also create these OpsItems manually via API or console. OpsItem contains following information

  • Title, ID, Priority, source, data/time
  • Overall status
  • Related resources or OpsStream
  • Searchable and Private operation data
  • Deduplication

Also, we can have reporting through SNS notification

09. Distributor

Distributor allows us to securely store and distribute software packages such as CloudWatch agents or any software agents, drivers and applications so that we simplify the distribution of these packages at scale. It provides a central repository with version control so we can make enhancement to our packages and also share packages with other AWS accounts. It also has integration with IAM allowing us to provide access control to our packages when we create.

We can install the packages either on demand or on schedule through Systems Manager State Manager. Also, we can automatically install packages on new instances based on how we target the distribution of the packages.

How does it work?
First we need to make sure that we have all necessary prerequisites completed so that Systems Manager can manage our instances. Then we need to specify where we want to store the package — the S3 bucket name and then we need to provide the software files and specify the platform, version and architecture of the package we are creating. Then we validate the install/uninstall or update script it generates meet our requirements.

Once the package is created, we can run this package one time using the Systems Manager Run Command or we can schedule it on a periodic manner using a State Manager. If we create association via State Manager, then the new instances that come online, that meet our target requirement will also get the package applied on to it.

What are the benefits?

  1. One package, many platforms
  2. Control package access across groups of managed instances — We can use Run Command or State Manager to control which of our managed instances get a package and which version of that package. Managed instances can be grouped by instance IDs, AWS account numbers, tags, or AWS Regions. We can use State Manager associations to deliver different versions of a package to different groups of instances.
  3. Many AWS agent packages included and ready to use — Distributor includes many AWS agent packages that are ready for us to deploy to managed instances. Look for packages in the Distributor Packages list page that are published by Amazon. Examples include AmazonCloudWatchAgent and AWSPVDriver.
  4. Automate deployment — To keep our environment current, use State Manager to schedule packages for automatic deployment on target instances when those instances are first launched.

10. Explorer

Is an operations dashboard providing you with aggregated graphical view of our Operation data, such as EC2 instance summaries and their patch compliance. With Explorer we can view our data across multiple AWS regions to see where attention, investigation and remediation is required. With Explorer we can gain insight to our operations issues and sort them by categories to focus on issues which are really relevant to us. We can see how these issues trend over time and how they are distributed across our organization helping us determine where action is required. Explorer integrates with AWS organizations to allow aggregate and view the data across our AWS account and also regions. All explored data can be queried through API allowing us to create our own customizable reports.

What is OpsData?
OpsData is any operations data that is displayed in the Systems Manager Explorer dashboard. Explorer retrieves OpsData from the following sources -

  • EC2
  • Systems Manager OpsCenter
  • Systems Manager Patch Manager
  • AWS Truster Advisor
  • AWS Compute Optimizer
  • AWS Support Center cases

11. Change Calendar

It is a fully managed AWS service that customers can use to create calendars and define important business events. Change calendar can be Open or Closed. Open allows actions by default. Closed calendar which denies action by default. Customer can then define their events on calendars to prevent or accept potentially disruptive actions during critical events based on state returned by the calendar. For customers that use multiple AWS accounts change calendar provides a flexibility to create a single master change control calendar that can be enforced across multiple AWS accounts.

Any AWS customer who creates or runs Systems Manager Automation documents or Administrators who are responsible for keeping the configurations of AWS Systems Manager managed instances consistent, stable, and functional should use Systems Manager calendar.

What are the benefits?

  • Review changes before they’re applied
    A Change Calendar entry can help ensure that potentially destructive Automation changes to your environment are reviewed before they’re applied.
  • Apply changes only during appropriate times
    Change Calendar entries help keep your environment stable during event times. For example, you can create a Change Calendar entry to block changes when you expect high demand on your resources, such as during a conference or public marketing promotion. A calendar entry can also block changes when you expect limited administrator support, such as during vacations or holidays. You can use a calendar entry to allow changes except for certain times of the day or week when there is limited administrator support to troubleshoot failed actions or deployments.
  • Get the current or upcoming state of the calendar
    You can run the Systems Manager GetCalendarState API operation to show you the current state of the calendar, the state at a specified time, or the next time that the calendar state is scheduled to change.

Conclusion

This document describes in brief the capabilities of AWS Systems Manager, the capabilities that Systems Manager offers us to make managing infrastructure — both cloud and on premises including EC2 machines, on premises servers and virtual machines. Though Systems Manager is free of cost, but when integrating with other AWS services, Systems Manager can incur additional cost — Eg AWS Config etc. We need to understand this to better build effective architecture. Most of the time, we combine different capabilities of Systems Manager to achieve our desired goal, making it a suitable service to efficiently and effectively manage our infrastructure.

Thank you for reading…!!!

--

--

Nikhil Narayanan

DevOps Engineer | Python Developer | Machine Learning | Artificial intelligence Enthusiast