The Alarm Context Tool (ACT) enhances AWS CloudWatch Alarms by providing additional context to aid in troubleshooting and analysis. By leveraging AWS services such as Lambda, CloudWatch, X-Ray, and Amazon Bedrock, this solution aggregates and analyzes metrics, logs, and traces to generate meaningful insights. Using generative AI capabilities from Amazon Bedrock, it summarizes findings, identifies potential root causes, and offers relevant documentation links to help operators resolve issues more efficiently. The implementation is designed for easy deployment and integration into existing observability pipelines, significantly reducing response times and improving root cause analysis.
Clone the repository:
git clone https://github.com/aws-samples/alarm-context-tool
cd alarm-context-tool
Install dependencies if you plan to use your IDE to detect problems in the code:
pip install -r ./dependencies_layer/requirements.txt
pip install aws_lambda_powertools
For some regions, you may need to change the layer version for Lambda Insights after the colon in template.yaml. See https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/Lambda-Insights-extension-versionsx86-64.html.
- !Sub arn:aws:lambda:${AWS::Region}:580247275435:layer:LambdaInsightsExtension:49
Edit the template.yaml file with the recipient email address and sender address.
Resources:
AlarmContextFunction:
Type: AWS::Serverless::Function
Handler: lambda_function.alarm_handler
Runtime: python3.12
Environment:
Variables:
RECIPIENT: [email protected]
SENDER: Name <[email protected]>
Update additional Environment Variables if required
Update your SNS Topics that receive notifications from CloudWatch alarms:
Use a guided deployment to start with:
sam build
sam deploy --guided
Subsequently, you can build, deploy and test using the following command: The test-event must be shared. See Testing
sam build; sam deploy --no-confirm-changeset; sam remote invoke --stack-name alarm-context-tool --region <aws-region> --test-event-name <test-event>
Once deployed, the Lambda function will be triggered by SNS topics subscribed to CloudWatch Alarms. The function will enhance the alarm message with additional context such as related metrics, logs, and traces. It uses Amazon Bedrock to analyze the gathered data and generate actionable insights.
To create a new handler for a different AWS service, follow these steps:
Create a new handler file:
Create a new Python file in the handlers
directory. For example, new_service_handler.py
.
Define the handler function: Implement the handler function similar to existing handlers. Here's a template:
import boto3
import botocore
from aws_lambda_powertools import Logger, Tracer
logger = Logger()
tracer = Tracer()
@tracer.capture_method
def process_new_service(dimensions, region, account_id, namespace, change_time, annotation_time, start_time, end_time, start, end):
# Your implementation here
pass
Add the handler to the Lambda function:
Update lambda_function.py
to import and call your new handler based on the trigger.
Update the template:
Modify template.yaml
to include your new handler and update necessary permissions.
Resources:
AlarmContextFunction:
Type: AWS::Serverless::Function
Handler: lambda_function.alarm_handler
Runtime: python3.12
Policies:
- Statement:
- Effect: Allow
Action:
- new-service:Describe*
Resource: "*"
Add necessary permissions:
Ensure that your new handler has the required permissions by updating the template.yaml
file as shown above.
Trigger an Alarm: Manually trigger an alarm using the following command, replacing <alarm_name> with the name of your alarm:
aws cloudwatch set-alarm-state --state-value ALARM --state-reason "Testing" --alarm-name "<alarm_name>"
Use the test cases generated in the logs:
The main Lambda function generates a test case that can be used in the Lambda console. See Testing Lambda functions in the console or by using sam remote invoke
.
Open the CloudWatch console
In the navigation pane, choose Logs, and then choose Logs Insights.
In the Select log group(s) drop down, choose /aws/lambda/alarm-context-tool-AlarmContextFunction-xxxxxxxxxxxx
Enter the following query, replacing <alarm_name> with the name of your alarm:
fields @timestamp, @message, @logStream, @log
| filter message = "test_case" AND Records.0.Sns.Message like /<alarm_name>/
Choose Run query
Expand a log entry and copy the entire @message field.
You can then use this to test your Lambda function on demand.
The following environment variables can be configured for the Lambda function:
AWS_LAMBDA_LOG_LEVEL
: Sets the log level for AWS Lambda logs (e.g., INFO, DEBUG). Default is INFO
.ANTHROPIC_VERSION
: Specifies the version of the Anthropic model to be used. Default is bedrock-2023-05-31
.BEDROCK_MODEL_ID
: The ID of the Amazon Bedrock model to use. Default is anthropic.claude-3-sonnet-20240229-v1:0
.BEDROCK_REGION
: The AWS region where the Bedrock model is deployed. Default is us-east-1
.BEDROCK_MAX_TOKENS
: The maximum number of tokens to be used by the Bedrock model. Default is 4000
.METRIC_ROUNDING_PRECISION_FOR_BEDROCK
: The precision for rounding metrics before sending to Bedrock. Default is 3
.POWERTOOLS_LOG_LEVEL
: Sets the log level for AWS Lambda Powertools logs (e.g., INFO, DEBUG). Default is INFO
.POWERTOOLS_LOGGER_LOG_EVENT
: Enables logging of the full event in Lambda Powertools logs. Default is True
.POWERTOOLS_SERVICE_NAME
: The name of the service to be used in Lambda Powertools. Default is Alarm
.POWERTOOLS_TRACER_CAPTURE_RESPONSE
: Controls whether to capture the response in tracing. Default is False
.RECIPIENT
: The email address to receive notifications.SENDER
: The sender's email address for notifications.USE_BEDROCK
: Enables or disables the use of Amazon Bedrock for generative AI. Default is True
.To configure these variables, update the template.yaml
file:
Resources:
AlarmContextFunction:
Type: AWS::Serverless::Function
Handler: lambda_function.alarm_handler
Runtime: python3.12
Environment:
Variables:
AWS_LAMBDA_LOG_LEVEL: INFO
ANTHROPIC_VERSION: bedrock-2023-05-31
BEDROCK_MODEL_ID: anthropic.claude-3-sonnet-20240229-v1:0
BEDROCK_REGION: us-east-1
BEDROCK_MAX_TOKENS: 4000
METRIC_ROUNDING_PRECISION_FOR_BEDROCK: 3
POWERTOOLS_LOG_LEVEL: INFO
POWERTOOLS_LOGGER_LOG_EVENT: "True"
POWERTOOLS_SERVICE_NAME: Alarm
POWERTOOLS_TRACER_CAPTURE_RESPONSE: "False"
RECIPIENT: [email protected]
SENDER: Name <[email protected]>
USE_BEDROCK: "True"
functions_logs
)log_group_name
(str): The name of the log group.start_time
(str): The start time for the query.end_time
(str): The end time for the query.query
(str): The Logs Insights query.functions_metrics
)dashboard_metrics
(list): The list of metrics for the dashboard.annotation_time
(str): The annotation time for the dashboard.start
(str): The start time for the dashboard.end
(str): The end time for the dashboard.region
(str): The AWS region.functions_xray
)trace_ids
(list): The list of trace IDs to process.start_time
(str): The start time for the trace processing.end_time
(str): The end time for the trace processing.region
(str): The AWS region.See CONTRIBUTING for more information.
This library is licensed under the MIT-0 License. See the LICENSE file.