Monitoring

CiviForm supports server monitoring via Prometheus metrics visualized in Grafana. These metrics are related to server status, things like latency and error rates, and do not contain sensitive data, such as who is applying to what programs.

Enabling metrics export

Exporting metrics from the server is optional, and must be enabled by setting the CIVIFORM_SERVER_METRICS_ENABLED environment variable to true. When enabled, the server exports metrics from the /metrics HTTP route. While these metrics data are not sensitive, it is a good practice to prevent access to this route from the public internet (which is done by default when using the CiviForm terraform deployment system).

Viewing metrics

AWS

The CiviForm terraform deployment system for AWS deploys the monitoring stack automatically. After deployment, user access to the Grafana dashboard and configuration of the dashboard need to be done manually.

Configure access

AWS Managed Grafana uses AWS IAM Identity Center for access management.

Note that this is a different service from AWS IAM. The accounts/user profiles in AWS IAM Identity Center are completely separate from accounts in AWS IAM.

  1. Login to the AWS console and navigate to the IAM Identity Center service

  2. In the left nav, click "Users"

  3. For each user you'd like to grant access to viewing metrics, click the "Add user" button and follow the workflow

  4. Follow instructions here for adding users to your Grafana workspace

  5. In AWS console for your Grafana workspace, grant permissions to the users you added for the workspace

Configure dashboard

Once you have an Identity Center user with permissions to administer your Grafana workspace, it's time to configure the workspace dashboard. From the Grafana workspace page in the AWS console, click the link under "Grafana workspace URL". After signing in this will take you to your Grafana workspace.

To enable viewing metrics, add the Prometheus server as a data source for your Grafana workspace:

  1. In AWS console for your Grafana workspace, click on the 'Data sources' tab

  2. Click on the 'Configure in Grafana' link on the 'Amazon Managed Service for Prometheus' row

  3. Select the region where CiviForm is deployed

  4. Select the '[deployment name]-CiviForm_metrics' row

  5. Select 'add 1 data source'

With Prometheus connected as a Grafana workspace, panels can now be created in Grafana that display metrics from the CiviForm server. There are many metrics available, and many ways to display them. You can get started with the some basic metrics by importing a pre-built CiviForm dashboard. This pre-built dashboard includes:

  • Requests per minute, split out by controller action

  • Requests per minute, split out by URL path pattern

  • Client errors (4XX status codes) per minute, split out by controller action and status code

  • Server errors per minute (500 status code), split out by controller action

  • 50th percentile 5-minute trailing average request latency, split out by controller action

  • 90th percentile 5-minute trailing average request latency, split out by controller action

  • 99th percentile 5-minute trailing average request latency, split out by controller action

  • Database query counts and latency

  • Email send counts and latency

  • Address correction / lookup counts and latency (if address correction is enabled)

To import this pre-built dashboard:

  1. Click on the "Data Sources" tab and select the Prometheus data source

  2. Change the Name value to PROMETHEUS_DATA

  3. Hover over the "+" icon in the left nav

  4. Click the "Import" option

  5. Paste the JSON here into the "Import via panel JSON"

  6. Click "Load"

  7. Fill in the details for the imported dashboard, selecting your CiviForm prometheus instance for the data source

Additional AWS monitoring

In addition to the server metrics provided by Prometheus, there are some additional places within AWS you can go to see metrics.

CloudWatch

Dashboards

CloudWatch has some default dashboards that allow you to see graphs with metrics on different parts of the deployment. Not all of these are relevant, but these can be helpful in seeing CPU utilization in RDS (CiviForm's PostgreSQL database) and ECS (server hosting), as well as requests to the ALB (load balancer), metrics about S3 (file storage), etc.

Alarms

AWS alarms allow you to see when metrics for an AWS service reach a given threshold, and can trigger an action, such as autoscaling.

Some alarms are configured by default through the CiviForm deployment system, including the following:

  • ECS:

    • High CPU alarm

      • Related variables: ECS_MAX_CPU_THRESHOLD, ECS_MAX_CPU_EVALUATION_PERIOD, ECS_MAX_CPU_PERIOD, ECS_SCALE_TARGET_MAX_CAPACITY

    • Low CPU alarm

      • Related variables: ECS_MIN_CPU_THRESHOLD, ECS_MIN_CPU_EVALUATION_PERIOD, ECS_MIN_CPU_PERIOD, ECS_SCALE_TARGET_MIN_CAPACITY

  • RDS:

    • High CPU alarm

      • Related variables: RDS_CREATE_HIGH_CPU_ALARM, RDS_MAX_CPU_UTILIZATION_THRESHOLD

    • High disk queue depth alarm

      • Related variables: RDS_CREATE_HIGH_QUEUE_DEPTH_ALARM, RDS_DISK_QUEUE_DEPTH_HIGH_THRESHOLD

    • Low disk space alarm

      • Related variables: RDS_CREATE_LOW_DISK_SPACE_ALARM, RDS_DISK_FREE_STORAGE_LOW_THRESHOLD

    • Low freeable memory alarm

      • Related variables: RDS_CREATE_LOW_MEMORY_ALARM, RDS_LOW_MEMORY_THRESHOLD

When the ECS alarms get triggered, an autoscaling policy is set up for a task to be added or removed.

For the RDS alarms, the field RDS_ALARM_EMAIL can be set for an email to be sent to the specified email when an alert gets triggered.

There are also the following alarms that can be enabled for RDS, but aren't created by default:

  • Low CPU credit alarm

    • Related variables: RDS_CREATE_LOW_CPU_CREDIT_ALARM, RDS_LOW_CPU_CREDIT_BALANCE_THRESHOLD

  • Low disk burst alarm

    • Related variables: RDS_CREATE_LOW_DISK_BURST_ALARM, RDS_DISK_BURST_BALANCE_LOW_THRESHOLD

  • High memory swap usage alarm

    • Related variables: RDS_CREATE_SWAP_ALARM, RDS_HIGH_SWAP_USAGE_THRESHOLD

  • Anomalous connection count alarm

    • Related variables: RDS_CREATE_ANOMALY_ALARM, RDS_ANOMALY_BANDWIDTH

  • Maximum transaction IDs too high alarm

    • Related variable: RDS_CREATE_TRANSACTION_ID_WRAPAROUND_ALARM, RDS_MAX_USED_TRANSACTION_IDS_HIGH_THRESHOLD

These alarms can be enabled by setting the first related variable listed to true in the civiform_config.sh file of the forked civiform-deploy repository.

For each of the RDS alarms, the variables RDS_ALARM_EVALUATION_PERIOD and RDS_ALARM_STATISTIC_PERIOD also apply.

These alarms can be viewed through the AWS management console by clicking All alarms in the CloudWatch menu.

The related variables can be added to the civiform_config.sh file of the forked civiform-deploy repository to customize the alarm settings or disable / enable certain alarms.

If there are other alarms that you wish to add, please let us know.

RDS database metrics

If you navigate to RDS in the AWS Console and click Databases in the navigation menu, you will see your database. In clicking on the database, you can see some basic metrics, like the current CPU percentage and activity. Additional metrics, which are the same as those in CloudWatch can be seen in the Monitoring section, and you can click other tabs (Logs, Configuration, etc.) to understand more about the Database configuration.

Database customization variables, including the one for the instance class and storage amount can be added to the civiform_config.sh file of the forked civiform-deploy repository to customize the configuration.

ECS metrics

If you navigate to Elastic Container Service in the AWS Console and click Clusters in the menu, you'll see the CiviForm cluster. When you click on the service, you can see health metrics (similar to those in CloudWatch).

In the Configuration section of the service, you can see current Auto Scaling policies.

By default, there is a high and low CPU auto-scaling policy, which adds or removes a task if the CPU is higher / lower than the alarm thresholds (mentioned above).

You can change the task and memory sizes by updating the variables ECS_TASK_CPU and ECS_TASK_MEMORY in the civiform_config.sh file of the forked civiform-deploy repository.

Last updated