Troubleshooting Production

Outage, site is down

DNS

Symptoms

  • "This site can’t be reached" in Chrome

  • Requests timing out

Diagnose

On a unix command line, run dig <your CiviForm domain>. (Note: Remove the protocol -- http or https -- from the front of your domain before running the command.) There should be a CNAME entry that points to an AWS load balancer e.g. seattle-civiform-lb-2038295446.us-west-1.elb.amazonaws.com.

Confirm that the CNAME record matches the public domain for your AWS application load balancer by visiting the AWS console EC2 > Load Balancing > Load Balancers and finding the load balancer for your prod deployment.

Resolution

If the CNAME entry is missing or does not match the DNS name you find in AWS, add or update a CNAME entry in your domain registrar with the application load balancer's DNS name.

Server can't start

Symptoms

  • "This site can’t be reached" in Chrome

  • Requests timing out

Diagnose

View the ECS cluster for your prod deployment in AWS by going to ECS > Clusters and clicking the cluster for your production deployment. There should be at least one healthy task. If all tasks are unhealthy or unknown the server is unable to start.

If no tasks are healthy, view the server logs (see Server errors below). Look for stack traces and error messages.

Resolution

If you have just deployed, revert your CiviForm version number to the previous version you deployed and re-deploy.

Contact the CiviForm maintainers and include any errors you found in the server logs.

Server errors

Symptoms

Server returns 400 or 500 level errors or pages with short, plaintext messages stating an error message.

Resolution

If you have deployed recently, consider reverting your CiviForm version number to the previous version you deployed and re-deploying.

Investigate the server logs. Report any errors you find along with complete stack traces to the CiviForm maintainers. To view the server logs in the AWS console go to CloudWatch > Logs > Log groups, select the log group for your production deployment and view the combined log stream.

Authentication errors

Symptoms

Users are unable to log in.

Resolution

Check with maintainers of the admin and applicant auth systems if there have been recent changes. Check the release notes for your CiviForm version to see if it mentions auth changes. If errors are occuring after the user successfully authenticates with the identity backend and redirects back to CiviForm there is likely a server error involved. Check server logs for errors.

Contact CiviForm maintainers with details of the investigation.

If a user has created an account as an applicant and then is added as an admin with the same email, they may see an error when logging in. When checking the logs, it'll show Profile already contains an authority ID: Optional[iss:xxx] - which is different from the new authority ID: Optional[iss:yyy]. To fix this error, the account admin will need to access the database and update the particular account to use the new authority ID. First, you can find the existing account by runnning SELECT * FROM accounts where authority_id = 'iss:xxx' (with the original authority_id). Once you confirm there is one account listed and it is the correct account (see note below), you then can run UPDATE accounts SET authority_id = 'iss:yyy' WHERE authority_id = 'iss:xxx';

NOTE: It should be strongly verified that the user/account is correct. Changing this without care is a security issue as now the "new" account has access to the system.

Last updated