On Tuesday 14th September at about 1930 UTC all Planet 4 sites were unavailable due to an unexpected result of a Cloudflare change made by Terraform code.
Forensic:
All P4 sites were unavailable for around 9 minutes giving a 403 error from Cloudflare.
Incident description
As part of the IT Operations move to Infrastructure as Code, the team had recently deployed the Cloudflare DNS records as code using Terraform in Gitlab. An approved change had been scheduled which included updating the CNAME record for www.greenpeace.org. The change was minor, a label and destination name change to an identical end point. The consequence however was that Terraform deleted the record entirely causing the Cloudflare edge SSL certificate to also be deleted. A rollback was executed immediately however there was a short delay in Cloudflare generating the new certificate.
Expected behavior when fixed
P4 accessible via SSL certificate provided by Cloudflare.
SLO
Not currently applicable for this incident
Communication
Audience : #p4-general slack channel
Last communication : 13th Sept 19:44 UTC