A misstep in blocking a phishing URL within Cloudflare’s R2 object storage platform triggered a widespread outage, temporarily taking down multiple services for nearly an hour.
Cloudflare R2, a scalable and cost-effective object storage service similar to Amazon S3, supports S3 compatibility, multi-location data replication, and seamless integration with Cloudflare services. The disruption occurred when an employee attempted to address an abuse report regarding a phishing link hosted on R2. However, instead of restricting access to the specific endpoint, they mistakenly disabled the entire R2 Gateway service.
In its post-mortem report, Cloudflare explained, “During a routine abuse remediation, action was taken on a complaint that inadvertently disabled the R2 Gateway service instead of the specific endpoint/bucket associated with the report. This was a failure of multiple system-level controls (first and foremost) and operator training.”
The incident, lasting 59 minutes from 08:10 to 09:09 UTC, not only affected R2 Object Storage but also disrupted other services. Video uploads and streaming on Cloudflare Stream failed entirely, as did image uploads on Cloudflare Images. Cache Reserve operations saw a 100% failure rate, causing increased origin requests, while Cloudflare’s Vectorize AI service experienced 75% query failures and total disruption of insert, upsert, and delete operations. Log Delivery suffered delays and data loss, with up to 13.6% of R2-related logs missing, alongside a 4.5% loss for non-R2 jobs.
Several other Cloudflare services faced partial failures, including Durable Objects, which saw a 0.09% increase in error rates due to reconnections after recovery. Cache Purge encountered a 1.8% spike in HTTP 5xx errors and a tenfold latency increase, while Workers & Pages deployments experienced a 0.002% failure rate, impacting projects utilizing R2 bindings.
Cloudflare acknowledged that both human error and inadequate system safeguards contributed to the incident. As an immediate fix, the company has disabled the ability to shut down services from the abuse review interface and restricted the Admin API to prevent similar disruptions in internal accounts. Future security measures will include improved account provisioning, stricter access control, and a mandatory two-party approval process for high-impact actions.
Bijay Pokharel
Related posts
Recent Posts
Subscribe
Cybersecurity Newsletter
You have Successfully Subscribed!
Sign up for cybersecurity newsletter and get latest news updates delivered straight to your inbox. You are also consenting to our Privacy Policy and Terms of Use.