Cloudflare‘s R2 object storage and dependent services experienced a 1-hour-and-7-minute outage, resulting in 100% write failures and 35% read failures globally.

The disruption, which lasted from 21:38 UTC to 22:45 UTC, was caused by a misconfigured credential rotation that left the R2 Gateway API frontend unable to authenticate with backend storage.

The issue stemmed from an operator error, where new credentials were mistakenly deployed to a development environment instead of production. When the old credentials were deleted, the production service was left without valid authentication. The root cause was the omission of a single command-line flag (--env production), which should have directed the credentials to the correct environment.

Cloudflare explained that the outage was not immediately apparent due to a delay in credential deletion propagation, making detection and remediation slower. The company acknowledged that better validation of active authentication tokens could have sped up the resolution.

While no customer data was lost or corrupted, the outage significantly affected Cloudflare services:

  • R2: Complete write failures and partial read failures (cached data was still accessible).
  • Cache Reserve: Increased origin traffic due to failed reads.
  • Images & Stream: Uploads failed, and delivery dropped to 25% for Images and 94% for Stream.
  • Other services impacted: Email Security, Vectorize, Log Delivery, Billing, and Key Transparency Auditor suffered varying degrees of degradation.

To prevent future incidents, Cloudflare is improving credential logging, verification, and deployment automation to reduce human errors. The company is also implementing stricter dual validation for high-risk actions, enhancing health checks, and refining standard operating procedures (SOPs) for credential management.

READ
Cloudflare Shuts Down HTTP Access for Its API, Enforcing HTTPS-Only Connections