Token Bucket Algorithm – The Backbone of Modern Rate Limiting

March 1, 2026 | by Shobhit Pandey

pexels-cottonbro-8721342

In scalable backend systems, rate limiting is essential to protect APIs from abuse, prevent system overload, and ensure fair usage among users. One of the most widely used algorithms for this purpose is the Token Bucket Algorithm.

Major tech companies like Amazon, Google, and Stripe use rate limiting mechanisms based on token bucket or its variants to manage API traffic efficiently.

Let’s understand how it works in a simple way.


What is the Token Bucket Algorithm?

Imagine you have a bucket that holds tokens.

  • The bucket has a fixed capacity (say 10 tokens).
  • Tokens are added to the bucket at a fixed rate (e.g., 1 token per second).
  • Each incoming request must consume 1 token.
  • If the bucket has tokens → request is allowed.
  • If the bucket is empty → request is rejected (or delayed).

That’s it. Simple and powerful.


Simple Real-World Example

Let’s say:

  • Bucket capacity = 5 tokens
  • Refill rate = 1 token per second

Scenario 1: Normal Traffic

A user sends 1 request per second.

  • Every second, 1 token is added.
  • Every second, 1 token is consumed.
  • The system runs smoothly.
  • No request is blocked.

Scenario 2: Sudden Burst

The user suddenly sends 5 requests at once.

  • If the bucket is full (5 tokens), all 5 requests are allowed.
  • Bucket becomes empty.
  • Next request must wait until tokens refill.

This is why the token bucket algorithm allows burst traffic while still controlling long-term rate.


Why Companies Prefer Token Bucket

Compared to fixed window or sliding window algorithms, token bucket:

  1. Allows short bursts
  2. Smoothly controls long-term rate
  3. Prevents system overload
  4. Simple to implement
  5. Efficient for distributed systems

For example:

  • Payment APIs like Stripe must allow short bursts (checkout spikes).
  • Cloud providers like Amazon Web Services need to fairly distribute API usage across millions of users.

Technical Intuition

The algorithm maintains:

  • capacity → Maximum tokens
  • refillRate → Tokens added per second
  • currentTokens
  • lastRefillTimestamp

Whenever a request comes:

  1. Calculate how many tokens to refill.
  2. Update bucket.
  3. If token available → allow request and decrement.
  4. Else → reject or queue.

Time complexity: O(1) per request.


Why It’s Perfect for Modern APIs

If you’re building scalable systems (like your POS backend or any SaaS API), token bucket is ideal because:

  • It handles traffic spikes gracefully.
  • It works well with Redis for distributed rate limiting.
  • It prevents brute-force attacks.
  • It ensures fair usage across users or organizations.

Final Thought

The Token Bucket Algorithm is a beautifully simple yet powerful concept:

“Control the flow, but allow flexibility.”

That balance between burst handling and rate control is why it remains one of the most widely adopted rate-limiting strategies in modern distributed systems.

If you’re building production-grade APIs, this algorithm is almost a must-know.

RELATED POSTS

View all

view all

Building or Scaling a Digital Product?

Stop guessing. Get expert guidance on whether you need a website, web app, or custom solution — tailored to your business.