Responsive exponential backoff in Go

Rate Limiting

Often when we use various cloud services we end up seeing throttling or rate limiting from the service
Various cloud services are often shared across multiple tenants and rate limiting mechanisms are used to prevent one tenant consuming much more resources than allocated/provisioned limits

Client side throttling

In order to handle the service throttling, client needs to add a delay between subsequent calls
This can achieved via a linear back off or an exponential back off

Linear back off

Simplest approach is to start with a base sleep time and add sleep time for each failure
Say 1 secs as base sleep time. For every request that is failed due to rate limiting, we add another 1 seconds
So, it will go like this 1s, 2s, 3s, 4s, 5s, ... and stops growing once request is successful

Code

// for each failure we add interval to current delay
func sleepTime(currentDelay, interval, maxInterval int64) int64 {
    currentDelay += interval
    if currentDelay > maxInterval {
        currentDelay = maxInterval
    }
    return currentDelay
}

Exponential back off

In this case, instead of increasing the sleep time by a constant time, it is multiplied by a factor
If this multiplication factor is 2 and initial sleep is 1 second
Then, it will grow exponentially as 1s, 2s, 4s, 8s, 16s, ...
This will be the better than linear back off as it can get successful delay with less iterations

Code

// for each failure we multiple the current delay by a multiplication factor
func sleepTime(currentDelay, initialDelay, maxInterval int64, multiplier float64) int64 {
    if currentDelay == 0 {
       return initialDelay
    }
    currentDelay = int64(float64(currentDelay) * multiplier)
    if currentDelay > maxInterval {
        return maxInterval
    }
    return currentDelay
}

Rate Limiting in DynamoDB

DynamoDB is a Fast, flexible NoSQL database service from AWS
DynamoDB provides two capacity modes for each table: on-demand and provisioned.
Irrespective of the capacity mode, DynamoDB will reject incoming write requests when it exceeds WCUs (Write Capacity Units)
I built this responsive exponential back off to handle this rate limiting
But, same can be used for handling rate limiting from any service
As part of a job, multiple workers are writing to the DynamoDB in parallel

Responsive Exponential back off

In addition to exponential back off, this one is responsive in nature
As we see more and more requests succeeding it will start decreasing the delay time
Down multiplier factor is used once down multiplier threshold is met. When we see N successful requests, down multiplier is applied
Additionally a RandomizationFactor is applied while scaling up and down.
Say RandomizationFactor=0.2, current delay is 1s and up multiplier 2, on next failure it can become (1 x 2) +/- 0.2 * (1 x 2) = 1.6 to 2.4
RandomizationFactor aids in getting slightly different new interval when applying multipliers

AutoSleeper

Following configurations are used for the sleeper.

type AutoSleeper struct {
  InitialInterval         time.Duration  // Used for sleeping first time
  MaxInterval             time.Duration  // Max interval for sleeping
  MaxRandomization        time.Duration  // Max randomization interval
  UpMultiplier            float64        // Multiplied for increasing the sleep time
  DownMultiplier          float64        // Multiplied for decreasing the sleep time
  RandomizationFactor     float64        // Randomize the new sleep value 
  DownMultiplierThreshold int            // Threshold for triggering sleep time reduction
}

Go Code

Source

https://github.com/sakthipriyan/go-utils/blob/main/sleeper.go

package main

import (
  "math/rand"
  "time"
)

const (
  DefaultMaxInterval             = 15 * time.Minute
  DefaultInitialInterval         = 500 * time.Millisecond
  DefaultRandomizationFactor     = 0.3
  DefaultMaxRandomization        = 2 * time.Minute
  DefaultUpMultiplier            = 1.5
  DefaultDownMultiplier          = 0.9
  DefaultDownMultiplierThreshold = 10
)

func NewAutoSleeper() *AutoSleeper {
  return &AutoSleeper{
    MaxInterval:             DefaultMaxInterval,
    InitialInterval:         DefaultInitialInterval,
    RandomizationFactor:     DefaultRandomizationFactor,
    MaxRandomization:        DefaultMaxRandomization,
    UpMultiplier:            DefaultUpMultiplier,
    DownMultiplier:          DefaultDownMultiplier,
    DownMultiplierThreshold: DefaultDownMultiplierThreshold,
  }
}

type AutoSleeperMetrics struct {
  TotalInvocation int
  TotalWentUp     int
  TotalWentDown   int
  TotalSlept      int
  TotalSleepTime  time.Duration
}

type AutoSleeper struct {
  InitialInterval         time.Duration
  MaxInterval             time.Duration
  MaxRandomization        time.Duration
  UpMultiplier            float64
  DownMultiplier          float64
  RandomizationFactor     float64
  DownMultiplierThreshold int

  metrics         AutoSleeperMetrics
  currentInterval time.Duration
  currentSuccess  int
}

func (s *AutoSleeper) GetMetrics() AutoSleeperMetrics {
  return s.metrics
}

func (s *AutoSleeper) SleepOnFailure() {
  s.metrics.TotalInvocation += 1
  s.goUp()
  s.sleep()
}

func (s *AutoSleeper) SleepOnSuccess() {
  s.metrics.TotalInvocation += 1
  if s.currentInterval == 0 {
    return
  }
  s.currentSuccess += 1
  if s.currentSuccess == s.DownMultiplierThreshold {
    s.goDown()
    s.currentSuccess = 0
  }
  s.sleep()
}

func (s *AutoSleeper) sleep() {
  s.metrics.TotalSleepTime += s.currentInterval
  s.metrics.TotalSlept += 1
  time.Sleep(s.currentInterval)
}

func (s *AutoSleeper) goDown() {
  s.metrics.TotalWentDown += 1
  interval := float64(s.currentInterval) * s.DownMultiplier
  random := getNextRandomInterval(interval, s.RandomizationFactor, float64(s.MaxRandomization))
  if random < float64(s.InitialInterval) {
    s.currentInterval = 0
    return
  }
  s.currentInterval = time.Duration(random)
}

func (s *AutoSleeper) goUp() {
  s.metrics.TotalWentUp += 1
  if s.currentInterval == 0 {
    s.currentInterval = s.InitialInterval
    return
  }
  interval := float64(s.currentInterval) * s.UpMultiplier
  random := getNextRandomInterval(interval, s.RandomizationFactor, float64(s.MaxRandomization))
  if random > float64(s.MaxInterval) {
    s.currentInterval = s.MaxInterval
    return
  }
  s.currentInterval = time.Duration(random)
}

func getNextRandomInterval(currentInterval, randomizationFactor, maxRandomization float64) float64 {
  if randomizationFactor == 0 {
    return currentInterval
  }
  delta := randomizationFactor * currentInterval
  if delta > maxRandomization {
    delta = maxRandomization
  }
  randomization := 2 * delta * rand.Float64()
  minInterval := currentInterval - delta
  return minInterval + randomization
}

Usage

s := NewAutoSleeper()
s.SleepOnFailure() // Uses up multiplier
s.SleepOnSuccess() // Uses down multiplier on N threshold

Graph of delay generated

For the following configuration,

InitialInterval = 1 millisecond
MaxInterval = 15 minutes
MaxRandomization = 2 minutes
UpMultiplier = 1.5
DownMultiplier = 0.6
RandomizationFactor = 0.0
DownMultiplierThreshold = 5

Y axis represents the delay time generated for each request
Responses succeeded are shown as blue and failed ones are shown in red
Initially, requests were failing till it reached 291.9 ms
Later point, delay of greater than 60 ms were succeeding
As shown in the graph, responsive exponential back off quickly generates suitable delay
It goes up and down as the requests were failing and suceeding respectively
Setting an appropriate value for DownMultiplierThreshold controls how frequently we want to reduce the delay
Setting an appropriate value for DownMultiplier controls how fast we want to reduce the delay
Setting an appropriate values for RandomizationFactor and MaxRandomization controls the noise introduced while scaling up/down

Usage of Responsive Exponential back off

In Auto scaling DynamoDB use case, DynamoDB will increase the capacity as it sees more load, but this will take time, not instantaneous
Each job is run by multiple workers. Workers count increase as the job started and decrease when individual tasks are completed
So this back off, initially increases the delay and tries to succeed
Once requests are succeeding, it will try to reduce the delay
This will be auto balancing as with the increase/decrease of workers as well as DynamoDB WCUs