I’ve been exhausting my Codex account’s usage limits, so I’m borrowing my friends’ ChatGPT subscriptions to use more accounts. Now, I’ve got 7. Even with more accounts, I’m still hitting my limits in the middle of my sessions. How can I be interrupted less?
Let’s start by considering some algorithms. When I start a session with codex,
should I pick the account:
What’ll minimize interruption duration and frequency1? Does it even matter? It’s not immediately obvious to me. The most popular Codex multi-accounting solution on GitHub, codex-lb, proposes three load-balancing strategies:
(secondary_used_percent, primary_used_percent, last_selected_at, account_id),remaining_secondary_credits.We’ll measure these algorithms’ interruption characteristics, improve on codex-lb’s algorithms, and design our own multi-accounting Codex wrapper to use usage limit optimizations codex-lb currently can’t.
First, we’ll need to understand how usage limits work.
An account has two usage quotas: the five hour quota, , and weekly quota, . Empirically, I’ve found that the five hour quota is 12% of the weekly quota.
Usage quota is replenished via two refresh timers: the time until the five hour and weekly quota refreshes, and . Note, refresh timers do not refresh at a fixed period! Instead, a timer first starts when an account starts a session without an active timer.
Usage quota is depleted during sessions. If the session’s account runs out of quota, the session must be paused until quota refreshes2. This is how interruptions happen.
We’ll assume each algorithm must load-balance an infinite list of sessions, arriving at some rate according to a Poisson process. Session consumes quota , sampled from an exponential distribution with parameter .
Try it yourself!
Given , , , , , and 3, design an algorithm to minimize cumulative wait duration and frequency.
Your goal is to choose which account should handle an incoming session. See how your solution compares to a random baseline, codex-lb’s algorithms, and three new algorithms I’ve tossed into the ring: “greedy frequency”, “greedy duration”, and “phase targeting”.
If you’re interested, you can inspect the source for all algorithms in the widget.
The three algorithms do surprisingly well. How do they work?
“greedy frequency” picks the account with the most immediately usable quota, minimizing the chance the next session hits a limit.
“greedy duration” estimates each account’s average wait for the next session, then picks the smallest one.
“phase targeting” tries to stagger refresh timers and spends quota closer to refreshes.
It seems like, of the codex-lb algorithms, “usage weighted” performs the best overall, despite “capacity weighting” being the default. Comparing usage weighting to the greedy algorithms, the greedy algorithms do much better. We’ll scale up from a single trial later.
The difference in performance is sometimes dramatic, sometimes not; it depends on the parameters. Specifically, on the relationship between quota depletion rate, , and the quota replenishment rate, :
Intuitively, if you’re either rapidly maxing out all your usage limits or barely dipping into them, your load balancing algorithm is a weak differentiator for interruption duration and frequency.
When , the relative performance of algorithms can still vary by workload. For example, usage weighting prefers many small tasks to few large tasks.
What else can we do to improve performance besides use these greedy algorithms?
At the moment, codex-lb doesn’t allow you to move sessions between accounts or proactively refresh account timers. My shell wrapper around Codex, cx, is designed to work around codex-lb’s limitations, providing smaller interruption duration and frequency.
All sessions are JSONL files under the CODEX_HOME/sessions directory. When an
account runs out of quota, we can simply copy its session file from the drained
account to a fresh one, and then resume it on the new account with
codex resume. This would be useful, since you’d only be interrupted when all
of your accounts are exhausted, instead of just one.
Moving a session to another account is not free, since moving sessions uncaches the context window’s prefix4. Broadly, it’s cheap to move and uncache fresh sessions, but expensive to move and uncache nearly finished ones.
In practice, I’ve found this cost negligible.
By starting a small session with to an account, you could kickstart an account’s timer before a real session is scheduled. This would be useful, because it increases the algorithm’s quota replenishment rate, , for effectively no cost.
Besides maximizing the frequency of each account’s refresh, we can choose the phase of each account’s refresh. Empirically, I’ve found staggering account’s refresh phase performs much better than synchronizing them, but feel free to compare.
Try it yourself!
Again, try writing your own algorithm! Also, try to predict the impact of the two above optimizations.
With session movement, phase targeting does exceptionally well. However, two runs in your browser isn’t very good evidence of what algorithm to use, so we’ll need to scale up.
I’ve averaged the performance of the algorithms over 2048 runs ahead of time. To avoid combinatorial explosion, instead of varying and independently, we’ll fix , and add a granularity slider to control how chunky the tasks are.
With session movement enabled, the algorithm that performs the best, regardless of setting, is phase targeting.
By far, the best optimization for avoiding interruptions is session movement. Averaging over all options, cumulative interruption time drops from 54,120 hours to 289 hours, and interruptions from 11,252 to 114; 99.5% and 99.0% reductions, respectively!
This is interesting to me, because the largest interruption decrease did not come from abstracting codex-lb’s quota model as a theoretical computer science problem. Rather, by having a better grasp of Codex’s practical details, we could rewrite the quota model into something more amicable.
Because of these results, cx uses phase targeting, session movement, and
staggered refresh phases.
Each account’s CODEX_HOME points to a folder that contains symlinks to one
shared CODEX_HOME, except for the auth.json, which is unique per account:
typeset -g CX_HOME="$HOME/.codex"
typeset -g CX_AUTH="$CX_HOME/auth"
typeset -g CX_HOMES="$CX_HOME/homes"
_cx_home() {
local home="$CX_HOMES/$1" file
mkdir -p "$home" || return
for file in "$CX_HOME"/*(ND); do
ln -sfn "$file" "$home/${file:t}" || return
done
ln -sfn "$CX_AUTH/$1.auth.json" "$home/auth.json"
}
Now, since the sessions folder is shared, we can simply run cx resume on a
halted session to resume it on a new account. The UX is convenient since
configs, selected models, memories, skills, and whatever new per-account state
Codex adds will be shared between accounts too.
To proactively stagger refreshes across accounts, I use a straightforward cron
job than runs cx [n] exec on a dummy prompt every 43 minutes, incrementing
n.
To summarize:
cx GitHub repository.With more information about your usage, you can probably do much better than the algorithms shown in this post. For example, if you allow arbitrary time varying distributions for session size and session arrival, you can probably trigger the refreshes more advantageously for your work schedule.
We could use other metrics, like time to first interruption, cumulative overdrawn quota, or some metric which weighs frequency and duration according to your preferred parameters. Each metric proxies a piece of the pain response I feel when I hit my quota limit, and no metric will proxy it perfectly. ↩
In practice, you might not always wait for quota to refresh. Maybe if the
time to refresh is short, but you might otherwise reprompt Codex in a new
session, or copy the session over and codex resume from a new account
(shoutout albert for putting me on). We’ll model this
later. ↩
You might notice that, in this model, the algorithm doesn’t know how large the incoming session is. In the real world, this isn’t always true—you (or an automated system) could provide an estimate.
However, I think the complexity incurred by incorporating size information isn’t worth the benefit. More critically than the time and cognitive cost of a less digestible model being high, I think the benefit is low: estimating the effort it takes to complete sessions is generally very hard, especially when working with AI agents.
This tradeoff may not make sense for your use case—maybe it’s easy to estimate how much quota your sessions will consume, and maybe you rarely engage in multi-turn conversations with your agents—and that’s fine. For my use case, I’d rather not. ↩
This model of moving cost is approximate, since the cost of the task doesn’t contain the necessary information for computing its cost to move.
We could come up with a more formal model for uncaching costs by using the ratio of uncached input tokens to input tokens, but I think this would detract from the point of this post. ↩