There is this tiny statistics problem in IT security that almost nobody talks about, yet I have seen people get it wrong many times in the past: Calculation of the success probability of brute-force attacks against TOTP two-factor authenticators.
As a reminder: TOTP tokens are defined in RFC 6238 ("TOTP: Time-Based One-Time Password Algorithm"). They usually consist of six-digit strings that change every 30 seconds. The entire sequence of tokens is derived from a secret (and the current time) in a way that makes token prediction without the secret impossible. TOTP tokens are commonly used as a second factor, in addition to a user’s password, reducing the impact of a potential password compromise.
In various engagements I have seen TOTP validation implementations that do not rate-limit the process of entering and validating the TOTP token. Due to the limited search space (6 digits = “only” 1000000 possible tokens) this can render the second factor ineffective and allow attackers to bypass 2FA. When documenting that finding, it’s usually a good idea to provide the client with some numbers answering the question “How long does it take?”. However, over the years I have seen several colleagues calculate that number incorrectly, which highlights how easy it is to overlook some of the statistical nuances involved.
Naive Approach Link to heading
The naive (and wrong) approach is as follows: Assuming 100 guesses per second (rate R) and a search space of 1000000 possible tokens (6 digits, search space N) the attack will be successful after:
Unfortunately, this calculation underestimates the fact that in reality the window of opportunity (for an attacker) “jumps” every 30 seconds. That means, there is usually1 a possibility that the attacker might never guess the correct code, even after billions of guesses - if the window always “jumps” away from their next guess.
Better Calculation Link to heading
A much better approximation can be made by looking at each window individually. For each window of length T, the probability of brute-forcing (i.e. guessing without replacement) the correct code is:
In order to calculate the probability that the correct code is found after n windows, we need to take a short detour via the probability that the correct code is not found:
Finally, instead of looking at n windows, let’s plug in a time t and the window time T. This yields our final2 result:
The following plot shows this function with the parameters R = 100/s, T=30s, N=1000000. As expected, the probability approaches 100%, but never quite reaches it.
Inversion Link to heading
Often we would like to calculate at which time a desired probability p has been reached. We start by calculating the amount of windows that must pass:
Here we can find an approximation by applying the Taylor series of to the divisor:
In practice, we can almost always use this approximation. I verified that even at R = 1000/s, the curves only differ by around 10 seconds at p = 0.5.
For the parameters shown in the plot above (R = 100/s, T=30s, N=1000000), a 50% probability is reached after 115 minutes (1h 55min), a 90% probability after 384 minutes (6h 23min).
Accepting Multiple Codes Link to heading
The probability formula can be easily extended for cases where the last m-1 TOTP codes are also accepted (total accepted: m):
For the inversion, this results in:
One consequence is that by also accepting the last code, the brute-force time is halved.
Interactive Calculator Link to heading
For all fellow penetration testers who need to document this (or a similar) issue, I’ve vibe coded a TOTP Brute-Force Calculator!
As long as the attacker does not exhaust the entire search space (1000000 guesses) within one window (30 seconds). ↩︎
The exact result would be a combination of a step function with linear interpolation. The “final result” shown in this post is only an approximation. I originally wanted to include the full derivation, but this would have made the post even more boring. ↩︎