Tech Deep Dive
Dedaub Logo
YANNIS SMARAGDAKIS
14 February 2019

Bad Randomness is Even Dicier Than You Think

Co-written with Neville Grech

Bad Randomness

Trivial Exploits of Bad Randomness In Ethereum, and How To Do On-Chain Randomness (Reasonably) Well

Ethereum has been used as a platform for a variety of applications of financial interest. Several of these have a need for randomness — e.g., to implement a lottery, a competitive game, or crypto-collectibles. Unfortunately, writing a random number generator on a public blockchain is hard: computation needs to be deterministic, so that it can be replayed in a decentralized way, and all data that can serve as sources of randomness are also available to an attacker. Several exploits of bad randomness have been discussed exhaustively in the past. Next, we discuss near-trivial exploits of bad randomness, as well as ways to obtain true randomness in Ethereum.

We begin by showing how easy it often is to exploit bad randomness without complex machinery, such as being a miner or reproducing the attacked contract’s internal state. The key idea is to use information leaks inside a transaction to determine whether the outcome of a random trial favors the attacker: an intra-transaction information leak. This is, to our knowledge, a new flavor of attack. Even though it shares most elements of past attacks on randomness, it generalizes to more contracts and is more easily exploitable.

Before we discuss the interesting aspects of intra-transaction information leaks, a bit of background is useful.

Ethereum Randomness Practices and Threat Model

Much has been written on the topic of random number generation in Ethereum smart contracts. The Ethereum Yellow Paper itself suggests “[approximating randomness with] pseudo-random numbers by utilising data which is generally unknowable at the time of transacting. Such data might include the block’s hash, the blocks’s timestamp, and the block’s beneficiary address. In order to make it hard for malicious miners to control those values, one should use the BLOCKHASH operation in order to use hashes of the previous 256 blocks as pseudo-random numbers.

More recent excellent advice on anti-practices and hands-on demonstrations of good practices have helped raise the bar of random number generation in smart contracts, as have several high-profile contracts (e.g., CryptoKitties — more on that later), serving as prototypes. For instance, it is now well understood that the current block number (or contents, or gas price, or gas limit, or difficulty, or timestamp, or miner address) is not a source of randomness. These quantities can be read by any other transaction within the same mined block. Even worse, they can be manipulated if the attacker is also a miner.

Ethereum miners predict the future by inventing it. Furthermore, Ethereum, the distributed “world computer”, is much slower than a physical computer. Therefore, a miner can actively choose to invent a future (i.e., mine a block) whose “random” properties will yield a favorable outcome. In one extreme case, a miner can precompute several alternative “next blocks”, pick the one that favors him/her, and then invest in making this block the next one (e.g., by dedicating more compute power to mine more subsequent blocks).

Therefore, the current understanding of the threat model to pseudo-randomness focuses on the scenario where the attacker is a miner. Thorough, well-considered discussions often recommend avoiding randomness “[that uses] a blockhash, timestamp, or other miner-defined value.” A common guideline is that “BLOCKHASH can only be safely used for a random number if the total amount of value resting on the quality of that randomness is lower than what a miner earns by mining a single block.” (As we discuss at the end, this guideline can be both too conservative and too lax. The expected value of all bets in a single block should be used instead of the “total amount of value”.)

Even though the usual threat model considers the case of a miner, most of the block-related pseudo-random properties can be exploited a lot more easily. The interesting block-related properties of the EVM are (in Solidity syntax) block.coinbaseblock.difficultyblock.gaslimitblock.numberblock.timestamp, and blockhash. For all these, an attacker can get the same information as the victim contract by just having a transaction in the same block. (The blockhash value is only defined for the previous 256 blocks, the rest of the quantities are only defined for the current block. In both cases, all current-block transactions receive the same values for these quantities.) In this way, an attacker can replay the randomness computation of the attacked contract before deciding whether to take a random bet. Effectively the pattern becomes:

if (replicatedVictimConditionOutcome() == favorable)
   victim.tryMyLuck();

Possible? Yes. Easy? Not quite.

Although the attack just described seems trivial, in practice it requires sophistication. A typical generator of randomness in a contract is often not merely blockhash(block.number-1) or some other such block-relative quantity. Instead, a common pattern mixes a seed value with block-relative quantities — for instance:

function _getRandomNumber(uint _upper) private returns (uint) {
   _seed = uint(keccak256(_seed, 
                          block.blockhash(block.number — 1),
                          block.coinbase, 
                          block.difficulty));
   return _seed % _upper;
}

This does not make the contract less vulnerable, in principle. There is no secret in the blockchain, so even a private _seed variable can be read. But in practice this can make the attack significantly harder. A contract with several users and intense activity will see its private seed modified often enough to be much less predictable. The attacker either needs (again) to be a miner, or needs to somehow coordinate receiving non-stale external information before the attack transaction. A very interesting illustration of both kinds of attacks (both as a miner and as a transaction with external information) shows how they are possible but not before admitting: “So much for a simple solution.

It’s Easier to Ask For Forgiveness Than to Get Permission

Yet, there is a very simple, non-miner attack that has guaranteed success, even with fast-changing private seeds. The transactional model of Ethereum computation together with the public nature of all stored information make exploitation of bad random number generators near-trivial.

The general pattern is simple. All a contract needs to do to be vulnerable is to finalize in a single transaction (typically before the end of a public call) an outcome that possibly favors the attacker. (This outcome may be determined through any technique producing entropy, including hashing of past blocks, reading the current block number, etc.) The attacker simply executes code such as:

victim.tryMyLuck();
require(victim.conditionOutcome() == favorable);

In other words, the attacker can choose to commit a transaction only when the outcome of a “random” trial is favorable, and abort otherwise. The only cost in the latter case is minor: the gas spent to execute the transaction. The attack works even if there is value transfer in the tryMyLuck() trial: if the transaction aborts, its effects are reverted.

In this transaction-revert-and-retry approach, the attacker turns the code of the victim contract against itself! There is no need to emulate the victim’s randomness calculation, only to check if the result is favorable. This is information that’s typically publicly accessible, or easy for the attacker to leak out of the victim (e.g., via gas computations, as we will discuss later).

Practical Examples

There are several examples of (already with past techniques) vulnerable contracts that can be attacked more easily in the way we describe. For a vivid illustration, consider the (defunct?) CryptoPuppies Dapp. CryptoPuppies attempted to build on the CryptoKitties code base and add “rarity assessments for puppies determined by the average between initial CryptoPuppy attributes (Strength + Agility + Intelligence + Speed) / 4”. The code for the contract, however, adds (to the otherwise solid CryptoKitties contract code) a bad random number generator, combining a seed and block properties (including block.blockhash(block.number-1)block.coinbaseblock.difficulty). Furthermore, the result is readily queryable: anyone can read the attributes of a generated puppy. It is trivial for an attacker to try to breed a puppy with the desired attributes and to abort the transaction if the result is not favorable.

In other cases of vulnerable contracts, an attacker can determine a favorable outcome of a battle between dragons and knights, create pets only when they have desired features, set the damage inflicted by heros or monsters, win a coin toss, and more.

(All contract examples are collected via analysis queries on the bytecode of the entire contents of the blockchain and inspected in source or via our alpha-version decompiler at contract-library.com.)

Hiding State Does Little To Help

The benefit of the attack pattern that cancels the transaction based on outcome is that the outcome of an Ethereum computation is easy to ascertain. In most cases, the vulnerable contract exposes publicly the outcome of a “random” trial. Even when not (i.e., when the outcome of the trial is kept in private storage only) it is easy to have an intra-transaction information leak. Perhaps the most powerful technique for leaking information (regarding what a computation did) is by measuring the gas consumption of different execution paths. Given the widely different gas costs of distinct instructions, this technique is often a reliable way of determining randomness outcomes.

For illustration, consider a rudimentary vulnerable contract:

contract Victim {
   mapping (address => uint32) winners;
    … 
   function draw(uint256 betGuess) public payable {
     require (msg.value >= 1 ether);
     uint16 outcome = badRandom(betGuess);
     if (winning(outcome))
       winners[msg.sender] = outcome;
   }
 }

The contract performs an extra store in the case of a winning outcome. The attacker can trivially exploit this to leak information about the outcome, before the transaction even completes:

contract Attacker {
   function test() public payable {
     Victim v = Victim(address(<address of victim>));
     v.draw.value(msg.value)(block.number); // or any guess
     require (gasleft() < 253000); // or any number that will
                                   // distinguish an extra store
                                   // relative to the original gas
   }
 }

So, What Can One Do? The Blockhash Minus-256 Problem

We saw some of the pitfalls of bad randomness on Ethereum, but what can one do to produce truly random numbers? A standard recommendation is to go off-chain and employ external sources. These are typically either an outside “oracle” service (e.g., Oraclize), or hashed inputs by multiple users with competitive interests. Both solutions have their drawbacks: the former relies on external trust, while the latter is only applicable in specific usage scenarios and may require as much care as designing nearly any cryptographic protocol. Furthermore, the issue with randomness on Ethereum is not the entropy of the bits — after all, there are excellent sources of entropy on the blockchain, yet they are predictable. Therefore, in principle, even external solutions may be vulnerable to transaction-revert-and-retry attacks, if they have not been carefully coded.

Although off-chain solutions have great merit, an interesting question is what one can do to produce random numbers entirely on-chain. There are certainly limitations to such randomness, but it is also quite possible, under strict qualifications. The best recommendation is to use the blockhash of a “future” block, i.e., a block not-yet-known at the time a bet is placed. For instance, a good protocol (formulating a random trial as a “bet”) is the following:

  • accept a bet, with payment, register the block number of the bet transaction
  • in a later transaction, compute the blockhash of the earlier-registered block number, and use it to determine the success of the bet.

The key to the approach is that the hash used for randomness is not known at bet placement time, yet cannot change on future trials. The approach still has limitations in the randomness it can yield, because of miners, who can predict the future (at a cost). We analyze these limitations in the next section, where we collect all randomness qualifications in a single place. Before that, however, we need to consider another caveat of the approach. As mentioned earlier, the blockhash function is only defined for the previous 256 blocks. (In the non-immediate future, EIP-210 aims to change this.) Therefore, if the second step of the above protocol is performed too late (>256 blocks later) or too early (in the same transaction as the first step), the result (zero) of blockhash will be known to an attacker.

Therefore, any protocol using blockhash of “future” blocks needs to integrate extra assumptions. The most practical ones seem to be:

  • the bettor has to not only place the bet but also invoke the contract in a future transaction (within the next 256 blocks) to determine the outcome
  • if the bettor is too late (or too early) the outcome should favor the contract, not a potential attacker.

Some smart contracts have attempted to circumvent the need for the second step with solutions that may be acceptable in context. A good example is the randomizer in the CryptoKitties GeneScience contract. (This contract seems to have no publicly available source code, unlike the CryptoKitties front-end contract, so we examine its decompiled intermediate-language version.) In function mixGenes, one can see code of the form:

v22b_a = block.blockhash(varg2);
if (!v22b_a) {
  v22b_c = ((block.number & -0x100) + (varg2 & 0xff));
  if ((v22b_c >= block.number)) {
    v22b_c = v22b_c — 256;
  }
  v22b_a = block.blockhash(v22b_c);
}

That is, if the block number of the bet is older than 256 blocks back (i.e., blockhash returns zero) the current block number’s high bits are merged with the older block’s lower bits, possibly with 256 subtracted, so as to produce a block number within the 256 most recent, whose blockhash is taken.

Such code can be well exploited with the transaction-revert-and-retry approach. The benefit of hashing an unknown-at-betting-time block is lost, instead sampling a predictable quantity, whose outcome may vary upon a retry. However, retries will yield different values only every 256 blocks — once the high bits of the block number change. In the specific context of the application (where other players can breed the same crypto-kitty) this risk is probably acceptable.

Putting it All Together

Based on the above, let us consider an end-to-end recommendation for purely-on-chain randomness. Computing the blockhash of a “future” block is a pattern that can yield truly unknown bits to the current transaction, but is still vulnerable to miners: a miner can place a bet, then mine more than one version of the “future” block. Therefore, for safe use of blockhash, the expected value of the random trial for an attacker should be lower than the reward of mining a block: an attacker should never benefit from throwing away non-winning blocks. Note that this expected value may be much lower than the total stakes riding on the randomness. For instance, a bet awarding 1000 ETH with probability 1/1000 is still only worth 1ETH to an attacker. Such randomness could, therefore, be quite practical for many applications.

However, in computing the expected value of a random trial it is important to remember that bets are compounding. If a single block contains N bets (e.g., in N independent transactions, which could be by the same attacker), each for 1000ETH, and each with 1/1000 probability, the expected value of the block for the attacker is N ETH. This reasoning can be used to bound the maximum number of bets accepted in the same transaction. Unfortunately, a single contract cannot know what other bets are taken by other contracts’ transactions in a single block, and an attacker could well be targeting multiple contracts to compound bets. Therefore, the estimate will be either approximate, or too conservative, yielding very low expected values per bet. Even worse, a badly-coded contract can incentivize attackers to violate the randomness of an unrelated contract, at least temporarily. The attacker/miner has an incentive in exploiting the badly-coded, vulnerable contract, and an extra opportunity to also take bets against a contract that wouldn’t be profitable on its own. (The attacker may not be able to exploit the weaker contract more, e.g., because it has limits in the bets per block, but can fit in more transactions in the same block.) Still, such an attack is only valid until the badly-coded contract is depleted.

A back-of-the-envelope calculation of pessimistic values with the current block mining reward (3ETH) and block gas limit (8 million) suggests that an expected value of an individual bet at under 3.75E-7 ETH-per-unit-of-gas is safe for steady-state use, even if temporarily vulnerable (until depletion of other contracts). For instance, a transaction consuming 100,000 gas should result in bets with expected return at most 0.0375 ETH. (If the block was filled with such transactions, it would still be unprofitable for an attacker-miner to throw it away.) This is currently around 50x the gas cost of such a transaction, so the bet value is not unrealistically low for real applications. Again, this does not limit the payoff of the bet but the expected return. The successful bet could result in 1M ETH, but if this only happens with probability 1/27,000,000, the expected bet value is under 0.0375 ETH.

More generally, such reasoning motivates an interesting practice that we have not seen adopted so far: to make bets consume gas proportionately to their expected value. For instance, a bet with a high expected value, e.g., of 2 ETH, should be perfectly possible but should require gas nearly equal to the block gas limit (i.e., the caller should know to supply the gas and the bet contract should consume it via extra computation), so that virtually no other transactions can be part of the same block.

[Standard caveat: all analysis assumes an attacker is incentivized only to maximize his/her profit in ETH (or tokens) based on smart contract execution. There may be attack models not considered, although most conventional attacks (e.g., double spending through chain reorg) don’t seem to benefit from throwing away a block. However, notably, the assumption does not apply to an attacker willing to lose ETH to perpetrate an attack (e.g., in order to cause damages to the victim, or to disrupt the ecosystem in order to manipulate ETH exchange rates, or …). Such attack conditions are a topic for a different post, but much of Ethereum is vulnerable to such attacks.]

To summarize, our recommendation for on-chain random number generation is to follow a pattern such as:

  • Accept a bet, with payment, register the block number of the bet transaction.
  • The bettor has to not only place the bet but also invoke the contract in a future transaction (within the next 256 blocks). The contract will compute the blockhash of the earlier-registered block number, and use it to determine the success of the bet.
  • If the bettor is too late (or too early) the outcome should favor the contract, not a potential attacker.
  • The expected value of the random trial for all bets in a single block should be lower than the reward for mining a block. (You should convince yourself that this calculation works in your favor.)

This approach has the disadvantages of a delay until a bet outcome is revealed, of requiring a second transaction, and of placing severe limits on the expected value of the bet. It is, however, otherwise the only known quasi-acceptable technique for purely-on-chain randomness.