Madmax: Surviving Out-of-gas Conditions in Ethereum Smart Contracts
Ethereum is a distributed blockchain platform, serving as an ecosystem for smart contracts: full-fledged inter- communicating programs that capture the transaction logic of an account. Unlike programs in mainstream languages, a gas limit restricts the execution of an Ethereum smart contract: execution proceeds as long as gas is available. Thus, gas is a valuable resource that can be manipulated by an attacker to provoke unwanted behavior in a victim’s smart contract (e.g., wasting or blocking funds of said victim). Gas-focused vulnerabilities exploit undesired behavior when a contract (directly or through other interacting contracts) runs out of gas. Such vulnerabilities are among the hardest for programmers to protect against, as out-of-gas behavior may be uncommon in non-attack scenarios and reasoning about it is far from trivial.
In this paper, we classify and identify gas-focused vulnerabilities, and present MadMax: a static program analysis technique to automatically detect gas-focused vulnerabilities with very high confidence. Our approach combines a control-flow-analysis-based decompiler and declarative program-structure queries. The combined analysis captures high-level domain-specific concepts (such as łdynamic data structure storagež and łsafely resumable loopsž) and achieves high precision and scalability. MadMax analyzes the entirety of smart contracts in the Ethereum blockchain in just 10 hours (with decompilation timeouts in 8% of the cases) and flags contracts with a (highly volatile) monetary value of over $2.8B as vulnerable. Manual inspection of a sample of flagged contracts shows that 81% of the sampled warnings do indeed lead to vulnerabilities, which we report on in our experiment.
1 INTRODUCTION
Ethereum is a decentralized blockchain platform that can execute arbitrarily-expressive compu- tational smart contracts. Developers typically write smart contracts in a high-level language that a compiler translates into immutable low-level EVM bytecode for a persistent distributed virtual machine. Smart contracts handle transactions in Ether, a cryptocurrency with a current market
capitalization in the tens of billions of dollars. Smart contracts (as opposed to non-computational łwalletsž) hold a considerable portion of the total Ether available in circulation, which makes them ripe targets for attackers. Hence, developers and auditors have a strong incentive to make extensive use of various tools and programming techniques that minimize the risk of their contract being
attacked.
Analysis and verification of smart contracts is, therefore, a high-value task, possibly more so
than in any other programming setting. The combination of monetary value and public availability makes the early detection of vulnerabilities a task of paramount importance. (Detection may occur after contract deployment. Despite the code immutability, which prevents bug fixes, discovering a vulnerability before an attacker may exploit it could enable a trusted party to move vulnerable funds to safety.)
A broad family of contract vulnerabilities concerns out-of-gas behavior. Gas is the fuel of com- putation in Ethereum. Due to the massively replicated execution platform, wasting the resources of others is prevented by charging users for running a contract. Each executed instruction costs gas, which is traded with the Ether cryptocurrency. Since a user pays gas upfront, a transaction’s computation may exceed its allotted amount of gas. As a consequence, the Ethereum Virtual Machine (EVM) raises an out-of-gas exception and aborts the transaction. A contract that does not correctly handle the possible abortion of a transaction, is at risk for a gas-focused vulnerability. Typically, a vulnerable smart contract will be blocked forever due to the incorrect handling of out-of-gas conditions: re-executing the contract’s function will fail to make progress, re-yielding out-of-gas exceptions, indefinitely. Thus, a contract is susceptible to, effectively, denial-of-service attacks, locking its balance away.
In this work, we present MadMax1: a static program analysis framework for detecting gas-focused vulnerabilities in smart contracts. MadMax is a static analysis pipeline consisting of a decompiler (from low-level EVM bytecode to a structured intermediate language) and a logic-based analysis specification producing a high-level program model. MadMax is highly efficient and effective: it analyzes the whole Ethereum blockchain in 10 hours and reports numerous vulnerable contracts holding a total value exceeding $2.8B, with high precision, as determined from a random sample.
MadMax is unique in the landscape of smart contract analyzers and verifiers. (Section 7 contains a more detailed treatment of related work.) It is an approach employing cutting-edge static program analysis techniques (e.g., data-flow analysis together with context-sensitive flow analysis and memory layout modeling for data structures), whereas past analyzers have primarily focused on symbolic execution or full-fledged verification for functional correctness. As MadMax demonstrates, static program analysis offers a unique combination of advantages: very high scalability, universal applicability, and high coverage of potential vulnerabilities.
We speculate that past approaches have not employed static analysis techniques due to three main reasons: a) the belief that the thoroughness of static analysis is unnecessary for smart contracts since they are small in size; b) the possibility that static analysis, although thorough, can yield a high number of false positivesÐfull-fledged, less automated verification techniques may be necessary; and c) the difficulty of applying static analysis techniques uniformly, at a low level: decompiling the low-level EVM bytecode into a manageable representation is a non-trivial challenge.
MadMax addresses or disproves these objections. It provides an effective decompilation substrate for analyzing low-level EVM bytecode. MadMax exhibits high precision, due to the sophisticated modeling of the gas-focused concepts it examines. Finally, our study of the Ethereum blockchain (and the subsequent application of MadMax to it) reveals that smart contracts can significantly benefit from static analysis. Figure 1 gives an early indication, by plotting smart contract size against the Ether held. We can see that relatively complex contracts (measured in the number of basic blocks) contain most of the Ether. Hence, the potential risk compounds for sophisticated smart contracts because complex contracts are harder to get right. This observation strongly supports the use of static program analysis, which scales well to relatively complex programs.
The main contributions of our work are:
● Validation: We validate the approach for all 6.3 million contracts deployed on the entire blockchain. To our knowledge, no other work in the smart contract security literature has performed program analysis on such a number of contracts. Our analysis does not require source code to run, nor external input, and at the same time is highly scalable. The analysis reports vulnerabilities for contracts holding a total value of over $2.8B. Even though it is uncertain whether most vulnerabilities are real and how easily exploitable they might be, manual inspection of a small sample reveals over 80% precision and the existence of specific issues, which we detail.
● A decompiler from EVM bytecode to structured low-level IR: We propose the use of static program analysis directly on the EVM bytecode. Analyzing EVM bytecode is challenging due to the stack-based low-level nature of the EVM with minimal control-flow structures.
● The identification of gas-focused vulnerabilities: The semantics of limited, gas-based execution on top of smart contracts handling monetary transactions introduces a new class of vulnerabilities that does not occur in other programming language paradigms. We identify out-of-gas vulnerabilities thoroughly and explain their essence.
● Abstractions for high-level data-structures and program constructs: We construct high-level abstractions for EVM bytecode for bridging the gap between the low-level EVM and the high-level vulnerabilities. We express analysis concepts that include safely resumable loops, data structures whose size increases in repeat invocations of public functions, and recognition of nested dynamic structures in low-level memory.