Precise Static Modeling of Ethereum “Memory”
SIFIS LAGOUVARDOS, University of Athens, Greece
NEVILLE GRECH, University of Athens, Greece
ILIAS TSATIRIS, University of Athens, Greece
YANNIS SMARAGDAKIS, University of Athens, Greece
Static analysis of smart contracts as-deployed on the Ethereum blockchain has received much recent attention. However, high-precision analyses currently face significant challenges when dealing with the Ethereum VM (EVM) execution model. A major such challenge is the modeling of low-level, transient “memory” (as opposed to persistent, on-blockchain “storage”) that smart contracts employ. Statically understanding the usage patterns of memory is non-trivial, due to the dynamic allocation nature of in-memory buffers. We offer an analysis that models EVM memory, recovering high-level concepts (e.g., arrays, buffers, call arguments) via deep modeling of the flow of values. Our analysis opens the door to Ethereum static analyses with drastically increased precision. One such analysis detects the extraction of ERC20 tokens by unauthorized users. For another practical vulnerability (redundant calls, possibly used as an attack vector), our memory modeling yields analysis precision of 89%, compared to 16% for a state-of-the-art tool without precise memory modeling. Additionally, precise memory modeling enables the static computation of a contract’s gas cost.
This gas-cost analysis has recently been instrumental in the evaluation of the impact of the EIP-1884 repricing (in terms of gas costs) of EVM operations, leading to a reward and significant publicity from the Ethereum Foundation. CCS Concepts: • Software and its engineering → Formal software verification; • Theory of computation → Program analysis. Additional Key Words and Phrases: ethereum, EVM, static analysis ACM Reference Format:
Sifis Lagouvardos, Neville Grech, Ilias Tsatiris, and Yannis Smaragdakis. 2020. Precise Static Modeling of Ethereum “Memory”. Proc. ACM Program. Lang. 4, OOPSLA, Article 190 (November 2020), 27 pages. https: //doi.org/10.1145/3428258
1 INTRODUCTION
The Ethereum blockchain has enabled the management of digital assets via unsupervised autonomous agents called smart contracts. Smart contracts are among the computer programs with the most dire needs of high correctness assurance, due to their managing of high-value assets, as well as their public and immutable nature. Therefore, static analysis for Ethereum smart contracts has captured the attention of the research community in recent years [Albert et al. 2018; Feist et al. 2019; Grech et al. 2018; Mueller 2018; Tsankov et al. 2018].
The first generation of Ethereum Authors’ addresses: Sifis Lagouvardos, Department of Informatics and Telecommunications, University of Athens, Athens, Greece, sifis.lag@di.uoa.gr; Neville Grech, Department of Informatics and Telecommunications, University of Athens, Athens, Greece, me@nevillegrech.com; Ilias Tsatiris, Department of Informatics and Telecommunications, University of Athens, Athens, Greece, i.tsatiris@di.uoa.gr; Yannis Smaragdakis, Department of Informatics and Telecommunications, University of Athens, Athens, Greece, yannis@smaragd.org. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). © 2020 Copyright held by the owner/author(s). 2475- 421/2020/11-ART190 https://doi.org/10.1145/3428258
Static analyses have already exhibited significant successes.1 For instance, Securify [Tsankov et al. 2018] has been the basis for a leading company in the Ethereum security space, with many tens of commercial audits performed. Most static analysis tools for Ethereum operate at the binary level of contracts, as-deployed on the blockchain. This ensures that the analysis operates on all contracts in existence, regardless of whether source code is available. (Source code is not always present but not negligible either: it is available for under 25% of deployed contracts, yet for more than half of the high-value contracts.) Furthermore, operating on low-level binaries offers source-language and languageversion independence, completeness in the presence of inline assembly (which is common), and, perhaps most importantly, uniform treatment of complex language features: as analysis authors often argue, innocuous-looking source-level expressions can incur looping behavior or implicit overflow [Grech et al. 2018].
Operating at the binary level necessitates contract decompilation [eth [n. d.]; Brent et al. 2018; Grech et al. 2019; Kolinko 2018] in order to recover high-level information from the very-low-level Ethereum VM (EVM) bytecode format. This decompilation effort is non-trivial: the EVM is a stack machine with no structured information (no types or functions). Control- low (i.e., calls, returns, conditionals) is implemented as forward jumps over run-time values obtained from the stack, hence even producing a control-flow graph requires a deep static analysis [Grech et al. 2018]. Despite these challenges, sophisticated decompilers mostly succeed in recovering high-level control flow and have been the basis of the most successful static analyses. Despite the relative success of Ethereum decompilers, some low-level aspects of the deployed contract remain unaddressed. The most major such aspect is the precise modeling of transient EVM memory. “Memory” in the context of the EVM (and of this paper) refers to a data store that is transient and transaction-private, used by the runtime primarily to store values whose size is statically unknown—e.g., arrays, strings, or encoded buffers. (Memory is to be contrasted with storage: the on-blockchain persistent value store of an Ethereum smart contract.)
Memory is crucial in the execution of smart contracts. It is used for many cryptographic hash operations (SHA3 is a native instruction, operating over arbitrary-length buffers), for the arguments of external calls (i.e., calls to other contracts), for logging operations, and for returns from a public function. Crucially, any complex mapping data structure (i.e., the most common Ethereum data structures) stores and retrieves information (from storage) after hashing keys in memory buffers. Modeling EVM memory is unlike other memory management settings. There is no speed premium for locality, yet the compiler attempts to keep memory tightly packed because the contract’s execution cost (in terms of Ethereum gas consumed for memory-related operations) depends on the highest initialized memory address. For the vast majority of smart contracts, written in the Solidity language, the compiler uses a simple strategy: every allocated buffer is written after the end of the last one, updating a “free-memory” pointer. This memory management policy is a low-level aspect, introduced entirely during the compilation process. At the source-code level, memory is implicit: the values that end up stored in memory merely have dynamic-length array or string types. The goal of a precise modeling of EVM memory is to recover such source-level information (expressed in terms of arrays or strings) from the low-level address manipulation code that the compiler produces. Generally, our work makes the following contributions:
Access the full document