Uncategorized
Dedaub Logo
Dedaub
24 October 2024

test

Transient Storage in the wild: An impact study on EIP-1153

With the recent introduction of transient storage in Ethereum, the landscape of state management within the Ethereum Virtual Machine (EVM) has evolved once again. This latest development has prompted us at Dedaub to take a fresh look at how data is stored and accessed in the EVM ecosystem, as well as analyze how the new transient storage is used in real-world applications.

It’s important to note that even though transient storage has properly been integrated into the EVM, the transient modifier is still not yet available in Solidity. Therefore, all usage of transient storage is directly from the TSTORE and TLOAD opcodes using inline assembly, meaning usage is not that widespread yet, and could also be at a higher risk of vulnerability.

In this comprehensive blog post, we will explore the strengths and limitations of each storage type. We will discuss all their appropriate use cases, and examine how the introduction of transient storage fits into the broader ecosystem of EVM data management. If you do not need a refresher of how the EVM manages state, feel free to skip to the EIP-1153 impact analysis section.

Quick refresher of data storage and access

Storage

Storage in Ethereum refers to the persistent storage a contract holds. This storage is split into 32 byte slots, with each slot having its own address ranging from 0 to $2^{256}-1$. In total, that means a contract could store potentially up to $2^{261}$ bytes.

Of course, the EVM doesn’t track all of the bytes simultaneously. Instead, it’s treated more like a map—if a specific storage slot needs to be used, it’s loaded similar to a map, where the key is its index and the value is the 32-bytes that are being stored or accessed.

Starting with slot 0, (Solidity) will try to store static size values as compactly as possible, only moving to the next slot when the value cannot fit into the remaining space. Structs and fixed size arrays will also always start on a new slot and any following items will also start a new slot, but their values are still tightly packed.

Here are the rules as stated by the Solidity docs:

  • The first item in a storage slot is stored lower-order aligned, meaning the data is stored in big endian form.
  • Value types use only as many bytes as are necessary to store them.
  • If a value type does not fit the remaining part of a storage slot, it will be stored in the next storage slot.
  • Structs and array data always start a new slot and their items are packed tightly according to these rules.
  • Items following struct or array data always start a new storage slot.

However, for mappings and dynamically size arrays, there’s no guarantee on how much space they will take up, so they cannot be stored with the rest of the fixed size values.

For dynamic arrays, the slot they would have taken up is replaced by the length of the array. Then, the rest of the array is stored like a fixed size array starting from the slot keccak256(s), where s is the original slot the array would have taken up. Dynamic arrays of arrays recursively follow this pattern, meaning an arr[0][0] would be located at keccak256(keccak256(s)), and where s is the slot the original array is stored at.

For maps, the slot remains 0, and every key-value pair is stored at keccak256(pad(key) . s), where s is the original data slot of the mapping, .is concatenation, and the key is padded to 32 bytes if it’s a value type but not if it’s a string and byte array. This address stores the value for the corresponding key, following the same rules for other storage types.

As an example, let’s look at a sample contract Storage.sol and view its storage:

contract Storage {
    struct SomeData {
        uint128 x;
        uint128 y;
        bytes z;
    }

    bool[8] flags;
    uint160 time;

    string title;
    SomeData data;
    mapping(address => uint256) balances;
    mapping(address => SomeData) userDatas;

    // ...
}

The command forge inspect Storage storage --pretty from foundry can be used to view the internal layout:

| Name      | Type                                        | Slot | Offset | Bytes | Contract                |
|-----------|---------------------------------------------|------|--------|-------|-------------------------|
| flags     | bool[8]                                     | 0    | 0      | 32    | src/Storage.sol:Storage |
| time      | uint160                                     | 1    | 0      | 20    | src/Storage.sol:Storage |
| title     | string                                      | 2    | 0      | 32    | src/Storage.sol:Storage |
| data      | struct Storage.SomeData                     | 3    | 0      | 64    | src/Storage.sol:Storage |
| balances  | mapping(address => uint256)                 | 5    | 0      | 32    | src/Storage.sol:Storage |
| userDatas | mapping(address => struct Storage.SomeData) | 6    | 0      | 32    | src/Storage.sol:Storage |

All the defined values are stored starting from slot 0 in the order they are defined.

  1. First, the flags array takes up the entire first slot. Each bool only takes 1 byte to store, meaning the entire array takes 8 bytes total.
  2. The uint160 time is stored in the second slot. Even though it only takes 20 bytes to store, meaning it can fit in the remaining space of the first slot, it must start on the second slot since the first slot is storing an array.
  3. The string title takes up the entire third slot, since it is a dynamic data type. The slot stores the length of the string, and the actual characters of the string should be stored starting at keccak256(2).
  4. Next, the entire data struct takes up 2 slots. The first slot of the struct packs both the x and y uint128 values, since they each only take 16 bytes. Then, the second slot of the struct stores the dynamic bytes value.
  5. Finally, there are two mapping values, each taking up an empty slot to reserve their mapping. The actual mapping values would be stored at keccak(pad(key) . uint256(5)) or keccak(pad(key) . uint256(6)) respectively.

Here’s a diagram visualizing the storage:

Untitled

Untitled

If the title or z variable contain data that is longer than 31 bytes, they would instead be stored at keccak(s), as shown by the arrows. The mapping values are stored following the defined rules above for hashing the key.

Finally, storage variables can also be declared as immutable or constant. These variables don’t change over the runtime of the contract, which saves on gas fees since their calculation can be optimized out. constant variables are defined at compile-time, and the Solidity compiler will replace them with their defined value during compilation. On the other hand, immutable variables can still be defined during construction of a contract. At this point the code will automatically replace all references to the value with the one that was defined.

Memory

Unlike storage, memory does not persist between transactions, and all memory values are discarded at the end of the call. Since memory reads have a fixed size of 32 bytes, it aligns every single new value to its own chunk. So while uint8[16] nums might only be one 32-byte word when stored in storage, it will take up sixteen 32-byte words in memory. The same splitting also happens to structs, regardless of how they are defined.

For data types like bytes or strings, their variables need to be differentiated between memory pointers or storage pointers, using the memory or storage keyword respectively.

Mappings and dynamic arrays do not exist in memory, since constantly resizing memory is very inefficient and expensive. Though you can allocate arrays with a fixed size using new <type>[](size), you cannot edit the sizes of these arrays like you can with storage arrays using .push and .pop .

Finally, memory optimization is very important, since the gas cost for memory scales quadratically with size as memory expands, rather than linearly.

Stack

Like memory, stack data only exists for the current execution. The stack is very simple, being just a list of 32-byte elements that are stored sequentially one after another. It is modified using POP, PUSH, DUP, and SWAP instructions, much like stacks in standard executables. Currently, the stack only stores up to 1024 values.

Most actual calculations are done on the stack. For example, arithmetic opcodes such as ADD or MUL pop two values from the stack, then push the result with the binary operation onto the stack.

Calldata

Calldata is similar to memory and stack data in that it only exists within the context of one function call. Like memory, all values must also be padded to 32 bytes. However, unlike memory, which is allocated during contract interactions, calldata stores the read-only arguments that are passed in from external sources, like an EOA or another smart contract. It is important to note that if you want to edit the values passed in from calldata, you must copy them to memory first.

Calldata is passed in with the rest of the data during the transaction, so it must be packed properly according to the specified ABI of the function that is being called.

Transient Storage

Transient Storage is a fairly new addition to the EVM, with Solidity only supporting the opcodes starting 2024, with the proper language implementation expected to arrive in the near future. It is meant to serve as an efficient key-value mapping that exists during the context of an entire transaction, and its opcodes, TSTORE and TLOAD.It always takes 100 gas each, making it much more gas efficient than regular storage.

The specialty of transient storage is that it persists through call contexts. This is perfect for scenarios like reentrancy guards that can set a flag in transient storage, then check if that flag has already been set throughout the context of an entire transaction. Then, at the end of the entire transaction, the guard will be wiped completely and can be used as normal in future transactions.

Despite its transient nature, it is important to note that this storage is still part of the Ethereum state. As such, it must adhere to similar rules and constraints of those of regular storage. For instance, in a STATICCALL context, which prohibits state modifications, transient storage cannot be altered, meaning only the TLOAD opcode is allowed and not TSTORE.

EIP-1153 impact analysis

Since transient storage is a relatively recent feature we were able to be comprehensive in inspecting all cases of how it has been used as of Ethereum block number 20129223. We found that from the ~250 deployed contracts containing or having libraries containingTSTORE or TLOAD opcodes, there were ~180 unique source files, meaning over 60 of these deployed contracts were duplicates deployed cross-chain.

Here is the recorded distribution of the usage of transient storage in these ~190 contracts:

Untitled

Out of the around 190 unique contracts on chain that use this feature, we were able to differentiate them into 6 general categories:

  1. First and foremost, over 50% of the usage of transient storage is on reentrancy guards. This makes sense, as reentrancy protection is the perfect use case for transient storage. It is also very easy to implement, with a simple one possibly looking like:
modifier ReentrancyGuard {
    assembly {
            // If the guard has been set, there is re-entrancy, so revert
        if tload(0) { revert(0, 0) } 
        // Otherwise, set the guard
        tstore(0, 1)
    }
    _;
    // Unlocks the guard, making the pattern composable.
    // After the function exits, it can be called again, even in the same transaction.
    assembly {
        tstore(0, 0)
    }
}
  1. On the other hand, only 3.6% of the contracts used this pattern as an entrancy lock, locking the contract state between transactions to ensure that certain functions could only be called after calling other functions. Here’s a short example.
// keccak256("entrancy.slot")
uint256 constant ENTRANCY_SLOT = 0x53/*...*/15;

function enter() {
    uint256 entrancy = 0;
    assembly {
        entrancy := tload(ENTRANCY_SLOT)
    }
    if (entrancy != 0) {
                revert("Already entered");
    }

    entrancy = 1;
    assembly {
        tstore(ENTRANCY_SLOT, entrancy)
    }
}

function withdraw() {
    uint256 entrancy = 0;
    assembly {
        entrancy := tload(ENTRANCY_SLOT)
    }

    if (entrancy == 0) {
        revert("Not entered yet");
    }

    // ...
}
  1. Next, around 6% of the contracts used transient storage to preserve contract context for callback functions or cross-chain transactions. This was mostly on Bridge contracts, like this one here.
  2. 8.3% of the contracts used transient storage to keep a temporary copy of the contract state to verify that certain actions are authorized. For example, this contract by OpenSea temporarily stores an authorized operator, specific tokens, and amounts related to those tokens to validate that all transfers happen as they should.
  3. A bit less than 9% of the contracts used transient storage for their own specialized purposes. For example, an airdropping contract utilizes tstore as a hashmap to track and manage eligible recipients within the transaction context.
  4. 20% of contracts had no transient storage opcodes compiled into their bytecode, but referenced libraries that contained functions capable of using transient storage. Most of these libraries are openzeppelin internals such as their implementation of ERC1967 (see StorageSlot).

The introduction of transient storage marks a significant evolution in the EVM’s data management capabilities. Our analysis at Dedaub reveals that while it is still in its early stages of adoption, transient storage is already making a notable impact, particularly in smart contract security and efficiency.

Key takeaways from our analysis of transient storage usage include:

  • Reentrancy guards dominate the current use cases, accounting for over 50% of transient storage implementations. This highlights the immediate value developers see in using transient storage for cross-function state management within a transaction.
  • Beyond security, innovative developers are finding creative ways to leverage transient storage for storing contextual information and managing contexts throughout complex transactions.
  • The adoption of transient storage, while still limited, shows promise for improving gas efficiency and simplifying certain smart contract patterns.

Gas efficiency improvements

In our analysis of transient storage usage, we also evaluated its gas efficiency compared to regular storage. To do this, we collected the last 100 transactions for each of the contracts analyzed. For each transaction, we obtained its execution trace and used a Python script to simulate gas costs by replacing TSTORE operations with SSTORE under the same conditions (including cold load penalties and other storage rules).

The results were impressive: across all use cases, using transient storage led to an average gas savings of 91.59% compared to regular storage operations. Below, you can find a more detailed graph that shows gas savings per category. Interesting to note that in the case of the Specialized Functionality, gas savings of around 98.7% were recorded. This is because of the airdropping contract mentioned above, and in this case memory might have been a more adequate comparison.

grouped_bar_chart_storage_100_transient_relative.png

Conclusion

As the Ethereum ecosystem continues to evolve, we expect to see more diverse and sophisticated uses of transient storage emerge. Its unique properties – persisting across internal calls within a transaction while being more gas-efficient than regular storage – open up new possibilities for optimizing smart contract design and execution.

Below we are publishing the dataset and scripts that were used for the above post.

dump_transient_traces.zip