State Growth & State Bloat - Blockchain's Hidden Scaling Challenge

What is Blockchain State?

State refers to the information about all accounts within the blockchain, including details about the accounts themselves, their balances, and contract codes. When a transaction occurs, it inevitably affects a particular state.

For example, if person A transfers tokens to person B, the balances of both A and B need to be updated. This is what it means for the state to change.

Components of State

Accounts: Addresses and their associated data
Balances: Token holdings for each account
Nonces: Transaction counters preventing replay attacks
Contract bytecode: The compiled smart contract code
Contract storage: Variables and data stored by contracts

Key Insight

Even transactions that merely alter the state (not create new accounts) leave a transaction record in the blockchain's history. This "historical state" means all on-chain transactions contribute to state growth.

A Helpful Analogy

Back in 2010, when Facebook was growing rapidly, it stored over 260 billion images, amounting to over 20 petabytes of data, and continued adding 60 TB of data every week. To manage this explosive growth, Facebook built Haystack - a system that minimized metadata requirements and allowed lookups to occur directly in main memory.

Blockchain state management methods follow a similar philosophy: minimize what needs to be stored, and optimize access patterns for frequently-used data.

The Problems of State Growth

As long as a blockchain is used - with accounts transacting and contracts being created - the state will continue to grow. This creates several problems:

1. Increased Node Operation Costs

Full nodes must store the entire blockchain state. As state grows, storage costs increase and hardware requirements escalate. This makes running nodes more expensive, which could lead to centralization as fewer people can afford to participate.

~245 GB Ethereum state size on disk

~650 GB Fully synced Geth node

+14 GB/week Ethereum state growth rate

2. Decreased Blockchain Performance

A larger state means more time for nodes to process and verify transactions. Whenever state-changing transactions occur, nodes need to read and update relevant values. As state grows, there's more data to access and more values to change, ultimately resulting in slower performance.

3. Node Synchronization Issues

New nodes must download the entire ledger to participate in the network. Chains take "snapshots" of the record at specific points, which new nodes use to synchronize. If the state is too large:

Taking snapshots takes longer
During snapshot creation, new transactions keep adding data
This discrepancy makes synchronization difficult
Nodes that fall behind face significant time and cost to catch up

State Bloat

The problem of the state becoming too large is called state bloat. If transaction throughput increases without database improvements, state bloats further, preventing the benefits of higher throughput from being realized.

Heaviest Contributors to Ethereum State

Data from Paradigm shows that ERC-20 and ERC-721 tokens are the heaviest contributors to Ethereum's state. Each token contract stores balances for potentially millions of holders, creating enormous state footprints.

Fast Chains & Accelerated State Growth

Fast blockchains face a unique challenge: the faster you process transactions, the faster state grows. If a chain like Sei processes more transactions in a given time, its state grows much more rapidly than slower chains.

The Parallel Execution Paradox

Adding parallel execution makes this worse. If you execute transactions in parallel without database improvements, the state bloats even faster, causing the problems mentioned above. These issues ultimately prevent the benefits of parallel execution from being realized.

This is why high-performance chains like Monad, Sei, and Fuel have invested heavily in custom database solutions - they recognized this challenge from the beginning.

Ethereum: Verkle Trees & Statelessness

Ethereum uses Merkle Patricia Tries (MPT) to store data such as accounts, smart contracts, transactions, and receipts. The tree structure visually resembles an inverted tree with a single root at the top and branches leading down to the leaves.

The Witness Problem

MPTs effectively store large amounts of data and create a proof (called a "witness") that verifies it all. However, as the tree grows, witness sizes grow too. Current witness sizes can range between 18-47 MB in worst-case scenarios.

Why does this matter? A witness needs to be transferred between validators fast enough to be received and processed within the block time (12 seconds). Larger witnesses slow down transfers and increase verification times.

Verkle Trees (EIP-6800)

Ethereum is working on Verkle trees as an alternative data structure. Verkle tree witnesses are significantly smaller because:

Smaller hierarchy of intermediate nodes
Reduced distance between leaf nodes and root node
Pedersen commitments generate more compact proofs

These upgrades help manage state growth and enable faster, more cost-efficient state access.

Node Pruning

To avoid constantly running out of disk space, Ethereum clients like Geth enable node pruning. This uses a snapshot of the state to decide which parts are stale and prunes them to make the database more compact.

Statelessness Vision

The ultimate goal is "stateless clients" - nodes that don't need to store the entire state to validate blocks. Instead, each block would include the witnesses needed to verify it. This dramatically reduces node requirements.

Monad: MonadDB

Given that Monad executes transactions in parallel, it requires a database that supports multiple simultaneous read and write operations. Ethereum's LevelDB and RocksDB don't natively support asynchronous I/O.

The Problem with Traditional DBs

If Ethereum optimistically executed transactions in parallel, its synchronous database operations would be a bottleneck. Every read/write operation would block, negating the benefits of parallel execution.

MonadDB Solution

MonadDB is purpose-built for parallelized execution:

Patricia Trie on disk and memory: More efficient updates and verification than Ethereum's MPT
Async I/O via io_uring: Linux's latest kernel support enables non-blocking operations
Reduced kernel contention: Traditional DBs open kernels for memory, threads, and synchronization. Parallel execution would increase overhead. io_uring bypasses this.

Why io_uring Matters

When RocksDB performs read/write operations, it opens kernel processes to manage memory and threads. Executing transactions in parallel would open even more kernels, causing CPU contention. io_uring allows multiple read/write operations to occur simultaneously without this overhead.

Sei: SeiDB & Modular Storage

SeiDB takes a modular approach to state storage, dividing it into two layers optimized for different access patterns.

Dual-Layer Architecture

State Commitment (SC) Layer

Manages active or "warm" state that is frequently accessed. Stored in Memory-Mapped IAVL Tree (MemIAVL) to optimize access, enabling faster reads and writes.

State Storage (SS) Layer

Stores historic or "cold" state data in DBs such as PebbleDB, RocksDB, or SQLite. Validators can choose based on their requirements.

Asynchronous Pruning

Sei asynchronously prunes state data, removing stale information without blocking transaction processing. This keeps the active state lean while maintaining historical data availability.

Fuel: State Rehydration & Predicates

Fuel has the most innovative state growth management methods. Unlike Ethereum, Monad, and Solana that use an account-based model, Fuel uses the UTXO model (like Bitcoin).

UTXO Advantage

Unlike accounts that hold balances and internal contract logic, UTXOs are independently trackable units of state. This simplifies the data structure, focusing on lean data that minimizes state growth.

Three Primary Methods

1. Native Token Standards

Ethereum uses token standards like ERC-20 implemented as layered smart contracts. Fuel integrates assets directly into the core protocol as native elements.

Eliminates additional state footprint from external contracts
Transferring an asset affects only one database key-value pair
Eliminates state changes from approval/transferFrom functions

2. State Rehydration

Instead of storing the entire state on-chain, developers can decompose smart contract states into smaller segments and store minimal records or root hashes.

Each smart contract relies on localized state trees
State elements are "rehydrated" from external sources when needed
More efficient than storing everything on-chain

3. Predicates & Scripts

Transaction authorization and execution use stateless mechanisms:

Predicates: Authorize on-chain actions without accessing global state
Scripts: On-chain logic embedded within transactions, discarded after execution
Instead of storing code on-chain, a hash generates the address
Full bytecode included in transactions, rehydrating state as needed

Fuel's Philosophy

Fuel's approach is fundamentally different: minimize what must be stored permanently, and provide mechanisms to reconstruct state when needed. This trades some complexity for dramatically reduced state growth.

Economic Approaches to State Management

Beyond technical solutions, some blockchains use economic mechanisms to incentivize optimal state management. The core idea: charge users for storage rather than placing the cost on validators and future users.

State Rent Concept

State rent charges users for storage while they transact. This:

Discourages unnecessary state creation
Pushes developers to use state efficiently
Encourages cleanup of unused accounts/data
Shifts storage costs to those who benefit from storage

UX Challenges

Pure state rent models have UX issues. Solana initially had state rent where memory would be evicted when account balance went to zero. This led to complexity and user confusion, causing them to move away from this model.

Solana's State Compression

After moving away from pure state rent, Solana has reintroduced state management through several innovative approaches.

Lightweight Simple Rent (LSR)

LSR implements a bonding curve for rent rates where the rent price increases as state size approaches hardware limits.

Discourages state bloat through economics
Pushes developers to use state efficiently
Encourages discarding unused accounts

Hot Account Management

Frequently accessed "hot accounts" must burn a portion of their rent balance to remain in the cache. This ensures Lamports (Solana units) are allocated to accounts that are actively used.

Chilly: Runtime Cache Management

Chilly implements Least Recently Used (LRU) cache for account data:

Frequently accessed accounts remain in memory
Less frequently used accounts move to disk
Determines when accounts are "cold" and should leave memory
Maintains optimal balance between RAM and disk usage
Uses "load_limit" to prioritize transactions by memory usage

State Compression (Avocado)

Solana's compression plan has two parts:

State Compression

Account data is compressed by replacing it with a hash. Anatoly (Solana founder) noted that over 75% of accounts haven't been accessed in six months. Compressing them could reduce snapshot size by 50%.

State can be decompressed when required, similar to loading a program. Decompression costs the same as setting up a new account.

Index Compression

Uses a binary tree structure to store accounts, with incentives for validators to participate in state compression.

Sui & Aptos: Storage Funds

Sui and Aptos take similar approaches with upfront storage fees and rebate mechanisms.

Sui's Storage Fund

On Sui, users pay upfront fees for both computation and storage:

Storage fees go into a storage fund
SUI in the fund is used as stake, earning rewards
Rewards split between current validators and reinvestment
Future validators are rewarded for carrying past state weight
Principal SUI remains intact
Deleting data provides partial refund

What Can Be Deleted?

Metadata, event data (auctions, tickets) can be deleted. Transaction history data remains intact.

Sui Pruning Policies

Aggressive Pruning: Remove old data ASAP, minimal disk usage
Epoch-based Pruning: Retain data for specified epochs before pruning

For pruned data, Sui provides fallback retrieval from a remote key-value store managed by Mysten Labs.

Aptos Storage Deposit

Similar to Sui, Aptos separately charges for storage along with execution:

Data can be deleted with full refund (declining rates likely coming)
Exploring ephemeral storage with time-to-live (TTL)
Resources automatically deleted after expiration

Jellyfish Merkle Tree (JMT)

Aptos uses JMT, a version of Sparse Merkle Tree optimized for parallel execution:

Modified leaf node structures for lower I/O overhead
Better data structure for computational efficiency
Layered storage: warm state in performant memory, cold state in archive

Economic + Technical

The most effective approaches combine economic incentives (storage fees, deletion rebates) with technical optimizations (efficient trees, tiered storage). Neither alone fully solves state growth.