Optimized Sparse Merkle Tree.

A binary STM with leaf shortcuts for a faster batch update time.

6 min readFeb 18, 2019

In the previous article we introduced a standard Sparse Merkle tree implementation and mentioned that the update time could be greatly reduced with a simple optimization : create leaf nodes to reduce the number of hash operations necessary to update the tree.

This optimized SMT is the tree used in the Aergo StateTrie for state authentication of the Aergo blockchain which was already released a few months ago. But for the sake of completeness I will review it here before writing in detail about Merkle proofs for the Optimised SMT and Ethereum’s Modified Patricia tree in the next articles.

Features :

This optimized SMT keeps the same features as the Standard SMT implementation while reducing hash computation for updates.

Efficient Merkle proof verification (binary tree structure)
Efficient database reads and storage through node batching
Simultaneous update of multiple keys with goroutines
Reduced data storage (leaf nodes for subtrees containing 1 key)
Reduced hash computation (leaf nodes for subtrees containing 1 key)

Modification of the Sparse Merkle Tree

To reduce the number of hashing operations necessary to update a key in a tree, we created leaf nodes. A leaf node is stored at the highest subtree that contains only 1 non-default key. So, the hashing of the tree starts from the height of leaf nodes instead of height 0. If the tree contains N random keys, then on average, leaf nodes will be created around height = log(N).

Figure 1. Modified SMT containing 3 keys. The left subtree contains only 1 key so it is stored as a leaf node (red).

We also add the following property to the hash function : H(0,0) = 0. Looking at the example above, D1 = Hash(D0,D0) = Hash(byte(0),byte(0)) = byte(0) = D2 =…= D256.

Merkle Proofs

Using a binary tree gives us very simple and easy-to-use Merkle proofs.
On the diagram above, the Merkle proof of the red key is composed of the node with a red circle: [h3]
In case of the standard SMT that proof would have been [D0, D1, D2, h3]

Compressed Merkle proofs

Like in the standard sparse Merkle tree, Merkle proofs can also be compressed. We can use a bitmap and set a bit for every index that is not default in the proof. The proof that the blue LeafNode1 is included in the tree is: [LeafNode2, D1, D2, LeafNode]. This proof can be compressed to 1001[LeafNode2, LeafNode]. The verifier of the compressed Merkle proof should know to use D1 to compute h2 because the second bit of the bitmap is 0, and D2 for the third proof element, etc.

Proofs of non-inclusion

There are 2 ways to prove that a key is not included in the tree :
- prove that the Leaf node of another key is included in the tree and is on the path of the non-included key.
- prove that a default node (byte(0)) is included in the tree and is on the path of the non-included key.

For example, a proof that key=0000 is not included in the tree is a proof that LeafNode is on the path of key and is included in the tree. A proof that key=1111 is not included in the tree is a proof that D2 is on the path of the key and is included in the tree.

Deleting from the tree

When a leaf is removed from the tree, special care is taken by the Update() function to keep leaf nodes at the highest subtree containing only 1 key. Otherwise, if a node has a different position in the tree, the resulting trie root would be different even though keys and values are the same.

So, when a key is deleted, Update() checks if its sibling is also a leaf node and moves it up until the highest subtree root containing only that non-default key.

Figure 2. The same tree after deleting LeafNode2 : LeafNode1 moves up to the highest subtree containing one key

Node batching

When storing each node as a root with 2 children, the quantity of nodes to store grows very quickly and a bottleneck happens due to multiple threads loading nodes from memory. A hex Merkle tree would solve this problem as each key has 16 children and a smaller height of 64 (256/4), though as we said earlier, we need the tree to be binary. We can achieve the same features of a hex tree by using node batching.

Instead of storing 2 children for one node, we store the subtree of height 4 for that node. A tree of height 4 has 16 leaves at height 0 (like hex). So, the value of a node is an array containing all the nodes of the 4-bit tree. The children of a node at index i in the tree can be found at index 2*i+1 and 2*i+2.

Figure 3. A visual representation of node batching. The first batch is blue, and all 16 leaves of a batch are roots to other batches (green). A batch contains 30 nodes.

A node is encoded as follows:

{ Root : [ [ byte(0/1) to flag a leaf node ], 3–0, 3–1, 2–0, 2–1, 2–2, 2–3, 1–0, 1–1, 1–2, 1–3, 1–4, 1–5, 1–6, 1–7, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, a, b, c, d, e, f ] }

For example, to get the children of node 3–0 at index id=1 in the array, we can access the left child 2–0 at index (2 * id + 1) = index 3 and the right child 2–1 at index (2 * id + 2) = index 4.

To each node, we append a byte flag to recognize the leaf nodes. Since the nature of Root is not know ahead of time, the byte flag is stored at the first index of the nodes array.

The example from figure 2 will be encoded as follows :

{Root : [ [byte(0)], LeafNodeHash, h3, LeafNodeKey, LeafNodeValue, h2, D2=nil, nil, nil, nil, nil, h1, D1=nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, nil, LeafNode1Hash, LeafNode2Hash, nil, nil, nil, nil, nil, nil ]}

Where LeafNodeHash = Hash(key, value, height)

To store the batch in the database, it is serialized with a bitmap which allows us to store only the non default nodes.
The bitmap is 4 bytes = 32 bits. The first 30 bits are for the batch nodes, the 31st bit is the flag to make a shortcut batch (a batch that only contains a key and a value at 3–0 and 3–1), the 32nd bit is not used.

Byte flags are appended to differentiate items in the batch. Byte(0) is for interior hashes, byte(1) is for leaf nodes, and byte(2) is for keys and values. Those flags are not used compute the root hash of the tree, but tell us the type of node that we are parsing. Byte(2) is simply for clarity as we already know that a key/value pair is being stored as children of a byte(1) leaf node.

The figure 2 example after serialization:

11111000001000000000001100000000 [LeafNodeHash]byte(1)[h3]byte(0)[LeafNodeKey]byte(2)[LeafNodeValue]byte(2)[h2]byte(0)[h1]byte(0)[LeafNode1Hash]byte(1)[LeafNode2Hash]byte(1)

Node batching has two benefits : reduced number of database reads and concurrent update of the height 4 subtree without the need for a lock.

Conclusion:

This optimized SMT implementation improves update performance and enables efficient Merkle proof verification of blockchain state. In the next article we will see how Merkle proofs are built and verified.

Other related research on modified Sparse Merkle trees : https://ethresear.ch/t/optimizing-sparse-merkle-trees/3751