Distributed hash chain - Extending blockchain to hashchain for scalability
Distributed hash chain or DHC is an extension of blockchain structure first implemented in Bitcoin. Bitcoin and many other blockchains suffer from scalability problem. Bitcoin on average can process only between 3.3 and 7 transactions per second which makes it unsuitable as a currency as people cannot do many transactions and transaction fees get very high. In comparison VISA processes thousands transactions per second. The main reason is that every node on the Bitcoin network needs to receive and keep a copy of every transaction. The goal of DHC is to improve scalability of a traditional blockchain.
One approach to improve scalability is to use sharding where instead of a single blockchain there are multiple blockchains that store different sets of transactions and they are synchronized between each other in some way.
DHC has a single blockchain, but each node only stores a subset of transactions in a given block so it can process them as quickly as possible without any bottleneck inherent in traditional blockchains. When more nodes join the DHC network the partitions of transactions split further so each node processes similar number of transactions as before the split.
Imagine there are two nodes A and B. Address of node A in binary form starts with 0 and address of node B in binary form starts with 1. Then node A will start collecting all pending transactions from senders which address start with 0 and node B will start collecting all pending transactions from senders which address start with 1. Node A will not even receive transactions from senders which address start with 1 thus reducing network traffic. Similarly for node B. How A and B can work together and create a distributed block? Here are the steps.
Node A creates hash of transactions it collected and sends that hash to node B
Node B creates hash of transactions it collected and sends that hash to node A
Both nodes produce combined hash in the same order. First hash from A then hash from B, since address of A starts with 0 and address of B starts with 1.
Both nodes start mining using the same combined hash
(https://i.ibb.co/nPN6q10/image1.jpg)
When there are more nodes the same approach applies. Any node with address starting with 0 needs to combine hash of transaction it collected with a hash it receives from a node with address starting with 1. It needs to broadcast that information to other nodes on the same partition so they would be able to recover that information once they receive mined block even if that node goes down. To reduce the number of these broadcasts and consequently the number of combinations with other partitions, nodes are required to include small proof of work.
How would we deal with a case when there are 4 partitions? Now we have 4 nodes that have addresses in binary form starting with 00, 01, 10, 11. It is happening as following.
Nodes on 00 partition collect transactions and exchange hashes with nodes on 01 partition, creating hash 0
Nodes on 10 partition collect transactions and exchange hashes with nodes on 11 partition, creating hash 1
Nodes on 00, 01 partitions (combined partition 0) send hash 0 to nodes 10, 11
Nodes on 10, 11 partitions (combined partition 1) send hash 1 to nodes 00, 01
All nodes combine hashes 0 and 1 and start mining
(https://i.ibb.co/PDjwnG5/image2.jpg)
In steps 3 and 4 hashes 0, 1 include small proof of work to reduce the number of combinations. Transactions are never sent to nodes on other partitions, only hashes. Nodes would only need to be connected to some other nodes on its own partition and on opposite partitions. For example Node 00 needs to be connected to some nodes on partition 00 and 01 and some nodes on partition 1 (combination of partitions 10 and 11).
We would call index length of a partition power. Partition 00 has power 2 and partition 110 has power 3.
This approach is easily extended when there are 2^n partitions. The number of steps would increase but not by much. There will be 10 steps to create combined hash for about a thousand partitions, 20 steps for about a million partitions, 30 steps for a billion partitions. If each step takes 1 second on the network then nodes on a billion partition network can create combined hash in 30 seconds.
In DHC one block is created on average every minute.
All nodes connect to each other in a Kademlia like network. The difference from Kademlia is that DHC uses TCP instead of UDP and it keeps k-bucket containing its own node (partition) at the same power (index length) without splitting it even if there are many nearby nodes. This is done to mitigate eclipse attacks. In our current implementation k = 8 as in BitTorrent Mainline DHT. Please notice that DHC is not a distributed hash table and only uses Kademlia like network to connect to other nodes.
Below is an example of Node 001, whose address starts with 001 in binary form, connecting to nodes in required partitions. Since k = 8, the node maintains connections to 8 nodes in each partition.
(https://i.ibb.co/KFphfpS/image3.jpg)
DHC also implements dynamic partitioning. As more nodes join, partitions split and the number of them increases as the network grows. If enough nodes leave the network then leaf partitions merge automatically. This way nodes on average process and store similar number of transaction in a given time period.
There are 2^(63) - 1 = 9223372036854775807
smallest units created. This number represent maximum long in Java. We called them also as in Bitcoin satoshis. Each coin contains one billion satoshis. Therefore there are 9,223,372,036 coins in circulation.
Non publishing attack
We assume that in each partition there is at least one honest node. Partitions over address space are uniform so it would impossible to replace all honest node in some partition by concentrating fake nodes in it. An attacker might create a hash without publishing transaction subset corresponding to it and send to neighboring partition. Honest nodes would create an alert for that and include proof of work relative to the size of partition. Why would honest node do that? Because for them would be risky to continue using that hash and contribute more resources to mining on top of it. If some other honest node produces alert of non publishing and it gets rolled back their effort of mining on top of that hash would be wasted. It is similar to people rejecting counterfeit currency bills.
Reclaiming Disk Space
Bitcoin white paper mentions possible removing of old spent transaction outputs. We implemented it with reliance on miners. Miners find all unspent transaction outputs older than one year and assign them to their owners in new transactions included in new blocks. This way blocks older than one year contain only spent transaction outputs, transactions in them are pruned and blocks are compacted to contain only headers. This reduces further amount of disc space required to store blockchain in addition to storing only partition transactions.
For more information check http://www.dhcne.org/ (http://www.dhcne.org/)