How is data in a blockchain stored? Structure of block and blockchain

At its core, a blockchain is a linked list with a twist.

In a normal linked list, each item points to the previous item (these are the “links” which allow us to traverse back through the list.) In a computer, these links are references to memory locations, but in a blockchain, they are references to the hashes of preceding blocks.

Most importantly, the block of data that is hashed to produce a blockhash includes within it the blockhash of the previous block. You can’t change the previous blockhash inside of the current block without changing the hash of the current block… and this means you obviously can’t replace a block in the chain without visibly breaking the chain of blockhashes that come after.

This is the critical difference between a linked list (or ArrayList) and a blockchain:

  • in a linked list, it’s easy to replace any element by modifying two
    elements, it and the element after it, and the change is not visible
    in later parts of the chain
  • in a blockchain, to replace a single
    element, you must replace all elements after it, corrupting the whole
    chain in an obvious visible way

Now, in most blockchain implementation, including bitcoin, what I said above pertains only to block headers, not to whole data blocks. For efficiency, the block headers comprise the cryptographically chained list, and the data is stored in separate chunks. So how are they related?

Each block header also includes within it a hash of one data block. This way, if you’re looking at block 2045 of data and want to authenticate it, you merely have to hash it and see if that hash matches the hash in block header 2045. Nobody can give you a false block of data without corrupting the hash and, as noted above, they can’t change the hash in the header without corrupting the whole chain of headers.

That’s the typically used blockchain structure:

  1. A chain of block headers, each of which includes the hash of the
    preceding block and also includes the hash of one data block.
  2. A matching collection of data blocks, each of which has a hash that is
    included in one of the block headers.

(You could dispense with the block header/data block separation, but it can make lots of things more efficient to separate them.)

If you have the first block (sometimes called a “genesis block”) and the validation rules (including what hashes are used and how they made), you can validate everything else yourself, without having to trust some other system to give you honest data.

If designing your own blockchain, you obviously have a lot of leeway as to what goes into the data blocks, how they are stored and distributed, etc. The choice of networking and storage protocols are not what make it a blockchain.

So, back to your original question: what is a block? It’s just a chunk of data.

How is it stored? However you like, but for Bitcoin, they are just chunks of data disseminated via a P2P network of nodes, and many nodes will store those chunks in a database on disk. In fact, there are several bitcoin implementations which store them in different ways! But they can all agree on what’s valid, because of the blockchain.