Shorts: Tiers of Data

While the core DSNP specification tries to be implementation agnostic, the implementations and even the core specification have an idea that different classes of data should be handled differently. While not a novel statement alone, the different classes of data are part of what make DSNP both possible and flexible.

DSNP makes use of a three tier data storage architecture. While implementations may slightly differ on how or why data is stored at a different tier, the general structure is still the same. As we go through each tier, the security, availability, and costs change.

Tier 0: The Chain

At the chain level we have the highest security and availability for data. The data is stored on each node of the network and losing one piece of data can invalidate the rest of the chain. The trade-off is that the cost is highest as well. At this tier, we place items that have the greatest consequences if lost: Identifiers, permissions, and references to the next tier of data.

Tier 1: Inside Batches

Batch files form the next tier of data. While the data is off-chain, Batch files integrity is ensured by using on-chain metadata. We connect the contents of the Batch file back to the chain. A hash of the file allows users to know if the Batch file they retrieve is authentic, no matter the source. The chain also records who announced the new Batch. While the announcer could be unknown, delegation allows some level of user enforced reputation. Users are fairly good at avoiding unreliable services.

Reputation alone may be enough to maintain batch file availability, but the files are expected to usually contain a mixture of users (at least for higher volume services). As these users share the desire that the content remains available, the service has a secondary incentive to keep the file available even if some portion of those users leave the service. A final availability safeguard is the use of IPFS for Batch files. IPFS, is a distributed file store and would allow anyone to persist data. I expect the next implementation of DSNP to require IPFS or other distributed file stores to be used for batch files.

Even with all of these various safeties, if a batch were to fall through, the network would continue functioning. The network would know the data is missing (and who failed their responsibility to maintain availability), but it would at least have a limited impact on the future. We have investigated ways to incentivise long-term storage or archiving of batches, but for now it remains to be determined if it will be necessary.

Tier 2: Off-chain References

While Batches are stored off-chain, they have some level of network information that is needed for the network to function fully. The final tier of data is just the simple URL or URI (Uniform Resource Identifier of which URLs are a subset). While we often have a chain of hashes that secure the authenticity of the retrieved information, the data availability is the responsibility of the user that desires others to access it.

DSNP can assist in communicating the intent to delete or notify the storage location or content is updated, but the responsibility for the storage is ultimately the announcing user and their services.

Mix and Match

Several times above you see a mixture of which tier metadata is placed vs the data. This is an intentional pattern. Metadata falls into three categories: who is trying to send the messages, how to know it is authentic, and where to find it. By securing this information at a more secure tier, the information stored less securely can still have a higher level of authenticity and provide a proof of (former) existence even if that data is not available.

In the end, the goal is transparency and authenticity with options for costs and storage. Different types of data have different needs and DSNP is designed to flexibly meet those different needs. As technologies grow and change, DSNP should be able to grow and change with it.

2 Likes