04: Batching Source-Dependent Messages with Delegation

As previously discussed, for DSNP to work as intended, everyone participating should be able to discover all messages on the platform. Next, we realized that the need for trust in authorship and content of individual messages requires validation. Finally, individual DSNP messages, as discussed previously, involve no exchange of value.

So if a DSNP message is independent of the blockchain ledger, that is, an individual message is simply informational and is not required for blockchain consensus,

Then,

  • Message validation may be deferred until after the message is posted.
  • Validation can be done off-chain.
  • Validation can even be done when the message is first read.
  • Messages do not actually need to be posted individually on the blockchain.

If we’re free to wait to validate a message until the first time we access it, then there’s no need to spend compute time and transaction fees posting each individual message. Instead, messages can be collated into Batches and stored off-chain. The location of the Batch message plus necessary metadata is then announced on chain for everyone to discover.

Since Batches of messages are announced, the announcer is free to put as many individual messages into a given Batch as needed, which saves them money.

Spam

Any communications network needs to severely limit spam. Although the share of spam in email traffic has been variably decreasing for over a decade, it is still estimated to make up 45% of all email traffic. The reason is that email is nearly free. A blockchain network cannot afford to have nearly half of all of its traffic be unwanted messages, with many for possibly criminal purposes. This not only worsens congestion, but put unacceptably heavy burdens on indexers and validators. Reducing spam also reduces the human toll of potentially successful scams.

If it costs the same amount to announce a small Batch as a large one, that makes spam cheap enough to remove a significant incentive for Announcers to limit spam. Therefore we must also have a limited discount for announcing larger Batches. The discount must be enough to be an incentive, but not enough to make it worth ignoring spam.

Latency & Cost

There are tradeoffs for how long to wait to compile a Batch of messages. An Announcer with fewer messages may want to post small batches to reduce latency. An Announcer with lots of messages can save transaction fees by announcing larger Batches, although it takes longer to index, search and validate larger Batches. These factors suggest a few other limits that will need to be considered.

First, we are saying that announcing has a cost. For the End User of the DSNP – the person using a dApp – who we assume doesn’t want to have to pay to be on it – how does this person send their messages for free?

Second, we mentioned that Batches of messages will be indexed, searched, and validated, as well as downloaded when dApps need the content, which means there is a practical limit on Batch size.

Also, Scaling!

As we hinted in the previous sections, batching messages has another important benefit: it allows for built-in scaling of the network. We provide an incentive to batch messages as much as possible and this allows a much higher effective message throughput than otherwise, as well as saving all node operators on-chain storage costs.

The Solution

Our solution is to have End User explicitly authorize an on-chain Announcer as their Delegate, to Batch and store messages for them, and then announce the existence of the Batch on the blockchain. To make that process useful and trustworthy, we need to know several things:

  • Where to get the batch file
  • What’s in the batch
  • How much to charge the Announcer
  • How to quickly tell if the batch is “real”
  • Who announced the batch
  • That the Announcer themselves announced this batch

Let’s look at each one of these pieces and how that translates into data fields of a Batch Announcement.

Where to get the batch file

Naturally, Announcers will want the network to know where the Batch file is. A URL lets Announcers store Batches where and how they want, whether that is IPFS, Amazon S3, Google Cloud, or Joe’s Basement RAID Array. So we want a batch URL in our announcement.

What’s in the batch

We simply need a message type. A message type is the one type belonging to all messages in this batch. We debated whether to allow multiple message types in each Batch, however, Batch size is reduced, and searching, indexing and putting message threads together is made easier by putting only one type of message in each Batch.

When considering that a network that supports multiple message types will be dominated by a few types, while others may be relatively rare, it makes sense to split batches up by message type. Such a split allows message consumers to get only the types they care about. Processing the batch is simpler when individual messages in the batch have only one format.

What to charge the Announcer

We need to know the size of the batch at the posted URL, because as we said, we want to charge more for larger batches. It’s not practical to fetch every batch, get its size, then figure out how much to charge the announcer at every announcement transaction. Providing a file size is the easiest and next most trustless way to calculate part of the message fee at the time of the announcement, giving batch consumers information on how much they should expect to download.

How to Quickly Tell If the Batch Is “real”

Many people – not least of all our End User who trusted this Announcer to store their messages, and store them intact – would like to know if a Batch Announcer, or perhaps just their Batch storage service, is reliable. Since we are charging extra for larger batches, we would also like to know if an Announcer is announcing a small Batch and serving a large Batch from the posted URL, for example.

A foolproof way to detect whether a file isn’t the same as what was originally reported is the cryptographic hash of the file. A cryptographic hash changes based on everything about the file - its name, how much and what data is in it. Even a single character difference drastically changes the resulting hash.

Who Announced the Batch

We need to know who is claiming to post this batch so they can be properly charged and verified by the network and also so our End User can check, too. For this purpose and others, accounts on-chain will be assigned an identifier. The identifier is also stored on-chain and used as a key for a map of Delegates to Delegators. This key map, when stored on chain, and taken together with the cryptographic signature, below, provides a fast and reliable way to check that a message in a given batch was posted with a user’s permission.

Make Sure Announcer Really Announced This Batch

We have to be sure that whoever posted this message to the blockchain had the right to do so and is posting it as themselves. A cryptographic signature, which uses a signing key owned by the message author, is similar to a cryptographic hash in that it allows a foolproof check for whether the announcer announced specific batch themselves. If the signature was captured and applied to another announcement, regardless of announcer ID, the signature will not verify. Since the signature is computed using the file hash, then if the file that lives at the URL is altered, the signature check fails, too.

Batching of messages is a key part of scaling the network required for DSNP or other high message volume communication networks. It reduces the problem space. In an upcoming post in the series, we will look at how batching alters the interactions at the blockchain level.

2 Likes