@stevepatterson is on PowPing!

PowPing is a place where you can earn Bitcoin simply by socializing, for FREE.
Never tried Bitcoin? It's OK! Just come, socialize, and earn Bitcoin.
Check out stevepatterson's activities
Total Economy: 0.52 USD

On-chain Data Storage with Sub-nodes

I've been playing with the idea of data storage on the blockchain. The business model remains unclear, but here's a potential scenario in which it could work. I'm not sure if the technical details work out, but if they do, this model seems plausible.

Let's start with a problem.

Imagine that Johnny uses the BSV blockchain to host his podcast. Let's say each episode is a 15mb audio file. For ease of calculation, let's say he pays $15 in transaction fees to upload this content to the Bitcoin miners. No problem so far.

Next, he creates an RSS feed. In it, he points to a BSV node as the host for the podcast file. The node could either be a miner or a non-mining node. The trouble begins when listeners start downloading his podcast. Who pays the bandwidth costs?

If the audio file is 15mb, then if 1,000 people download it, that's 15gb worth of bandwidth for each episode. If a million people download it, that's 15tb. Not a trivial cost. 

Right now, services like bico.media are letting people access this blockchain data for free. But that won't scale. A single successful podcast on the blockchain would break their services.

Miners can probably handle on-chain storage costs without much issue. Bandwidth costs, however, are a much bigger problem with wild variance depending on the demand for the data.

Storing all data and serving all data is extremely expensive. Is there a better way?

I think so, if it works technically. We just need more specialization and more accurate prices.


Imagine that Bitcoin miners focused on storage, but didn't have to worry as much about bandwidth. In other words, they store the data but don't serve it to regular users.

Instead, they serve the data to sub-nodes - BSV nodes that don't archive all the data, but only the data they are interested in. Then, once they have this data, the sub-nodes distribute it to end users. They incur the bandwidth costs, not the miners.

In this model, the sub-nodes archive only a small part of the blockchain to distribute to their users. They are specialists, which means they can much more easily monetize the data, either serving it directly to consumers for a fee, or offering some API service to other sub-sub-nodes in the same field.

Here's an image:

The "Full Node" archives all data. If they had to serve this data for free to everybody on demand, their costs would be extremely high. So instead, the sub-nodes request they data that they want (for a fee), and they serve it to end users. That way, the bandwidth costs are passed onto sub-nodes that are trying to profit from the data.

What Are the Benefits?

Imagine that you're trying to build a business using on-chain data. Say, a company that archives and serves podcasts on-chain. If you had to run a full node to download everything, your costs would be astronomical at scale. You'd have to download all data just to access the small part of the blockchain that you're interested in. That's silly.

So instead, you could run a sub-node that only gets the data you ask for from the miners. And naturally, you would pay them for it. The miners' costs would be covered, and you'd have access to valuable blockchain data. 

End users would benefit by uploading their content to the blockchain from:

-Permanent immutable storage 

-No maintenance costs

-Their data distributed to multiple data centers (among both miners and sub-nodes)

Miners would benefit from:

-Transaction fees

-Serving data to sub-nodes

Sub-nodes would benefit from:

-Access to blockchain data without having to download the whole chain

What is Needed?

From what I can tell, two things are needed in order to make this happen. I don't know how feasible they are.

The first thing is the ability to tag information correctly inside of transactions. If you want your data to be archived and accessible, then it needs to be identifiable. The miners and sub-nodes need to share the same language for tagging and retrieving content.

For users to maintain stronger ownership of their data, the content even can be encrypted with their own private keys. But, the metadata needs to stay unencrypted, so that sub-nodes can access and use it. 

The second thing that's needed is, of course, the sub-nodes themselves. We need some software that allows people to connect to miners and constantly receive only a part of the blockchain.

If this works out, there are some pretty cool business models that are created by on-chain data storage. From my reading of Satoshi and Hearn, I don't think this was part of "Satoshi's Vision", but I do think it's a good enough idea to try. What do you think? 

powered by powpress
link Tip
bitcoin tipped:
0.32 USD
1 year ago
acmonides tipped:
0.17 USD
1 year ago
OnChainBitcoin.com 75 BSV
I have thought about this issue a fair bit and I think that you are missing a little bit of industry knowledge that might help contextualize further development of these thoughts. 1. Transit costs are calculated on both the ingress and egress of the transit (the 'greater of' typically, not both) so it doesn't make sense to have a specialized one way service because you are throwing away potential free transit, which is why miners make the most sense as the data provider as they will naturally have much more ingress from other block providers than egress from winning block. As a result, they will naturally have 'free' transit that they can sell at a low cost to clients (i.e. Be more competitive) compared to a dedicated subnode that has to bear the full cost of their majority egress. 2. The primary cost with transit isn't volume, it is speed. If you want 'fast fast fast' then you are using a tier 1 transit provider. A tier 1 transit provider will not charge you for volume, they will charge you based on the maximum number of megabits per second you use during a month (based on majority utilization in that month). If a service wants to reduce costs then they can reduce the speed at which they serve data. This is approaching free fwiw for internet speeds seen only 20 years ago and I expect that trend to continue in the future. 3. Peering with other providers is free (usually), unless you are a megaservice provider like Netflix or YouTube and people can demand extra payments or threaten to degrade your service quality. So greater connectivity (which is encouraged by Bitcoin) would reduce costs and reduce latency by cutting out intermediaries. This is why I expect in the future that we will see dedicated Bitcoin peering exchanges similar as we see now with LINX in London. Hope this helps!
musiq tipped:
0.04 USD
1 year ago
stevepatterson replied:
Thanks for the response. On point #1: You say, " As a result, [miners] will naturally have 'free' transit that they can sell at a low cost to clients (i.e. Be more competitive) compared to a dedicated subnode that has to bear the full cost of their majority egress." A couple of thoughts on this. First, selling "free" transit to clients sound like how I'm imagining the sub-nodes gain access to their data. I don't have the industry knowledge, so I'm unclear on the difference between "selling transit to clients" and "selling data to sub-nodes." Second, isn't there going to be a natural asymmetry when it comes to ingress vs egress? So, even if miners have "free" transit, it's only to a certain level. If some data has extremely high demand, then their egress could greatly outweigh their ingress. So when you say "miners make the most sense as the data provider as they will naturally have much more ingress from other block providers than egress from winning block", this seems mistaken to me. While it's true that miners are pushing out far less data, inside of blocks, than they're taking in, if that data is also being generally hosted for the entire internet, their egress will become far larger, right? So it would seem like a natural thing to reduce their outgoing costs by only supplying that data to a smaller group of sub-nodes to distribute. But help me out, because I'm sure I'm missing something here!
911 replied:
As a miner (or a sub-node), you have to buy transit (bandwidth as a certain speed for a certain period) in order to operate. I would expect that as a miner, you have more inbound traffic consuming transit than egress because of how the block propagation works. Given that you only pay for the 'greater of' the inbound or outbound, this 'lodsidedness' can give you a pricing advantage against a competitor who does not. Let me give a example and hopefully this helps. Miner A: I pay $100 a month per Mbps (notice speed not volume). My ingress ends up costing me $1,000,000. My egress would have cost me $600,000 but I don't pay for it because my ingress is so much more. I then find enough blocks to cover the $1,000,000, my operating expenses and then I have $10,000 left over for profit. Miner B: Exactly the same, however spots the opportunity to serve data to clients as well. Now they can sell blockchain data using the excess egress for an extra $300,000. As a result, they are now 30x more profitable than the competing miner without paying anything extra. What happens if the egress exceeds ingress? It is greater of so they will pay more for the egress but nothing for the ingress. Subnode c: I want to sell blockchain data, I get the exact same deal as miner B, so my revenue is $300,000. However, I still need to pay for transit. The transit ends up costing me $600,000. So whereas Miner B ends up much more profitable, competing on a head to head basis as a subnode, I actually end up losing money. Does that make sense?
stevepatterson replied:
I think I see what you're saying, but I still spot gaps. First, the "subnode scenario" is not right. You aren't paying for transit that's costing more than you're making in revenue. The whole point is to get a smaller subsection of the blockchain data and only serve it to customers *when they pay for it*. That way, whatever my egress costs are as a sub-node, they are covered by usage. Second, there are three practical problems with the idea of miners serving the data. - Miners have to distinguish between free data and valuable data. If they mistake valuable data for free data, they'll eat huge costs (and if they end up charging for the data after-the-fact, that can cause problems for whoever was using it). It seems to me that specialists (sub-nodes) would be best at determining which data will be highly demanded. - Miners need the software infrastructure for charging for access to the data. I don't see this anywhere. Everything I see looks free to access, which is inevitably going to cause problems. With the sub-node model, miners can just focus on transaction processing and leave the data-servicing to others. - If there are businesses that depend on this data, they could run into serious problems if the miners are not allocating sufficient time and resources to serving all data. In the sub-node model, there would be specialists dedicated to serving that data, and the practical reliability would be far higher. In the near future, I just see too many things to juggle for miners trying to serve as universal data hosting. Perhaps in the very long run, but not now.
911 replied:
I am saying that Miner B has a pricing advantage due to the transit and can sell the same data for less allowing them to achieve much higher profitability at a price where Subnode c would actually be losing money. Miners don't really care that much about free versus non-free data. If they want to give away free data, they can just make it so you get that data slowly so the cost is minimal. Similar to how you can query the nasdaq servers for free pricing information but it is on a 15 minute delay so traders pay significant sums to get the same pricing information without delay. You have a point on specialization if there are data service specialists then it will be because they are peering with all the miners in a Bitcoin LINX like system.
stevepatterson replied:
I see what you're saying. This point I think is crucial: "Miners don't really care that much about free versus non-free data. If they want to give away free data, they can just make it so you get that data slowly so the cost is minimal." This is why I don't think miners are the best data providers. People will absolutely care about free versus non-free data. Some of that data needs to be quickly accessed. For example, with the podcast situation, if the miners are hosting and serving the data themselves, then it *must* be fast for a good user experience. But if it's fast, it will be extremely expensive if the file is popular. So, one solution would be to have a sub-node specializing in that set data and having it on-hand for customers.
911 replied:
Yeah, I don't think that we disagree at all. I agree wholeheartedly, although my counterpoint is that block propagation must be fast as well. They are already paying for fast so adding another service line that can take advantage of the infrastructure and services they are already putting in place makes sense and they (should) be able to do it cheaper than a dedicated service provider in the near term who doesn't get the same economies of scale.