Trustless data validation

sziller.eu - Szilard Gabor LADANYI - 2023

Disclosure

No information in this document is owned by the author, thus no ownership of this information can be sold. The author of this document is only and solely rewarded - if that is the case - for his effort to collect, present, explain, data included herein. Data which is freely accessible on the Internet anyway. This document therefore should not be sold, but is freely useble by anyone.

Proof: A version of this document is hashed. Said hash is stored on the bitcoin blockchain.

The necessary background

History

The need for a decentralized currency did not come out of the blue, there were lots of attempts at creating such a currency long before 2009. It just never took off.

Examples

Here are some currencies that were created to be decentralized:

Digicash: BlindedCash - David Chaum (1982) (1983 - 1998)

e-Gold: Barr Downey, Douglas Jackson (1996)

PayPal - turned into fully centralized payment system

B-Money: Wei Dai (1998)

(HashCash: Adam Back - first PoW (1997 - 2002) - algorithm)

Bitgold: Nick Szabo (1998…?)

No surprise these devs' names have - without exception - all been surfaced as possible real identities of Satoshi Nakamoto.

The reason none of these currencies were (or could possibly be) successful was also already understood. Nick Szabo - inventor of Bitgold - came to the conclusion, that his own system’s biggest (only?) challenge was:
decentralized, anonymous consensus - or the lack thereof:
- How can many actors of the same hierarchical level - without a trust arbiter - come to consensus on anything. This problem is known to be the Byzantine Generals Problem, and is known and proven to be unsolveable. (these cypherpunk guys aren’t exaclty idiots… to say the least) This problem is by the way so fundamental, our entire society is based upon concocting solutions around it: We outsource consensus and reward agents providing it. Who in turn usually abuse the responsibility and the power that comes with it. Satoshi - who eventually solved the unsolvable problem - himself wrote an email answering a question to a mailing list member, explaining how his invention solves the BGP.

Let’s step back again to previous 'solutions'!
Ingredients like: blockchain, cryptography, hashes, nodes were already suggested - some of them decades ago. All these parts were already gigantic compromises as the problem to be solved is - proven to be - unsolveable.

An important example: the blockchain - a chain of data, linked in sequence. An add-only (…) database. A terribly redundant, slow, a huge - for what it stores - range of information. One, no sane programmer would ever pick to be a database for any problem. Nobody really. The blockchain (hashchain) however was first proposed and actually implemented by Haber and Stornetta in 1991. It has in fact been used as a chained proof of immutable data, and has been published and is in fact running EVERSINCE in The New York Times. It started in 1995! Just to help drive the point home, the very first-ever blockchain (hashchain) implemented was and IS used to achieve decentralized consensus. That is how important to solve decentralized consensus for the authors was.
(plus it is a Proof-of-Work chain to the bone).
And this is why nobody ever used it for such a long time ever after. Because a blockchain is a terrible database. Any commercial database, or custom-made one is faster, more effective, smaller, easier to process for practically any problem. Unless of course, you MUST solve anonymous decentralized consensus. And even for that, it still is not enough! The Blockchain is - as we know - one of the compromizing ingredients. One of them, but not the last one, and not the most creative one.

The fact, blocks of a blockchain are linked together, by including the previous Blocks hash, DOES not make it immutable at all! You can simply recalculate the entire chain in seconds, and have a new chain. The crux of the matter is, how to then know, whitch one is the 'valid' one.
'Valid' is a bit missleading though! 'Validity' is NOT - by definition - in a decentralized system defined centrally. 'Valid' means the fact, that there IS (always) a version, each actor - of the same hierachy level - comes individually, independently and on his own to the same conclusion all others do.

The actual last step - proven to be the breakthrough - was the Proof-of-Work mechanism. A method also already in use for spam filtering which is based on the idea, that we only read (value) data, which was - at least to a predefined level - provably important to the sender himself. Alternatively to have the sender of the information skin (stake) in the game. To intrinsically ensure: no matter what, if you broadcast something, you are risking something by doing so. There is an upper limit on how much spam (regardless of its informational value or otherwise any receiver of said information assigns to it. What is important, the source objectively if forced to spend real-life resources in order to be heard.)
- Adam Back’s HashCash actually used PoW, but had scaling problems preventing it from eventually taking off.
Yet another way to look at it is the difference between free speach (having the right to speak) and right to an audience. Imagine, you’d only read mails, whos sender solved a sudoku, you sent her/him in advance. You easily check, if the sudoku is solved (if he-she invested DEDICATED energy, you required), and then you give the message credit. Why? Because you set up the puzzle with two specific features in mind:

you can be certain whoever solves it MUST invest a certain amount of energy and

the invested energy can ONLY serve the sole purpose to solve that specific riddle.

Once the protocol itself forces you to spend the ultimate real-life resource, the decentralization stems directly from the distribution of said resource. Your system will possibly be as decentralized as the resource is distributed. Would you on the other hand try to use internal value to reward / punish pier behaviour, there is no limit to how much centralization will occur: (PoW vs. PoS)

Long story short: bitcoin - as a solution to the unsolvable BGP - works because it makes HUGE compromises on many levels (redundancy, insane energy use, extremly clunky database, huge system). If all these building blocks are applied together, one of which - the most recent, and most important one on system level - is the extremly energy intensive Proof-of-Work:
then, and only then is the BGP solved.
This complete set of building parts enables a more-or-less scalable solution. All these together.

— How good of a solution? — you ask. Well the only metric we have here is market cap.

— That good of a solution. That’s how much humanity TRUSTS the solution.

As the BGP cannot be solved, the upper end of the scale - the cap of what we call best in terms of anonymous decentralized consensus - is the only point we can define.
This is at this point-in-time Bitcoin.
There is no system even in the same ballpark. Bitcoin is - as of now - THE decentralized consensus system on the planet. The more energy there is spent on this system (as in the more energy humanity is ready to spend to maintain an immutable, openly verifiable, freely accessible database) the more secure it will be, the higher the max level of decentralized anonymous consensus achievable gets. Which on the other hand means, any other system does NOT provide anonymous decentralized consensus (if regarded binary).

Punchline

There is Bitcoin, and there is other shit lacking the only value proposition Bitcoin and its blockchain was ever created for.

Please rethink

Why "blockchain technology" is a meaningless buzzword.
Bitcoin is a package of 4 groundbreaking and complicated techs put together. One of which is the Blockchain. When used out-of-the-scope a terrible database. If someone tells you, he uses a Blockchain, just ask him why he picked such a crap? Blockchain was created as part of a solution to an extreme problem. There still needs to be at least one usecase outside a Proof-of-Work based, massive network using high level cryptography where using a blockchain is of any advantage to basically any other database. "Blockchain technology" does not exist. It simply doesn’t. For people interested in databases: you can write a Blockchain in 3 lines.
There is no 'there' there…

Too big of an energy usage - wrong approach: How valuable to humanity is an immutable database. Not measured in couterfeited, freely printable fiat money, but valuing it in the currency of the universe: namely Energy. That valuable. The energy usage is the feature. It is what enables the solution of the BGP. We, humanity tried it. Long and hard, and couldn’t solve it any other way. Along comes Satoshi, and solves the BGP using PoW. Now - after we - humanity - finally have solved it, let’s try it again without the final-key ingredient. Because it is too energy intense.
--Well, really??? - Helloooo, PoW is the point!
Please understand, the space in this database is capped. There are as many units, as predefined. This openly calculable and verifiable feature enables it to be of value. The space in the planets most decentralized, provably immutable database is limited, and can therefore be of value for somebody. It might even have - so called - infinite marginal utiity, as additional space quanta are just as valuable as the first one, so users will hoard it.
As such it can motivate someone to invest real currency - ENERGY - to acquire some. It can and already does have a marketplace, namely the whole Planet, and as it is decentralized, actual use isn’t controlled, or gate-kept. We do not have a larger marketplace on this planet for anything, as more accessible goods - commodities are much more abundant. How so? Three reasons, all three related to scarcity:

in most cases scarcity is artificial

in many cases goods are not really fungible

in many cases scarcity although natural is unevenly distributed by nature.

Energy

Energy is scarce, the need for it arises by nature, it cannot be controlled to the extremes - less and less so -, and is one of the most fungible goods we have (we even have a physics - dimension referring to it) And please understand the exclusivity of Proof-of-Work system(s?). Energy is fungible. The energy you spend to one PoW system could just as well had been spent on the other one. Even if there are different hardware techs, the gateway between different ones is still energy, so at the end of the day, proof of work systems are each-other’s competitors. I firmly believe, there can and ultimately will only be one. This - combined with the fact, that by market cap the only proven BGP solution is PoW based - tells me:
Being a Bitcoin maximalist is the only rational stance.
When it comes to the ultimate BGP solution, thus the best digital store of value, it is and will IMO remain Bitcoin.

On meta level

Just how logical it is. Intuitively: 10 people tell you something on a topic, you do not know anything about. Who will you believe? I bet, you will believe the one who provably spent most time studying it, burnt more energy to acquire said knowledge. Proof-of-Work is everywhere. In reality, it is our best way to determine the credibility of sources.

Gold

A gold coin is as much a proof-of-work as a satoshi can ever be. All refined gold coins in your hand are proofs of work. It is a proof that a well-defined amount of energy was spent to just create that very coin. This is why it works as creditless currency. On the other hand, the stamp of the US mint is a centralized verification method, because you aren’t really sure, you hold gold in your hand. (you can’t verify instantly)
Verification (auditing) of gold is a huge issue, and the more valuable it gets, the bigger it will be. That challenge does not apply to bitcoin for instance. You can verify your bitcoin free of charge in milliseconds.
Transporting gold is also an issue. The minimal amount of energy can even be well-estimated to transport an ounce of it in a certain time from A to B. Which depending on the value of the compared bitcoin can be orders of magnitude bigger than the energy a bitcoin TX consumes.
Storing it probably even more a problem: It takes space, and arguably lot more effort to store safely than to store the same value of bitcoin.

There is not one statement in any of the 200 videos about gold (belangp’s ones that is), that describes any of gold’s features - making it a great store of value -, that couldn’t be applied without any stretch to Bitcoin. Not one. Other than maybe being beautiful… (which I never agreed with anyway - it is nothing more than putting the carriage before the horse) And - being a huge Gold believer myself - I spent a lot of time convincing my fellow Bitcoiners how incredibly valuable Gold is, and what terrible misconceptions about gold there are.

You want to judge, if telling you this info was important to me: just think of how long it took me to write it. Think of the Energy I invested into writing it. Consider it a Proof-of-Work

Trustless data validation

Talking about transparent data validation, where data is still kept hidden from those not having the right to access it is a tricky proposition and must be discussed in detail in order to be understood. Main characteristics the described and discussed model features:

The Prover can prove any information to a Verifier beyond reasonable doubt.
The Prover does not leak any data not meant for the Verifier.
The Proof does not tell anything about the proven data itself.
The Proof is unchangeable - tamper-proof.
That very fact - it not having changed - is publicly and freely verifiable.
Proof sizes are compact, Proof data easy to handle.
Proving procedure is quick, easy, transparent and trustless.

Zero knowledge proof

An extremely important tool in achieving trustless data validation are Zero Knowledge proofs. They grant anonymity, conciseness and scalable trust while still leaving the control where it belongs to, namely close to the prover’s chest.

An important concept to understand is the one of:
Zero-knowledge-proof
It represents the possibility to prove - and verify information, without actually leaking - receiving information other than the fact that it was proven. I assume the reader of this document to be familiar with Zero-Knowledge-Proofs, so I’ll not go into details on the concept.

Example: To demonstrate the very basic concept of a Zero-knowledge-proof, read this quick exercise:
--Prove me, you’ve found the book with Queen Elisabeth on it, in the picture below, without actually letting me know where it is.

Solution: You pick a huge sheet of paper, cut out a book-sized hole in it, lay the sheet over the screen, with the Queen’s book in the hole. Now you proved me, you’ve found it, but nothing else. I am still not able to tell where the book is, unless I find it myself.

Tiers of trustless data validation

Other than the concept of Zero knowledge proofs, we need to dig deep into the challenge of trustless validation. In order to understand it, I dissected the issues into 5 tiers:

0. The Oracle problem (measurement data malleability)
1. Hashing
2. Signatures
3. Tamper-evidence
4. Tamper-proofness

The Oracle problem - why important

A fundamental issue, when documenting real life data. This will have an impact further down the line, so we discuss it in detail.

Why does this problem occupy a whole chapter in this document?
--In order to prove: Neither a Turing-complete language, nor a Proof-of-Stake solution have any advantage over Bitcoin, when it comes to the fundamental problem underlying smart contracting.

In our case means: I am the Prover, my client is the Validator. If "validation" is supposed to mean checking if a real life event happened (bool) or checking an events details (complex data), then my suggestions for consideration are as follows:

0. The Oracle problem

Result of an outside measurement cannot credibly be passed to a closed system without a trusted party. The essence of the Oracle problem is, there is no way to perfectly internalize data from an external source. It is totally irrelevant if we talk about databases, programming languages, smart-contracts or functions. As soon as a system needs data from external scope, it inherently trusts the validity (content, not syntax) of the data.

Once inside the system, we now - thanks to bitcoin - have ways to proof beyond reasonable doubt the data was not meddled with. This is the topic of our exercise.

An opinion: I do not see the Oracle problem be solvable the same way the Bizantine Generals Problem was solved. I do not believe, there can be a one-size-fits-all solution. It will rather be analogous to the Cancer cure: as there will have to be different Oracle solutions for different usecases, based on Measurement types, Data types, contract types. Even based on the number and relations between parties involved in the contract using said Oracles.

The only exception from the Oracle problem is the Proof-of-Work algorithm itself, which - due to Energy-Information equivalency - limited exclusively and strictly to it’s usecase is able to deduct and internalize digital information from the outside (real) world w/o a second party. The digital system can as much as it needs, come to a - probability based - conclusion that (and how much) real energy was used, and does not need an Oracle to mediate. This does not help in case of Oracles necessary for Smart contracts, as it is not in the same scope.
HOWEVER: let me speculate, and make a bold prediction: a general(ish) solution to the Oracle problem will have to include a Proof-of-Work ingredient, because of the same reason.

Accepting there cannot be a solution to the Oracle problem, we seem to have two options:

Using external, trust- and influence minimized Oracle(s): there are exceptionally well researched proposals using DLC-s on-chain, combined with custom-made game-theoretical setups, minimizing the possibility and motivation of Oracle corruption: by isolating the Oracle from outside influence.
Bitcoin already uses such Oracle scheems for prediction markets and sports-betting.
Using a trusted Oracle
The Verifier side (your client) accepts the fact, that it is not a real life measurement being documented, but the Prover’s (in our case: ours) digital version of it. (leaving the unavoidable possibility for the Oracle-controlling party to meddle with the data, at (!) digitalization: which is - again - in essence The Oracle problem)
Using a controlled Oracle

Whether a function (SmartContract) uses Turing complete or incomplete language is totally and absolutely IRRELEVANT at this stage, as the Oracle problem occurs when entering the system.
Also, the consensus mechanism of a blockchain is totally and absolutely IRRELEVANT as long as we talk about how a function (SmartContract) trusts its incoming arguments.

HOWEVER: if you use a Turing-complete system, where the Smart-contract can access on-chain data, it is orders of magnitude more complicated to isolate the Oracle, thus nearly impossible to minimize it’s bias.

A simple example to the Oracle problem

Let’s suppose you and me, we make a bet on the HUF-EUR exchange rate. if it is above 410 tomorrow at 12:00, you get 10 bitcoins, if below, it’s me winning. Regadless of which blockchain, we make an immutable SmartContract with said conditions. We HAVE to - even if only on a meta level - aggree in advance, how the exchange rate will be entered, what the source of the data will be.

And now, just start asking questions

Will any of us enter the data? (if so, why even bother with a SmartContract, it’s 'Trust' written all over it.)
Will we ask the MNB website?
- What if the website goes down at 11:50? - our money might get stuck.
- What if there is a bug, and it returns a wrong answer?
- What if it receives wrong data, does what it was meant to do, still the answer would be wrong.
- Well, lemme see: I’ll tell the operator to whoever asks at 12:00 answer 409, and receive 2 coins from me in exchange.
- Is the contract public and readable, I’ll probably not even have to ask them, my phone will ring, and they’ll come to me with an offer.
Will we ask multiple sources:
- what if they contradict each other?
- What if they collude? this would just introduce a clusterfuck: more eyes watching the contracts, more room for corruption, partial co-opting…
- What if one of us controls some?
Do we introduce time tolerances? How to reconcile these - in advance I might add?

Using a trusted Oracle

If I use an Oracle, I need to know it’s costs in advance. An on-turing-chain decentralized smart contract has a huge overhead.

Using an Oracle run on a Turing-complete blockchain only solves The Problem, if the object of the measurement is already on the same blockchain (which itself questions the very function itself).Such an Oracle only kicks the can further down the road, as Trust at analogue-digital conversion still is necessary.

For this reason I understand an Oracle to be the entity active at analog-digital conversion. And this is why using a turing-complete chain is not an advantage, while on the resource side it is a clear disadvantage vs. a Turing-incomplete one.

The off-chain, centrally run software responsible for the Oracle can quite well be proven and verified inexpensively. You for instance distribute it using git, and prove its version. You can run your Oracle code on a centralized server, as long as you prove the code version. Oracle problem still not used, software-side however became quit immutable.

I can channel the motivations of the Oracle if it:

does not know who is requesting the data
does not know the motivations behind the data it provides
itself has something at stake and motivation to behave according to the protocol.

Using a controlled Oracle

- Ignoring the OP as verifier -

Here’s the shortcut, the hand-wavy solution to the unsolveable: Let’s redefine our goal: I do not document the real life event, but rather my 'opinion' about it. This is a qualitative difference as we introduce a new layer of trust. Both me the Prover and my Client the Verifier make the following assumption: Until documented I, the Prover, initially have the possibility to massage the data.

Tiers of trustless data validation explained

The following concepts and technologies are important, as these bring us step-by-step closer to trustless validation. The further down the list, the closer we are to our goal.

1. Hashing

I assume the reader of this document to be familiar with Hashing, so I’ll not go into details on the concept of it. A Hash function simply creates a fixed sized number as an output in a deterministic manner. Changing the input results in a seemingly random, unpredictable change in the output. Hash functions are one-way, meaning, it is easy to calculate forward, but infeasible to reverse.

In Bitcoin, and for that matter in our document, we use and refer to sha256() as the default Hash function. Bitcoin for most of the time uses double sha256: H(H(data)).

Use of words: Applying a hash function to a preimage results in the hash:

hashfunction ( pre-image ) = hashed data or hash

Hashing is a fingerprint of a data. Hashing does NOT make a data immutable, by any strech of imagination. As long as the Hash itself isn’t set in stone, you can simply rehash the altered pre-image, and sell the new hash as the original. A hash does not tell you anything of the origin of the data nor the time (sequence) of its emergence.

2. Signatures

I assume the reader of this document to be familiar with signatures, so I’ll not go into details on the concept. A Signature consists of a Public key, a random number, a signature data and some meta-data representing what was signed in the first place. Signatures just as hashes are the result of one way functions, meaning you cannot reverse them. Neither the original data nor the signers Private key are accessible knowing a signature. (given the signer knows what he does) A common misconception is, that signatures were deterministic whereas they aren’t and shouldn’t. A given data can have godzillion different signatures, even using the same private key to sign. A validator can still see, if the data was tampered with.

Security model of Signature schemes we use in Bitcoin is based on the assumption of the reverse discrete log problem being hard. A clock arithmetic problem, underlying many cryptographic solutions. Bitcoin and Ethereum for that matter use ECDSA (EllipticCurve) algorithms, which aren’t linear, thus not being able to handle some multisig scheemes inherently. Schnorr-signatures on the other hand being linear, enable us to concoct multisig coding on signature level. Schorr signatures are also formally proven to be bulletproof. (ECDSA is only assumed to be) Bitcoin uses Schnorr since November 2021.

You usually do not sign the data itself, rather a hash of it.

A signature is really just a signature. Using the signers Private key, a signer can prove himself to be the one having created the signature thus the document. Signature does NOT make a data immutable, by any stretch of imagination. As long as the Signature itself isn’t set in stone, you can simply resign the altered hash, and sell the new signature plus data as the original. A signature tells you that an entity was in control of a given private key, commanding the public key included in the signature and if it matches the original data(hash) itself.

The fact, that creating a signature does not incur cost is the fundamental problem with blockchain based Proof-of-Stake systems. See: Proof of Stake. Creating a signature is basically free. Thus, in a Proof-of-Stake system NOTHING prevents a Node to calculate and keep alternative versions of reality. This is the well known 'nothing at stake' problem, which isn’t solved for PoS. And looking at the problem from the Oracle point of view, you’ll not be able to introduce external cost other than energy without an Oracle (an additional 3dr party). Signature based 'mining' does not solve the BGP!

3. Tamper-evidence

We call something tamper-evident, if a verifier can be certain, if data was tampered with. Example: your average concert wrist-tape or the lid of your beverage. Tamper-evident storing of a data is nothing new, in fact the problem arises, when owners of a data do not even give a f.ck, if it is evident, they meddled with the data. It is useful up to a certain cost.

On the other hand, Tamper-evident might in many cases be just enough. For instance: if Validator’s inquiry ends at checking the data, well then Tamper evident is the actual goal. That goal of the Validator is by definition finding evidence. And for tamper evidence, you may not even need a blockchain.

You really need to pause here, and distinguish between these two concepts. Tamper-proof systems are necessary, if the data you check is further used in important calculations, functions, determining real life events, etc.
It is also important to understand: tamper evident - once tampered with - shows you a new state, out of which you can not necessarily recalculate the original state. Also, once tampered with sequence and timing info is also gone, even if the Verifier knows it happened. So in tamper-evident systems, if you need old data, you have to keep it!

4. Tamper-proofness

Tamper-proof are data storing methods, where it is disproportionately costly or practically impossible to change the stored data. Now that is hard to achieve, in fact until Bitcoin, it was digitally not achievable. Not in a decentralized open manner anyway. In order to create a system of minimal trust, we need to achieve the lowest possible trust, when moving up between these points, while keeping scalability, usefulness and a practical architecture in mind.

In Bitcoin Tamper-proofness is probabilistic. The more time passes, the more certain you can be, data will not be changed. Neither data, nor discrete data’s sequence. So if your life depends on data, you only need to keep the youngest collection, as older data cannot possibly be changed.

It is hard to overstate, how important Tamper-proofness is, and how little use of trustlessness on lower levels mean, as long as Tamper-proofness isn’t achieved.

Tamper-proof is not automatically decentralised. Disregarding external impacts system level tamper-proofness might technically be achievable in centralized way. In fact, it could even be easier. Once however outside motivation or errors are considered, Tamper-proofness CANNOT be achieved in a centralized system. Having a single point of failure is inherently prone to failure and/or corruption.

For a fully trustless validation protocol,
one has to climb to tier-4
while leveraging all other tiers below.

Moving between Hashing and Signature: Once we picked an economic and privacy enabling Hashing process, we need to make sure, the Validator knows, it was me (the prover) providing data and/or proof. Signing data cuts out a large amount of trust, given the Verifier knows our Public key(s).
Moving between Signature and Tamper-evident: Though signed by me, data can still be re-signed and re-sold as the original, absent a method making it evident I massaged the data in hindsight. Making the data tamper evident on top, increases trust by a huge amount.
Moving between Tamper-evident and Tamper-proof: Having evidence of someone having changed the data doesn’t do much, if we cannot enforce a penalty and/or still have the unaltered data. We have to make tampering costly for anyone up to the point, where it does not pay of for the attacker, but is still rationally cheap for us. Good news is, bitcoin is quite cheap, so moving up a step is almost a no-brainer. With moving to a Tamper-proof stornig system, we cut the last substantial amount of trust out of our protocol. Now, and only now is the chain complete. Having tamper-proof validation actually enables us to write automated processes, as uncorruptability of the in-system data is granted.

Once stored using the proposed protocol data will be immutable. Storage will have to be designed based on Business model and Architecture. The last chapter - documentation of a real-time demo - introduces an example: how to store, request, prove, validate data in such a system.

Merkle Tree

It is a binary data structure, a binary tree that consists of hashes. Data to be stored in a MerkleTree are stored in the 'leaves', all branches are hashes of the branches (or the leaf) originating from it..

            Mroot               <-- Merkle Root
          /       \
        0           1           <-- branches of Combined Hashes
      /   \       /   \
    00     01   10     11       <-- branches of Data Hashes
    |       |   |       |
    D1      D2  D3      D4      <-- data stored

Usecase: Oracle

Using Merkle trees, an Orcle itself can prove validity of data for adversary Validtors, while storing only a couple of kilobytes and using its signature to validate data. The SmartContract can be off-chain, using DLC-s, only touching the Blockchain in case of a debate.

We listed two Oracle scenarios above. The two usecases implement the same basic logic: The first one asks an external Oracle, in the second one I am the Oracle myself. Does the Oracle return bundled data verified in a Merkle-Tree, we witness how the Oracle is put under pressure to behave according to the 'rules'. As more and more actors request data from the Oracle - being the trusted external information source - its stakes are raised to:

tell the 'truth', thus not ruining possible later revenues
to keep my cost low, putting more info into the same merkle root: which on the other hand makes it even harder for me to massage the data later.

This applies to both an external Oracle and to me as well, if I take the role of one.

Usecase: trustless data validation

The same applies to my - the provers - data, if and when I prove something towards a validator (client). My software can in milliseconds on p2p basis prove validity of any data, once I checked if the request comes from the user with appropriate rights.

In a state - I want to document - I make x different measurements. The content, timing, frequency is arbitrary, as is the identity of the later verifier. These data, however big is ordered in a MerkleTree. Results, details, parameters can be stored by me, off-line, off-chain. Furthermore, lets be honest, whether logic of data they do not even belong anywhere else, as it unnecessarily bloats the system increasing its cost of use for me as well.

Role in scaling

Logic, data, details: all stored / run offchain: Close to ones chest, centralized.
The Merkle root of the arbitrarly sized data however is stored on the only really immutable Blockchain. Or depending on scaling requirements is as-good-as stored on the blockchain:

--Does everybody really need to see it all?
--Is it really necessary to save even the one root on the chain, if I guarantee the Verifier can make my fraud as costly as possible?
--Even if necessary, does it have to happen all the time?

The answer is probably: NO. It may all happen on a 2nd layer. In Bitcoins case on the Lightning Network. Once we realize this, and combine it with the bundled Merkle-Proof, we have a fully scalable absolutelly bulletproof system. See also: Scaling

Let’s go one step further, if for some (subjective) reason we’re not keen on Bitcoin, we can build or own blockchain. It’s 10 lines approximatelly. I can give nodes to my clients. There is a short code to run a demo-chain shown in a real-time presentation.

Scaling

Let us shortly discuss an important issue, that of scaling. As long as we have a system of which measurement validation is the most important (only?) task to be decentralised (turned into trustless) we would keep all other building blocks of the system under our control: centralized. In the hypotetical case, anything else would have to be trustlessly verified by external clients, those factors (data, code, processes, etc.) would also have to be scaled as highlighted below. (I do not analyze centralized system modules here. Scaling of those is the task of the system architect.)

Aspects

…to consider, when scaling trustless building blocks of your development:

number of users
measurement data size, frequency, timing
layer architecture
task delegation - putting centralized tasks onto blockchains

Without diving too deep into each of these factors, let us touch on some basic concepts, and discuss which abstraction level to tackle these on: The system is by-large a centralized one, under our full control up to the point when data-to-be-validated is published. As introduced in detail, data can and should be bundled into larger chunks. The more data we can fit into one M-tree, the better. We can include the data from multiple sources, for multiple users, with user of different rights. We can even include external provers data as a service. (serve as an Oracle) And the Merkle-root’s size is still the same. So the issue of what to store in one given unit is not an issue when talking scalability.

Another question is the frequency with which we want to set our measurements in stone. Well, frequency can also be bundled. You can use a milestone approach, by sending different 'immutable' - level data on different channels, to different users. Let’s say, you publish your trees once a day over your preferred blockchain. In the meantime, you can send your trees to your client via … whatever you like, email, smokesignals, doves, or simply put it on your homepage.

Then - as mentioned above - there is the "if a tree falls in the forest, and no one hears it, does in make a sound" approach: Use Layer-2!

Publish measured data every second by the thousands on your homepage - hash and sign, if needed
Send your clients signed emails once every hour including his branch of the tree
Create a Bitcoin Lightning TX - with your preferred multisig or timelock - every day
Settle your Bundled Trees once a week on-chain for 2€-s

Your scaling opportunities are endless, really.

But most importantly: why would we use or need ANY of our logic to RUN on a Turing-complete or incomplete Blockchain? All we need is - for whichever usecase - data to be Tamper-proof. Other than that, scaling is best achieved, if your code runs on a small number of dedicated machines rather than on a world computer. Running complex logic on a myriad of machines - that do not even grant perfect immutability - is a receipt for un-scalability. Instead: limit the tamper-proof storage of only necessary data to a trully immutable DB, while delegating every other task to dedicated servers under your controll.

Using blockchains: the bottleneck of ALL scalability in decentralized tasks will ALWAYS be onchain data. You can optimize the shit out of your system: if the one, tiny task is being performed ten-thousand times at the same time, on god-knows-what-crap-of computers, you can scale a factor of 5 by simply moving that task offchain.
Moving data and/or tasks off the chain is THE PRIORITY.
How could you possibly be less effective, than doing ANY task on hundreds of computers at the same time. While it even costs you on-chain resources (such as gas) as well. A good rule of thumb: the thiner the better. Blockchains using Turing-complete languages are therefore by definition an overkill. The fewer data you store and process on-chain, the less you need the DB to talk Turing-complete in the first place.

In bitcoin this is already the tendency: we at this stage even strive to replace code with proofs.
Meaning we try to only put the most necessary proof of an action on the chain, rather than the action itself:
MAST, TAPROOT.

Lightning Network

I assume the reader of this document to be familiar with the LN, so I’ll not go into details on the concept. We refer to the Layer-2 solution of Bitcoin as the Lightning Network as we presuppose an existing network. In Bitcoin - being pier-to-pier - no one can forbid you to implement Layer-2 without even touching the 2nd Layer network. In this document we use Layer-2 and LN interchangeably, but most of the time you can think of Layer-2 as a basic messaging system between two piers once they’ve anchored a SmartContract on the Bitcoin Blockchain. This way, you can spare having your keys in a LightningNode hot wallet, spare running an LN Node.

Layer-2 means, you trade TX’s with your pier, using ANY protocol you pick. As the validation model works with a one way channel, you are free to develop custom-made quick and easy layer-2 channels and solutions.
Off-chain scaling is trivial.

Back to the top: Trustless data validation

Consensus mechanisms

Thanks to Bitcoin’s stellar success, there have been countless attempts at copying it. As Bitcoin is basically defined by the following 4 fundamental factors, if you want to create a distinguishable system you’ll have to change one of these 'ingredients':

1.a: proof of work algorithm
1.b: blockchain
2. : incentive and validation system (enforced by the code)
3. : network: size and distribution

As I detailed in separate essays, once you separate blockchain and proof of work, the fundamental (and only) value prop of Bitcoin - anonymous decentralized consensus - in no more achieved.
No wonder Satoshi never referred to these as separate building blocks. He always talked about a Proof-of-Work-Blockchain and would probably only have talked about 3 components all together.

Some newer systems, changed the incentives, some altered the validation language and some tried to invent a different consensus mechanism. This document introduces Proof of Stake consensus mechanism, while highlighting the main differences vs. Proof-of-Work, and will show you, why - while having some great features - it fundamentally misses the whole point of decentralization as it does not grant a solution to the anonymous decentralized consensus a.k.a. the Byzantine Generals Problem.

Consensus in a decentralized system

We have a database of which we keep different, sequential, timestamped states on many adversarial Nodes (computers). Adversarial means, they have incentives to retroactively alter said database to include data more advantageous to them. The problem is, how all these Nodes consider the same state "valid".

"Valid" is - by definition - never defined centrally, as all Nodes are 'created equal'. There is by definition no authority who would or by definition could be able to tell version X is the valid state or that given n different states, a specific X would be the valid one.
The conclusion must be arrived at on an "uninhabited island". Every Node, on its own, by itself, without knowing the identity of other actors, or even the source of the information is able to come to a conclusion which of the versions it knows it should consider to be valid (:assumes the other Nodes to make the same decision, by assuming I make the same decision, by assuming…).

Do you know, if I know, if you know, if I know…

Proof of Stake

Proof of Stake disregards these principles on many levels. The main motivation behind any alternative to PoW is…

… getting rid of energy cost and scaling - giving it the benefit of doubt, or…
… thinly veiled, strong centralization - if it is dishonest.

Depending on the implementation, there are specific actors, who validate new incoming information and write the db. These actors are picked (not necessarily actively) based on how much internal(!) value they are ready to bet, to risk, to stake, to lock. Did the Validator of a new entry - Block - not behave in accordance with the protocol, an internal rule (depending on the implementation) punishes them by cutting said stake.

In other words, cheating on such a system is NEWER punished with real life consequences.

The stake at risk CAN only come from inside the system, as an outside stake would worsen the problem by introducing an old-friend: the even more complicated Oracle Problem to the scope. 0. The Oracle problem Unless of course you happen to use the Proof of Work consensus mechanism which btw. solves the Oracle problem for max. energy usage as well.

Consensus

The biggest issue is the Nothing at stake situation. In order for a Node to decide, which version out of 2 possible states is the valid one (every other unknown actor considers valid), is checks signatures. The production of these signatures however are free. Validators can in fact sign and keep any number of valid chain versions, as signing two different versions. Me as a Node, can not tell which version is valid, as the creator could have signed thousands of chainstates absent any metadata on actors. In fact, he is even motivated to do so. The only penalty he may suffer is from the other Nodes, based on software behaviour. So not only am I unable to decide, which version is 'valid', cheating is also not punished absent of piers. Whereas, if a cheater using P-o-W gives me a 'wrong' version, I can make the quick decision without knowing anything about the system itself and be certain no matter what, the cheater has already been punished, as I see his PoW.
With Pow the decision is a comparison of the numbers. Consensus can be checked at using pen and paper is a second.
Same problem form a different point-of-view:
With a reasonable Desktop computer, I can regenerate an entire PoS Blockchain of 5-years-of-age in an hour. An entire chain, and no Validator could 'anonymously' tell the difference btw. my version and the 'valid' one. Good luck generating a PoW chain of an afternoon on your own.

Centralization

Another huge problem is the inherent centralization of the PoS system. It has three correlated pillars.

In order to write the DB, - as mentioned above - you already need to have internal value. This means rewards by definition go to actors already with stake above a certain limit.
The huge issue here is the unchangeable imbalance, once a certain centralization is achieved. Unlike Bitcoin’s PoW, which automatically self-corrects everytime 51% hashpower is controlled by an actor; PoS on the other hand is not just unable to self-correct, it is in fact impossible to swing back at all. Period. Once the fact of having 50% is rewarded by adding more to its pool, it is game over. The 51% in PoW - contrary to puplic opinion - is not even a soft limit. You can write the DB with 1/x of the complete hashpower every 1/x of the time. So keeping up a successful 51% attack needs waaaaay more than 50% of the HP, with a strong natural tendency to swing back inherently.
even worse, said 50% of users do not even need to know of each other, situation described above kicks-in uncontrollably, the effects occur automatically.

Bootstraping

Unlike in Proof of Work, a newcomer to the Proof of Stake system cannot take part in writing the DB, as it needs internal funds to stake, while you can start mining using PoW wherever you are, given a Joule of energy. No system actor - other than you - needed to do so. Meaning, you obviously cannot initiate a 'fair' PoS chain.

Proof-of-Stake is not able to grant anonymous decentralized consensus. Not on its own at least. Implemented well, it can scale a base layer effectively. In other words, once the BGP is solved, it is a great scaling tool:
For that however it does not even need a blockchain(!).
Is the immutable consensus granted by a base layer Proof-of-Work blockchain (Bitcoin), you can anchor your PoS system to it, and run a second layer above the Byzantine fault resistant Layer 1.
This is exactly what Bitcoin’s Lightning Network is and does. It is a Proof-of-Stake system. Nodes on the Lightning Network prove their stake in order to take part in the Layer-2 network build over and anchored to the Layer-1 Proof-of-Work blockchain. They prove their stake immutably on layer one. Without it, it is impossible to take part in the PoS layer: Not because someone forbids, or some additional software rule prohibits it. Inherently, Lightning Network’s PoS is dependent on the decentralized anonymous consensus one Layer below, granted by the Proof-of-Work Blockchain.

Back to the top: Trustless data validation

Presentation

Merkle-Tree based Zero-knowledge data verification. The following slides describe and demonstrate the behaviour of a Proof-of-Concept Python project the author of this document - Szilard Gabor Ladanyi - created and will probably be published on github.com/sziller It is an open-source code, freely accessible to anyone. It is not exclusive to the receiver of this document.

Description

Excercise shows and demonstrates the use of the framework the author created. The Prover and (an arbitrary set of) Verifier's agree in advance on:

what Hash function will be used
the structure of the data storage (proof-of-include vs. exclude)
where and how recent the Merkle-Roots are published.

The Prover makes his measurements, customizes and creates a Merkle-Tree to store any of his data in predefined arrangement with an arbitrary timing (as long as the Verifiers agree). The Prover then stores the Merkle-Root on- or as-good-as on-chain (Layer 1 or 2). Both the Prover and the Verifier have and use their own class-instantiate and use these to provide and verify a proof.

The possibilities of the data content, its format, its structure; the timing, the frequency of the recording; the way these are stored, the Merkle-tree buildup, the management of Layer-1 and -2; the software and hardware architecture derived… all these considerations provide a pretty much endless set of possibilities, and should - depending on the current usecase - be carefully engineered.

Oracle: measure & pass

slide 1
An Oracle - under either Our or ThirdParty control - creates a collection of measurements, (possibly for multiple clients, different times, with arbitrary timing…. etc.) we store, and from which we can - on request - select and provide proofs for the ones our given Client - the Verifyer - asks for. Stored data - measurement documentation - can be in any format as long as standardized.

Each measurment on it’s own may even be signed.

Here’s an example of data, we collect and bundle up: <data_list> - made up of strings.

    (A)-T:23.5  (A)-p:100   (A)-V:233   (A)-use:True    (A)-h:87
    (B)-T:24.3  (B)-p:101   (B)-V:210   (B)-use:True    (B)-h:84
    (C)-T:21.3  (C)-p:100   (C)-V:180   (C)-use:False   (C)-h:52
    (D)-T:24.3  (D)-p:101   (D)-V:220   (D)-use:True    (D)-h:85

Prover: data processing

slide 2
Data is then simply turned into bytes, and is extended in order to have the necessary lenght for us to be able to derive a Merkle root from. It has to be size of an integer power of 2. We call these the leaves of the tree.

Here it is represented as byteranges: <leaves>

    000000: b'(A)-T:23.5'
    000001: b'(A)-p:100'
    000010: b'(A)-V:233'
    000011: b'(A)-use:True'
    000100: b'(A)-h:87'
    000101: b'(B)-T:24.3'
    000110: b'(B)-p:101'
    000111: b'(B)-V:210'
    001000: b'(B)-use:True'
    001001: b'(B)-h:84'
    001010: b'(C)-T:21.3'
    001011: b'(C)-p:100'
    001100: b'(C)-V:180'
    001101: b'(C)-use:False'
    001110: b'(C)-h:52'
    001111: b'(D)-T:24.3'
    010000: b'(D)-p:101'
    010001: b'(D)-V:220'
    010010: b'(D)-use:True'
    010011: b'(D)-h:85'
    010100: b'(E)-T:23.1'
    010101: b'(E)-p:99'
    010110: b'(E)-V:220'
    010111: b'(E)-use:True'
    011000: b'(E)-h:84'
    011001: b'\x00\x00\x00\x00\x00\x00\x00\x00'
    011010: b'\x00\x00\x00\x00\x00\x00\x00\x00'
    011011: b'\x00\x00\x00\x00\x00\x00\x00\x00'
    011100: b'\x00\x00\x00\x00\x00\x00\x00\x00'
    011101: b'\x00\x00\x00\x00\x00\x00\x00\x00'
    011110: b'\x00\x00\x00\x00\x00\x00\x00\x00'
    011111: b'\x00\x00\x00\x00\x00\x00\x00\x00'

Prover: data processing

slide 3
All 'leaves' are now Hashed using the function we pick to set up the entire tree. Topmost branches of the tree are first created.

Here’s the last branch made up of hashes: <last_branch>

  0: b'\x97\t\xe5\xea\x0eH\xa3\x8e;\xb8v\x13\x93u\xdeh\xb5\xcf;\xfdtGg\x85\x13G\xba\\`mw\x91'
  1: b'\x1a\xa1]\xf7\x97\x18\xe8\x8f\xbb\x17~\x00\xc6\xf9\x08\xbd\xf1\x87<4\xfd\xb1\x16\xc6\xc4z\x99\x9e\xa8t\xd0%'
  2: b'\xca\xdep\xbc\xf2\x96\xbf\xa5\xceT\xee3\xa0yP\x0e\xd5\xd1\x9a\x03\xe9Q\xa8\xf1\x0c\xb8\xa9\xfak\x1a\x18L'
  3: b'zQ\x94\x04\xc1?\xd0\x82\xcf\xba\x81\xa7\x83F\xc2\xc3\xd52\xa0\xc4\x91/_\xe8\xae\xb5\xe6\xcb\xde\x16\x1b\x87'
  4: b'd\x7f\x1b^7O9\x88M\x82\xaezk0\xac/or/\xd2J\tG\x0f\x1f\xa4\x1d\xa4Tw\x94\xaa'
  5: b'\xc5v\xab\xf5+nm\xc6\x97m\xb2y\xbd\x99\xd3|\xf7\xcd\xfe\x1b\xe0\xaf[EY\x84\xda\xa1\x06\xbf .'
  6: b'(\x9c\xc5\xe0\xc9u\xec{\xbb\x19T\xb7|\x90\xbb\x8a\x92Z\xe0\x91 \x90$\xb2\xe2j\xd4Q\xb2\x06{\xe1'
  7: b'W\xd9\x10\xb67,T\xc4\x1f\x16\x1f\xd4_\x8ahI\xdd\xf3\x920\x08\xe9\xc6\xec\x8a0\x94M\x08\tt+'
  8: b't\x1c79)\xf8\x8e\xda\n\r\x1d\x8b\x99\xd5l\x8c|Bm\r\xff`\x19i\x0c\xca\xd8*\xaa\xe5f\xa7'
  9: b'\xbd\x10\xdc\x118<\xfd\x1f/\xfe\xdf\x90H1\xf3\x02\x92\xee\x8ag|XY\x0c\x8d\x96s\xab\xce\xdc^\xfe'
 10: b'w\x9b\xf6Ox&\xff\xf5\xeb\x12\xf5O\xd1\xe4[\xac/\x95\xf3W\xe8b\xfd[\x1eo\xb1Wzn{\x03'
 11: b"%\x01\x0b\xc1\x9e;\xfb\xde\r\xa7\x0b]y\x19\x86<\xe1+'NL\x85\x89\xf3[\x9f\xef\xf2\xf0\xaa|\xf8"
 12: b'R0\xec\xf5!\xa1\x11t\x981o\xe9\xf8-\x8d\xa3\x98\xd8\xa9\x1f\xed\xfa\x9b\x9as\xa5C2\xb2\t\xf7\x93'
 13: b'"\xf6\xd3|\x16\xa6\xfe\xa0\xc9F\x04\xceJ\x8aJRAQ\x01R\xb5\'\x1b\xb1b\xccS\x1c\xb8sr\n'
 14: b"\xbeK\x98\xfd\xa6\xfb\xaf@\xda\xde\x1b\xf1gP:\x04\x16\x81\xfd\xde\x19'\xa5\xa2\xd7\x8a\xeej\xa4\xa0\x83\xae"
 15: b"q\xf3\xd0p\x83f$AG\x1a\x92u\x16\x08\xa3\x12!_\x1c*'\x81\xf9\xb3U9\x99\xd1Y\xd7\x83Q"
 16: b'\x06\xc4W\x8e\x9a\xf0\x87o\x8e\x9ek\xe9\x90<\x11Sv\tq!\xd3\x8e\x0b/\x97\x0e\xeb\xf3\xb4B\xe0`'
 17: b'/\x00\xd1\xae\xe9\xb4a\x19%\xd3\xf6M#P\xde\xf0\xfd\xf2 \xbd\xaf\xea}\xd4\x12\x17\t\xdav\x0f\xb37'
 18: b'\x00\xe0\xa1k\xb1I\tr\x89:\x03\xf8\x0cv\xc1\xa0\x05A\xd2\xe8\xd4\x001\x03\xae\xc0\xca\x19\x9e\xea\x85\x9c'
 19: b'\xb2G\xfc\xd7\xa0b\xaa}2\xde\x01\xf7\x02)*\xe1q\tfF\xf5\x92\xf5\xddS\x9fy\x86\nv\xceN'
 20: b'$\xe3\x8c\xf2\xcd\xdfZ\xffg\x0b\xe3.u\x9c\t\xd0\xa0\xac\x8b\x04j))2\x90\xe6\x96\xd4@\xa9\xf6#'
 21: b'm\xc1\x9eZ\xd5\xdc\xda\xe9\xa3\x07\xba\x85mV\xdb\xa6\xa92\xb2\xb5\x00o4\xf7A\x8bHE\x83\x9e=]'
 22: b'\x8b\x9d\x1e\x8c\xfae\x01\xd9E\x9b.c\xbd\xa1\xa2\xfb\xf6\x98\xdc/\x190\xb4\x04\xebX\xa5\x83S\xaa\xb2\x9a'
 23: b"'\xe8$]\x94\x9bCI\xbfq\xb4\x13\xaf\xc2\x9dQ\xee\xd7\x98\xcdJ\x07\xcfZ\x88\xdd\xfc1\x16\xc3\x0f\x07"
 24: b'\x0f7\x18\xee\xa7l|\x1d\xc7\xd8\x89g\xc7\xaf)7\xf1\x15\xe7\xe8\x82U\x9b\x89!\x0c2\xb5\xd1\xd05\xc5'
 25: b'\xafUp\xf5\xa1\x81\x0bz\xf7\x8c\xafK\xc7\nf\x0f\r\xf5\x1eB\xba\xf9\x1dM\xe5\xb22\x8d\xe0\xe8=\xfc'
 26: b'\xafUp\xf5\xa1\x81\x0bz\xf7\x8c\xafK\xc7\nf\x0f\r\xf5\x1eB\xba\xf9\x1dM\xe5\xb22\x8d\xe0\xe8=\xfc'
 27: b'\xafUp\xf5\xa1\x81\x0bz\xf7\x8c\xafK\xc7\nf\x0f\r\xf5\x1eB\xba\xf9\x1dM\xe5\xb22\x8d\xe0\xe8=\xfc'
 28: b'\xafUp\xf5\xa1\x81\x0bz\xf7\x8c\xafK\xc7\nf\x0f\r\xf5\x1eB\xba\xf9\x1dM\xe5\xb22\x8d\xe0\xe8=\xfc'
 29: b'\xafUp\xf5\xa1\x81\x0bz\xf7\x8c\xafK\xc7\nf\x0f\r\xf5\x1eB\xba\xf9\x1dM\xe5\xb22\x8d\xe0\xe8=\xfc'
 30: b'\xafUp\xf5\xa1\x81\x0bz\xf7\x8c\xafK\xc7\nf\x0f\r\xf5\x1eB\xba\xf9\x1dM\xe5\xb22\x8d\xe0\xe8=\xfc'
 31: b'\xafUp\xf5\xa1\x81\x0bz\xf7\x8c\xafK\xc7\nf\x0f\r\xf5\x1eB\xba\xf9\x1dM\xe5\xb22\x8d\xe0\xe8=\xfc'

Prover: data processing

slide 4

all hashes are of the same length!

…and here the last branch again in hexstring format: <last_branch>

  0: 9709e5ea0e48a38e3bb876139375de68b5cf3bfd744767851347ba5c606d7791
  1: 1aa15df79718e88fbb177e00c6f908bdf1873c34fdb116c6c47a999ea874d025
  2: cade70bcf296bfa5ce54ee33a079500ed5d19a03e951a8f10cb8a9fa6b1a184c
  3: 7a519404c13fd082cfba81a78346c2c3d532a0c4912f5fe8aeb5e6cbde161b87
  4: 647f1b5e374f39884d82ae7a6b30ac2f6f722fd24a09470f1fa41da4547794aa
  5: c576abf52b6e6dc6976db279bd99d37cf7cdfe1be0af5b455984daa106bf202e
  6: 289cc5e0c975ec7bbb1954b77c90bb8a925ae091209024b2e26ad451b2067be1
  7: 57d910b6372c54c41f161fd45f8a6849ddf3923008e9c6ec8a30944d0809742b
  8: 741c373929f88eda0a0d1d8b99d56c8c7c426d0dff6019690ccad82aaae566a7
  9: bd10dc11383cfd1f2ffedf904831f30292ee8a677c58590c8d9673abcedc5efe
 10: 779bf64f7826fff5eb12f54fd1e45bac2f95f357e862fd5b1e6fb1577a6e7b03
 11: 25010bc19e3bfbde0da70b5d7919863ce12b274e4c8589f35b9feff2f0aa7cf8
 12: 5230ecf521a1117498316fe9f82d8da398d8a91fedfa9b9a73a54332b209f793
 13: 22f6d37c16a6fea0c94604ce4a8a4a5241510152b5271bb162cc531cb873720a
 14: be4b98fda6fbaf40dade1bf167503a041681fdde1927a5a2d78aee6aa4a083ae
 15: 71f3d07083662441471a92751608a312215f1c2a2781f9b3553999d159d78351
 16: 06c4578e9af0876f8e9e6be9903c115376097121d38e0b2f970eebf3b442e060
 17: 2f00d1aee9b4611925d3f64d2350def0fdf220bdafea7dd4121709da760fb337
 18: 00e0a16bb1490972893a03f80c76c1a00541d2e8d4003103aec0ca199eea859c
 19: b247fcd7a062aa7d32de01f702292ae171096646f592f5dd539f79860a76ce4e
 20: 24e38cf2cddf5aff670be32e759c09d0a0ac8b046a29293290e696d440a9f623
 21: 6dc19e5ad5dcdae9a307ba856d56dba6a932b2b5006f34f7418b4845839e3d5d
 22: 8b9d1e8cfa6501d9459b2e63bda1a2fbf698dc2f1930b404eb58a58353aab29a
 23: 27e8245d949b4349bf71b413afc29d51eed798cd4a07cf5a88ddfc3116c30f07
 24: 0f3718eea76c7c1dc7d88967c7af2937f115e7e882559b89210c32b5d1d035c5
 25: af5570f5a1810b7af78caf4bc70a660f0df51e42baf91d4de5b2328de0e83dfc
 26: af5570f5a1810b7af78caf4bc70a660f0df51e42baf91d4de5b2328de0e83dfc
 27: af5570f5a1810b7af78caf4bc70a660f0df51e42baf91d4de5b2328de0e83dfc
 28: af5570f5a1810b7af78caf4bc70a660f0df51e42baf91d4de5b2328de0e83dfc
 29: af5570f5a1810b7af78caf4bc70a660f0df51e42baf91d4de5b2328de0e83dfc
 30: af5570f5a1810b7af78caf4bc70a660f0df51e42baf91d4de5b2328de0e83dfc
 31: af5570f5a1810b7af78caf4bc70a660f0df51e42baf91d4de5b2328de0e83dfc

Prover: Merkle-Tree setup

slide 5
Preparation is over. Prover initiates a MerkleTree manager, using the <last_branch> and the agreed upon <hashfunction()>, to create the entire tree, and the Merkle Root.

*Here’s the entire Merkle tree with all branches labelled with a binary address: <prover_tree>

(0, 0, 0, 0, 0)     : 9709e5ea0e48a38e3bb876139375de68b5cf3bfd744767851347ba5c606d7791
(0, 0, 0, 0, 1)     : 1aa15df79718e88fbb177e00c6f908bdf1873c34fdb116c6c47a999ea874d025
(0, 0, 0, 1, 0)     : cade70bcf296bfa5ce54ee33a079500ed5d19a03e951a8f10cb8a9fa6b1a184c
(0, 0, 0, 1, 1)     : 7a519404c13fd082cfba81a78346c2c3d532a0c4912f5fe8aeb5e6cbde161b87
(0, 0, 1, 0, 0)     : 647f1b5e374f39884d82ae7a6b30ac2f6f722fd24a09470f1fa41da4547794aa
(0, 0, 1, 0, 1)     : c576abf52b6e6dc6976db279bd99d37cf7cdfe1be0af5b455984daa106bf202e
(0, 0, 1, 1, 0)     : 289cc5e0c975ec7bbb1954b77c90bb8a925ae091209024b2e26ad451b2067be1
(0, 0, 1, 1, 1)     : 57d910b6372c54c41f161fd45f8a6849ddf3923008e9c6ec8a30944d0809742b
(0, 1, 0, 0, 0)     : 741c373929f88eda0a0d1d8b99d56c8c7c426d0dff6019690ccad82aaae566a7
(0, 1, 0, 0, 1)     : bd10dc11383cfd1f2ffedf904831f30292ee8a677c58590c8d9673abcedc5efe
(0, 1, 0, 1, 0)     : 779bf64f7826fff5eb12f54fd1e45bac2f95f357e862fd5b1e6fb1577a6e7b03
(0, 1, 0, 1, 1)     : 25010bc19e3bfbde0da70b5d7919863ce12b274e4c8589f35b9feff2f0aa7cf8
(0, 1, 1, 0, 0)     : 5230ecf521a1117498316fe9f82d8da398d8a91fedfa9b9a73a54332b209f793
(0, 1, 1, 0, 1)     : 22f6d37c16a6fea0c94604ce4a8a4a5241510152b5271bb162cc531cb873720a
(0, 1, 1, 1, 0)     : be4b98fda6fbaf40dade1bf167503a041681fdde1927a5a2d78aee6aa4a083ae
(0, 1, 1, 1, 1)     : 71f3d07083662441471a92751608a312215f1c2a2781f9b3553999d159d78351
(1, 0, 0, 0, 0)     : 06c4578e9af0876f8e9e6be9903c115376097121d38e0b2f970eebf3b442e060
(1, 0, 0, 0, 1)     : 2f00d1aee9b4611925d3f64d2350def0fdf220bdafea7dd4121709da760fb337
(1, 0, 0, 1, 0)     : 00e0a16bb1490972893a03f80c76c1a00541d2e8d4003103aec0ca199eea859c
(1, 0, 0, 1, 1)     : b247fcd7a062aa7d32de01f702292ae171096646f592f5dd539f79860a76ce4e
(1, 0, 1, 0, 0)     : 24e38cf2cddf5aff670be32e759c09d0a0ac8b046a29293290e696d440a9f623
(1, 0, 1, 0, 1)     : 6dc19e5ad5dcdae9a307ba856d56dba6a932b2b5006f34f7418b4845839e3d5d
(1, 0, 1, 1, 0)     : 8b9d1e8cfa6501d9459b2e63bda1a2fbf698dc2f1930b404eb58a58353aab29a
(1, 0, 1, 1, 1)     : 27e8245d949b4349bf71b413afc29d51eed798cd4a07cf5a88ddfc3116c30f07
(1, 1, 0, 0, 0)     : 0f3718eea76c7c1dc7d88967c7af2937f115e7e882559b89210c32b5d1d035c5
(1, 1, 0, 0, 1)     : af5570f5a1810b7af78caf4bc70a660f0df51e42baf91d4de5b2328de0e83dfc
(1, 1, 0, 1, 0)     : af5570f5a1810b7af78caf4bc70a660f0df51e42baf91d4de5b2328de0e83dfc
(1, 1, 0, 1, 1)     : af5570f5a1810b7af78caf4bc70a660f0df51e42baf91d4de5b2328de0e83dfc
(1, 1, 1, 0, 0)     : af5570f5a1810b7af78caf4bc70a660f0df51e42baf91d4de5b2328de0e83dfc
(1, 1, 1, 0, 1)     : af5570f5a1810b7af78caf4bc70a660f0df51e42baf91d4de5b2328de0e83dfc
(1, 1, 1, 1, 0)     : af5570f5a1810b7af78caf4bc70a660f0df51e42baf91d4de5b2328de0e83dfc
(1, 1, 1, 1, 1)     : af5570f5a1810b7af78caf4bc70a660f0df51e42baf91d4de5b2328de0e83dfc
(0, 0, 0, 0)        : d181e733aa3d5fce4a0bdc18ebd72e1abd5854e89ad2ff42e25b3390f831f428
(0, 0, 0, 1)        : 4a3437650497e5ccb22240e7c447f1e3dc5d24ef6ac19e9c96b84fb55e053608
(0, 0, 1, 0)        : f79b0c2dd40cce0b6d78ac46374db17f2f71645d6d1d330ecb8882dbb4d31dbd
(0, 0, 1, 1)        : 789518bf5cada61fc0a794d03b9423282c5a1a55adc4b160a2d21d3141000bf0
(0, 1, 0, 0)        : ad739c42e124adadd217b1b081cf314d68dcc6f2edd90b9864987a1683ed4bf7
(0, 1, 0, 1)        : 1a0ab823b55a015a89111c75b1d1906ce18a941aeaec606f3eaf915e66980464
(0, 1, 1, 0)        : 8bd3475ef6fb97b37da12ac393d907772d4e0b60e39290b238241e9fc3e3ad2b
(0, 1, 1, 1)        : 8dbcb94385f5ea6c37b8a5acf31bbb3fb68f31f2f66c061e3f7395ba3f81af67
(1, 0, 0, 0)        : 50ddfe0da7dae9b97ba2b32cc2f3e24cb819f633834aa8e149b440119bbb80b2
(1, 0, 0, 1)        : 7aed63c66d6e5e227ca9db6f0461a0efde2187206c3fabd1af39a44442625ffe
(1, 0, 1, 0)        : 705b559a333a329d1de218e1d3e90034194fda49ecf2c8ae1f7388f194051205
(1, 0, 1, 1)        : dc28a4e8730eef148792d3d051501e3a456e74d70a7300fc9c44cc3d35dadaae
(1, 1, 0, 0)        : 6b9330e8552be625f6ba44b462a174ec061c25b12b688b9b9e00fd5b9df34745
(1, 1, 0, 1)        : 5672695e79d5c2898c61dffa926bd315e5000a77cf38303c0744fcc5a94f5c02
(1, 1, 1, 0)        : 5672695e79d5c2898c61dffa926bd315e5000a77cf38303c0744fcc5a94f5c02
(1, 1, 1, 1)        : 5672695e79d5c2898c61dffa926bd315e5000a77cf38303c0744fcc5a94f5c02
(0, 0, 0)           : ba5e67467e5aed09f17fcc677b46ecc05ea020c5bda4d7a5d1586283a9af426f
(0, 0, 1)           : dbc8f09a0c022e262f5fadfbc728ffd00fb0c05fd378e36125b51b0959860ec0
(0, 1, 0)           : 821f38e9521ae1d3f3f6472575fce6b52e60f239db318ea29b9b3a752e83933f
(0, 1, 1)           : 9f2ada38a00ac48fe0da84305530cae417f6a6fd5850167b0e0a36ec1e6a6fed
(1, 0, 0)           : 0746db3f3e1bff88f88edd345c706ee01ac261ce7ef506b644ec11eec61c6452
(1, 0, 1)           : a6064c7b04f17940e06b262505312faa26a29dffb6b8f757417fbb899963006d
(1, 1, 0)           : 771a7311f29db4833b0d21925adf8c33e73bfa6e41bc9023f07a570d8880fdae
(1, 1, 1)           : fee79cde9d08aa336fc604dda152d80852d9a39277ce15aa423ffc743ffd9b5f
(0, 0)              : 683a728015090bc0a8f69f1dbf95b579bcbdcb0fbb749a7ac26d1ca7346172b9
(0, 1)              : 32f8383ac62561c022e86785f2cd44a71d26ddc397cb389ed1756cc6759bcf18
(1, 0)              : 2e1b85c43d85bc4229b32eb13c3f8112ef6ca2229bc5999f9a3cb1199fb46b33
(1, 1)              : 826fe411d73d54bfac63de9833017530c95f40767cb0461d91aa5b74611fcb4b
(0,)                : 039eb075b43e399ae57b67396535eabe766365aec86028d469f552380a6655f8
(1,)                : 67e8086c212690461e61ac9280e5272d30c4013cb0de4f8b4b5bc19d60d8d108

root                : fe0a6d117712418826ba39324c926b7ae104ce255e74aa20af332a75cfccf605

Prover: Merkle-root

slide 6

This MUST then credibly be shown to the Verifyer BEFORE he requests any Proofs!

Below you’ll find the Merkle Root: (shown as collection of hexstrings)

root: fe0a6d117712418826ba39324c926b7ae104ce255e74aa20af332a75cfccf605

Verifier: data request and store

slide 7
Setup basic VERIFYER data: He creates a Verifyer instance to use it for several proofs. The verifyer might be interested in a couple of data in the current tree. He already has the data he wants to doublecheck. For reasons outlined before, his data MUST be labeled (fraud, tree-building).

So he stores hes version

<verifyers_check_dict>

(1, 0, 0, 0, 0): b'(D)-p:101'
(0, 1, 1, 1, 1): b'(D)-T:24.4'
(1, 0, 0, 1, 1): b'(D)-h:85'

…and using this data, updates his Validator engine.

Temperature stored does not match the provided value

Verifier: Proof request

slide 8
Setup VERIFYER’s request Validator now creates a request containing data addresses he wants to check, and sends said list to the Prover.

Here’s a request example

    (1, 0, 0, 0, 0)
    (0, 1, 1, 1, 1)
    (1, 0, 0, 1, 1)

Prover: providing Proof

slide 9
The prover takes each request data, and coughs up a merkle-proof for these.

This is how a proof looks like

for data: (1, 0, 0, 0, 0):

(1, 0, 0, 0, 1)     : 2f00d1aee9b4611925d3f64d2350def0fdf220bdafea7dd4121709da760fb337
(1, 0, 0, 1)        : 7aed63c66d6e5e227ca9db6f0461a0efde2187206c3fabd1af39a44442625ffe
(1, 0, 1)           : a6064c7b04f17940e06b262505312faa26a29dffb6b8f757417fbb899963006d
(1, 1)              : 826fe411d73d54bfac63de9833017530c95f40767cb0461d91aa5b74611fcb4b
(0,)                : 039eb075b43e399ae57b67396535eabe766365aec86028d469f552380a6655f8

for data: (0, 1, 1, 1, 1):

(0, 1, 1, 1, 0)     : be4b98fda6fbaf40dade1bf167503a041681fdde1927a5a2d78aee6aa4a083ae
(0, 1, 1, 0)        : 8bd3475ef6fb97b37da12ac393d907772d4e0b60e39290b238241e9fc3e3ad2b
(0, 1, 0)           : 821f38e9521ae1d3f3f6472575fce6b52e60f239db318ea29b9b3a752e83933f
(0, 0)              : 683a728015090bc0a8f69f1dbf95b579bcbdcb0fbb749a7ac26d1ca7346172b9
(1,)                : 67e8086c212690461e61ac9280e5272d30c4013cb0de4f8b4b5bc19d60d8d108

for data: (1, 0, 0, 1, 1):

(1, 0, 0, 1, 0)     : 00e0a16bb1490972893a03f80c76c1a00541d2e8d4003103aec0ca199eea859c
(1, 0, 0, 0)        : 50ddfe0da7dae9b97ba2b32cc2f3e24cb819f633834aa8e149b440119bbb80b2
(1, 0, 1)           : a6064c7b04f17940e06b262505312faa26a29dffb6b8f757417fbb899963006d
(1, 1)              : 826fe411d73d54bfac63de9833017530c95f40767cb0461d91aa5b74611fcb4b
(0,)                : 039eb075b43e399ae57b67396535eabe766365aec86028d469f552380a6655f8

Verifier: building alt. trees and verify

slide 10
Verifyer now builds he’s own Root version for each of the data he wants to validate using the Merkle Proof provided by the Prover - us

This is the verification for every data he needs to check

Each result is compared to the original - previously published Root.

ROOT provided by Prover:   fe0a6d117712418826ba39324c926b7ae104ce255e74aa20af332a75cfccf605

(1, 0, 0, 0, 0)        : {'fe0a6d117712418826ba39324c926b7ae104ce255e74aa20af332a75cfccf605': True}
(0, 1, 1, 1, 1)        : {'73f36d649bfd5959ffa08984c1a032b33f84f04daf9db4993f379be8b0cb2cb6': False}
(1, 0, 0, 1, 1)        : {'fe0a6d117712418826ba39324c926b7ae104ce255e74aa20af332a75cfccf605': True}

As indicated on SLIDE - 7:
Temperature stored does not match the provided value.
Proof failed for (0, 1, 1, 1, 1).
Data must have been tampered with!

Conclusion

slide 11
We provided an optimal solution for both sides:

Prover

Did not leak data not requested for.
At the time of request, he can check if Verifier has the necessary rights to the proof.
He can put any number of datapoints into he’s tree, as the proof is log(n) sized, scaling quite well in fact.
Depending on his bottleneck, he can choose to:
- store the Tree - optimizing for processor
- store last-branch - optimizing for storage
Optimizing Blockchain use Nr.1:
- the same method can in fact be used to bundle up Proofs!
- and only publish collections on regular basis

Verifier

Can request proof for any data he owns.
Receives a small (datasize) proof for each of his requests.
Verification is cheap, as minuscule amount of calculation is necessary: log(n)

Back to the top: Trustless data validation