BIPs 455–457: SwiftSync Specification#2152
Conversation
murchandamus
left a comment
There was a problem hiding this comment.
Just a quick first glance, but could you please break your text into shorter lines? That makes it easier to leave review and track what changed between commits. Either 100 or 120 characters per line seems to work well enough.
|
FWIW, I don't mind the unbroken lines and even prefer them. Avoids rejigging line lengths to keep them consistent when updating or having lines with very different lengths. |
danielabrozzoni
left a comment
There was a problem hiding this comment.
I did an initial pass and left some comments. I read the BIPs in the commit order (block undo -> histfile -> swiftsync) and it was pretty easy to follow.
jurraca
left a comment
There was a problem hiding this comment.
some writing nits but overall the concept is clear enough.
92093e1 to
f4cd99a
Compare
|
Thanks for the review, @danielabrozzoni and @jurraca, as well as the quick turnaround @rustaceanrob. I notice that this pull request is still marked as a Draft PR. Are you still planning significant changes? If your submission is ready for another BIP Editor review, please mark the PR as "ready for review". |
|
I will keep these as a draft as the hintsfile format is subject to change. |
| | Amount | 64 bit unsigned integer | Defined above | Satoshi denominated value | | ||
| ### Messages | ||
|
|
||
| #### MSG_GET_SPENT_COINS |
There was a problem hiding this comment.
Is the idea that a peer would issue this request for every block in the chain? If we assume mainnet at height, and a 150 ms round trip time, then a peer would spend nearly 80 hours just downloading this undo data.
You may want to consider a batched variant, similar to the way messages like getheaders works.
There was a problem hiding this comment.
We've found that bandwidth throughput is the limiting factor when downloading blocks in parallel. Not all spent coins have to be downloaded if a client keeps a cache, as this document describes. In the batched variant, the cache is not possible and the bandwidth requirement increases significantly.
|
|
||
| | Field | Value | | ||
| | :----------------- | :---------- | | ||
| | `NODE_BLOCK_UNDO` | `1 << ???` | |
There was a problem hiding this comment.
Rationale should be added for the choice of a new node version over the more common place (as of the past few years) exchange of a sendX message during the version handshake.
IMO a version makes sense here, as it can be used to filter out peers upfront that support sending this undo data over the network.
There was a problem hiding this comment.
I opted for a BIP-434 feature message, which has a similar mechanism for the sendX
|
|
||
| #### MSG_SPENT_COINS | ||
|
|
||
| `MSG_SPENT_COINS` defines the data structure for inputs of a block. |
There was a problem hiding this comment.
Probably not really y'all's intended use case, but if you optionally make it possible to include merkle proofs for the set of coins, then this message can be used to obtain a proof that an output was spent in a given block.
There was a problem hiding this comment.
It would actually also be useful for BIP 157+158 peers, as the final version that shipped includes the script spent (instead of the outpoint), which means that if you're using the filters to find a block where a given script has been spent, you need to make some assumptions about what the prev script is for a given transaction.
There was a problem hiding this comment.
The most recent response on this mailing list post mentions commitment to the UTXO set as part of the block header. There are additional ways to do this outside of a soft fork as well, i.e. utreexo proofs. For now I think it best to leave this unspecified in this version of the message while the community shares ideas, but I do think this is interesting.
f4cd99a to
e4f8172
Compare
|
Wrapped text and standardized formatting with |
|
Great, thanks! I’ll give it a read when you’re done with that. |
hintsfile.js implements the SwiftSync hintsfile (bitcoin/bips#2152 'Hints for unspent coins'): per-block unspent output indices encoded with Elias-Fano (CompactSize(n) || CompactSize(m) || L || H; low bits LSB-first, unary gap high bits), plus the 'UTXO' magic/version/height/vector container. This is the ONE cross-compatibility artifact (per Somsen), so it's validated byte-for-byte against the BIP's own elias_fano.json vectors — all three match, plus round-trips, edge cases, and a container round-trip. Exported from index + package exports (./hintsfile).
…matches BIP vectors undo.js implements the full-validation spent-coin data (bitcoin/bips#2152 'Peer sharing of block spent coins'): Core's CompressAmount/DecompressAmount, the reconstructable-script prefix table (P2PKH/P2SH/P2PK/P2WPKH/P2WSH/P2TR/raw), the height code (height<<1|coinbase), and a spent-coin record. Amount + script compression validated byte-for-byte against the BIP's compressed_amount.json and reconstructable_script.json vectors (+ round-trips). Also factored CompactSize/concat into varint.js, shared by hintsfile + undo. Full suite: 21 pass / 0 fail.
e4f8172 to
e5ec578
Compare
|
Given there are a few clients that have started implementations of SwiftSync, and new hintsfile encodings may simply increment the file version, I am moving these out of draft. Some outstanding comments addressed, others require some additional thought. |
24ae4e7 to
ecbda2a
Compare
edilmedeiros
left a comment
There was a problem hiding this comment.
Thanks for documenting the protocol in this draft.
Did a deep dive together with the guys from @vinteumorg and left many comments concerning conceptual aspects of the BIPs. I have many editing suggestions, but left them for a second round after the higher-level aspects are more mature.
| in Bitcoin Core, and reasonable for most clients to hold directly in memory. This encoding represents elements in $2n + | ||
| n \\lceil \\log_2(m/n) \\rceil$ bits, which is within a reasonable bound of the theoretical optimum. | ||
|
|
||
| Partitioning the hints by block is an intuitive choice, and allows for efficient random access of hints. Groupings of |
There was a problem hiding this comment.
Partitioning the hints by block is an intuitive choice, and allows for efficient random access of hints.
I don't see how this can be true: the bistream has a header (magic, version, height) followed by a sequence of EliasFano items, each of which are composed by N, M (fixed size info), L, H (variable size info).
So, imagine I have a hintsfile. To find data for block k, one do need to (minimally) process data for block 1 to discover the size of the first EliasFano item (because of the variable size parts). Then, block 2 and so forth, until the intended block target. This would be true if the EliasFano items were fixed size among all blocks to allow for offset arithmetic, but the amount of padding would be prohibitively high.
Thus, the hints payload works more like a List<EliasFano> (requires sequential access) than a vector<EliasFano> (allows random access). Of course, the decoder could create an index of offsets, but this is not only an implementation detail, but also something that will add to the required resources to process the hintsfile.
There was a problem hiding this comment.
Good catch, the original version had a header section, but was removed as it could be reconstructed as you described. I will remove that note.
There was a problem hiding this comment.
I wonder if having more blocks taken together will not improve compression sensibly (I'll experiment with it). We can add a counter in the bitstream to allow the encoder to choose it freely (at the cost of having more bytes in the bitstream) but we potentially gain:
- Less
H, Lpairs in the bitstream. - More data to feed to the Elias-fano process (tends to push it closer to the theoretical entropy).
27c42b6 to
2606bd8
Compare
murchandamus
left a comment
There was a problem hiding this comment.
I read the first document "Peer sharing of block spent coins". Given my prior knowledge it’s pretty clear what’s going on, but I think people reading about the topic for the first time could use more context in some passages. I noticed a couple sections with potential for improvement.
- The Abstract and first sentences of the Motivation are a bit repetitive.
- For the Definitions and Data Structures sections, I could have used a little more context. What will I be shown? Why? How is the table to be read? What do the columns mean?
|
|
||
| ## Motivation | ||
|
|
||
| A current limitation of IBD is that it must be done sequentially. This is a result of the height, coinbase flag, input |
There was a problem hiding this comment.
A current limitation of IBD is that it must be done sequentially.
Given the postulation or existence of alternative syncing models, that feels a bit loaded. Maybe mention that this specifically refers to Bitcoin Core or alternatively consider something along the lines of: "The common approach to IBD is to process blocks sequentially as that ensures the existence of TXO details when input validation requires them to be available."
This is a result of the height, coinbase flag, input script, and amount of the block inputs being omitted from the data committed to by proof of work in the current block
This is jumping several steps from the prior statement at once. Maybe you could segue that a bit more, e.g., by mentioning that fields you introduce are TXO details, before going into them being only implicitly or not at all committed to by transaction inputs, before explaining how that makes it impossible to verify what is provided by a peer.
| window, and request only coins that are older than this height via the `cutoff` filter. This results in a significant | ||
| bandwidth reduction at the cost of a cache that can be set dynamically by the client depending on available memory. |
There was a problem hiding this comment.
Ah cool. I was missing this context above when cutoff was introduced.
There was a problem hiding this comment.
I added a short note when introducing the request message that the cutoff field is motivated in the rationale section.
| 11gb reduction in bandwidth is achieved. The application of `VARINT` as opposed to `CompactSize` offers a further | ||
| reduction of 4gb, however the `VARINT` primitive is currently a Bitcoin Core implementation detail. Reusing existing |
There was a problem hiding this comment.
Given the confusing terminology in regard to CompactSize and VARINT and Bitcoin Core, you probably want to define these terms more concretely.
5bd069f to
5dd720a
Compare
f0b1426 to
a5579f3
Compare
|
Thank you @murchandamus @Roasbeef @danielabrozzoni @jurraca @johnnyasantoss @yancyribbens @edilmedeiros for the review! The drafts should now be up to date |
|
I discussed with Ruben on attribution and we decided it makes sense for him to be author on each BIP. @edilmedeiros has agreed to take on deputy as well |
murchandamus
left a comment
There was a problem hiding this comment.
Did a first read of "Hints for unspent coins"
| ## Abstract | ||
|
|
||
| The SwiftSync protocol requires a client to have foresight, or "hints", into the UTXO set at a state at a particular | ||
| height, which is verified at the end of the protocol. This document describes a concise representation of the UTXO set. | ||
| Clients performing SwiftSync may use this file of hints to perform IBD and verify the UTXO set they arrive at is | ||
| correct. | ||
|
|
||
| ## Motivation | ||
|
|
||
| SwiftSync can improve the user experience by accelerating IBD, however the protocol requires the client verify a UTXO | ||
| set corresponds to the blockchain history they received. Rather than simply encoding and distributing the UTXO set, a | ||
| far smaller representation may be computed and shared. Intuitively, just as how it is cheap for programs to share | ||
| pointers to objects in memory, we define a "hintsfile" that encodes pointers to unspent outputs. |
There was a problem hiding this comment.
The Abstract and Motivation feel a bit mixed up here. Since you’ll have to fill out three Abstracts and three motivations, it might be the easiest to decide which BIP is the first one to be read, and put the motivation of the overarching proposal there.
Then this one could, e.g., simply read: "This document specifies the hintsfile for Swiftsync. The hintsfile stores a compact representation of the UTXO set at a particular height by encoding which TXOs from each block remain unspent."
On the other hand, the motivation should be more focused on what problem this proposal addresses and why this should be adopted, i.e., that the hints file enables skipping all intermediate UTXO set states per foreseeing whether a UTXO is to be retained, that this speeds up IBD, etc. should be in the motivation.
| - Let ${ i\_{0}, ..., i\_{n-1} }$ be a list of $n$ elements where $i\_{n-1} = m$ | ||
| - Let $\\ell(m, n)$ be the function that determines the number of low bits to use: $\\ell(m, n) = \\left\\lfloor \\log_2 | ||
| \\ \\left(\\frac{m+1}{n}\\right) \\right\\rfloor$ | ||
| - Let $\\text{unary}(q)$ be the following function: $\\text{unary}(q) = \\underbrace{0\\cdots0}\_{q}1$ |
There was a problem hiding this comment.
This section seems to have more than the necessary escaping. I think this should work fine:
| - Let ${ i\_{0}, ..., i\_{n-1} }$ be a list of $n$ elements where $i\_{n-1} = m$ | |
| - Let $\\ell(m, n)$ be the function that determines the number of low bits to use: $\\ell(m, n) = \\left\\lfloor \\log_2 | |
| \\ \\left(\\frac{m+1}{n}\\right) \\right\\rfloor$ | |
| - Let $\\text{unary}(q)$ be the following function: $\\text{unary}(q) = \\underbrace{0\\cdots0}\_{q}1$ | |
| - Let ${ i_{0}, ..., i_{n-1} }$ be a list of $n$ elements where $i_{n-1} = m$ | |
| - Let $\ell(m, n)$ be the function that determines the number of low bits to use: $\ell(m, n) = \left\lfloor \log_2 \left(\frac{m+1}{n}\right) \right\rfloor$ | |
| - Let $\text{unary}(q)$ be the following function: $\text{unary}(q) = \underbrace{0\cdots0}_{q}1$ |
| This formulation has a known theoretical optimum of $\\log_2 \\binom{m}{n}$ bits. _Elias-Fano_ developed a | ||
| representation that is reasonably close to this optimum in a concise and maintainable format. | ||
|
|
||
| ### Elias-Fano Encoding |
There was a problem hiding this comment.
I found this description of the encoding too dense to follow. I had to work the example by hand to understand the instructions. Also, some of the instructions are missing and are only mentioned in the example. More detailed suggestions inline.
| There are many ways to do so, however an intuitive representation is a | ||
| [bitset](https://en.cppreference.com/w/cpp/utility/bitset.html). Each output in a block is assigned a bit, with a `0` | ||
| bit denoting the output will be spent, and a `1` to denote the output is unspent. For a block of 8 outputs, we may | ||
| arrive at a bitset of `1000 0010`, which conveys the 0th and 6th index are UTXOs. |
There was a problem hiding this comment.
I looked up the outputs per day for the last ~1000 days from mainnet.observer and divided them by 144 for an educated guess. Some of those days might have more or fewer blocks, so this is probably fairly inaccurate, but by that method I found a minimum of 7200 and a maximum of 10,700 TXOs created per block. I think giving this number would help readers’ intuition why encoding it this way would be very inefficient. I calculated that we could create slightly over 32,000 TXOs in a single block in the past. Maybe consider adding more context along the lines of:
| arrive at a bitset of `1000 0010`, which conveys the 0th and 6th index are UTXOs. | |
| arrive at a bitset of `1000 0010`, which conveys the 0th and 6th index are UTXOs. Actual blocks recently have about 8000–10,000 outputs, but the theoretical maximum is over 32k. Especially for older blocks, very few of these TXOs would remain unspent, so a bitset encoding would take several kilobits of mostly 0s. |
You could also explain this alternative approach in the Rationale, or take it to the Motivation to explain why Elias-Fano was used. In the specification, it seems a bit out of place.
| append them, most significant bit ordering, to a bitset $L$. Next, iterate over each element and take the remaining most | ||
| significant bits, then record the difference between the last element's high bits and the current element's high bits. |
There was a problem hiding this comment.
| append them, most significant bit ordering, to a bitset $L$. Next, iterate over each element and take the remaining most | |
| significant bits, then record the difference between the last element's high bits and the current element's high bits. | |
| append them, most significant bit ordering, to a bitset $L$. Next, iterate over each element and interpret the remaining most | |
| significant bits as a number, then record the difference between the last element's high bits and the current element's high bits. The first element records the difference to 0 as there is no preceding element. |
|
|
||
| $\\ell = \\left\\lfloor \\log_2\\left(\\frac{m+1}{n}\\right) \\right\\rfloor = \\left\\lfloor | ||
| \\log_2\\left(\\frac{13}{3}\\right) \\right\\rfloor = \\lfloor \\log_2(4.33) \\rfloor = \\lfloor 2.11 \\rfloor = 2$ | ||
|
|
There was a problem hiding this comment.
| We write out the binary representation for all elements in the sequence $S$ and split them into upper and lower bits: | |
|
Will read the third document some time in the next few days and then circle back. |
a5579f3 to
8be4df2
Compare
8be4df2 to
2abf06a
Compare
murchandamus
left a comment
There was a problem hiding this comment.
This is my first review of the SwiftSync Initial Block Download document.
| ## Abstract | ||
|
|
||
| _SwiftSync_ is a protocol to accelerate initial block download (IBD) using existing cryptographic primitives and minimal | ||
| state. The protocol is comprised of hash aggregate for a set of elements and a "hintsfile" to indicate the spent-ness of |
There was a problem hiding this comment.
Missing article?
| state. The protocol is comprised of hash aggregate for a set of elements and a "hintsfile" to indicate the spent-ness of | |
| state. The protocol is comprised of a hash aggregate for a set of elements and a "hintsfile" to indicate the spent-ness of |
| Initial block download is the first user experience when using Bitcoin software, and, moreover, is a bootstrapping cost | ||
| for second layer protocols. Improvements to this process benefit end-users and scaling protocols alike. IBD faces two | ||
| limitations. First, although the lifetime of coins demonstrates an empirical distribution, cache misses occur for coins | ||
| that are deleted. This creates unnecessary disk I/O and database compaction. Secondly, given the structure of a block, | ||
| coins that are spent are indexed by their outpoint. This creates a requirement for clients to maintain a cache to fetch | ||
| coin metadata associated with an outpoint. _SwiftSync_ alleviates both of these limitations, allowing for IBD in as fast | ||
| as a client can download blocks and verify signatures. |
There was a problem hiding this comment.
Some of these might be implementation specific, and that should perhaps be noted in those cases. ;)
|
|
||
| - $H$: A hashing function | ||
| - $Hintsfile\_{n}$: Defined in BIP ??? | ||
| - $UTXO\_{n}$: Unspent outputs at block height $n$ |
There was a problem hiding this comment.
Here and in several instances below, the rendered "UTXO" has weird spacing. I didn’t test, but would the following maybe work?
| - $UTXO\_{n}$: Unspent outputs at block height $n$ | |
| - $\text{UTXO}_{n}$: Unspent outputs at block height $n$ |
Also note, that the underscore doesn’t need to be escaped as far as I’m aware, and that seems to cause oddities in some Mathjax interpreters.
| _SwiftSync_ builds on a common observation in cryptography, that _verification_ is often orders of magnitude more | ||
| performant than _computation_. What a client seeks to verify when performing _SwiftSync_ is that a unspent transaction |
There was a problem hiding this comment.
This observation is broader than cryptography, it applies to computing in general, but also to math (e.g., it is much easier to verify the square root of a number than to calculate it).
There was a problem hiding this comment.
While the verification vs computation thing is true, I would describe SwiftSync’s speedup as being achieved by foregoing manifestation of all the intermediate UTXO sets and skipping directly to the UTXO set at the target height.
| As a final note, some coins are guaranteed to be impossible to spend. We define an _unspendable output_ is defined as: | ||
|
|
||
| - An output with script length over 10,000 OR | ||
| - An output beginning with `OP_RETURN` OR | ||
| - A [BIP-30](https://github.com/bitcoin/bips/blob/master/bip-0030.mediawiki) unspendable coinbase output |
There was a problem hiding this comment.
The capitalized "OR" threw me off for a moment, maybe:
| As a final note, some coins are guaranteed to be impossible to spend. We define an _unspendable output_ is defined as: | |
| - An output with script length over 10,000 OR | |
| - An output beginning with `OP_RETURN` OR | |
| - A [BIP-30](https://github.com/bitcoin/bips/blob/master/bip-0030.mediawiki) unspendable coinbase output | |
| Some coins are impossible to spend. We define any of the following as _unspendable outputs_: | |
| - An output with script length over 10,000 | |
| - An output beginning with `OP_RETURN` | |
| - A [BIP-30](https://github.com/bitcoin/bips/blob/master/bip-0030.mediawiki) unspendable coinbase output |
|
|
||
| ### Aggregation | ||
|
|
||
| A client must compare $Outputs - UTXO\_{n}$ with $Inputs$ in a succinct way. Rather than say, comparing the lists, a |
There was a problem hiding this comment.
Maybe:
| A client must compare $Outputs - UTXO\_{n}$ with $Inputs$ in a succinct way. Rather than say, comparing the lists, a | |
| A client must compare $\text{Outputs} - \text{UTXO}_{n}$ with $\text{Inputs}$ in a succinct way. Rather than say, comparing the lists, a |
|
|
||
| 1. Download the required spent coins data defined in BIP ???. | ||
| 1. Using the undo data, validate the block. If the block is invalid, fail. | ||
| 1. For all inputs, except the coinbase, add the hashes to $Agg\_{inputs}$ using the spent coin data. |
There was a problem hiding this comment.
The "coinbase" is the transaction field that replaces the input script in a coinbase transaction input. I think you mean "For all inputs, excluding coinbase inputs,…"?
| $Agg\_{outputs}$. | ||
|
|
||
| Notice here that a client does not have to download blocks in any particular order, and may download blocks from | ||
| multiple peers at a time. A client then verifies $Agg\_{outputs} = Agg\_{inputs}$ once they have arrived at height |
There was a problem hiding this comment.
In a previous description of SwiftSync, my understanding was that there was only one aggregate, and outputs were added, inputs were deducted. A valid hintsfile would thereby cause the aggregate to result in 0. While this clearly is functionally equivalent, I was curious what the motivation was to split them into two aggregates.
| During the period between the genesis block and BIP-34 activation, a _SwiftSync_ client must check for duplicate | ||
| coinbase outputs. A cache of these outputs is modest in memory footprint, and may be easily added and queried for the | ||
| fixed block range. More information on this caveat is detailed in | ||
| [this article](https://gist.github.com/RubenSomsen/a02b9071bf81b922dcc9edea7d810b7c). |
There was a problem hiding this comment.
Given that we can confidently rule out a reorg of more than a decade, wouldn’t it be more practical to hardcode the two exceptions?
| ## References | ||
|
|
||
| - [Original proposal](https://gist.github.com/RubenSomsen/a61a37d14182ccd78760e477c78133cd) | ||
|
|
There was a problem hiding this comment.
Please add the required Backwards Compatibility section.
There was a problem hiding this comment.
Let’s refer to the SwiftSync BIPs as 455, 456, and 457. Please feel free to pick which number should go on each of the three documents.
Please update the file names of the documents and auxiliary files, update the BIP and Assigned headers in the preambles, and add entries in the README.mediawiki table.

SwiftSync is a protocol for clients to parallelize initial block download, based on the original writeup.