# Sybil Attacking Byzantine Faults

This post describes an idea I had a few months ago for how to get around a pretty fundamental impossibility result in BFT (Byzantine fault tolerant) systems: that any asynchronous agreement algorithm must have at least $n=3f+1$ nodes, where $f$ is the number of tolerated Byzantine faults. I will show how one can use an anonymous communication network like Tor to solve agreement with fewer than $3f+1$ nodes in a fully asynchronous network. To my knowledge no one else has considered this idea before.1 I was at one point planning to submit this idea to conference. But the approach has some significant drawbacks that I’ll discuss later that make it less technically impressive than I initially thought. Nonetheless, I still think the idea is pretty neat so I want to share it.

Since this post is directed at a moderately wide audience, I’ll clarify a few terms. If you already know a little about BFT, you can skip the next three paragraphs.

A node just means a computer that is participating in a group computation. Nodes can pick an input (for example, a human operator might vote “yes” or “no” on some poll), send and receive messages with each other, run some computations, and then produce an output. All nodes follow a protocol, which is a program that tells them exactly what to do in each circumstance depending on the messages they receive from other nodes. The basic problem we want these nodes to solve is to have all nodes output the same value within a finite period of time. We will also require some sort of “validity” rule to prevent solutions like every node automatically outputs “monkey” no matter what.2

Byzantine nodes are nodes that deviate from their protocol arbitrarily. We usually imagine this as a malicious actor controlling the node, although in practice Byzantine behavior could also be benign (for example, software bugs, hardware damage, or even random bit-flips due to background radiation can all make computers behave non-deterministically). Byzantine nodes can be non-responsive or delayed, lie about things, and even send contradictory messages to different nodes. For example, a Byzantine node $F$ might send “Attack!” to node $A$ and “Flee!” to node $B$. It turns out this ability to send contradictory messages (also known as “equivocation”) is the most damaging power Byzantine nodes have. We will refer to non-Byzantine nodes as “honest nodes”.

Asynchrony means messages between nodes can be delayed arbitrarily long. We do require that all messages are delivered eventually within a finite length of time, but delays can be unbounded (for instance, if I send out a sequence of messages $1,2,3,4,...$ then these messages might be delayed by $1$ second, $2$ seconds, $4$ seconds, $8$ seconds,…. Even though each delay is finite, the sequence of delays is unbounded). The difficulty that asynchrony introduces comes from the fact that a node $A$ cannot distinguish between the situation where $B$ is Byzantine and refusing to send out messages vs. the situation where $B$ is honest but messages from $B$ are being delayed for a very long time. This also leads to difficulty when combined with the equivocation power of Byzantine nodes; without asynchrony you could simply wait to hear an “echo” from every node to figure out if some node is equivocating, but with asynchrony you can only wait for $n-f$ echoes, since $f$ nodes might be Byzantine and never respond.

Now back to the impossibility result.

This result is pretty robust in the sense that it comes up in a lot of different contexts, even when the underlying definitions are changed somewhat. For example, using digital signatures or assuming eventual synchrony (e.g., message delays are bounded but that bound is not known in advance; or after some unknown point in time t all messages are guaranteed to be delivered synchronously, but t is unknown).

It also holds for a very weak variant of agreement called “consistent broadcast”. Consistent broadcast involves a designated node $D$ that wants to send a message to the group, but $D$ might be Byzantine so the group wants to prevent $D$ from possibly equivocating. So $D$ sends its message to everyone and then the nodes interact with each other to run some verifications, and then each node can choose whether or not to accept the message. We require that if $D$ is honest then all the nodes eventually accept the message. If $D$ is not honest, then the nodes do not ever need to produce an output, but if they do then the messages they accept must be the same.

Here’s a simple proof of impossibility of consistent broadcast with $n=3f$. For simplicity, we’ll separate the nodes into $3$ groups, $A,B,$ and $F$, each of size $f$. These nodes are not literally separated, we’re just giving them labels to make it easier to refer to them. We’ll make the nodes in $F$ all be Byzantine.

Now suppose $D$ (who will be one of the nodes in $F$) sends “Attack!” to all nodes in $A$ while sending “Flee!” to all nodes in $B$. The Byzantine nodes in $F$ can pretend like they received “Attack!” to the nodes in $A$ and at the same time pretend like they received “Flee!” to the nodes in $B$. If messages between $A$ and $B$ are delayed for a long time, then to the nodes in $A$ it looks just the same as a situation in which $D$ is honest and sent “Attack!” to everyone, and the nodes in $B$ are just Byzantine and non-responsive. Thus the nodes in $A$ must eventually accept “Attack!”. Similarly, to the nodes in $B$ it looks like $D$ is honest and sent “Flee!” to everyone, and the nodes in $A$ are just Byzantine and non-responsive. Hence the two groups will accept different messages.

So if this result is so robust, how can I claim to be able to solve agreement with $n<3f+1$ nodes? The answer is increase the computational power of the honest nodes by allowing them to communicate anonymously with each other. Note that the above argument relies critically on the assumption that the Byzantine nodes can carefully choose how to equivocate by sending only “Attack!” related messages to nodes in one group while sending only “Flee!” related messages to nodes in the other group. If we assume that nodes can construct anonymous identities, then we can remove this foundation by effectively forcing the Byzantine nodes to equivocate in a random way that the honest nodes can detect.

Let’s make this more concrete. We have a group of nodes $P_1,P_2,...,P_n$. We assume each node $P_i$ also has $m$ “secret pseudonyms” $S_{i1},S_{i2},...,S_{im}$. Every node has access to the whole set of $n\cdot m$ pseudonyms. But importantly, no one knows which pseudonyms correspond to which node (except of course for their own pseudonyms). Indeed, they don’t even know whether or not two pseudonyms correspond to the same node. So for example $S_{35}$ and $S_{39}$ (which both correspond to $P_3$) look no more similar than $S_{12}$ and $S_{35}$ (which correspond to different nodes). This explains the choice of title for this post: each node creates $m$ Sybil identities, and we will use those identities to prevent the Byzantine nodes from harming us!

In practice, one could use a network like Tor to implement these mechanics, but I won’t go into detail on how Tor works. We’ll just treat it as a black box that allows us to send messages to the node corresponding to these pseudonyms without directly learning anything about which node that pseudonym refers to.

Now the key point in this construction is that if $m$ is large then it will be nearly impossible to send “Attack!” to all the pseudonyms corresponding to nodes in $A$ without sending “Attack!” at least once to the nodes in $B$. In fact the probability in succeeding with this is exponentially small in $m$, meaning that it suffices to set $m$ to a modest number like $m=20$ (note that this is statistical security rather than cryptographic security, so there’s no need to make $m$ be on the order of $128$).

In full detail, the protocol works as follows:

1. $D$ sends out its message to all pseudonyms.
2. If we’re an honest node $P_i$, then we wait to receive messages from each of our pseudonyms $S_{i1},...,S_{im}$.
3. If the messages we receive are all identical, then we accept that message; otherwise we reject the message and have proof that $D$ misbehaved.

There is one catch that makes this construction a bit hard to make work properly: After accepting a message from $D$, a node will generally do something with that information, for example either attacking or fleeing. But if $D$ is malicious it can pay attention to when you react to accepting your message and use that information to learn which pseudonyms correspond to you. For example, if $D$ sends “Attack!” to every pseudonym except one, waits to see which nodes attack, and then sends the last message a bit late, then it just learned which node that last pseudonym corresponds to. By changing the order each time to make a new pseudonym be last, after $n\cdot m$ repetitions $D$ can learn the correspondence between every node and their pseudonyms.

A trick to solve this problem is to designate a deterministic order in which messages must be received. Let $\pi(S_{ij})$ be a unique index for each pseudonym, where all nodes know and agree upon the value of $\pi(S_{ij})$ for all pseudonyms. Now we can modify rule $3$ such that we also reject the message if we receive our messages out of the order that they should have been sent according to $\pi$. This assumes that messages that are sent in order are also received in order; this can be enforced by having each node (anonymously) return an acknowledgement to $D$ upon receipt of each message, and then having $D$ send out its messages serially. The functionality of allowing anonymous replies is possible with standard anonymity networks like Tor.

There are two drawbacks of this approach. The biggest drawback is efficiency: since $D$ needs to wait for each acknowledgement before sending out the next message, the latency of the protocol is $2n\cdot m+1$ times the standard network latency, and messages over Tor already have a high network latency. This kind of timescale can easily get impractical. The second drawback is more subtle and I’m not sure how noticeable it would be in practice, but at least theoretically the owner of a pseudonym could be identified by the time it takes them to respond (these delays would likely be more strongly correlated for pairs of pseudonyms with the same owner compared with pseudonyms with different owners, allowing the attacker to build up a statistical advantage over time by just passively observing delays).

Both of these problems can be solved if we make a feasible but non-standard (i.e., I’m not sure if any popular public anonymity networks have this feature in practice) assumption about the functionality exposed by the anonymity network. We can assume that if a set of messages $M_1,...,M_k$ are all sent to the anonymity network together in a batch, then for each receiver $P_i$ of some subset of those messages (say, $M_3,M_6,$ and $M_{11}$ were sent to pseudonyms of $P_i$), the messages delivered to $P_i$ will be delivered together in a batch (so in this case $P_i$ receives a single message containing the combined information of $M_3,M_6,$ and $M_{11}$). With all the common anonymity network architectures I know such as onion routing and mixnets, I think it should be possible to set things up to ensure this property holds; but again the issue is that I don’t think it holds for any common real networks used in practice (although if it is, let me know!).

Now we can change rule $3$ so that rather than making sure the messages are delivered in order, we instead ensure that they’re all delivered in batch. $D$ simply sends all its messages in batch to the anonymity network. Aside from being fast (the latency just matches the message latency through the anonymity network), $D$ doesn’t receive acknowledgements that it can use to deanonymize pseudonyms through timing information. In fact, one other neat bonus is that $D$ actually cannot do anything other than follow its correct functionality without at least one node learning that $D$ is Byzantine. Indeed, if $D$ doesn’t send the entire set of messages in one batch, then at least one node will not receive its full set of messages in that batch.

So with the last version of this protocol works, and within a fairly reasonable model of anonymous communication we can prove security with an arbitrary number of failures. Now before we get ahead of ourselves, I’ll clarify that this does not mean we can achieve consensus with an arbitrary number of faults. Consistent broadcast is in a formal sense much easier to achieve than consensus. In fact it’s known that non-equivocation and digital signatures are equivalent to the crash fault model, which still requires $n\geqslant 2f+1$ to achieve consensus in the asynchronous model (with some weakening like randomization or partial synchrony to get around FLP). Nonetheless, $2f+1$ is a significant improvement over $3f+1$!

There’s still a few catches though. The first is that to my knowledge there do not exist any anonymity networks that are secure in the asynchronous model. It seems to me as though such a system would be impossible, but I also don’t know of any proofs of impossibility here.3 So it could be argued that we’ve just pushed the issue back without solving it. But there some reasons why I think this design is still interesting.

First of all, it forces a potential attacker to attack the (supposedly massive, well-tested, globally-distributed, etc.) anonymity network to break the protocol, rather than attacking the (small and potentially not as well vetted) agreement system. This is likely a much more difficult task, so we still gain a benefit from “pushing back the issue”.

Second, the security of our protocol is based explicitly on the knowledge of the Byzantine nodes; if the Byzantine nodes cannot break the anonymity assumptions, then there is a negligible chance of failure regardless of how messages are delivered. I would argue that this is much preferable to a situation in which a Byzantine node can mess things up simply due to an “unlucky” ordering of message delivery. In a standard system with $10$ nodes, if $4$ of those nodes are Byzantine then they can equivocate and possibly break the system without needing to do any extra work. They can also take advantage of natural (and surprisingly frequent) transient partitions and possibly break the system with high probability! In our system, those $4$ nodes could not break the system without first explicitly deanonymizing the honest nodes’ pseudonyms, regardless of transient partitions. We can also quantitatively measure the probability of failure ($\sim 2^{-d}$), whereas it is impossible to quantitatively measure the probability that equivocation might succeed against a standard system.

Another catch involves the setup. How do the nodes share their pseudonyms with each other while preserving anonymity? While this might at first glance seem like a trivial issue (just anonymously send messages over Tor containing information about your pseudonymous identities), but timing information could theoretically be exploited to learn some information about the pseudonyms during setup. I don’t see this as a real issue in practice, but the theory is made a bit less appealing by it. I think the best way to solve this issue theoretically is by assuming a synchronous initialization such that all nodes get their pseudonym messages included in a single batch of the anonymity network, preventing any timing information from being spilled. This is not such a strange assumption (it’s used ubiquitously in the asynchronous multi-party computation literature), and I think it’s fairly reasonable.

1. There is one other line of research I know of that explores how to achieve asynchronous BFT with $n<3f+1$. This research uses trusted hardware, which is a mechanism by which the hardware manufacturer adds mechanisms to a computer that force it to run specific computations and then sign an attestation to the fact that those computations were executed correctly, in such a way that even the computer operator cannot get a valid signature over the output of an incorrect operation. Similar to the anonymity-based construction we develop, using trusted hardware allows one to restrict Byzantine nodes so that they cannot equivocate without being detected.
Two problems with the trusted hardware approach is that the security is only as strong as the adversary’s inability to hack the trusted hardware (no trusted hardware is unhackable, but the current generation is supposed to be designed to protect against all known attacks), and it requires every node to have a trusted hardware unit. In contrast, our technique simply involves connecting to a public anonymity network over the Internet, and further the security is reduced to the security of the anonymity network which is better understood.
A notable benefit of the trusted hardware approach over ours is that trusted hardware attestations are publicly verifiable; in other words, the message can be broadcast in a peer-to-peer fashion and everyone who receives it can directly verify that the message they receive is consistent with everyone else’s messages. With our approach, the sender needs to actively send the message to a pre-specified group of nodes, and anyone outside of that group cannot get a guarantee that a given message is consistent with everyone in that group.
2. No, this is not a secure password.
3. Note that the fact that our protocol violates a known impossibility result result by leveraging anonymity does not immediately reduce to impossibility of asynchronous anonymity, since the anonymity network might have extra assumptions about e.g. trusting mixnodes, such that the joint system does not violate the impossibility of $n<3f+1$ asynchronous agreement. It does rule out certain kinds of anonymity networks though, like mixnets which guarantee security even if all but one mixnodes are corrupted.