This article builds upon the introductory article about baseband modulation where Manchester code was explained, and hopefully understood and accepted by the reader as a viable code.
Manchester code and particularly its differential variant are nice, but as mentioned in the introductory article, they are wasteful in terms of bandwidth. It was employed in 10Mbit Ethernet, but not in 100Mbit because the Cat-5 cables did not have enough bandwidth.
Well, it is possible to transmit the bit stream (known as NRZ code) as it is, and indeed NRZ is widely employed. It takes half the bandwidth of the Manchester-encoded version. But then, how to recover the clock in face of long sequences of 0's or 1's? How to avoid the DC bias?
There is a simple solution for that: guarantee that there won't be long sequences of the same bit, so the RX clock has opportunity to be recovered.
There are many ways to do it. The simplest is bit stuffing. The example below inserts an inverted bit after 5 consecutive bits:
0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 1 |
0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 0 | 1 |
The RX side knows that TX does bit stuffing, so it just removes the sixth raw bit from the raw stream after five consecutive bits. Note that the actual sixth data bit is not taken into consideration; the stuffing bit is added regardless.
The RX clock recovery (some implementation of a phased-locked loop) also knows how many consecutive bits are allowed by the protocol, so it won't try to reset the clock too fast; it needs to "listen" a fairly long sequence of raw bits before reaching a conclusion.
A disvantage of bit stuffing is that it wastes a bit of bandwidth. Problematic data streams, with long stretches of 0's or 1's may need up to 20% more bandwidth, so the physical media must have this spare capacity to handle the situation.
There are more advanced techniques than bit stuffing, that don't take extra bandwidth, but we will not consider them in this page. For now, we will mention a technique called data whitening.
Data whitening is a transformation that makes data look more random. Typical messages, that are prone to having long same-bit sequences, are typically much more balanced after the whitening. The intermitent need of 20% more bandwidth is traded for a continuous need of, say, 5% more bandwidth, because the occurrence of stuffing bits has a probability that no longer depends on carried data.
Data whitening is all about probabilities. It is perfectly possible that a given message looks less random after whitening. But it is highly unlikely.