CRCs

A Better Error Checking Mechanism

The questions below are due on Thursday October 24, 2024; 11:59:00 PM.

You are not logged in.
Please Log In for full access to the web site.
Note that this link will take you to an external site (https://shimmer.mit.edu) to authenticate, and then you will be redirected back to this page.

Much of this progression was taken somewhat directly from this excellent, in-depth introduction to CRC, and more or less translated one-to-one into pset form. If you're interested in a little more of the nitty gritty details, feel free to give this a read.

In this class, we've studied a handful of protocols used by hardware and software components to communicate with each other. For example, SPI was prominently featured in lab 2, we did UART in week 3, we played around with DVI/HDMI in week 4, and then more in weeks 5 and 6. Like it or not, these protocols are extremely important to our daily lives, and find themselves at the heart of digital design once things start to scale up.

So far, we've generally taken for granted that communication between hardware components Just Works. But in flashes we've also mentioned that things can go wrong, and that we need ways to detect and handle errors when digital data gets fudged on the wire, or when we're sending through a noisy channel. That's going to be our job for this pset.

Warm Up: Breaking Parity

An excellent example of error detection and recovery showed up in week 3, where you looked at the addition of a parity bit with messages you sent over the wire. We didn't actually make you do it in practice in the interest of time, but it is a natural extension of UART. By counting whether the number of ones in a message was even or odd, we were able to catch interference in the form of bit flips and drop certain bad samples. In fact, at least on the RX side, if we had done parity checking, things would have probably gotten a little easier at the very end in terms of debugging.

Parity bits are imperfect though. Suppose we have the message 8'b10000100 coming off the wire. Also, suppose we enforce odd parity, i.e. the number of ones in the message should always be odd (enforced by the concatenation of the appropriate parity bit).

Provided we're using odd parity, what should the parity bit for this message be?

Can you 'corrupt' your message in such a way that a parity check on your message would still pass, but that the original message content is no longer 8'b10000100? Enter your answer as a Verilog bit vector, e.g. 8'b00000000 or 8'b0000_0000. Make sure your submitted answer is the correct length, else the checker will crash and you'll lose a submission.

Okay, now what?

That was easy! It took a constant number of bit flips to let an error get through our parity calculation. Clearly we're going to need something more robust if we intend to send data through a noisier channel.

One common method for error checking you'll run into in a lot of software applications is called a checksum (for example, the checksum on data encapsulated via the ubiquitous Internet Protocol Version 4). This is about as straightforward as it sounds - to ensure our data arrives at the receiver intact, we run the following algorithm on our data:

Sum the bits we send as we send them. For instance, if we send 8'b10000100 onto the wire, we arrive at a sum of 2
Decide on how wide we want our checksum to be - let's say 2 bits
Take this sum modulo 2^{2} to 'pack' the checksum into 2 bits
Stick the result onto the end of our message and send it off

In this example, we'd arrive at a checksum of 2'b10. Astute readers will notice:

A checksum of length one is kind of like using even parity
The longer the checksum, the lesser the probability of error

As an example...

Suppose we increased the checksum length to 3. What's the checksum of our input data now? Enter your answer as a verilog literal in binary (for example, 4'b0101 or 1'b0)

This is longer, but there's a lesser likelihood that we'll find a different message that produces the same checksum. Of course, our channel is still noisy, so the probability of a corrupted message is the same (or might even increase since our message is longer - more opportunities for something to go astray). But the probability of a 'dangerous-side' failure - where a message corrupts into another message which produces the same checksum - is somewhat lower.

There's just one problem: put bluntly, addition is a really bad choice for protecting our data. It's pretty easy to flip a very small ¹ number of bits in a message, no matter how long the checksum is, and wind up with the same result...

Say we're using a four bit checksum attached to a 32 bit message. What's the lowest number of bits you can flip in the message 32'b1010_1010_1010_... to get a different message but the same checksum via the algorithm above?

What about modulo division?

As we've said before...

Addition is easy to do in your head
Addition is also easy to conceptualize in hardware, with a little 6.205 under your belt...

By contrast,

Long division is much harder to do in your head, depending on the numbers at play
Similarly, have fun writing a remainder calculator in hardware. Ignoring the trivial cases of power-of-two numbers, this is as hard a task to do as division.

More importantly, suppose for our 'checksum' we try to use input / n, where n is some conveniently big and chaotic number (more on this later) our sender and receiver have agreed upon. For the rest of this explanation, let's call that the shared divisor. If you flip a bit or two in the input, the remainder from dividing by our shared divisor will jump around a lot - the bit flips you'd need to get an identical remainder are a lot less trivial to figure out than an identical sum mod some number, as before. This gives us the foundation for a much more robust error checking scheme, and is quite similar to the widely used Cyclic Redundancy Check (CRC) algorithm.

Of course, division is hard. Regular integer division takes a lot of resources (and needs cycles to compute even with the best algorithms), so we're going to try something a little bit different to exploit this shared divisor integrity checking technique.

Polynomial Arithmetic

To make division by our shared divisor easier, we're gonna step into the weird and wacky world of polynomial arithmetic. Don't worry, this is less complicated than it sounds. We also won't spend a ton of time here, because this is an engineering class - we just want you to be familiar with the 'why' and the 'how' of CRC before we go and implement some crazy thing.

Keeping with what's easy and what's not, let's start with polynomial addition. Say we have the numbers 4'b0010 and 4'b0001. We would like to add these numbers. Here's a very wacky way of going about it:

Transform these numbers into polynomials by rewriting the n^{th} least significant bit as the polynomial term x^n. For example, 4'b0010 becomes x^1 and 4'b0001 becomes x^0. If we were to set x to 2, we'd wind up with the original binary numbers we started with.
Add the two polynomials together, ignoring the fact that we know x = 2

As a little throwback to middle school algebra, try it real quick:

What's 4'b0010 + 4'b0001 in polynomial arithmetic? Specify your answer as an x-based polynomical. Fully simplify your answer in terms of x, in descending order (e.g. x^2 before x^1, not x^1 before x^2), and don't rewrite x^1 as x or x^0 as 1.

As we're pretending not to know what x is, we can quickly make our number system even simpler by playing with the coefficients on each polynomial term left over from our above expression. One adaptation we can make might look like this:

For every coefficient in an expression resulting from a polynomial arithmetic operation, calculate that coefficient % 2 and use it in our final expression rather than the original result.
For example, x^3 - x^2 + 2x^1 becomes x^3 + x^2.

This is called polynomial arithmetic mod 2, because we're taking the coefficients modulo two across the resulting polynomial. It turns out polynomial arithmetic mod 2 has some really interesting properties we can exploit in hardware to make super simple dividers - reaping the benefits that division by a shared divisor has for data integrity, without blowing our propagation delay out of whack.

What's 4'b0010 - 4'b0001 in polynomial arithmetic mod 2? Again specify your answer as an x-polynomial like x^2+x or something. Fully simplify your answer in terms of x, and don't rewrite x^1 as x or x^0 as 1.

It should become apparent that subtraction (and addition -- do it out!) in polynomial arithmetic mod 2 feels a lot like taking an xor. If x^n appears in both polynomials, or neither of them, x^n doesn't appear in the final answer. If it appears in only one of them, x^n does show up in the final answer.

As an additional complication - 4'b1000 - 4'b1111 = 4'b0111, which would imply that 4'b1111 < 4'b1000. The opposite also holds: 4'b1111 - 4'b1000 = 4'b0111 as well, which would imply that 4'b1111 > 4'b1000. Clearly numeric magnitude is pretty shot in this system, because a - b = b - a but b \neq a.

As a result the only real indication we have of whether a number is greater than another is the position of its most significant one. In other words, we can tell clearly that 4'b0001 < 4'b1000 - that makes sense just by looking at it, but we have no clue once we compare 4'b1001 and 4'b1000 because subtraction and addition are effectively the same operation now - xor.

So we'll roll with this definition for now - a number is greater than or equal to another number if its most significant one is farther to the left than that of the other number.

Division in poly-mod-2 land

To illustrate why division is super easy in this polynomial-mod-2 regime, let's try out an example division.

Suppose we've agreed with a message recipient that we'd like to ensure the integrity of our messages using the shared divisor x^4 + x^1 + x^0. That's like using the shared divisor 5'b10011 in binary world. Unbeknownst to the recipient, we're going to send them the 8 bit message 8'b1010_1010. What this means is that we're going to divide our message by the shared divisor/polynomial, append the remainder to the end of our message, and then toss it over the wire. Pretty straight forward.

We're going to use long division so that we can exploit subtraction (i.e. xor) as much as possible). A few tips:

I can't believe I have to put this here, but if you're not familiar with long division anymore, you're not alone...here's a decent explanation this was one of the better ones I could find
As we said before, subtraction is xor'ing in this number system!
Enter your answer as a Verilog literal, as before
When you're going each step of the long division, you'll progressively check whether some of the numbers under the tableau (the |- thing) fit into the shared divisor. But what does "fit into" mean? In our number system, a number a goes into another number b once if a \geq b, and zero times otherwise.

What's 8'b1010_1010 % x^4 + x^1 + x^0 (i.e. 5'b10011) in polynomial mod 2 arithmetic? Enter your answer as a 4-bit Verilog literal.

Writing this out and stepping through the same process I did will make this whole thing way easier to understand! You should arrive at the same answer as I do below - if you don't, post on Piazza and we'll figure it out.

Here's how one could attack this problem:

Steamrolling through this we find a remainder of 4'b0111. This is our new "checksum" that gets attached to the end of the message -- and since we derived it through division it has super nice properties that make it 'better' than an additive or parity-based checksum.

Astute readers might notice something about each step of this division. Remember this circuit from the LFSR pset?

Well....what if we did THIS???? Look at the differences:

This is a regular old LFSR with a small catch - we no longer feed the output from the most significant position (in this case, lfsr[3]) directly back into the XOR gate at each tap (for example, lfsr[1] = lfsr[0] ^ lfsr[3]). Since we want to accept new input, what we actually wind up doing is feeding lfsr[3] ^ input into the first tap and then the regular lfsr[3]^lfsr[n-1] for any tap at n, so that the LFSR can incorporate new things 'shifted in' to it as time progresses.

Why the XOR, you ask? It turns out adding that xor with the input allows this LFSR to compute the exact same remainder that we're computing using polynomial-mod-2 division! If you want to see why, here's the step-by-step explainer behind it:

Jay's (2022 TA's) explainer video

Key takeaways:

LFSRs compute the remainder of their inputs divided by the polynomial shared divisor they encode (i.e. some pre-shared value we all agree on) in our unique arithmetic regime
Unlike a conventional divider, the LFSR is super simple and fits onto a schematic you can read!
When used with certain polynomials, this procedure is called a Cyclic Redundancy Check (CRC)

Using this technique for insuring our messages rather than a plain old checksum provides us some serious robustness in the face of noise - and motivates the LFSRs we built last week.

Now this isn't exactly how the LFSR is used in generated an Cyclic Redunancy Check. In practice, you'll start your divider with a state of all 1's and you need to shift in a trailing amount of 0's after your dividence equal in length to 1 less than the length of your divider polynomial. In practice this means you'll need to spend extrac clock cycles shifting in those trailing zeroes which takes extra time. The circuit can be refactored to the following to form (often called an "optimized" CRC circuit) which does all of this in a way that requires only a number of clock cycles equal to the dividence (or message) length and this is the general topology that gets used in most actual CRC implementations.

The only big change is that the XOR of the input and the value at the highest register lfsr[3] is now fed back at every spot where only lfsr[3] would have been fed before (no longer just the lowest order tap like the first improvement above). Keeping with our old example, we now have lfsr[1] = lfsr[0] ^ lfsr[3] ^ input`!

Putting it all together: verifying message integrity using CRC

Using this background, your job is to implement CRC32 - a specific instance of the shared polynomial divisor technique we theorized about above. Specifically, you will be implementing CRC32-MPEG2, which mandates an initial 'seed value' of 32'hFFFF_FFFF and utilizes the following polynomial:

x^{32} + x^{26} + x^{23} + x^{22} + x^{16} + x^{12} + x^{11} + x^{10} + x^8 + x^7 + x^5 + x^4 + x^2 + x + 1

Create a module with the following interface (a skeleton is provided in the code submission box below) that implements CRC32-MPEG2:

module crc32_mpeg2(
            input wire clk_in,
            input wire rst_in,
            input wire data_valid_in,
            input wire data_in,
            output logic [31:0] data_out);

  //your code here
endmodule

clk_in: the input clock
rst_in: an active-high reset signal.
data_valid_in: Indicates when a new bit is present on the line.
data_in: This is a one-bit wide data bus (i.e. a regular single-bit logic) that will contain bits of your client's message. The message is fed in msb-first.
data_out: This will be a 32-bit wide data bus shipping out the contents of your CRC32-MPEG2 module, so that the testbench (and users of your code!) can inspect your work.

Your module will do the following:

As mentioned before, our LFSR now accepts inputs! As in the schematic above, at every tap, instead of just XOR'ing by the most significant bit of your LFSR, XOR by MSB ^ input.
Whenever a user asserts data_valid_in, you should shift data_in ^ data_out[31] into your LFSR, per the schematic above.
Your LFSR should implement the above CRC32-MPEG2 polynomial.
Whenever data_valid_in is low, you'll do nothing and keep the LFSR's contents the same.
Whenever rst_in is asserted, you'll reset your LFSR to contain all 1s (seed value of 32'hFFFF_FFFF).

For debugging you should write a testbench that feeds in messages you care about

Don't know what to expect out? Well thankfully this is pretty standardized and there's tools. Like this one. Or various Python libraries.

Some Python code that might prove helpful for testbenching. This function will take an integer of size bits and reverse the bits (useful for feeding in data msb-first I found.

def reverse_bits(n,size):
    reversed_n = 0
    for i in range(size):
        reversed_n = (reversed_n << 1) | (n & 1)
        n >>= 1
    return reversed_n

Also if you're using the libscrc library for testbenching, if you have some random integer called randint you want to test with this module, then a line like:

randcrc32 = libscrc.mpeg2(struct.pack('>L',randint))

will generate the expected CRC for you. You have limited submissions on this checker, so make sure you are testbenching.

module crc32_mpeg2
Submit your tested CRC32-MPEG2 module here!

Code Skeleton
No file selected

Footnotes

¹but not constant! think about the message 8'b0000_0000 as an example... (click to return to text)