Curriculum Cyclic Redundancy Check

From Real Software Documentation

Jump to: navigation, search

Aim

This lesson explores the strange world of binary arithmetic, through the implementation of a very important algorithm, the Fast Cyclic Redundancy Check.

The theory involved is not actually very complicated, but it does involve learning a new kind of math. It’s considerably easier than, say, the basic arithmetic you learned in Primary School. But it is a little strange, so if you don’t feel it’s your cup of tea, that’s fine. Let’s call this lesson optional.

We’re doing this lesson because Cyclic Redundancy Checks (CRCs) are very important, but also because we it’s a good lesson involving binary arithmetic. There are many, many thousands of web sites and books available that can teach you basic binary arithmetic, so there seemed little point covering the same ground. Instead, we’re doing a lesson designed to get you to stretch your ideas about what all those 1s and 0s inside your computer are good for.

Also, a very broadly useful technique is used in the project to speed up the CRC calculation.


Contents

Learn Basic Binary Math

Binary math is math base 2. It works just like the base 10 math you do all the time, but it only uses 1 and 0, and rather than the digits in a number representing 1s, 10s, 100s and so on, they represent 1s, 2s, 4s, 8s and so on. If at any time in this lesson you feel you need more information about how to do binary math, try doing a google search for “binary math”. One page that looked useful was: http://www.math.grin.edu/~rebelsky/Courses/152/97F/Readings/student-binary.html.

What is a Cyclic Redundancy Check?

It is common to want to compare two pieces of data that are separated in time or space, in a situation where you only have one or even neither of the two pieces available. You might have transmitted something over a network, or you’ve stored it for some period of time in some way, and you now want to determine if what you have is the same as the original.

A CRC is a way to do this: it is a short “signature” generated from the data, in such a way that if the data has changed, the signature will almost certainly be different. Since the CRC is generally much shorter than the data it is derived from, it is inevitable that many possible blocks of data will generate the same CRC. What we want, though, is to guarantee that the CRC will not be the same given expected patterns of errors (such as single bits being switched, bursts of sequential bits being switched, and so on).

The math is a bit beyond the scope of this lesson, but it is not too hard to prove that the CRC we&squo;re going to use here meets these kinds of criteria. See, for example, http://www.cs.williams.edu/~tom/courses/336/outlines/lect7_2.html for a proof of this.

Math with Polynomials Modulo2

In this lesson, we’re going to combine the computer’s native 1s and 0s into things that are almost numbers. Technically, they’re polynomials modulo 2[note 1]. They’re almost numbers because we can perform operations with them that are equivalent to the operations you are used to performing with numbers.

A polynomial modulo 2 is a string of 1s and 0s, just like a regular binary number. For simplicity, we’ll refer to them from here on as CRC numbers.

You can perform the usual addition, subtraction, multiplication, and division with CRC numbers, the same way you perform those operations with regular binary numbers, with one exception: you ignore carries[note 2].

The only operation we need for the CRC is division. In fact, calculating a CRC is just the remainder after dividing the data by a particular, fixed polynomial (usually referred to as “the poly”).

Now, division amounts to repeated subtraction:

7192 
   ______ 
72 )517843 
   504 
   ---- 
    138 
     72 
    ---- 
     664 
     648 
     ---- 
      163 
      144 
       -- 
       19 

In this calculation, 72 is the divisor, 7192 is the result and 517843 is the dividend. There doesn’t seem to be a proper term for each of the numbers we subtract as we carry out the algorithm, so we’ll call these numbers (such as 504 from the first step here) the subtractor. And we’ll call the number we get after subtracting the subtractor and then bringing down the next digit the goal.

We will use the same algorithm for our modulo 2 division; we just need to know how to do modulo 2 subtraction, and also how to work out how many “times” the divisor to subtract to get the subtractor at each step.

We’ll start by observing that we’re working in binary, so the second question amounts to whether to subtract 0 or 1 times the divisor — in other words, whether to subtract the divisor or not. And this just amounts to asking whether the dividend is less than the goal or not.

Subtraction of polynomials modulo 2 turns out to be the same as addition without carries, and is actually just the same as the exclusive or operation. We can see this by noting that since we are ignoring carries, each position in the addition or subtraction is entirely independent of the positions around it. So let’s look at adding or subtracting 1s and 0s when we ignore carries:

0 + 0 = 0 
0 - 0 = 0 
1 + 1 = 0  (because 1+1=10 in binary, but we  drop the 1 carried over) 
1 - 1 = 0 
0 - 1 = 1  (because 1+1=0) 
1 - 0 = 1 
0 + 1 = 1 
1 + 0 = 1 

So whether we are adding or subtracting, two 1s or two 0s give us a 0, while a 1 and a 0 give us 1.

This turns out to be the same thing as the exclusive or operation (see Appendix 2: Boolean Operators, for more about operations like exclusive or).

Back to determining when one number is less than or greater than another. Since addition and subtraction are the same, all we can do for numbers of the same length is to compare their left most digits; if both have the same left most digit, you can regard either as the bigger one. Again, this makes the CRC calculation easy: if the first digit of both the goal and the dividend is the same, we subtract (exclusive or) the dividend; if they are different, we don’t. Actually, it’s even easier than this: the poly for a CRC always starts with a 1, so we just have to look at the next digit of the goal: if it’s 1, we exclusive or the poly; if it isn’t, we don’t.

An Example

To calculate a CRC, find the number of digits in the poly, and append one less zeroes to the data. So in the example we’re about to see, since our poly is 10011, which has five digits, we will append four 0s (0000). Now just calculate the remainder when you divide by the poly, ignoring carries:

1100001010 
      ________________
10011 ) 11010110110000 
       10011,,.,,.... 
       -----,,.,,.... 
        10011,.,,.... 
        10011,.,,.... 
        -----,.,,.... 
         00001.,,.... 
         00000.,,.... 
         -----.,,.... 
          00010,,.... 
          00000,,.... 
          -----,,.... 
           00101,.... 
           00000,.... 
           -----,.... 
            01011.... 
            00000.... 
            -----.... 
             10110... 
             10011... 
             -----... 
              01010.. 
              00000.. 
              -----.. 
               10100. 
               10011. 
               -----. 
                01110 
                00000 
                ----- 
                 1110 

Let’s look at the first few operations:

  • Our first step is to divide 10011 into 11010 (notice that we only need to look at the same number of digits as our poly has at one time). Since 11010 starts with a 1, and our poly always starts with a 1, we can regard the Poly as smaller, so we subtract it. The result is 01001. We remove the 0 from the left end, and shuffle in the next digit of our data (this is just the regular long division algorithm), to get 10011, our next goal.
  • Now we’re dividing the poly into 10011. Again, the left digit is a 1, so we subtract the poly again. This gives us 00000, so after we shuffle the next digit of the data in, we have 00001.
  • Now, the left digit is a 0, so we don’t subtract the poly. We still have 00001, so now we shuffle the next digit of the data in, to get 00010.
  • Same thing again, so we get 00101. And so on. Since our poly always starts with a 1, the process winds up being very simple: if the current left over starts with a 1, we exclusive or the poly, if not we don’t. Then we move the next digit of our data into the leftover from the right, wash, rinse and repeat.

Speeding it Up

Take a look at the first few lines of our worked example:

1100001010 
      _______________ 
10011 ) 11010110110000 
       10011,,.,,.... 
       -----,,.,,.... 
        10011,.,,.... 
        10011,.,,.... 
        -----,.,,.... 
         00001.,,.... 

Note that XORing a series of numbers together can be done in any order. To calculate a XOr b XOr c, we can calculate a XOr b and XOr the result with c, or we can calculate b XOr c, and then XOr the result with a. This is similar to addition and multiplication, which we can also perform in any order.

Now, consider what happens as we perform the XOrs for each pair of digits in the dividend: for any possible pair of digits in the dividend, we will always be XOring with the same pair of values. So, if we look at the first two steps of the example, we will be XOring with:

10011, 
 10011 
----- 
110101 

So we can perform the first two steps of the calculation above by XOring with 110101.

If we perform the equivalent computation for all four possible values that a pair of digits from the dividend can have, we can then step through the dividend two digits at a time, by XOring with a value looked up in the table. Here is what that table looks like:

Data XOr with
00 00000

000000
_______
0000000

01 00000

010011
_______
010011

10 10011

000000
_______
100110

11 10011

010011
_______
110101

The most common speedup is to extend this idea to construct a 256-element table, and to feed through the data one byte at a time.

One final comment: you need to use very particular polys to get good results. Calculating what would be a good poly requires some slightly hairier math than we want to get into here. The Institute of Electrical and Electronic Engineers (IEEE, http://www.ieee.org) is a good source for such information. A search on google for “crc polynomials ieee” will quickly find you a list of good polys to use for different lengths of CRCs. The one used in the example project is a good poly for 32-bit CRCs, and 32 bits is a good length for most purposes.

The Project

The included project implements a table-driven CRC in Real Studio.

The code is fairly short and simple.

  1. Open the CRC project and look at the code in the CRC Class.
    Begin with the BuildTable method. Building the table is straightforward: you just perform the calculations of the CRC for all the binary numbers from 0 to 255, one bit at a time.
  2. Now look at the constructor.
    Notice that we build the table, and set the starting value of the CRC to all 1s.
  3. Next, look at the Add method, to which you pass a string.
    It just loops over the bytes in the string, XORing the current value of the register with the corresponding table entry.
  4. Finally, note that getting the result out is a separate operation from the calculation of the CRC (via the Value method).

By doing it this way, we can feed the data through in a series of stages. We can just keep calling Add with some chunk of our data, until we’ve read through all our data. This is particularly useful for, say, communication software, which might be receiving some data in chunks, rather than all at once.

Using CRCs

Remember what CRCs do: they give you a signature for some data. The uses for this capability are limited only by your imagination. A couple of examples:

  • You might use a CRC on a file to ensure it wasn’t modified (perhaps your program’s preferences file);
  • You could calculate a CRC for a document and then encrypt it. If you use a public key encryption method, someone would be able to verify that a document came from you (because they could decrypt the encrypted key and get a valid CRC for the document). If you don’t know what public key encryption is, read about it online. It’s an amazing and very useful form of encryption.

Further Exercises

There are a number of interesting exercises from this point:

  • Extend the example to use a 65536-element table, so we can double the speed of the calculation.
  • Modify the example so we only need to compute the table once — particularly useful for the 65536-element case. This will probably involve creating a separate table object or Module property, and passing it to the CRC calculator.
  • Overload the Add operator to accept other data types: an array of strings; a number; an array of numbers.

Appendix 1: Polynomials Modulo 2

Why are these polynomials, and why are they modulo 2?

Math Modulo n

Performing math modulo something (where something here is a number) is like doing math on a clock face (actually, that would be math modulo 12). Adding and subtracting is then just moving clockwise or anti-clockwise around the clock face.

Another way of looking at this is that to do math modulo n, you just find the remainder after you divide by n after any operation.

So in math modulo 12, if you add 8 to 9, you get 5 (try looking at a clock face, and move clockwise 8 steps from 9; or calculate 8 + 9 (17), and then find the remainder after you divide by 12 (5).

Polynomial Math

A polynomial is an expression like 3x5+8x3+x2+5x-1. You can’t simplify this expression until you know what x is, but there is a whole branch of math concerned with math with polynomials just left in this form. You can add two polynomials together, by just adding the coefficients (the numbers in front of the xs) in each place, so

3x5+8x3+x2+5x-1 + 4x5+2x3-6x2+2x+4 = 7x5+10x3-5x2+7x+3

Polynomials Modulo 2

When the coefficients are modulo 2, the math becomes very simple. The polynomial means the coefficients are added separately and as we saw, the modulo 2 means that the add and subtract operation amounts to exclusive or.

Appendix 2: Boolean Operators

An operation on binary numbers is just like any other kind of operator: it represents a calculation that takes one or more values and generates a new value.

Operations on binary values are special because we can also consider them as Boolean operators, and in fact this is how they’re usually named. And just as we can use a multiplication table to represent the results of a multiplication operation, we can use a truth table to represent the results of a Boolean operation. It is customary to consider 1 as True and 0 as False. Here is a truth table for the Or operator:

OR 0 1
0 0 1
1 1 1

So:

  • 0 OR 0 = 0
  • 0 OR 1 = 1

and so on.

We can see these as operations with truth values if we consider how we use or with real world truths or falsehoods: Is it true that:

  • This document is about Real Studio or the moon is blue (True or False = True, 1 Or 0 = 1)?
  • The moon is blue or the world is flat (False or False = False, 0 Or 0 = 0)?
  • This document is about Real Studio or the world is round (True or True = True, 1 Or 1 = 1)?

Here is the truth table for the And operator:

AND 0 1
0 0 0
1 0 1


Bitwise Boolean Operators

Real Studio’s regular Boolean operators are used in almost every program. We’ll assume you’ve used these before now.

Real Studio also provides a set of Boolean operators that work on groups of binary values at once. These groups of binary values you’re used to calling “integers”, but we can treat an integer as an ordered set of 32 Boolean values. Real Studio’s Bitwise operators then perform a Boolean operation using the corresponding bits of each of the values.

The bitwise equivalents of And and Or are BitwiseAnd and BitwiseOr. For instance, the following code:

MsgBox Str(BitwiseOr(17, 12))

displays “29”. We can see that by performing the calculation manually:

10001 Or
01100
_____
11101

The third Bitwise operator Real Studio provides is BitwiseXor. Xor is short for “exclusive or”, and it has the following truth table:

XOR 0 1
0 0 1
1 1 0

a Xor b is the same thing as a Or b And Not (a And b).

Note that there is no BitwiseNot operator, but you can achieve the same effect by XOring with all (the easiest way to write a number consisting of 32 1s is to use hexadecimal).

Hexadecimal, Octal, and Binary Constants

When you write programs that manipulate binary information directly, you will usually need to include binary values in your code.

You can type binary values directly by preceding them with &b, like this:

&b11101

But typing 32-bit values this way gets tricky and long-winded. Alternatively, you can group 3 bits at a time together, and get a base-8 (octal) number. So rather than the columns in the number being 1s, 10s, 100s (10x10), 1000s (10x10x10) and so on, the columns are 1s, 8s, 64s (8x8), 512s (8x8x8) and so on. You do this by preceding the value with &o, like this:

&o35

In an octal number, 9s and 8s would be meaningless.

You can also use a base 16 number (hexadecimal), grouping 4 digits at once, and using the letters A through F as the extra values after 9. Hexadecimal is the most compact, and it’s also convenient that there are two hexadecimal digits in a byte, and four in an integer. You write a hexadecimal number by preceding it with &h:

&h1D

So the quickest way to write 32 binary 1s is:

&hFFFF


1. Remember: if complexity or maintainability is important in your program, you should use a named constant defined in a module rather than typing a number directly into your code.

References

Notes


The technical way of saying this is: ‘the XOR operation is commutative’.

Personal tools