Insertion Sorting and Efficiency

From Real Software Documentation

Jump to: navigation, search

Aim
In this lesson, we will look at another sorting algorithm. We will also build a program that lets us easily compare different sorting algorithms.

The Insertion Sort

  1. Begin by considering how you would sort a set of filing cards with names on them.
  2. You would probably think of using an Insertion Sort: Essentially, start with the first card, and start building up a new stack of cards with each card sorted into its correct spot. As you pick up each unsorted card, you will find its place in the sorted pile and insert it. This is exactly what we want to do on the computer.

  3. Consider how you would implement this on the computer.
  4. You will probably want to move the elements of one array into another array, mimicking what you would do with real cards.

    Now ask yourself if it is possible to implement the same algorithm with a single array. Notice that the total number of elements involved in the unsorted and the sorted list together doesn’t change: as one list grows, the other shrinks by an equal amount. From this, we can see that you can sort the array within itself, by building the sorted array from one end of the unsorted array. This will save memory and time (we save time because the computer doesn’t need to allocate a new array), without making the algorithm significantly more complex.

    You will see as we implement this that moving element n into its correct place will actually involve rotating the list of elements between its new place and old place down by one, with the element itself rotating to the top of the rotated section of the array.

    So part of the algorithm will be a rotate sub-array operation.

  5. Open the InsertionSort project.
  6. Take a quick tour through the larger project.
  7. It lets us run more than one sorting algorithm over the same data, and to compare the time those multiple algorithms take.
  8. Open the InsertionSortList class. Examine the Rotate method.
  9. Pretty straightforward.
  10. Now examine the SortEvent event handler.
  11. Make sure you also check out the general structure of the project, and look for how we use object-oriented programming features to simplify the design.
  12. Run the project, and try an interesting range of values: 100 to 1000 in steps of 100 is probably good.
  13. Examine the Results

    This next step is optional, but if you are at all comfortable with a spreadsheet, it’s a nice way to compare the results.

    After the program runs, it will show a TextArea with the times from the two sort algorithms.

  14. Copy the values from the TextArea and paste it into a spreadsheet.
  15. Sort the entire spreadsheet by the first column, then cut the values from the right column in the second half of the rows up alongside the right column in the first half of the rows, like this (circles show where data is pasted from/to):

    Bubble Sort Array 100 24 11
    Bubble Sort Array 200 89 54
    Bubble Sort Array 300 223 104
    Bubble Sort Array 400 469 208
    Bubble Sort Array 500 617 323
    Bubble Sort Array 600 880 413
    Bubble Sort Array 700 1060 641
    Bubble Sort Array 800 1398 771
    Bubble Sort Array 900 1766 981
    Bubble Sort Array 1000 2732 1248
    Insertion Sort Array 100 11
    Insertion Sort Array 200 54
    Insertion Sort Array 300 104
    Insertion Sort Array 400 208
    Insertion Sort Array 500 323
    Insertion Sort Array 600 413
    Insertion Sort Array 700 641
    Insertion Sort Array 800 771
    Insertion Sort Array 900 981
    Insertion Sort Array 1000 1248

Now you should be able to graph the right three columns in the top half of the rows, using the first column as labels, and see how the two algorithms compare.

You should be able to produce a spreadsheet something like this:

A spradsheet and graph of the results.

The new algorithm is clearly much faster. But note that the two curves are similarly-shaped (parabolic). This means that the new algorithm reaches similar times for a data set that is not very much bigger. Both algorithms will take a long time with very large data sets. Ask yourself where the time is being spent in the insertion sort. Answer: rotating the elements in the list.

Further Exercises

It would be interesting at this point to ask yourself how this algorithm might be improved. There are quite a few possible answers to this question…

Here is just one to get you started, which you should try to implement: It is inefficient to search for the insertion point by just starting at the top of the list and looking one by one through the list, because the list is already sorted. Instead, implement a binary search, in which you start in the middle of the already-sorted list, then jump to the middle of either the first or second half, then to the middle of the first or second half of that, and so on, until you have narrowed down the location you’re looking for to within maybe five elements, then you switch to the one-by-one linear search. This will improve your sorting time quite a bit.

Personal tools