Insertion Sort 2 More on Efficiency
From Real Software Documentation
Aim
In this lesson, we look at a variation on the insertion sort. We will build a framework that allows us to compare different sorting algorithms’ speed, and use it to see if the new variation is faster.
Contents |
About Testing
It is often useful to conduct tests to compare the speed of algorithms. In many cases, it is not a straightforward matter to prove if or when one algorithm will be faster than another.
A high-level language such as Real basic introduces another complication: Real Studio does many things for us “behind the scenes,” and we don’t know exactly how it does these things, or when. For instance, as we create and dispose of variables, or resize arrays or strings, Real Studio is taking care of some fairly complex memory management for us (tracking which variable is kept where, how big each variable is, and managing the reclamation of space occupied by variables that are disposed of or resized). Sometimes, some activity Real Studio is doing behind the scenes will introduce hiccups or unpredictability into the time an algorithm or some part of it will take to complete.
As you will see in this lesson, just because we feel certain something should be quicker doesn’t necessarily mean that it will be…
Many Ways of Improving the Insertion Sort
At the end of the last lesson, we asked you to come up with ways to improve the algorithm for the insertion sort. There are many possible answers to this. Here are a few:
- Assuming the values to be sorted are randomly distributed, start looking for an insertion point at a index value estimated from the value to be inserted (so if it’s a 100-element list, with values up to 100, start looking for where to insert the value 50 in the middle of the so-far sorted values, and start looking for where to insert 25 at a quarter of the way into the list);
- Perform a binary search for the insertion point (look at the middle value, then the middle of the list above or below that, then at the middle of the values left, and so on).
- Sort into N separate lists, each covering a particular range of values, then merge these lists at the end; and
- Rather than rotating the elements in the array to insert a value, maintain a list of next index values in a parallel array, and work out the correct order using that array. This makes insertion take a fixed number of operations, rather than taking longer as the list gets longer.
Some of these ideas can be combined, of course. We will test the last idea. It seems as though it should be faster, since assigning to a couple of array elements should be considerably quicker than the large number of assignment operations involved in the rotation operation in a long list.
- Open the project 15_InsertionSort2. Run it for the range 100 to 1000 in steps of 100.
You may wish to copy the results out to a spreadsheet and graph them. You may find that this new algorithm isn’t faster. It seems reasonable to expect that the new algorithm will be faster, so this result is interesting (and it shows why we should test our assumptions!).
After the sort operation, a particular element of the link array contains the index value of the element in the Numbers array which comes after the element of the values array at the same index. This is easier to see than to explain, so here is an example:
| Index | Numbers | Link Array |
|---|---|---|
| 1 | 5 | 2 |
| 2 | 5 | -1 |
| 3 | 1 | 6 |
| 4 | 3 | 5 |
| 5 | 4 | 7 |
| 6 | 2 | 4 |
| 7 | 4 | 1 |
The link array tells us which row is next in order. We keep separate track of the lowest element in the array to be sorted, so we know that row 3 is first (it contains the lowest value, 1). We look across at the Link Array in row 3, and this tells us that row 6 is next (because it has the next-lowest value, 2). It is followed by row 4 and so on. The last row, number 2, has a Link Array value of –1, to tell us that there is no next element.
To insert a row into the list, it is only a matter of updating the link values of the elements preceding and following it in order. It seems reasonable to expect that for large arrays, it should be quicker to just update a link array element, rather than to rotate a large part of the list.
Even if this sort method is faster, just building the link array isn’t quite the same as sorting the original array. Searching through the array in sorted order will be slower if it is necessary to follow the link array to find the next element. And finding the nth element in sorted order will be very slow. This is a common situation: which algorithm is better will often depend on how you want to use the results.
- Open the LinkInsertionSortList class, and examine the SortEvent event handler.
As an example, we’re doing everything in one big method here because doing something through a method call rather than in place is a little slower. This can matter when you multiply that little difference by a lot of times through the loop.
Remember our discussions about efficiency. We would want to do this only if doing it makes the code noticeably faster; the delay is making someone wait; and we don’t have anything more important to spend our programming time on.
Commenting and Uncommenting Code
There are three ways to indicate that something typed into Real Studio is a comment. Two of them are to begin a line with // and to begin a line with '. You should always use // for actual comments, because there is a different use for the ' comment: Real Studio provides a Comment/Uncomment button in the Code Editor toolbar. This will add or remove the ' comment mark from a block of lines. You can use this to temporarily disable or enable a chunk of code, which can be very useful during debugging. If you always use the double slashes for your comments, the built-in Comment/Uncomment commands won’t disturb your actual comments.
The LinkInsertionSortList Sort event handler has some code commented out like this at the end. It uses the link array to actually sort the array (so immediate access to a particular element of the sorted array is not slowed down). If you got faster results for the new algorithm, try again with the commenting removed. The new algorithm should now be faster, but the threshold at which it becomes faster will be at a higher length of list.
Why is the New Algorithm Slower?
It’s difficult to be sure why the new algorithm may not be faster, but we can make some educated guesses.
First, the new algorithm has to do more to start. The new algorithm has to resize two arrays, rather than one.
Notice also that for each comparison of two elements of the array we’re sorting, the new algorithm has to execute two comparisons more than the first algorithm does. So finding the location in which to insert the new element is going to be slower than it is in the first algorithm. It also has to keep track of two extra variables as it searches the list.
Further Exercises
It would be interesting to implement some of the other ideas for how to speed up the insertion sort. Build each as a separate class, and wire them all into the test framework so you can work out which is quickest for large lists. It would be surprising if one of the other ideas for improvements wasn’t actually faster.
Any such experiments will only be an academic exercise (although a good one). In the next lesson, we’ll go to quite a different algorithm called a Quick Sort that is much faster for a random list…
