Bubble Sorting
From Real Software Documentation
Aim
We take a break from object-oriented programming to return to algorithms. Over the next few lessons, we will examine a number of different ways of sorting a list into order. In examining this range of code and techniques, you will achieve a few things:
- You will begin learning about the rich and complex notion of efficiency in computer programming;
- You will see an example of how many ways there usually are to solve a given problem (even an apparently simple one) with a computer, and
- You will look at more code. It is important to read a broad range of code.
Contents |
About Efficiency
There are a few very specific notions of efficiency that computer science is usually concerned with, and we will look at these in a moment. But before we do, let’s consider efficiency more broadly.
Programmer’s Time vs. User’s Time
We will begin by observing that there are two different times being used by a computer program:
- The programmer’s time, in writing the program; and
- The user’s time, in waiting for the program to finish.
Programmer Time is Scarce
Next, we’ll consider that there are a number of tasks that can take up a programmer’s time:
- Making a program work correctly;
- Making a program work faster (or, say, in less memory);
- Making a program easy to change;
- Doing something else (writing a different program; taking the dog for a walk; learning Spanish…)
(The first three aren’t clearly separated. For instance, making the program easier to change and making it correct may go hand-in-hand). Our second observation, then, is that there are many demands on a programmer’s time. We’ll also notice at this point that most programs will only be used for a certain period of time, after which a new version is used, or the program is not used at all.
User’s Time vs. Computer Time
Computer time is getting cheaper very quickly. Meanwhile, people work at the same speed they always have.
More and more often, then, when a user runs a program, almost the entire time involved is taken up by the user setting up the data, choosing which program to run, reading the results, and so on.
Why you should probably write a lot of quick and dirty programs
All of this leads to the conclusion that a lot of programs should be written using the quickest, simplest code possible.
Your time is probably much scarcer than your user’s. Even if it isn’t (perhaps your program will be used by a million people), a “slow” program that finishes in one second, compared to a “fast” program that finishes in 1/100 of a second, is probably fine. There are a lot of simple tasks you can save people from doing, by writing simple programs. If you can do more of this, by writing quick and dirty programs, then this is the sensible thing to do.
When you should spend more time writing a program
You should spend more time on a program in the following cases:
- The program won’t run “in an instant” if you don’t pay attention to the program’s speed, and will be used often, or by many people;
- The program must handle a very large problem (say, spending an extra hour writing more efficient code means the program finishes in a few minutes rather than a week — as we will see, this kind of difference is surprisingly common);
- The program will be used for a very long time. It will probably be repeatedly modified, so you should at least spend time to make its design easy to change and well-documented;
- When the speed of the code manifests itself as another quality that people will notice, you may want to optimize the speed of your code. The commonest example of this is in games: the faster your code runs, the more images per second it can generate, meaning the animation is smoother. You can doubtless conceive of other conditions along these lines.
Don’t Waste Time on Wasteful “Efficiency”
We will spend quite a bit of time looking at the notion of efficiency in programming, as programmers usually define it. As we do this, you will see that very often, code that is easier to write will run more slowly (sometimes much more slowly). Don’t make the mistake that surprisingly many programmers do, of thinking that the slower, simpler code is bad code. Very often, it is better code, because you can write it more quickly and it is easier to understand.
Different Parts of a Program Often Require Different Attention
One final observation: even in a program that you decide to spend a lot of time optimizing (meaning you are making it run faster or in less memory), there will be parts of the program where you shouldn’t bother. You should only optimize the parts of the program that the computer will spend a lot of time in.
Four Lessons on Sorting
Now on to the narrower notion of efficiency.
One of the most famous books on programming, The Art of Computer Programming, describes 29 quite different ways to sort a list of things into order (along with many minor variations). And the list in that book is by no means exhaustive.
In this and the next three lessons, we will look at four of these ways of sorting. We will examine how they work; we will test how fast they are; we will briefly consider other issues, such as what will be their best- and worst-case performance; and we will consider minor variations on them that might improve their performance.
The Bubble Sort
We begin with the Bubble Sort, probably the simplest possible sorting algorithm, and an amazingly inefficient one.
- Open the project BubbleSort and examine the code of the SortEvent Event handler of the BubbleSortList class.
- Put a breakpoint on the Do command.
We haven’t seen the Do… Loop Until before. This is just like the While… Wend loop, except that it executes the body of the loop once before testing the condition, and it stops executing the loop when the condition evaluates to True, not False.
You should try to work out how the sorting is done by examining the code, before reading the explanation in the next paragraph.
How the Bubble Sort Works
The Bubble Sort works by looking through the list from one end to the other, examining each pair of adjacent elements in the list, and exchanging them if they are out of order. This is repeated until you can look through the list from one end to the other, without having to exchange any of the elements.
It has probably never occurred to you to sort a list in this way, and it is obvious how inefficient it is. But the code to implement a Bubble Sort is very simple. Don’t be afraid to implement a Bubble Sort if you need to sort a small amount of information in a quick and dirty program. On the other hand, in the next lesson, we will look at a much more efficient algorithm that is very nearly as easy to implement (the Insertion Sort).
Make sure you look through the code and see how we’ve used polymorphism to separate the implementation of the sort from the code that’s requesting the sort. The code requesting the sort only needs to know that the thing to be sorted is a NumList. This will be important in later lessons, where we run multiple different sorting algorithms and compare them. |
The Narrow Definition of Efficiency
All of our discussions of the importance of your time aside, it is very important in computer science to understand how we look at efficiency. There are many situations where it will be important to write efficient code.
There are many possible types of efficiency, but the most important two are: how much time does an algorithm take to finish? And how much memory does an algorithm need?
The Bubble Sort is very good in terms of its memory efficiency: other than the list of elements itself, we need a counter, a variable to use in the exchange, and a Boolean. Even the counter is not actually necessary (think about how to write the method without it).
But the code is very inefficient in terms of time. Try increasing the list to 1000 elements, and see how long it takes to finish. When the time to complete an algorithm increases very rapidly with the size of the problem (in this case, the ‘size of the problem’ would be the number of elements in the list), we say the algorithm is time inefficient.
Best and Worst Cases
The best case for this algorithm is an already-sorted list. In this case, it will complete as quickly as a simple test that the list is already in order.
And a list that is close to sorted will sort fairly quickly as well. The worst case for the Bubble Sort is when the list is sorted in reverse order. In that case, each time through the list, one element will be sorted into place, and the rest of the list will be unchanged. For a list of n elements, a straight Bubble Sort in the worst case will have to make n2 comparisons (because it searches a list of n elements n times).
Further Exercises
As you watch the algorithm in action, and if you think about what it’s doing, you might have noticed that the first time around the Do loop, the largest element in the list winds up at the end; the next time through the list, the second-largest winds up at the second from the end, and so on.
- Consider how the algorithm might be improved as a result.
The improved algorithm will require only n2-n comparisons, for a list of n elements.
A nice and reasonably simple improvement of the algorithm is to make alternate passes in opposite directions through the list: this slightly improves the average time, and massively improves the worst case. It is difficult to arrive at a simple mathematical expression for how many comparisons are required in the worst case, but it is considerably better than either of the above.
References
- Notes
1. Knuth, Donald E, The Art of Computer Programming 2nd Ed, Vol 2, ISBN 0-201-89658-
