Algorithms and Flowcharts

Aim

In this lesson, we will begin developing and writing algorithms.

What is an algorithm?

The algorithm is one of the most important inventions in computer science. Here is a formal definition of an algorithm:

An Algorithm is an unambiguous procedure for performing some task in a finite number of steps.

This definition is adapted from Donald Knuth’s The Art of Computer Programming[note 1].

This means an algorithm must be:

• Finite: An algorithm must always terminate after a finite number of steps (although this number can be arbitrarily large, or even unknown);
• Definite: Each step of an algorithm must be unambiguous; and
• Effective: Each step of the algorithm must be possible within a finite period of time. An algorithm is an unambiguous procedure for performing some task in a finite number of steps.

Algorithms and Computer Science

Computers were originally developed to perform calculations to produce such things as navigation tables. Later, they were employed to store and produce calculations from business data. In both cases, developing and coding algorithms was really all that computer programming was about.

This is no longer the case. Many of the tasks given to computers these days are about decision-making, controlling some continuous process, or running a simulation, rather than completing a finite calculation. There are probably quite a few computers in your car, for example; and think of the number of computers being used to play games. While developing and employing good algorithms is important in developing such programs, there is now quite a bit more to computer programming than just developing algorithms.

Nevertheless, it is traditional in computer science to study algorithms early and in some detail, and we will do the same. The finite nature of algorithms (either they work, or they don’t; you run them and they finish in a certain amount of time) means that they make good, self-contained programming exercises. Their finite nature also makes them very good for examining issues of efficiency (which we will look at in more detail in later lessons).

Also, it works pretty well much of the time to think of control or simulation programs as being repeated application of one or more algorithms.

An example of an algorithm: Counting words

1. Start a new, empty project, open Window1, and lay it out like this. The controls are a TextArea, a PushButton, and a ListBox.
The Word Counter window.

2. Name the objects Source, CountButton and WordList from top to bottom.
3. Change the PushButton’s Caption property to “Count”.
If you like to pretty things up and make the window resizable, a good size setting for the window is 550 x 300.
The bottom object is a ListBox. You should read about how to use a ListBox in the Real Studio documentation, although we will discuss the particular features we use in this program as we go.

Set the Title property of Window1 to “Word Counter”. Suitable settings for the ListBox are:

Property Setting
Name WordList
ColumnCount 2
InitialValue Count Word

The InitialValue property has a tab, not a space, in between the column titles. The best way to enter the column titles is to open the Edit Value dialog by clicking on the button with the ellipsis (three dots) in the value area.

The Edit Value Dialog box.

Leave all the other properties for the ListBox unchanged.

Here is what our program will do: the user types or pastes text into the top box, clicks Count, and the bottom list displays the words in the text, along with the number of times each word occurs.

Now we need to develop an algorithm to do the word counting. We have already learned about all the programming techniques we will need in order to implement this algorithm; what we need to do now is to consider how to put them together.

Starting with a Flow Chart

Drawing a flow chart can be an effective way to begin developing an algorithm. A flow chart is a pictorial representation of the steps involved in carrying out some process. The parts you use to draw a flowchart are shown below.
Symbols used in flowcharts.

A flowchart is made by joining the boxes together with arrows. It should have a single starting Terminal, and one or more ending Terminals. You write concise English text in the boxes to describe what to do at each stage, as shown below.
A Troubleshooting Flowchart.

The flowchart describes a set of decision-making and computational steps, starting at the beginning Terminal, and following the arrows and boxes, performing the actions and making the decisions in the boxes until an ending Terminal is reached.

Pseudocode

While Real Studio’s programming language (Real basic) is very high-level and efficient for writing code, all computer languages are still fussier about every little detail than we need to be when we’re thinking about an algorithm. What is important when you start developing an algorithm is to get the important decisions in place. You can fill in the specifics, like variable declarations and particular commands later.

It is usually effective to describe an algorithm in a kind of fairly formal, structured English. This should be detailed enough for a person to understand the details of how the algorithm works, and to feel sure that they could code the algorithm in an actual programming language, but no more detailed than that. There are no fixed rules about how to do this, but here is a simple example of pseudo-code.

Troubleshooting Pseudo-code.

You read this by starting at the top and doing the next step each time, unless directed to go somewhere else.

A Word-Counting Algorithm

Now let’s look at what we need to do to count the number of instances of each word in a string:

• We need to find each word in the string. This requires us to decide what defines a word (is punctuation part of a word, for example?);
• We need to store each word we find in a data structure, along with a count of the number of times we have encountered that word; and
• Each time we encounter a word, we need to see if it is already in the data structure. If it is already in the data structure, we need to increment (add one to) its count; if it isn’t already in the data structure, we need to add it to the data structure with a count of 1. In developing an algorithm to implement this, we will have four major parts:
• IsWordCharacter: A function that says whether a particular character is a word character or not (if it’s not, we’ll consider it a separator character, and skip over it);
• WordPosition: A function that tells us whether a word has already been stored in our word list and if so, where in the list it is stored;
• AddWord: A method to add a word to the list if it isn’t there, and to increment it if it is; and
• CountWords: A method to search through the string, handing each separate word to the word storage method, and determining where a word begins and ends in part by calling the function that defines what a word character is.

This isn’t the only way to break this problem down; it would be quite possible to write the whole thing as a single method, for example. But this is a fairly natural separation of the parts of this task.

We will keep track of the count in a pair of matching arrays: the word at a particular index in the first array will have the count stored at the same index in the second array. Managing this data structure is an extension of the kind of code we›ve written in previous lessons.

The algorithm to separate the words from the string is more interesting. If you try to write a method to do this, you may find it a little more challenging than you expect.

You will have to write a loop that looks at the letters one at a time, but in doing this, it must decide correctly where a word starts and ends, and it must pass only the characters that are part of the word to the method that stores it.

The strategy we will use is to start the loop each time through in one of two states: InWord or not. (So InWord will be a Boolean variable). Each time through the loop, we look to see whether the next character is a boundary character for the current state (so if we are InWord, look for a non-word character, and if we are not InWord, look for a word character).

When we flip from the not InWord state to an InWord state, we have found the beginning of a word, and the opposite means we have found the end of a word. The change of state also marks the end of a word if we are initially InWord. We begin the whole process by looking at the first character to decide which state we start in.

How to Write an Algorithm

You should always think about the following things when writing an algorithm:

• The preconditions for the algorithm, meaning what has to be set up before the algorithm can run. In our case, we need a string, and we need methods that can store the words as we find them.
• The postconditions for the algorithm, meaning what changes to the state of the program outside of the method itself have occurred after the algorithm has finished. You should always be able to specify this very clearly. In this case, the only change made after the program runs is that the words have been passed to the storage method. But the method might have made other changes to the overall state of the program, so we need to note such changes if there are any.

Note that the algorithm changes the value of the Counter variable, but since this is a local variable, it disappears after the method ends, so we don’t need to worry about it.

• The conditions we have to keep making true as the algorithm executes; there may be several stages where you repeatedly carry out different actions. In this case, we are incrementing counter, and passing a word to AddWord every time the counter points to the end of a word.

Algorithms can also use sub-algorithms, which solve a smaller part of the overall program. We have a sub-algorithm in our case, contained in the For loop in CountWords. It, too, can be considered in the terms described above:

Preconditions for the loop:

• The string must not be empty;
• Counter = 1;
• InWord is True if the first character is a word character, False otherwise.

Postconditions for the loop:

• Counter is 1 greater than the length of the string;
• We have not emitted the last word of the string, if the string ends with a word character.

The loop invariant, which is the thing we keep making true at the end of the loop, is similar to the one for the entire algorithm, except that it doesn’t send the last word to the AddWord method if the string ends at a non-word character.

Here is our algorithm rewritten as a flowchart.

Our Algorithm as a Flowchart.

Exercise: Translate the flowchart into pseudo-code

Try to translate the above figure into pseudocode, to see if you’ve followed what we’ve discussed so far.

The Actual Code

Here is the actual CountWords method, which you should add to your window:

//Pre: None
//Post: Word array contains all of the words in Source;
//Count array contains corresponding count of those words in Source
Dim InWord As Boolean
Dim counter, WordStart As Integer
if Len(Source.text)>0 then //If source is not empty
InWord = IsWordCharacter (1)//Inword invariant (below)
If InWord then//Wordstart invariant (below)
WordStart = 1
end if
for counter = 1 to Len (Source.text)
//Invariants:
//Inword: InWord is true if the counter character of Source.text was IsWordCharacter
//WordStart: WordStart is the last counter character such that the character
// before it was not IsWordCharacter
if InWord then
if not IsWordCharacter (counter) then
InWord = false //InWord invariant
end if
//WordStart invariant: InWord implies we don't need to change WordStart
else //not InWord
If IsWordCharacter (counter) then
InWord = true //InWord invariant
WordStart = counter //Wordstart invariant
end if
end if
next
end if
//The loop doesn't detect the end of the string as a boundary
If InWord Then
End If

This code is written with a very formal description of the preconditions, the postconditions, and the loop invariants in the comments. We’ve actually named the loop invariants, and tried to prove where in the loop we made them true.

Doing this is overkill for this fairly simple exercise, but you should employ something like this (or at the very least, try to think like this) for any more complex algorithm you try to produce. The closer you can come to proving your code is correct before you run it, the fewer bugs you will produce in your code.

1. Add the following properties to the window.
Count() As Integer

Word() As String

2. Add the following function to the window.
Function IsWordCharacter (n As Integer) As Boolean
//Pre: None
//Post: If the character at location n in Source is a letter or number, return True
//else return false
Dim c As String
c = Mid (Source.text, n, 1)
Return (c>="0" and c<="9") or (c>="A" and c<="Z") or (c>="a" and c<="z")

The Code Editor for the IsWordCharacter function should look like this.
The IsWordCharacter function.

Separating something like this into its own method is a good idea. We can easily change what defines a word by changing this single method (perhaps making it depend on some preference settings). It also makes the word counting method shorter and simpler, since it doesn’t need to include this code. In a more complex project, you also might be able to re-use this method in other string processing code.

3. Add the following function to the window.
Function WordPosition (w As String) As Integer
Dim counter As Integer
//Pre: None
//Post: Returns the array index within the Word array if it is there, or -1 if it is not.
For counter = 0 to UBound(Word)
If Word(counter) = w then
return counter
End if
Next
Return –1

Make sure you can follow what this method does.[note 1]

4. Add the following method to the window.
//Pre: Count and Word arrays have same number of elements
//Post: if w is in Word array, the value of Count at the same index is incremented
//otherwise, w is appended to Word array, and 1 is appended to Count array
//ie we are maintaining a count of the words submitted to this method,
//in the two arrays
Dim position As Integer
position = WordPosition(w)
If position = -1 then
Word.append w
Count.append 1
Else
Count(position) = Count(position) + 1
End if

You should note that breaking WordPosition out into a separate function allows the rest of the code to be independent of how the data is represented. It’s a very good idea to keep assumptions like ‘these characters are all of the word characters’ quite separate from the code that uses them.

5. Add the following to the Action event handler for CountButton.
// Reset results and counts
WordList.DeleteAllRows
Redim Count(-1)
Redim Word(-1)

//Execute CountWords, then copy the results into WordList
Dim counter As integer
CountWords
For counter = 0 to UBound(Word)
WordList.Cell(counter,1) = Word(counter)
Next
As you will have read if you looked up the ListBox control in the built-in documentation, the ListBox is for displaying information in a grid or list. Among its many features are the AddRow method, which adds a new row and sets the contents of the first column; and the Cell property, which lets you set the contents of individual cells in the grid. The two lines in the above For loop, then, add a new row to WordList.

6. Run It

We’re done. As usual, you should run the program and try it out. Also try inserting BreakPoints and stepping through the code to watch it in action.

Experiment

There are many ways of extending the project from here. A few, in rough order of difficulty, are:

• Change the code to ignore numbers (remember how we detected whether a string was a number in an earlier lesson?);
• The code provided is case-insensitive (that is, capital and lower case letters are considered the same when comparing strings), because that’s how the regular equals operator works in REAL Studio. You can do case-sensitive comparisons with the StrComp function. Try adding a checkbox to the window that lets the user choose whether to make the word count case-sensitive or not;
• Add a separate display of the most common word and its count. There are two obvious ways of doing this (after the array is constructed, and as the array is constructed); and
• You might try making the list sort properly when the column headers are clicked on. This is as actually fairly easy to do, but you’ll need to read the documentation on the ListBox to figure it out. Reading documentation is an important skill you should start learning now, anyway.

Conclusion

Developing algorithms is much of what computer science is about, and developing an algorithm is more art than science. Mostly, it just takes practice, and reading about algorithms, to learn to do it. In this lesson, we’ve tried to give you a few guidelines, and an example of how to think about developing an algorithm.

This is good, formal computer science, and there is a lot of discussion to be found online about these and other design virtues. A nice one to be found fairly quickly through a google search is:
http://cs.colgate.edu/faculty/nevison.pub/web102/web102S00/Notes8.htm.

Notes