Software Tools, Getting Started
This post is the first in a series revisiting the programs described in the 1981 book by Brian W. Kernighan and P. J. Plauger’s called Software Tools in Pascal. The book is available from the Open Library and physical copies are still (2020) commonly available from used book sellers. The book was an early text on creating portable command line programs.
In this series I present the K & P (i.e. Software Tools in Pascal) programs re-implemented in Oberon-7. I am testing my implementations using Karl Landström’s OBNC compiler and his implementation of the Oakwood Guide’s modules for portable Oberon programs. Karl also provides a few additional modules for working in a POSIX environment (e.g. BSD, macOS, Linux, Windows 10 with Linux subsystem).
NOTE: OBNC compiler is the work of Karl Langström, it is portable across many systems where the C tool chain is available.
NOTE: POSIX defines a standard of compatibility inspired by UNIX, see https://en.wikipedia.org/wiki/POSIX
Chapter one in K & P is the first chapter that presents code. It introduces some challenges and constraints creating portable Pascal suitable for use across hardware architectures and operating systems. In 1981 this included mainframes, minicomputers as well as the recent evolution of the microcomputer. The programs presented build up from simple to increasingly complex as you move through the book. They provide example documentation and discuss their implementation choices. It is well worth reading the book for those discussions, while specific to the era, mirror the problems program authors face today in spite of the wide spread success of the POXIS model, the consolidation of CPU types and improvements made in development tools in the intervening decades.
Through out K & P you’ll see the bones of many POSIX commands we have today.
Programs from this chapter include:
- copyprog, this is like “cat” in a POSIX system
- charcount, this is like the “wc” POSIX command using the “-c” option
- linecount, this is like the “wc” POSIX command using the “-l” option
- wordcount, this is like the “wc” POSIX command using the “-w” option
- detab, converts tabs to spaces using tab stops every four characters in a line
All programs in this chapter rely solely on standard input and output. Today’s reader will notice an absence to common concepts in today’s command line programs. First is the lack of interaction with command line parameters, the second is no example take advantage of environment variables. These operating system features were not always available across operating systems of the early 1980s. Finally I’d like to point out a really nice feature included in the book. It is often left out as a topic in programming books. K & P provide example documentation. It’s structure like an early UNIX man page. It very clear and short. This is something I wish all programming texts at least mentioned. Documentation is important to the program author because it clarifies scope of the problem being tackled and to the program user so they understand what they are using.
1.1. File Copying
Here’s how K & P describe “copyprog.pas” (refered to as “copy” in the documentation).
PROGRAM copy copy input to output USAGE copy FUNCTION copy copies its input to its output unchanged. It is useful for copying from a terminal to a file, from file to file, or even from terminal to terminal. It may be used for displaying the contents of a file, without interpretation or formatting, by copying from the file to terminal. EXAMPLE To echo lines type at your terminal. copy hello there, are you listening? **hello there, are you listening?** yes, I am. **yes, I am.** <ENDFILE>
The source code for “copyprog.pas” is shown on page 9 of K & P. First the authors introduce the copy procedure then a complete the section introducing it in context of the complete Pascal program. After this first example K & P leave implementation of the full program up to the reader.
The body of the Pascal program invokes a procedure called copy which reads from standard input character by character and writes to standard output character by character without modification. Two supporting procedures are introduced, “getc” and “putc”. These are shown in the complete program listing on page 9. They are repeatedly used through out the book. One of the really good aspects of this simple program is relying on the idea of standard input and output. This makes “copyprog.pas” a simple filter and template for writing many of the programs that follow. K & P provide a good explanation for this simple approach. Also note K & P’s rational for working character by character versus line by line.
My Oberon-7 version takes a similar approach. The module looks remarkably similar to the Pascal but is shorter because reading and writing characters are provided for by Oberon’s standard modules “In” and “Out”. I have chosen to use a “REPEAT/UNTIL” loop over the “WHILE” loop used by K & P is the attempt to read from standard input needs to happen at least once. Note in my “REPEAT/UNTIL” loop’s terminating condition. The value of
In.Done is true on successful read and false otherwise (e.g. you hit an end of the file). That means our loop must terminate on
In.Done # TRUE rather than
In.Done = TRUE. This appears counter intuitive unless you keep in mind our loop stops when we having nothing more to read, rather than when we can continue to read. It
In.Done means the read was successful and does not mean “I’m done and can exit now”. Likewise before writing out the character we read, it is good practice to check the
In.Done value. If
In.Done is TRUE, I know can safely display the character using
MODULE CopyProg; IMPORT In, Out; PROCEDURE copy; VAR c : CHAR; BEGIN REPEAT In.Char(c); IF In.Done THEN Out.Char(c); END; UNTIL In.Done # TRUE; END copy; BEGIN copy(); END CopyProg.
This program only works with standard input and output. A more generalized version would work with named files.
1.2 Counting Characters
PROGRAM charcount count characters in input USAGE charcount FUNCTION charcount counts the characters in its input and writes the total as a single line of text to the output. Since each line of text is internally delimited by a NEWLINE character, the total count is the number of lines plus the number of characters within each line. EXAMPLE charcount A single line of input. <ENDFILE> 24
On page 13 K & P introduces their second program, charcount. It is based on a single procedure that reads from standard input and counts up the number of characters encountered then writes the total number found to standard out followed by a newline. In the text only the procedure is shown, it is assumed you’ll write the outer wrapper of the program yourself as was done with the copyprog program. My Oberon-7 version is very similar to the Pascal. Like in the our first “CopyProg” we will make use of the “In” and “Out” modules. Since we will need to write an INTEGER value we’ll also use “Out.Int()” procedure which is very similar to K & P’s “putdec()”. Aside from the counting this is very simple like our first program.
MODULE CharCount; IMPORT In, Out; PROCEDURE CharCount; VAR nc : INTEGER; c : CHAR; BEGIN nc := 0; REPEAT In.Char(c); IF In.Done THEN nc := nc + 1; END; UNTIL In.Done # TRUE; Out.Int(nc, 1); Out.Ln(); END CharCount; BEGIN CharCount(); END CharCount.
The primary limitation in counting characters is most readers are interested in visible character count. In our implementation even non-printed characters are counted. Like our first program this only works on standard input and output. Ideally this should be written so it works on any file including standard input and output. If the reader implements that it could become part of a package on statistical analysis of plain text files.
1.3 Counting Lines
PROGRAM linecount count lines in input USAGE linecount FUNCTION linecount counts the lines in its input and write the total as a line of text to the output. EXAMPLE linecount A single line of input. <ENDFILE> 1
linecount, from page 15 is very similar to charcount except adding a conditional count in the loop for processing the file. In our Oberon-7 implementation we’ll check if the
In.Char(c) call was successful but we’ll add a second condition to see if the character read was a NEWLINE. If it was I increment our counter variable.
MODULE LineCount; IMPORT In, Out; PROCEDURE LineCount; CONST NEWLINE = 10; VAR nl : INTEGER; c : CHAR; BEGIN nl := 0; REPEAT In.Char(c); IF In.Done & (ORD(c) = NEWLINE) THEN nl := nl + 1; END; UNTIL In.Done # TRUE; Out.Int(nl, 1); Out.Ln(); END LineCount; BEGIN LineCount(); END LineCount.
This program assumes that NEWLINE is ASCII value 10. Line delimiters vary between operating systems. If your OS used carriage returns without a NEWLINE then this program would not count lines correctly. The reader could extend the checking to support carriage returns, new lines, and carriage return with new lines and cover most versions of line endings.
1.4 Counting Words
PROGRAM wordcount count words in input USAGE wordcount FUNCTION wordcount counts the words in its input and write the total as a line of text to the output. A "word" is a maximal sequence of characters not containing a blank or tab or newline. EXAMPLE wordcount A single line of input. <ENDFILE> 5 BUGS The definition of "word" is simplistic.
Page 17 brings us to the wordcount program. Counting words can be very nuanced but here K & P have chosen a simple definition which most of the time is “good enough” for languages like English. A word is defined simply as an run of characters separated by a space, tab or newline characters. In practice most documents will work with this minimal definition. It also makes the code straight forward. This is a good example of taking the simple road if you can. It keeps this program short and sweet.
If you follow along in the K & P book note their rational and choices in arriving at there solutions. There solutions will often balance readability and clarity over machine efficiency. While the code has progressed from “if then” to “if then else if” logical sequence, the solution’s modeled remains clear. This means the person reading the source code can easily verify if the approach chosen was too simple to meet their needs or it was “good enough”.
My Oberon-7 implementation is again very simple. Like in previous programs I still have an outer check to see if the read worked (i.e. “In.Done = TRUE”), otherwise the conditional logic is the same as the Pascal implementation.
MODULE WordCount; IMPORT In, Out; PROCEDURE WordCount; CONST NEWLINE = 10; BLANK = 32; TAB = 9; VAR nw : INTEGER; c : CHAR; inword : BOOLEAN; BEGIN nw := 0; inword := FALSE; REPEAT In.Char(c); IF In.Done THEN IF ((ORD(c) = BLANK) OR (ORD(c) = NEWLINE) OR (ORD(c) = TAB)) THEN inword := FALSE; ELSIF (inword = FALSE) THEN inword := TRUE; nw := nw + 1; END; END; UNTIL In.Done # TRUE; Out.Int(nw, 1); Out.Ln(); END WordCount; BEGIN WordCount(); END WordCount.
1.5 Removing Tabs
PROGRAM detab convert tabs into blanks USAGE detab FUNCTION detab copies its input to its output, expanding the horizontal tabs to blanks along the way, so that the output is visually the same as the input, but contains no tab characters. Tab stops are assumed to be set every four columns (i.e. 1, 5, 9, ...), so each tab character is replaced by from one to four blanks. EXAMPLE Usaing "->" as a visible tab: detab ->col 1->2->34->rest col 1 2 34 rest BUGS detab is naive about backspaces, vertical motions, and non-printing characters.
The source code for “detab” can be found on page 24 in the last section of chapter 1. detab removes tabs and replaces them with spaces. Rather than a simple “tab” replaced with four spaces detab preserves a concept found on typewriters called “tab stops”. In 1981 typewrites were still widely used though word processing software would become common. Supporting the “tab stop” model means the program works with what office workers would expect from older tools like the typewriter or even the computer’s teletype machine. I think this shows an important aspect of writing programs. Write the program for people, support existing common concepts they will likely know.
K & P implementation includes separate source files for setting tab stops and checking a tab stop. The Pascal K & P wrote for didn’t support separate source files or program modules. Recent Pascal versions did support the concept of modularization (e.g. UCSD Pascal). Since and significant goal of K & P was portability they needed to come up with a solution that worked on the “standard” Pascal compilers available on minicomputers and mainframes and not write their solution to a specific Pascal system like UCSD Pascal (see Appendix, "IMPLEMENTATION PRIMITIVES page 315). Modularization facilitates code reuse and like information hiding is an import software technique. Unfortunately the preprocessor approach doesn’t support information hiding.
To facilitate code reuse the K & P book includes a preprocessor as part of the Pascal development tools (see page 71 for implementation). The preprocessor written in Pascal was based on the early versions of the “C” preprocessor they had available in the early UNIX systems. Not terribly Pascal like but it worked and allowed the two files to be shared between this program and one in the next chapter.
Oberon-7 of course benefits from all of Wirth’s language improvements that came after Pascal. Oberon-7 supports modules and as such there is no need for a preprocessor. Because of Oberon-7’s module support I’ve implemented the Oberon version using two files rather than three. My main program file is “Detab.Mod”, the supporting library module is “Tabs.Mod”. “Tabs” is where I define our tab stop data structure as well as the procedures that operating on that data structure.
Let’s look at the first part, “Detab.Mod”. This is the module that forms the program and it features an module level “BEGIN/END” block. In that block I call “Detab();” which implements the program’s functionality. I import “In”, “Out” as before but I also import “Tabs” which I will show next. Like my previous examples I validate the read was successful before proceeding with the logic presented in the original Pascal and deciding what to write to standard output.
MODULE Detab; IMPORT In, Out, Tabs; CONST NEWLINE = 10; TAB = 9; BLANK = 32; PROCEDURE Detab; VAR c : CHAR; col : INTEGER; tabstops : Tabs.TabType; BEGIN Tabs.SetTabs(tabstops); (* set initial tab stops *) col := 1; REPEAT In.Char(c); IF In.Done THEN IF (ORD(c) = TAB) THEN REPEAT Out.Char(CHR(BLANK)); col := col + 1; UNTIL Tabs.TabPos(col, tabstops); ELSIF (ORD(c) = NEWLINE) THEN Out.Char(c); col := 1; ELSE Out.Char(c); col := col + 1; END; END; UNTIL In.Done # TRUE; END Detab; BEGIN Detab(); END Detab.
Our second module is “Tabs.Mod”. It provides the supporting procedures and definition of the our “TabType” data structure. For us this is the first time we write a module which “exports” procedures and type definitions. If you are new to Oberon, expected constants, variables and procedures names have a trailing "*". Otherwise the Oberon compiler will assume a local use only. This is a very powerful information hiding capability and what allows you to evolve a modules’ internal implementation independently of the programs that rely on it.
MODULE Tabs; CONST MAXLINE = 1000; (* or whatever *) TYPE TabType* = ARRAY MAXLINE OF BOOLEAN; (* TabPos -- return TRUE if col is a tab stop *) PROCEDURE TabPos*(col : INTEGER; VAR tabstops : TabType) : BOOLEAN; VAR res : BOOLEAN; BEGIN res := FALSE; (* Initialize our internal default return value *) IF (col >= MAXLINE) THEN res := TRUE; ELSE res := tabstops[col]; END; RETURN res END TabPos; (* SetTabs -- set initial tab stops *) PROCEDURE SetTabs*(VAR tabstops: TabType); CONST TABSPACE = 4; (* 4 spaces per tab *) VAR i : INTEGER; BEGIN (* NOTE: Arrays in Oberon start at zero, we want to stop at the last cell *) FOR i := 0 TO (MAXLINE - 1) DO tabstops[i] := ((i MOD TABSPACE) = 0); END; END SetTabs; END Tabs.
NOTE: This module is used by “Detab.Mod” and “Entab.Mod” and provides for common type definitions and code reuse. We exported
SetTabs. Everything else is private to this module.
This post briefly highlighted ports of the programs presented in Chapter 1 of “Software Tools in Pascal”. Below are links to my source files of the my implementations inspired by the K & P book. Included in each Oberon module source after the module definition is transcribed text of the program documentation as well as transcribed text of the K & P Pascal implementations. Each file should compiler without modification using the OBNC compiler. By default the OBNC compiler will use the module’s name as the name of the executable version. I I have used mixed case module names, if you prefer lower case executable names use the “-o” option with the OBNC compiler.
obnc -o copy CopyProg.Mod obnc -o charcount CharCount.Mod obnc -o linecount LineCount.Mod obnc -o wordcount WordCount.Mod obnc -o detab Detab.Mod