Software Tools, Filters

Overview

This post is the second in a series revisiting the programs described in the 1981 book by Brian W. Kernighan and P. J. Plauger’s called Software Tools in Pascal. The book is available from the Open Library and physical copies are still (2020) commonly available from used book sellers. The book was an late 20th century text on creating portable command line programs using ISO standard Pascal of the era.

In this chapter K & P focuses on developing the idea of filters. Filters are programs which typically process standard input, do some sort of transformation or calculation and write to standard output. They are intended to work either standalone or in a pipeline to solve more complex problems. I like to think of filters as software LEGO. Filter programs can be “snapped” together creating simple shapes data shapes or combined to for complex compositions.

The programs from this chapter include:

Implementing in Oberon-07

With the exception of echo (used to introduce command line parameter processing) each program increases in complexity. The last program translitis the most complex in this chapter. It introducing what we a “domain specific language” or “DSL”. A DSL is a notation allowing us to describe something implicitly rather than explicitly. All the programs except translit follow closely the original Pascal translated to Oberon-07. translit book implementation is very much a result of the constraints of Pascal of the early 1980s as well as the minimalist assumption that could be made about the host operating system. I will focus on revising that program in particular bring the code up to current practice as well as offering insights I’ve learned.

The program translit introduces what is called a “Domain Specific Language”.Domain specific languages or DSL for short are often simple notations to describe how to solve vary narrow problems. If you’ve used any of the popular spreadsheet programs where you’ve entered a formula to compute something you’ve used a domain specific language. If you’ve ever search for text in a document using a regular expression you’ve used a domain specific language. By focusing a notation on a small problem space you can often come up with simple ways of expressing or composing programmatic solutions to get a job done.

In translit the notation let’s us describe what we want to translate. At the simplest level the translit program takes a character and replaces it with another character. What make increases translit utility is that it can take a set of characters and replace it with another. If you want to change all lower cases letters and replace them with uppercase letters. This “from set” and “to set” are easy to describe as two ranges, “a” to “z” and “A” to “Z”. Our domain notation allows us to express this as “a-z” and “A-Z”. K & P include several of features in there notation including characters to exclude from a translation as well as an “escape notation” for describing characters like new lines, tabs, or the characters that describe a range and exclusion (i.e. dash and caret).

2.1 Putting Tabs Back

Page 31

Implementing entab in Oberon-07 is straight forward. Like my Detab implementation I am using a second modules called Tabs. This removes the need for the #include macros used in the K & P version. I have used the same loop structure as K & P this time. There is a difference in my WHILE loop. I separate the character read from the WHILE conditional test. Combining the two is common in “C” and is consistent with the programming style other books by Kernighan. In Oberon-07 doesn’t make sense at all. Oberon’s In.Char() is not a function returning as in the Pascal primitives implemented for the K & P book or indeed like in the “C” language. In Oberon’s “In” module the status of a read operation is exposed by In.Done. I’ve chosen to put the next call to In.Char() at the bottom of my WHILE loop because it is clear that it is the last think done before ether iterating again or exiting the loop. Other than that the Oberon version looks much like K & P’s Pascal.

Program Documentation

Page 32

  1. PROGRAM
  2. entab convert runs of blanks into tabs
  3. USAGE
  4. entab
  5. FUNCTION
  6. entab copies its input to its output, replacing strings of
  7. blanks by tabs so the output is visually the same as the
  8. input, but contains fewer characters. Tab stops are assumed
  9. to be set every four columns (i.e. 1, 5, 9, ...), so that
  10. each sequence of one to four blanks ending on a tab stop
  11. is replaced by a tab character
  12. EXAMPLE
  13. Using -> as visible tab:
  14. entab
  15. col 1 2 34 rest
  16. ->col->1->2->34->rest
  17. BUGS
  18. entab is naive about backspaces, virtical motions, and
  19. non-printing characters. entab will convert a single blank
  20. to a tab if it occurs at a tab stop. The entab is not an
  21. exact inverse of detab.

Source code for Entab.Mod

  1. MODULE Entab;
  2. IMPORT In, Out, Tabs;
  3. CONST
  4. NEWLINE = 10;
  5. TAB = 9;
  6. BLANK = 32;
  7. PROCEDURE Entab();
  8. VAR
  9. c : CHAR;
  10. col, newcol : INTEGER;
  11. tabstops : Tabs.TabType;
  12. BEGIN
  13. Tabs.SetTabs(tabstops);
  14. col := 1;
  15. REPEAT
  16. newcol := col;
  17. In.Char(c);
  18. IF In.Done THEN (* NOTE: We check that the read was successful! *)
  19. WHILE (ORD(c) = BLANK) DO
  20. newcol := newcol + 1;
  21. IF (Tabs.TabPos(newcol, tabstops)) THEN
  22. Out.Char(CHR(TAB));
  23. col := newcol;
  24. END;
  25. (* NOTE: Get the next char, check the loop condition
  26. and either iterate or exit the loop *)
  27. In.Char(c);
  28. END;
  29. WHILE (col < newcol) DO
  30. Out.Char(CHR(BLANK)); (* output left over blanks *)
  31. col := col + 1;
  32. END;
  33. (* NOTE: Since we may have gotten a new char in the first WHILE
  34. we need to check again if the read was successful *)
  35. IF In.Done THEN
  36. Out.Char(c);
  37. IF (ORD(c) = NEWLINE) THEN
  38. col := 1;
  39. ELSE
  40. col := col + 1;
  41. END;
  42. END;
  43. END;
  44. UNTIL In.Done # TRUE;
  45. END Entab;
  46. BEGIN
  47. Entab();
  48. END Entab.

2.2 Overstrikes

Page 34

Overstrike isn’t a tool that is useful today but I’ve included it simply to be follow along the flow of the K & P book. It very much reflects an error where teletype like devices where still common and printers printed much like typewriters did. On a 20th century manual type writer you could underline a word or letter by backing up the carriage then typing the underscore character. Striking out a word was accomplished by a similar technique. The mid to late 20th century computers device retained this mechanism though by 1980’s it was beginning to disappear along with manual typewriters. This program relies on the the nature of ASCII character set and reflects some of the non-print character’s functionality. I found it did not work on today’s terminal emulators reliably. Your mileage may very nor do I have a vintage printer to test it on.

Our module follows K & P design almost verbatim. The differences are those suggested by differences between Pascal and Oberon-07. Like in previous examples we don’t need to use an ENDFILE constant as we can simply check the value of In.Done to determine if the last read was successful. This simplifies some of the IF/ELSE logic and the termination of the REPEAT/UNTIL loop. It makes the WHILE/DO loop a little more verbose.

One thing I would like to point out in the original Pascal of the book is a problem often referred to as the “dangling else” problem. While this is usually discussed in the context of compiler implementation I feel like it is a bigger issue for the person reading the source code. It is particularly problematic when you have complex “IF/ELSE” sequences that are nested. This is not limited to the 1980’s era Pascal. You see it in other languages like C. It is a convenience for the person typing the source code but a problem for those who maintain it. We see this ambiguity in the Pascal procedure overstrike inside the repeat loop on page 35. It is made worse by the fact that K & P have taken advantage of omitting the semi-colons where optional. If you type in this procedure and remove the indication if quickly becomes ambiguous about where on “IF/ELSE” begins and the next ends. In Oberon-07 it is clear when you have a dangling “IF” statement. This vintage Pascal, not so much.

K & P do mention the dangling “ELSE” problem later in the text. Their recommend practice was include the explicit final “ELSE” at a comment to avoid confusion. But you can see how easy an omitting the comment is in the overstrike program.

Limitations

This is documented “BUG” section describes the limitations well, “overstrike is naive about vertical motions and non- printing characters. It produces one over struck line for each sequence of backspaces”. But in addition to that most printing devices these days either have their own drivers or expect to work with a standard like Postscript. This limited the usefulness of this program today though controlling character movement in a “vt100” emulation using old fashion ASCII control codes is still interesting if only for historical reasons.

Program Documentation

Page 36

  1. PROGRAM
  2. overstrike replace overstrikes by multiple-lines
  3. USAGE
  4. overstrike
  5. FUNCTION
  6. overstrike copies in input to its output, replacing lines
  7. containing backspaces by multiple lines that overstrike
  8. to print the same as input, but containing no backspaces.
  9. It is assumed that the output is to be printed on a device
  10. that takes the first character of each line as a carriage
  11. control; a blank carriage control causes normal space before
  12. print, while a plus sign '+' suppresses space before print
  13. and hence causes the remainder of the line to overstrike
  14. the previous line.
  15. EXAMPLE
  16. Using <- as a visible backspace:
  17. overstrike
  18. abc<-<-<-___
  19. abc
  20. +___
  21. BUGS
  22. overstrike is naive about vertical motions and non-printing
  23. characters. It produces one over struck line for each sequence
  24. of backspaces.

Source code for Overstrike.Mod

  1. MODULE Overstrike;
  2. IMPORT In, Out;
  3. CONST
  4. NEWLINE = 10;
  5. BLANK = 32;
  6. PLUS = 43;
  7. BACKSPACE = 8;
  8. PROCEDURE Max(x, y : INTEGER) : INTEGER;
  9. VAR max : INTEGER;
  10. BEGIN
  11. IF (x > y) THEN
  12. max := x
  13. ELSE
  14. max := y
  15. END;
  16. RETURN max
  17. END Max;
  18. PROCEDURE Overstrike;
  19. CONST
  20. SKIP = BLANK;
  21. NOSKIP = PLUS;
  22. VAR
  23. c : CHAR;
  24. col, newcol, i : INTEGER;
  25. BEGIN
  26. col := 1;
  27. REPEAT
  28. newcol := col;
  29. In.Char(c);
  30. (* NOTE We check In.Done on each loop evalution *)
  31. WHILE (In.Done = TRUE) & (ORD(c) = BACKSPACE) DO (* eat the backspaces *)
  32. newcol := Max(newcol, 1);
  33. In.Char(c);
  34. END;
  35. (* NOTE: We check In.Done again, since we may have
  36. additional reads when eating the backspaces. If
  37. the previous while loop has taken us to the end of file.
  38. this will be also mean In.Done = FALSE. *)
  39. IF In.Done THEN
  40. IF (newcol < col) THEN
  41. Out.Char(CHR(NEWLINE)); (* start overstrike line *)
  42. Out.Char(CHR(NOSKIP));
  43. FOR i := 0 TO newcol DO
  44. Out.Char(CHR(BLANK));
  45. END;
  46. col := newcol;
  47. ELSIF (col = 1) THEN (* NOTE: In.Done already check for end of file *)
  48. Out.Char(CHR(SKIP)); (* normal line *)
  49. END;
  50. (* NOTE: In.Done already was checked so we are in mid line *)
  51. Out.Char(c); (* normal character *)
  52. IF (ORD(c) = NEWLINE) THEN
  53. col := 1
  54. ELSE
  55. col := col + 1
  56. END;
  57. END;
  58. UNTIL In.Done # TRUE;
  59. END Overstrike;
  60. BEGIN
  61. Overstrike();
  62. END Overstrike.

2.3 Text Compression

Page 37

In 20th century computing everything is expensive, memory, persistent storage computational ability in CPU. If you were primarily working with text you still worried about running out of space in your storage medium. You see it in the units of measurement used in that era such as bytes, kilobytes, hertz and kilohertz. To day we talk about megabytes, gigabytes, terabytes and petabytes. Plain text files are a tiny size compared to must digital objects today but in the late 20th century their size in storage was still a concern. One way to solve this problem was to encode your plain text to use less storage space. Early attempts at file compression took advantage of repetition to save space. Many text documents have repeated characters whether spaces or punctuation or other formatting. This is what inspired the K & P implementation of compress and expand. Today we’d use other approaches to save space whether we were storing text or a digital photograph.

Program Documentation

Page

  1. PROGRAM
  2. compress compress input by encoding repeated characters
  3. USAGE
  4. compress
  5. FUNCTION
  6. compress copies its input to its output, replacing strings
  7. of four or more identical characters by a code sequence so
  8. that the output generally contains fewer characters than the
  9. input. A run of x's is encoded as -nx, where the count n is
  10. a character: 'A' calls for a repetition of one x, 'B' a
  11. repetition of two x's, and so on. Runs longer than 26 are
  12. broken into several shorter ones. Runs of -'s of any length
  13. are encoded.
  14. EXAMPLE
  15. compress
  16. Item Name Value
  17. Item-D Name-I Value
  18. 1 car -$7,000.00
  19. 1-G car-J -A-$7,000.00
  20. <ENDFILE>
  21. BUGS
  22. The implementation assumes 26 legal characters beginning with A.

Source code for Compress.Mod

  1. MODULE Compress;
  2. IMPORT In, Out;
  3. CONST
  4. TILDE = "~";
  5. WARNING = TILDE; (* ~ *)
  6. (* Min -- compute minimum of two integers *)
  7. PROCEDURE Min(x, y : INTEGER) : INTEGER;
  8. VAR min : INTEGER;
  9. BEGIN
  10. IF (x < y) THEN
  11. min := x
  12. ELSE
  13. min := y
  14. END;
  15. RETURN min
  16. END Min;
  17. (* PutRep -- put out representation of run of n 'c's *)
  18. PROCEDURE PutRep (n : INTEGER; c : CHAR);
  19. CONST
  20. MAXREP = 26; (* assuming 'A' .. 'Z' *)
  21. THRESH = 4;
  22. VAR i : INTEGER;
  23. BEGIN
  24. WHILE (n >= THRESH) OR ((c = WARNING) & (n > 0)) DO
  25. Out.Char(WARNING);
  26. Out.Char(CHR((Min(n, MAXREP) - 1) + ORD("A")));
  27. Out.Char(c);
  28. n := n - MAXREP;
  29. END;
  30. FOR i := n TO 1 BY (-1) DO
  31. Out.Char(c);
  32. END;
  33. END PutRep;
  34. (* Compress -- compress standard input *)
  35. PROCEDURE Compress();
  36. VAR
  37. c, lastc : CHAR;
  38. n : INTEGER;
  39. BEGIN
  40. n := 1;
  41. In.Char(lastc);
  42. WHILE (In.Done = TRUE) DO
  43. In.Char(c);
  44. IF (In.Done = FALSE) THEN
  45. IF (n > 1) OR (lastc = WARNING) THEN
  46. PutRep(n, lastc)
  47. ELSE
  48. Out.Char(lastc);
  49. END;
  50. ELSIF (c = lastc) THEN
  51. n := n + 1
  52. ELSIF (n > 1) OR (lastc = WARNING) THEN
  53. PutRep(n, lastc);
  54. n := 1
  55. ELSE
  56. Out.Char(lastc);
  57. END;
  58. lastc := c;
  59. END;
  60. END Compress;
  61. BEGIN
  62. Compress();
  63. END Compress.

2.4 Text Expansion

Page 41

Our procedures map closely to the original Pascal with a few significant differences. As previously I’ve chosen a REPEAT ... UNTIL loop structure because we are always attempting to read at least once. The IF THEN ELSIF ELSE logic is a little different. In the K & P version they combine retrieving a character and testing its value. This is a style common in languages like C. As previous mentioned I split the read of the character from the test. Aside from the choices imposed by the “In” module I also feel that retrieving the value, then testing is a simpler statement to read. There is little need to worry about a side effect when you separate the action from the test. It does change the structure of the inner and outer IF statements.

Program Documentation

Page 43

  1. PROGRAM
  2. expand expand compressed input
  3. USAGE
  4. expand
  5. FUNCTION
  6. expand copies its input, which has presumably been encoded by
  7. compress, to its output, replacing code sequences -nc by the
  8. repeated characters they stand for so that the text output
  9. exactly matches that which was originally encoded. The
  10. occurrence of the warning character - in the input means that
  11. which was originally encoded. The occurrence of the warning
  12. character - in the input means that the next character is a
  13. repetition count; 'A' calls for one instance of the following
  14. character, 'B' calls for two, and so on up to 'Z'.
  15. EXAMPLE
  16. expand
  17. Item~D Name~I Value
  18. Item Name Value
  19. 1~G car~J ~A~$7,000.00
  20. 1 car -$7,000.00
  21. <ENDFILE>

Source code for Expand.Mod

  1. MODULE Expand;
  2. IMPORT In, Out;
  3. CONST
  4. TILDE = "~";
  5. WARNING = TILDE; (* ~ *)
  6. LetterA = ORD("A");
  7. LetterZ = ORD("Z");
  8. (* IsUpper -- true if c is upper case letter *)
  9. PROCEDURE IsUpper (c : CHAR) : BOOLEAN;
  10. VAR res : BOOLEAN;
  11. BEGIN
  12. IF (ORD(c) >= LetterA) & (ORD(c) <= LetterZ) THEN
  13. res := TRUE;
  14. ELSE
  15. res := FALSE;
  16. END
  17. RETURN res
  18. END IsUpper;
  19. (* Expand -- uncompress standard input *)
  20. PROCEDURE Expand();
  21. VAR
  22. c : CHAR;
  23. n, i : INTEGER;
  24. BEGIN
  25. REPEAT
  26. In.Char(c);
  27. IF (c # WARNING) THEN
  28. Out.Char(c);
  29. ELSE
  30. In.Char(c);
  31. IF IsUpper(c) THEN
  32. n := (ORD(c) - ORD("A")) + 1;
  33. In.Char(c);
  34. IF (In.Done) THEN
  35. FOR i := n TO 1 BY -1 DO
  36. Out.Char(c);
  37. END;
  38. ELSE
  39. Out.Char(WARNING);
  40. Out.Char(CHR((n - 1) + ORD("A")));
  41. END;
  42. ELSE
  43. Out.Char(WARNING);
  44. IF In.Done THEN
  45. Out.Char(c);
  46. END;
  47. END;
  48. END;
  49. UNTIL In.Done # TRUE;
  50. END Expand;
  51. BEGIN
  52. Expand();
  53. END Expand.

2.5 Command Arguments

Page 44

Program Documentation

Page 45

  1. PROGRAM
  2. echo echo arguments to standard output
  3. USAGE
  4. echo [ argument ... ]
  5. FUNCTION
  6. echo copies its command line arguments to its output as a line
  7. of text with one space
  8. between each argument. IF there are no arguments, no output is
  9. produced.
  10. EXAMPLE
  11. To see if your system is alive:
  12. echo hello world!
  13. hello world!

Source code for Echo.Mod

  1. MODULE Echo;
  2. IMPORT Out, Args := extArgs;
  3. CONST
  4. MAXSTR = 1024; (* or whatever *)
  5. BLANK = " ";
  6. (* Echo -- echo command line arguments to output *)
  7. PROCEDURE Echo();
  8. VAR
  9. i, res : INTEGER;
  10. argstr : ARRAY MAXSTR OF CHAR;
  11. BEGIN
  12. i := 0;
  13. FOR i := 0 TO (Args.count - 1) DO
  14. Args.Get(i, argstr, res);
  15. IF (i > 0) THEN
  16. Out.Char(BLANK);
  17. END;
  18. Out.String(argstr);
  19. END;
  20. IF Args.count > 0 THEN
  21. Out.Ln();
  22. END;
  23. END Echo;
  24. BEGIN
  25. Echo();
  26. END Echo.

2.6 Character Transliteration

Page 47

translit is the most complicated program so far in the book. Most of the translation process from Pascal to Oberon-07 has remained similar to the previous examples.

My implementation of translit diverges from the K & P implementation at several points. Much of this is a result of Oberon evolution beyond Pascal. First Oberon counts arrays from zero instead of one so I have opted to use -1 as a value to indicate the index of a character in a string was not found. Equally I have simplified the logic in xindex() to make it clear how I am handling the index lookup described in index() of the Pascal implementation. K & P implemented makeset() and dodash(). dodash() particularly looked troublesome. If you came across the function name dodash() without seeing the code comments “doing a dash” seems a little obscure. I have chosen to name that process “Expand Sequence” for clarity. I have simplified the task of making sets of characters for translation into three cases by splitting the test conditions from the actions. First check to see if we have an escape sequence and if so handle it. Second check to see if we have an expansion sequence and if so handle it else append the char found to the end of the set being assembled. This resulted in dodash() being replaced by IsSequence() and ExpandSequence(). Likewise esc() was replaced with IsEscape() and ExpandEscape(). I renamed addchar() to AppendChar() in the “Chars” module as that seemed more specific and clearer.

I choose to advance the value used when expanding a set description in the loop inside of my MakeSet(). I minimized the side effects of the expand functions to the target destination. It is clearer while in the MakeSet() loop to see the relationship of the test and transformation and how to advance through the string. This also allowed me to use fewer parameters to procedures which tends to make things more readable as well as simpler.

I have included an additional procedure not included in the K & P Pascal of this program. Error() displays a string and halts. K & P provide this as part of their Pascal environment. I have chosen to embed it here because it is short and trivial.

Translit suggested the “Chars” module because of the repetition in previous programs. In K & P the approach to code reuse is to create a separate source file and to included via a pre-processor. In Oberon we have the module concept.

My Chars module provides a useful set of test procedures like IsAlpha(c), IsUpper(c), IsLower() in addition to the CharInRange() and IsAlphaNum(). It also includes AppendChar() which can be used to append a single character value to an end of an array of char.

Program Documentation

Page 56

  1. PROGRAM
  2. translit transliterate characters
  3. USAGE
  4. translit [^]src [dest]
  5. FUNCTION
  6. translit maps its input, on a character by character basis, and
  7. writes the translated version to its output.In the simplest case,
  8. each character is the argument src is translated to the
  9. corresponding character is the argument dest; all other characters
  10. are copies as is. Both the src and dest may contain substrings of
  11. the form c1 - c2 as shorthand for all the characters in the range
  12. c1..c2 and c2 must both be digits, or both be letter of the same
  13. case. If dest is absent, all characters represented by src are
  14. deleted. Otherwise, if dest is shorter than src, all characters
  15. is src that would map to or beyond the last character in
  16. dest are mapped to the last character in dest; moreover adjacent
  17. instances of such characters in the input are represented in the
  18. output by a single instance of the last character in dest. The
  19. translit 0-9 9
  20. converts each string of digits to the single digit 9.
  21. Finally, if src is precedded by ^, then all but the characters
  22. represented by src are taken as the source string; i.e., they are
  23. all deleted if dest is absent, or they are all collapsed if the
  24. last character in dest is present.
  25. EXAMPLE
  26. To convert upper case to lower:
  27. translit A-Z a-z
  28. To discard punctualtion and isolate words by spaces on each line:
  29. translit ^a-zA-Z@n " "
  30. This is a simple-minded test, i.e., a test of translit.
  31. This is a simple minded test i e a test of translit

Pascal Source

translit.p, Page 48

makeset.p, Page 52

addstr.p, Page 53

dodash.p, Page 53

isalphanum.p, Page 54

esc.p, Page 55

length.p, Page 46

The impacts of having a richer language than 1980s ISO Pascal and evolution in practice suggest a revision in the K & P approach. I have attempted to keep the spirit of their example program while reflecting changes in practice that have occurred in the last four decades.

Source code for Translit.Mod

  1. MODULE Translit;
  2. IMPORT In, Out, Args := extArgs, Strings, Chars;
  3. CONST
  4. MAXSTR = 1024; (* or whatever *)
  5. DASH = Chars.DASH;
  6. ENDSTR = Chars.ENDSTR;
  7. ESCAPE = "@";
  8. TAB* = Chars.TAB;
  9. (* Error -- write an error string to standard out and
  10. halt program *)
  11. PROCEDURE Error(s : ARRAY OF CHAR);
  12. BEGIN
  13. Out.String(s);Out.Ln();
  14. ASSERT(FALSE);
  15. END Error;
  16. (* IsEscape - this procedure looks to see if we have an
  17. escape sequence at position in variable i *)
  18. PROCEDURE IsEscape*(src : ARRAY OF CHAR; i : INTEGER) : BOOLEAN;
  19. VAR res : BOOLEAN; last : INTEGER;
  20. BEGIN
  21. res := FALSE;
  22. last := Strings.Length(src) - 1;
  23. IF (i < last) & (src[i] = ESCAPE) THEN
  24. res := TRUE;
  25. END;
  26. RETURN res
  27. END IsEscape;
  28. (* ExpandEscape - this procedure takes a source array, a
  29. position and appends the escaped value to the destintation
  30. array. It returns TRUE on successuss, FALSE otherwise. *)
  31. PROCEDURE ExpandEscape*(src : ARRAY OF CHAR; i : INTEGER; VAR dest : ARRAY OF CHAR) : BOOLEAN;
  32. VAR res : BOOLEAN; j : INTEGER;
  33. BEGIN
  34. res := FALSE;
  35. j := i + 1;
  36. IF j < Strings.Length(src) THEN
  37. res := Chars.AppendChar(src[j], dest)
  38. END
  39. RETURN res
  40. END ExpandEscape;
  41. (* IsSequence - this procedure looks at position i and checks
  42. to see if we have a sequence to expand *)
  43. PROCEDURE IsSequence*(src : ARRAY OF CHAR; i : INTEGER) : BOOLEAN;
  44. VAR res : BOOLEAN;
  45. BEGIN
  46. res := Strings.Length(src) - i >= 3;
  47. (* Do we have a sequence of alphumeric character
  48. DASH alpanumeric character? *)
  49. IF res & Chars.IsAlphaNum(src[i]) & (src[i+1] = DASH) &
  50. Chars.IsAlphaNum(src[i+2]) THEN
  51. res := TRUE;
  52. END;
  53. RETURN res
  54. END IsSequence;
  55. (* ExpandSequence - this procedure expands a sequence x
  56. starting at i and append the sequence into the destination
  57. string. It returns TRUE on success, FALSE otherwise *)
  58. PROCEDURE ExpandSequence*(src : ARRAY OF CHAR; i : INTEGER; VAR dest : ARRAY OF CHAR) : BOOLEAN;
  59. VAR res : BOOLEAN; cur, start, end : INTEGER;
  60. BEGIN
  61. (* Make sure sequence is assending *)
  62. res := TRUE;
  63. start := ORD(src[i]);
  64. end := ORD(src[i+2]);
  65. IF start < end THEN
  66. FOR cur := start TO end DO
  67. IF res THEN
  68. res := Chars.AppendChar(CHR(cur), dest);
  69. END;
  70. END;
  71. ELSE
  72. res := FALSE;
  73. END;
  74. RETURN res
  75. END ExpandSequence;
  76. (* makeset -- make sets based on src expanded into destination *)
  77. PROCEDURE MakeSet* (src : ARRAY OF CHAR; start : INTEGER; VAR dest : ARRAY OF CHAR) : BOOLEAN;
  78. VAR i : INTEGER; makeset : BOOLEAN;
  79. BEGIN
  80. i := start;
  81. makeset := TRUE;
  82. WHILE (makeset = TRUE) & (i < Strings.Length(src)) DO
  83. IF IsEscape(src, i) THEN
  84. makeset := ExpandEscape(src, i, dest);
  85. i := i + 2;
  86. ELSIF IsSequence(src, i) THEN
  87. makeset := ExpandSequence(src, i, dest);
  88. i := i + 3;
  89. ELSE
  90. makeset := Chars.AppendChar(src[i], dest);
  91. i := i + 1;
  92. END;
  93. END;
  94. RETURN makeset
  95. END MakeSet;
  96. (* Index -- find position of character c in string s *)
  97. PROCEDURE Index* (VAR s : ARRAY OF CHAR; c : CHAR) : INTEGER;
  98. VAR
  99. i, index : INTEGER;
  100. BEGIN
  101. i := 0;
  102. WHILE (s[i] # c) & (s[i] # ENDSTR) DO
  103. i := i + 1;
  104. END;
  105. IF (s[i] = ENDSTR) THEN
  106. index := -1; (* Value not found *)
  107. ELSE
  108. index := i; (* Value found *)
  109. END;
  110. RETURN index
  111. END Index;
  112. (* XIndex -- conditionally invert value found in index *)
  113. PROCEDURE XIndex* (VAR inset : ARRAY OF CHAR; c : CHAR;
  114. allbut : BOOLEAN; lastto : INTEGER) : INTEGER;
  115. VAR
  116. xindex : INTEGER;
  117. BEGIN
  118. (* Uninverted index value *)
  119. xindex := Index(inset, c);
  120. (* Handle inverted index value *)
  121. IF (allbut = TRUE) THEN
  122. IF (xindex = -1) THEN
  123. (* Translate as an inverted the response *)
  124. xindex := 0; (* lastto - 1; *)
  125. ELSE
  126. (* Indicate no translate *)
  127. xindex := -1;
  128. END;
  129. END;
  130. RETURN xindex
  131. END XIndex;
  132. (* Translit -- map characters *)
  133. PROCEDURE Translit* ();
  134. CONST
  135. NEGATE = Chars.CARET; (* ^ *)
  136. VAR
  137. arg, fromset, toset : ARRAY MAXSTR OF CHAR;
  138. c : CHAR;
  139. i, lastto : INTEGER;
  140. allbut, squash : BOOLEAN;
  141. res : INTEGER;
  142. BEGIN
  143. i := 0;
  144. lastto := MAXSTR - 1;
  145. (* NOTE: We are doing low level of string manimulation. Oberon
  146. strings are terminated by 0X, but Oberon compilers do not
  147. automatically initialize memory to a specific state. In the
  148. OBNC implementation of Oberon-07 assign "" to an assignment
  149. like `s := "";` only writes a 0X to position zero of the
  150. array of char. Since we are doing position based character
  151. assignment and can easily overwrite a single 0X. To be safe
  152. we want to assign all the positions in the array to 0X so the
  153. memory is in a known state. *)
  154. Chars.Clear(arg);
  155. Chars.Clear(fromset);
  156. Chars.Clear(toset);
  157. IF (Args.count = 0) THEN
  158. Error("usage: translit from to");
  159. END;
  160. (* NOTE: I have not used an IF ELSE here because we have
  161. additional conditions that lead to complex logic. The
  162. procedure Error() calls ASSERT(FALSE); which in Oberon-07
  163. halts the program from further execution *)
  164. IF (Args.count > 0) THEN
  165. Args.Get(0, arg, res);
  166. allbut := (arg[0] = NEGATE);
  167. IF (allbut) THEN
  168. i := 1;
  169. ELSE
  170. i := 0;
  171. END;
  172. IF MakeSet(arg, i, fromset) = FALSE THEN
  173. Error("from set too long");
  174. END;
  175. END;
  176. (* NOTE: We have initialized our array of char earlier so we only
  177. need to know if we need to update toset to a new value *)
  178. Chars.Clear(arg);
  179. IF (Args.count = 2) THEN
  180. Args.Get(1, arg, res);
  181. IF MakeSet(arg, 0, toset) = FALSE THEN
  182. Error("to set too long");
  183. END;
  184. END;
  185. lastto := Strings.Length(toset);
  186. squash := (Strings.Length(fromset) > lastto) OR (allbut);
  187. REPEAT
  188. In.Char(c);
  189. IF In.Done THEN
  190. i := XIndex(fromset, c, allbut, lastto);
  191. IF (squash) & (i>=lastto) & (lastto>0) THEN (* translate *)
  192. Out.Char(toset[lastto]);
  193. ELSIF (i >= 0) & (lastto > 0) THEN (* translate *)
  194. Out.Char(toset[i]);
  195. ELSIF i = -1 THEN (* copy *)
  196. (* Do not translate the character *)
  197. Out.Char(c);
  198. (* NOTE: No else clause needed as not writing out
  199. a cut value is deleting *)
  200. END;
  201. END;
  202. UNTIL (In.Done # TRUE);
  203. END Translit;
  204. BEGIN
  205. Translit();
  206. END Translit.

In closing

In this chapter we interact with some of the most common features of command line programs available on POSIX systems. K & P have given us a solid foundation on which to build more complex and ambitious programs. In the following chapters the read will find an accelerated level of complexity bit also programs that are significantly more powerful.

Oberon language evolved with the Oberon System which had a very different rich text user interface when compared with POSIX. Fortunately Karl’s OBNC comes with a set of modules that make Oberon-07 friendly for building programs for POSIX operating systems. I’ve taken advantage of his extArgs module much in the way that K & P relied on a set of primitive tools to provide a common programming environment. K & P’s version of implementation of primitives listed in their appendix. Karl’s OBNC extensions modules are described on website. Other Oberon compilers provide similar modules though implementation specific. A good example is Spivey’s Oxford Oberon-2 Compiler. K & P chose to target multiple Pascal implementations, I have the luxury of targeting one Oberon-07 implementation. That said if you added a pre-processor like K & P did you could also take their approach to allow you Oberon-07 code to work across many Oberon compiler implementations. I leave that as an exercise for the reader.

I’ve chosen to revise some of the code presented in K & P’s book. I believe the K & P implementations still contains wisdom in their implementations. They had different constraints and thus made different choices in implementation. Understand the trade offs and challenges to writing portable code capable of running in very divergent set of early 1980’s operating systems remains useful today.

Compiling with OBNC:

  1. obnc -o entab Entab.Mod
  2. obnc -o overstrike Overstrike.Mod
  3. obnc -o compress Compress.Mod
  4. obnc -o expand Expand.Mod
  5. obnc -o echo Echo.Mod
  6. obnc -o translit Translit.Mod

Previous