Dynamic types

By R. S. Doiel, 2020-05-25

This is the eighth post in the Mostly Oberon series. Mostly Oberon documents my exploration of the Oberon Language, Oberon System and the various rabbit holes I will inevitably fall into.

Dynamic Types in Oberon

Oberon-07 is a succinct systems language. It provides a minimal but useful set of basic static types. Relying on them addresses many common programming needs. The Oberon compiler ensures static types are efficiently allocated in memory. One of the strengths of Oberon is this ability to extend the type system. This means when the basic types fall short you can take advantage of Oberon's type extension features. This includes creating dynamically allocated data structures. In Oberon-07 combining Oberon's POINTER TO and RECORD types allows us to create complex and dynamic data structures.

An example, dynamic strings

Strings in Oberon-07 are typical declared as an ARRAY OF CHAR with a specific length. If the length of a string is not known a head of time this presents a challenge. One approach is to declare a long array but that would allocate allot of memory which may not get used. Another approach is to create a dynamic data structure. An example is using a linked list of shorter ARRAY OF CHAR. The small fixed strings can combine to represent much larger strings. When one fills up we add another.

Pointers and records, an Oberon idiom

Our data model is a pointer to a record where the record contains an ARRAY OF CHAR and a pointer to the next record. A common idiom in Oberon for dynamic types is to declare a POINTER TO type and declare a RECORD type which contains the POINTER TO type as an attribute. If you see this idiom you are looking at some sort of dynamic data structure. The pointer type is usually named for the dynamic type you want work with and the record type is declared using the same name with a "Desc" suffix. In our case DynamicString will be the name of our POINTER TO type and our record type will be called DynamicStringDesc following the convention. In our record structure we include a "value" to holding a short fixed length ARRAY OF CHAR and a "next" to holding the pointer to our next record.

In our record the value is declared as a static type. We need to know how long our "short" string should be? I.e. What length is our ARRAY OF CHAR? It's a question of tuning. If it is too short we spend more time allocating new records, too long and we are wasting memory in each record. A way to make tuning a little simpler is to use a constant value to describe our array length. Then if we decide our array is too big or too small we can adjust the constant knowing that our record structure and the procedures that use that the length information will continue to work correctly.

Let's take a look at actual code (NOTE: vSize is our constant value).


          CONST
            vSize = 128; 
          
          TYPE
            DynamicString* = POINTER TO DynamicStringDesc;
            DynamicStringDesc* = RECORD 
              value : ARRAY vSize OF CHAR; 
              next : DymamicString; 
            END;

NOTE: Both DynamicString and DynamicStringDesc are defined using an *. These are public and will be available to other modules. Inside our record DynamicStringDesc we have two private to our module attributes, .value and .next. They are private so that we can change our implementation in the future without requiring changes in modules that use our dynamic strings. Likewise our constant vSize is private as that is an internal implementation detail. This practice is called information hiding.

NOTE: The asterisk in Oberon decorates procedures, types, variables and constants that are "public" to other modules.

NOTE: Variables are always exported read only.

NOTE: With information hiding some details of implementation allow us to keep a clean division between implementation inside the module and how that implementation might be used. With out information hiding we often have "leaky" abstractions that become brittle and hard to maintain and rely on.

Working with DynamicString

Our type definitions describe to the compiler how to layout our data in memory. The type system in Oberon-07 also ensures that access to that memory is restricted to assignments, operations and procedures compatible with that type. To be useful from other modules we need a few procedures to help work with this new data type. What follows is a minimal set needed to be useful.

`New*(VAR str : DynamicString)`

New will initialize a DynamicString object setting .value to an empty string.


        PROCEDURE New*(VAR str : DynamicString);
        BEGIN NEW(str);
          str.value := ""; 
          str.next := NIL;
        END New;

`Set*(VAR str : DynamicString; source : ARRAY OF CHAR)`

Set copies an ARRAY OF CHAR into an existing DynamicString. This requires that we add and link additional records if the source is longer than our current dynamic string. Set is a bridge procedure between an existing datatype, ARRAY OF CHAR and our new data type, DynamicString.


        PROCEDURE Set*(VAR str : DynamicString; source : ARRAY OF CHAR); 
          VAR cur, next : DynamicString; tmp : ARRAY vSize OF CHAR; 
              i, l : INTEGER;
        BEGIN cur := str; cur.value := "";
          l := Strings.Length(source);
          i := 0; 
          WHILE i < l DO
            Strings.Extract(source, i, i + vSize, tmp);
            Strings.Append(tmp, cur.value);
            i := i + Strings.Length(tmp);
            IF (i < l) THEN
              IF cur.next = NIL THEN
                New(next); cur.next := next;
              END;
              cur := cur.next;
            END; 
          END;
        END Set;

`ToCharArray*(str : DynamicString; VAR dest : ARRAY OF CHAR; VAR ok : BOOLEAN)`

ToCharArray copies the contents of our dynamic string into an array of char setting ok to TRUE on success or FALSE if truncated. Like Set* it is a bridge procedure to let us move data output our new dynamic string type.


        PROCEDURE ToCharArray*(str : DynamicString; 
                               VAR dest : ARRAY OF CHAR; 
                               VAR ok : BOOLEAN);
          VAR cur : DynamicString; i : INTEGER;
        BEGIN 
          ok := FALSE;
          cur := str; i := 0;
          WHILE cur # NIL DO
            i := i + Strings.Length(cur.value);
            Strings.Append(cur.value, dest);
            cur := cur.next;
          END;
          ok := (i = Strings.Length(dest));
        END ToCharArray;

Two additional procedures will likely be needed-- Append and AppendCharArray. This first one is trivial, if we want to add one dynamic string onto another all we need to do is link the last record of the first and point it to a copy of the second string we're appending.

`Append*(extra : DynamicString; VAR dest : DynamicString);`

Append adds the extra dynamic string to dest dynamic string. Our "input" is extra and our output is a modified dynamic string named dest. This parameter order mimics the standard Strings module's Append.

NOTE: Oberon idiom is often input values, modified value and result values. Modified and result values are declared in the parameter definition using VAR.

Algorithm:

Move to the end of dest dynamic string
Create a new record at cur.next.
Copy extra.value info.value cur.next.value
Advance extra and cur, repeating steps 2 to 4 as needed.

Implemented procedure.


        PROCEDURE Append*(extra: DynamicString; VAR dest : DynamicString);
          VAR cur : DynamicString;  
        BEGIN
          (* Move to the end of the dest DynamicString *)
          cur := dest;
          WHILE cur.next # NIL DO cur := cur.next; END;
          (* Starting initial pointer of `extra` copy its records
             input new records created in `cur`. *)
          WHILE extra # NIL DO
            (* Create a new record *)
            NEW(cur.next);
            cur.next.value := "";
            cur.next.next := NIL;
            (* Copy extra.value into new record *)
            Strings.Extract(extra.value, 0, vSize, cur.next.value);
            (* Advance to next record for both cur and extra *)
            extra := extra.next;
            cur := cur.next;
          END;
        END Append;

A second procedure for appending an ARRAY OF CHAR also becomes trivial. First convert the ARRAY OF CHAR to a dynamic string then append it with the previous procedure.

`AppendCharArray*(src : ARRAY OF CHAR; VAR str : DynamicString);`

This procedure appends an ARRAY OF CHAR to an existing dynamic string.


        PROCEDURE AppendCharArray*(extra: ARRAY OF CHAR; VAR dest : DynamicString);
          VAR extraStr : DynamicString;    
        BEGIN
          (* Convert our extra ARRAY OF CHAR into a DynamicString *)
          New(extraStr); Set(extraStr, extra);
          (* Now we can append. *)
          Append(extraStr, dest);
        END AppendCharArray;

At some point we will want to know the length of our dynamic string.

`Length(str : DynamicString) : INTEGER`

Our Length needs to go through our linked list and total up the length of each value. We will use a variable called cur to point at the current record and add up our total length as we walk through the list.


        PROCEDURE Length*( source : DynamicString) : INTEGER;
          VAR cur : DynamicString; total : INTEGER;
        BEGIN
          total := 0;
          cur := source;
          WHILE cur # NIL DO
            total := total + Strings.Length(cur.value);
            cur := cur.next;
          END; 
          RETURN total
        END Length;

Extending DynamicStrings module

With these few procedures we have a basic means of working with dynamic strings. Moving beyond this we can look at the standard Oberon Strings module for inspiration. If we use similar procedure signatures we can create a drop in replacement for Strings with DynamicStrings.

NOTE: Procedure signatures refer to procedures type along with the order and types of parameters. A quick review of the procedure signatures for the standard module Strings is provided by the OBNC compiler docs.

Let's look at recreating Insert as a potential guide to a more fully featured "DynamicStrings.Mod". In our Insert we modify the procedure signature so the source and destinations are dynamic strings.

`Insert(source : DynamicString; pos : INTEGER; VAR dest : DynamicString)`

The Insert procedure inserts a source dynamic string at the position provided into our dest dynamic string. We are implementing the same signature for our DynamicStrings.Insert() as Strings.Insert(). Only the parameters for source and destination are changed to DynamicString.

Internally our procedure for Insert is a more complicated than the ones we've written so far. It needs to do all the housing keeping for making sure we add the right content in the correct spot. The general idea is to find the record holding the split point. Split that record into two records. The first retains the characters before the insert position. The second holds the characters after the insert position and points to next record in the dynamic string. Once the split is accomplished it then is a matter of linking everything up. The record before the insert position is set to point at the dynamic string to be inserted, the inserted dynamic string is set to point at the record that contained the rest of the characters after the split.

It is easy to extract a sub-string from an ARRAY OF CHAR using the standard Strings module. We can store the characters in the .value of the record after the split in a temporary ARRAY OF CHAR. The temporary ARRAY OF CHAR can be used to create a new dynamic string record which will be linked to the rest of our destination dynamic string. The record which held the characters before the insert position needs to be truncated and it needs to be linked to the dynamic string we want to insert. NOTE: This will leave a small amount of unused memory.

NOTE: If conserving memory is critical then re-packing the dynamic string could be implemented as another procedure. The cost would be complexity and time to shift characters between later records and earlier ones replacing excess NULL values.

We need to find the record where the split will occur. In the record to be split we need to calculate a relative split point. We then can copy the excess characters in that split record to a new record and truncate the .value's ARRAY OF CHAR to create our split point. Truncating is easy in that we replace the CHAR in the .values that are not needed with a NULL character. We can do that with a simple loop. Likewise calculating the relative insertion position can be done by taking the modulo of the vSize of .value.

NOTE: In Oberon stings are terminated with a NULL character. A NULL character holds the ASCII value 0X.

Our algorithm:

Set cur to point to the start of our destination dynamic string
Move cur to the record in the link list where the insertion will take place
Calculate the relative split point in cur.value
Copy the characters in cur.value from relative split point to end of .value into a temporary ARRAY OF CHAR
Make a new record, rest, using the temporary ARRAY OF CHAR and update the value of .next to match that of cur.next
Truncate the record (cur) at the relative split point
Set cur.next to point to our extra dynamic string.
Move to the end of extra with cur
Set the cur.next to point at rest

Our procedure:


        PROCEDURE Insert*(extra : DynamicString; 
                          pos : INTEGER; 
                          VAR dest : DynamicString);
          VAR cur, rest : DynamicString;
              tmp : ARRAY vSize OF CHAR;
              i, splitPos : INTEGER; continue : BOOLEAN;
        BEGIN
          (* 1. Set `cur` to the start of our `dest` dynamic string *)
          cur := dest;
      
          (* 2. Move to the record which holds the split point *)
          i := 0;
          continue := (i < pos);
          WHILE continue DO
            i := i + Strings.Length(cur.value);
            continue := (i < pos);
            IF continue & (cur.next # NIL) THEN
              cur := cur.next;
            ELSE
              continue := FALSE;
            END;
          END;
      
          (* 3. Copy the characters in `cur.value` from relative
                split point to end of `.value` into a 
                temporary `ARRAY OF CHAR` *)
          splitPos := pos MOD vSize;
          Strings.Extract(cur.value, splitPos,
                          Strings.Length(cur.value), tmp);
      
          (* 4. Make a new record, `rest`, using the temporary 
                `ARRAY OF CHAR` and update the value of `.next` to
                match that of `cur.next` *)
          New(rest); Set(rest, tmp);
          rest.next := cur.next;
      
          (* 5. Truncate `cur.value` at the relative split point *)
          i := splitPos;
          WHILE i < LEN(cur.value) DO
            cur.value[i] := 0X;
            INC(i);
          END;
      
          (* 6. Set `cur.next` to point to our `extra`
                dynamic string. *)
          cur.next := extra;
      
          (* 7. Move to the end of extra with `cur` *)
          WHILE cur.next # NIL DO cur := cur.next; END;
      
          (* 8. Set the `cur.next` to point at `rest` *)
          cur.next := rest;
        END Insert;

While our Insert is the longest procedure so far the steps are mostly simple. Additionally we can easily extend this to support inserting a more traditional ARRAY OF CHAR using our previously established design pattern of converting a basic type into our dynamic type before calling the dynamic version of the function.


        PROCEDURE InsertCharArray*(source : ARRAY OF CHAR; 
                                   pos : INTEGER; 
                                   VAR dest : DynamicString);
          VAR extra : DynamicString;
        BEGIN
          New(extra); Set(extra, source);
          Insert(extra, pos, dest);
        END InsertCharArray;

Where to go next

It is possible to extend our "DynamicStrings.Mod" into a drop in replacement for the standard Strings. I've included a skeleton of that module as links at the end of this article with stubs for the missing implementations such as Extract, Replace, Pos, and Cap. I've also included a "DynamicStringsTest.Mod" for demonstrating how it works.

The procedure I suggest is to mirror Strings replacing the parameters that are ARRAY OF CHAR with DynamicString. It will be helpful to include some bridging procedures that accept ARRAY OF CHAR as inputs too. These will use similar names with a suffix of CharArray.

Parameter conventions and order

Oberon is highly composable. The trick to creating a drop in replacement module is use the same parameter signatures so you only need to make minor changes like updating the IMPORT statement and using a module alias to map the old module to the new one. The parameter signatures in Strings follow a convention you'll see in other Oberon modules. The parameter order is based on the "inputs", "modify parameters", and "output parameters". Inputs are non-VAR parameters. The remaining are VAR parameters. I think of "modify parameters" as those objects who reflect side effects. I think of "output" as values that in other languages would be returned by functions. This is only a convention. A variation I've read in other Oberon modules is "object", "inputs", "outputs". "object" and "outputs" are VAR parameters and "inputs" are not. This ordering makes sense when we think of records as holding an object. In both cases ordering is a convention and not enforced by the language. Convention and consistency is helpful but readability is the most important. Oberon is a readable language. It does not reward obfuscation. Readability is a great virtue in a programming language. When creating your own modules choose readability based on the concepts you want to emphasize in the module (e.g. procedural, object oriented).

The modules so far

You can read the full source for the module discussed along with a test module in the links that follow.

Next and Previous

Next Procedures as parameters
Previous Oberon-07 and the file system