Dynamic types
By R. S. Doiel, 2020-05-25
This is the eighth post in the Mostly Oberon series. Mostly Oberon documents my exploration of the Oberon Language, Oberon System and the various rabbit holes I will inevitably fall into.
Dynamic Types in Oberon
Oberon-07 is a succinct systems language. It provides a minimal
but useful set of basic static types. Relying on them addresses
many common programming needs. The Oberon compiler ensures
static types are efficiently allocated in memory. One of the
strengths of Oberon is this ability to extend the type system.
This means when the basic types fall short you can take
advantage of Oberon's type extension features. This includes
creating dynamically allocated data structures. In Oberon-07
combining Oberon's POINTER TO and RECORD types allows us to
create complex and dynamic data structures.
An example, dynamic strings
Strings in Oberon-07 are typical declared as an ARRAY OF CHAR
with a specific length. If the length of a string is not
known a head of time this presents a challenge. One approach is
to declare a long array but that would allocate allot of memory
which may not get used. Another approach is to create a dynamic
data structure. An example is using a linked list of shorter
ARRAY OF CHAR. The small fixed strings can combine to
represent much larger strings. When one fills up we add
another.
Pointers and records, an Oberon idiom
Our data model is a pointer to a record where the record
contains an ARRAY OF CHAR and a pointer to the next record.
A common idiom in Oberon for dynamic types is to declare a
POINTER TO type and declare a RECORD type which contains
the POINTER TO type as an attribute. If you see this idiom
you are looking at some sort of dynamic data structure. The
pointer type is usually named for the dynamic type you want
work with and the record type is declared using the same name
with a "Desc" suffix. In our case DynamicString will be the
name of our POINTER TO type and our record type will be
called DynamicStringDesc following the convention. In our
record structure we include a "value" to holding a short
fixed length ARRAY OF CHAR and a "next" to holding the
pointer to our next record.
In our record the value is declared as a static type. We need
to know how long our "short" string should be? I.e. What length
is our ARRAY OF CHAR? It's a question of tuning. If it is too
short we spend more time allocating new records, too long and
we are wasting memory in each record. A way to make tuning a
little simpler is to use a constant value to describe our array
length. Then if we decide our array is too big
or too small we can adjust the constant knowing that our record
structure and the procedures that use that the length
information will continue to work correctly.
Let's take a look at actual code (NOTE: vSize is our constant value).
CONST
vSize = 128;
TYPE
DynamicString* = POINTER TO DynamicStringDesc;
DynamicStringDesc* = RECORD
value : ARRAY vSize OF CHAR;
next : DymamicString;
END;
NOTE: Both DynamicString and DynamicStringDesc are defined
using an *. These are public and will be available
to other modules. Inside our record DynamicStringDesc we
have two private to our module attributes, .value and
.next. They are private so that we can change our
implementation in the future without requiring changes in
modules that use our dynamic strings. Likewise our constant vSize
is private as that is an internal implementation detail. This
practice is called information hiding.
NOTE: The asterisk in Oberon decorates procedures, types, variables and constants that are "public" to other modules.
NOTE: Variables are always exported read only.
NOTE: With information hiding some details of implementation allow us to keep a clean division between implementation inside the module and how that implementation might be used. With out information hiding we often have "leaky" abstractions that become brittle and hard to maintain and rely on.
Working with DynamicString
Our type definitions describe to the compiler how to layout our data in memory. The type system in Oberon-07 also ensures that access to that memory is restricted to assignments, operations and procedures compatible with that type. To be useful from other modules we need a few procedures to help work with this new data type. What follows is a minimal set needed to be useful.
New*(VAR str : DynamicString)
New will initialize a DynamicString object setting .value to
an empty string.
PROCEDURE New*(VAR str : DynamicString);
BEGIN NEW(str);
str.value := "";
str.next := NIL;
END New;
Set*(VAR str : DynamicString; source : ARRAY OF CHAR)
Set copies an ARRAY OF CHAR into an existing DynamicString.
This requires that we add and link additional records if the
source is longer than our current dynamic string. Set is a
bridge procedure between an existing datatype, ARRAY OF CHAR
and our new data type, DynamicString.
PROCEDURE Set*(VAR str : DynamicString; source : ARRAY OF CHAR);
VAR cur, next : DynamicString; tmp : ARRAY vSize OF CHAR;
i, l : INTEGER;
BEGIN cur := str; cur.value := "";
l := Strings.Length(source);
i := 0;
WHILE i < l DO
Strings.Extract(source, i, i + vSize, tmp);
Strings.Append(tmp, cur.value);
i := i + Strings.Length(tmp);
IF (i < l) THEN
IF cur.next = NIL THEN
New(next); cur.next := next;
END;
cur := cur.next;
END;
END;
END Set;
ToCharArray*(str : DynamicString; VAR dest : ARRAY OF CHAR; VAR ok : BOOLEAN)
ToCharArray copies the contents of our dynamic string into an array
of char setting ok to TRUE on success or FALSE if truncated.
Like Set* it is a bridge procedure to let us move data output
our new dynamic string type.
PROCEDURE ToCharArray*(str : DynamicString;
VAR dest : ARRAY OF CHAR;
VAR ok : BOOLEAN);
VAR cur : DynamicString; i : INTEGER;
BEGIN
ok := FALSE;
cur := str; i := 0;
WHILE cur # NIL DO
i := i + Strings.Length(cur.value);
Strings.Append(cur.value, dest);
cur := cur.next;
END;
ok := (i = Strings.Length(dest));
END ToCharArray;
Two additional procedures will likely be needed-- Append and
AppendCharArray. This first one is trivial, if we want to add
one dynamic string onto another all we need to do is link the
last record of the first and point it to a copy of the second string we're appending.
Append*(extra : DynamicString; VAR dest : DynamicString);
Append adds the extra dynamic string to dest dynamic string. Our
"input" is extra and our output is a modified dynamic string
named dest. This parameter order mimics the standard
Strings module's Append.
NOTE: Oberon idiom is often input values, modified value and
result values. Modified and result values are declared in the parameter
definition using VAR.
Algorithm:
- Move to the end of
destdynamic string - Create a new record at
cur.next. - Copy
extra.valueinfo.valuecur.next.value - Advance
extraandcur, repeating steps 2 to 4 as needed.
Implemented procedure.
PROCEDURE Append*(extra: DynamicString; VAR dest : DynamicString);
VAR cur : DynamicString;
BEGIN
(* Move to the end of the dest DynamicString *)
cur := dest;
WHILE cur.next # NIL DO cur := cur.next; END;
(* Starting initial pointer of `extra` copy its records
input new records created in `cur`. *)
WHILE extra # NIL DO
(* Create a new record *)
NEW(cur.next);
cur.next.value := "";
cur.next.next := NIL;
(* Copy extra.value into new record *)
Strings.Extract(extra.value, 0, vSize, cur.next.value);
(* Advance to next record for both cur and extra *)
extra := extra.next;
cur := cur.next;
END;
END Append;
A second procedure for appending an ARRAY OF CHAR also
becomes trivial. First convert the ARRAY OF CHAR to a dynamic
string then append it with the previous procedure.
AppendCharArray*(src : ARRAY OF CHAR; VAR str : DynamicString);
This procedure appends an ARRAY OF CHAR to an existing dynamic string.
PROCEDURE AppendCharArray*(extra: ARRAY OF CHAR; VAR dest : DynamicString);
VAR extraStr : DynamicString;
BEGIN
(* Convert our extra ARRAY OF CHAR into a DynamicString *)
New(extraStr); Set(extraStr, extra);
(* Now we can append. *)
Append(extraStr, dest);
END AppendCharArray;
At some point we will want to know the length of our dynamic string.
Length(str : DynamicString) : INTEGER
Our Length needs to go through our linked list and total up
the length of each value. We will use a variable called cur
to point at the current record and add up our total length as
we walk through the list.
PROCEDURE Length*( source : DynamicString) : INTEGER;
VAR cur : DynamicString; total : INTEGER;
BEGIN
total := 0;
cur := source;
WHILE cur # NIL DO
total := total + Strings.Length(cur.value);
cur := cur.next;
END;
RETURN total
END Length;
Extending DynamicStrings module
With these few procedures we have a basic means of working with
dynamic strings. Moving beyond this we can look at the standard
Oberon Strings module for inspiration. If we use similar
procedure signatures we can create a drop in replacement
for Strings with DynamicStrings.
NOTE: Procedure signatures refer to procedures type along with the order and types of parameters. A quick review of the procedure signatures for the standard module Strings is provided by the OBNC compiler docs.
Let's look at recreating Insert as a potential guide to
a more fully featured "DynamicStrings.Mod".
In our Insert we modify the procedure signature so the
source and destinations are dynamic strings.
Insert(source : DynamicString; pos : INTEGER; VAR dest : DynamicString)
The Insert procedure inserts a source dynamic string at the
position provided into our dest dynamic string. We are implementing
the same signature for our DynamicStrings.Insert() as
Strings.Insert(). Only the parameters for source and destination
are changed to DynamicString.
Internally our procedure for Insert is a more complicated than
the ones we've written so far. It needs to do all the housing
keeping for making sure we add the right content in the correct
spot. The general idea is to find the record holding the split
point. Split that record into two records. The first retains
the characters before the insert position. The second holds the
characters after the insert position and points to next record
in the dynamic string. Once the split is accomplished it then
is a matter of linking everything up. The record before the
insert position is set to point at the dynamic string to be
inserted, the inserted dynamic string is set to point at the
record that contained the rest of the characters after the
split.
It is easy to extract a sub-string from an ARRAY OF CHAR
using the standard Strings module. We can store the characters
in the .value of the record after the split in a temporary
ARRAY OF CHAR. The temporary ARRAY OF CHAR can be used to
create a new dynamic string record which will be linked to the
rest of our destination dynamic string. The record which held
the characters before the insert position needs to be truncated
and it needs to be linked to the dynamic string we want to
insert. NOTE: This will leave a small amount of unused
memory.
NOTE: If conserving memory is critical then re-packing the dynamic string could be implemented as another procedure. The cost would be complexity and time to shift characters between later records and earlier ones replacing excess NULL values.
We need to find the record where the split will occur. In the
record to be split we need to calculate a relative
split point. We then can copy the excess characters in that
split record to a new record and truncate the .value's
ARRAY OF CHAR to create our split point. Truncating is easy
in that we replace the CHAR in the .values that are not
needed with a NULL character. We can do that with a
simple loop. Likewise calculating the relative insertion
position can be done by taking the modulo of the vSize of
.value.
NOTE: In Oberon stings are terminated with a NULL
character. A NULL character holds the ASCII value 0X.
Our algorithm:
- Set
curto point to the start of our destination dynamic string - Move
curto the record in the link list where the insertion will take place - Calculate the relative split point in
cur.value - Copy the characters in
cur.valuefrom relative split point to end of.valueinto a temporaryARRAY OF CHAR - Make a new record,
rest, using the temporaryARRAY OF CHARand update the value of.nextto match that ofcur.next - Truncate the record (cur) at the relative split point
- Set
cur.nextto point to ourextradynamic string. - Move to the end of extra with
cur - Set the
cur.nextto point atrest
Our procedure:
PROCEDURE Insert*(extra : DynamicString;
pos : INTEGER;
VAR dest : DynamicString);
VAR cur, rest : DynamicString;
tmp : ARRAY vSize OF CHAR;
i, splitPos : INTEGER; continue : BOOLEAN;
BEGIN
(* 1. Set `cur` to the start of our `dest` dynamic string *)
cur := dest;
(* 2. Move to the record which holds the split point *)
i := 0;
continue := (i < pos);
WHILE continue DO
i := i + Strings.Length(cur.value);
continue := (i < pos);
IF continue & (cur.next # NIL) THEN
cur := cur.next;
ELSE
continue := FALSE;
END;
END;
(* 3. Copy the characters in `cur.value` from relative
split point to end of `.value` into a
temporary `ARRAY OF CHAR` *)
splitPos := pos MOD vSize;
Strings.Extract(cur.value, splitPos,
Strings.Length(cur.value), tmp);
(* 4. Make a new record, `rest`, using the temporary
`ARRAY OF CHAR` and update the value of `.next` to
match that of `cur.next` *)
New(rest); Set(rest, tmp);
rest.next := cur.next;
(* 5. Truncate `cur.value` at the relative split point *)
i := splitPos;
WHILE i < LEN(cur.value) DO
cur.value[i] := 0X;
INC(i);
END;
(* 6. Set `cur.next` to point to our `extra`
dynamic string. *)
cur.next := extra;
(* 7. Move to the end of extra with `cur` *)
WHILE cur.next # NIL DO cur := cur.next; END;
(* 8. Set the `cur.next` to point at `rest` *)
cur.next := rest;
END Insert;
While our Insert is the longest procedure so far the steps
are mostly simple. Additionally we can easily extend this to
support inserting a more traditional ARRAY OF CHAR using our
previously established design pattern of converting a basic type
into our dynamic type before calling the dynamic version of the
function.
PROCEDURE InsertCharArray*(source : ARRAY OF CHAR;
pos : INTEGER;
VAR dest : DynamicString);
VAR extra : DynamicString;
BEGIN
New(extra); Set(extra, source);
Insert(extra, pos, dest);
END InsertCharArray;
Where to go next
It is possible to extend our "DynamicStrings.Mod" into a drop
in replacement for the standard Strings. I've included a
skeleton of that module as links at the end of this article
with stubs for the missing implementations such as Extract,
Replace, Pos, and Cap. I've also included a
"DynamicStringsTest.Mod" for demonstrating how it works.
The procedure I suggest is to mirror Strings replacing the
parameters that are ARRAY OF CHAR with DynamicString. It
will be helpful to include some bridging procedures that accept
ARRAY OF CHAR as inputs too. These will use similar names
with a suffix of CharArray.
Parameter conventions and order
Oberon is highly composable. The trick to creating a drop in
replacement module is use the same parameter signatures so
you only need to make minor changes like updating the IMPORT
statement and using a module alias to map the old module to the
new one. The parameter signatures in Strings follow a
convention you'll see in other Oberon modules. The parameter
order is based on the "inputs", "modify parameters", and
"output parameters". Inputs are non-VAR parameters. The
remaining are VAR parameters. I think of "modify parameters"
as those objects who reflect side effects. I think of "output"
as values that in other languages would be returned by
functions. This is only a convention. A variation I've
read in other Oberon modules is "object", "inputs", "outputs".
"object" and "outputs" are VAR parameters and "inputs" are
not. This ordering makes sense when we think of records as
holding an object. In both cases ordering is a convention
and not enforced by the language. Convention and consistency is
helpful but readability is the most important. Oberon is a
readable language. It does not reward obfuscation. Readability is
a great virtue in a programming language. When creating your own
modules choose readability based on the concepts you want to
emphasize in the module (e.g. procedural, object oriented).
The modules so far
You can read the full source for the module discussed along with a test module in the links that follow.
Next and Previous
- Next Procedures as parameters
- Previous Oberon-07 and the file system