A good place to start describing the design of Welly is its data model. It can be understood without knowing much else about the design. Its decisions plainly show the design priorities and motivations of Welly. Finally, data structures are something that programmers spend a lot of time choosing, so the data model may go a long way towards determining what programming in Welly feels like.
The decisions we made are described in the reference section. This section explains why we made them.
We have allowed Welly's data model to become moderately complicated. Complexity is always painful, but we tolerate it here because of the benefits. There is nothing in our data model that can be omitted without compromising one or more of Welly's design goals: especially safety, efficiency, and rapid development.
In the authors' opinions, a lot of their favourite programming languages made basic mistakes in their data models. A poor decision in the data model can harm the whole language, and can raise frequent grumbles from programmers. Here are some such mistakes, which we are keen to avoid.
In Java's data model, every value fits in a machine word. Values of the so called "primitive types" all fit in a machine word (with the pedantic exception of doubles and longs on a 32-bit machine), and are all passed around by value. All other values are references to "objects" stored in the heap. Large values, such as tuples or arrays, can only be represented as objects, and therefore can only be stored in the heap, and can only be passed around by reference.
Usually, this is not a problem. Allocating memory in the heap is a cheap operation in Java, so there is no reasonable objection on grounds of memory allocation costs. For many large and immutable values, passing by reference is what most programmers would want most of the time anyway. However, there are a few situations where Java's restriction on large values bites quite hard:
clone()
implementation, for example. The best work-around is to
allocate three separate arrays of bytes, but that breaks all sorts of
principles of good engineering, and also spoils locality of reference.Welly will not make this mistake. It will support struct
s, as
in C, that are first class values that can be passed around by value, returned
from functions, stored by value in arrays, and included by value in still
larger structs. It will be up to the programmer when to store things in the
heap and pass a reference instead.
Some languages, including OCaml and most dynamic languages including Python, do not support values smaller than a machine word. This is normally fine; you just waste a few bytes. However, every so often you want to define a very large array, or allocate a very large number of small tuples, or define a compact virtual machine code. In these cases, small primitive values can save a huge amount of memory, and consequently substantially improve performance too.
Welly will support one-, two- and four-byte integers in addition to its default eight-byte integers. Welly will also support four-byte floating-point numbers in addition to its default eight-byte floating-point numbers. All these small values will be densely packed when incorporated into structs, which can thereby also be small.
It is widely acknowledged that C made a mistake in using integers to represent booleans.
Welly will provide a primitive boolean type, which will occupy one byte.
It is common to want to define a variable that either holds a value of one type or a value of another. The simplest and commonest case is an option type: an optional t is either a value of type t or it is null. All references in Java and C are option types, for example. Another common case is an enumeration type, which can have many alternative cases, provided they are all constant. Again, C and (recent versions of) Java support enumerations. However, what if you want a type with more than two cases, at least two of which are not constant?
This is possible, and indeed encouraged, in ML-like languages. For example,
the type of a binary tree with ring
s on its branches and
bell
s on its leaves (ring
and bell
being previously defined types) can be defined in OCaml like this:
type tree = Branch of tree*ring*tree | Leaf of bell;;
After defining the type like this, OCaml will verify that all case switches
on tree
s are exhaustive and type-correct.
In C, you can use union types, if you are careful. See [TODO] "C-algebraic.txt" for details. In Java, your best option is to define a superclass for the tree type, and subclasses for the two cases. See [TODO] "Java-algebraic.txt" for details. In a dynamic language such as Python, you need to write much less, but you get much less. You cannot define the type in any useful way, and anyway, you cannot constrain variables to hold values of the type. There is no way to prevent further cases being added to the union. There is no hope of an exhaustivity check or a type check.
In summary, languages that do not support algebraic types (aka. tagged unions) leave programmers with no attractive options.
Welly supports tagged unions. It is possible to use union values as
the argument of a switch
statement, which performs a static
exhaustivity check. In each case
of the switch, Welly understands
the relationship between the tag and the actual type of the value, and
provides an opportunity to bind a new variable to the value with a purely
static type check.
Reference types in Java and pointer types in C always admit
null
as a possible value. Therefore, it is not possible to ensure
at compile time that a null pointer will never be dereferenced.
Welly will distinguish reference types from optional reference types. Most reference types will not be optional, so that it is never possible to dereference a null pointer.
It will be possible to use an optional reference as the argument of a
switch
statement, as if it were a union value. However, options
will be implemented like references, which are more compact than unions. There
will be syntactic sugar for dereferencing optional references if you are
content to throw an exception if the reference is null. This allows you to
write as in Java, C or Python, if you prefer.
In many languages, including C and Python, and parts of Java, a function that receives a large value from a caller must choose between two unattractive options:
In many applications, the former option is unacceptable. For example, an
early version of Java suffered a security vulnerability because a
cryptographic KeyStore
was passed by reference. For another
example, Welly allows types to refer to each other by reference. The effort
spent type-checking a program is worthless if the type can later be
modified.
Welly will support const
types: reference types that promise
that the referee will not be modified. It will allow constness information to
be attached to all reference-like types, including arrays.
Proper languages like Python and Java allow new code to be generated on the fly, without invoking an external program. However, this feature can be difficult to reconcile with static type checking. The new code may define types that have never previously existed. At the very least, the caller will not know in advance what types the new code will use. This becomes a problem when the caller needs to manipulate values defined in the new code, for example in order to link it to some other new code.
To solve this problem, Welly will support dynamic values. When asked to
compile new code, the compiler to return a value and its type, grouped
together inside a box that says "I hereby guarantee that the contained value
belongs to the contained type". The box is a dynamic value, which can be
passed around safely in variables of type dynamic
, as long as the
box remains sealed. Eventually, the box may reach a piece of code that knows
the type of the value in it. Correctly guessing the type, i.e. passing a
run-time check, allows the code to extract the value in a type-safe manner.
And if the communicated value is a reference, it can in turn grant access to
many other values without any further run-time checks.