This section lists all Welly's data types. For each one, it specifies:
There is a design document explaining why we've chosen these particular types.
This documentation is partly aspirational. Most features work as described, but there are some differences.
Before listing the available types, we can say a few things that apply to all values.
A "value" is something that can be stored in a variable. Welly values are (conceptually) always stored in memory as a contiguous string of bytes. To be more specific, values will always be stored somewhere inside a block of memory allocated in the heap (there is no stack allocation in Welly). There is a wide variety of different kinds of value, but they all satisfy this universal property.
In practice, values will sometimes be held in virtual machine registers, which may in turn be held in x86 registers. However, that is an optimisation which is invisible to programmers. Similarly, where the specification says that Welly will copy a value, the optimiser might decide not to bother, and if it can prove that no harm will come of sharing the original. For simplicity, this specification describes the unoptimised version.
Some values require other data to be stored elsewhere in memory. For example, an array value is just a short header, including a pointer to another heap block containing the array elements. The other heap block is not considered to be part of the value, because it is not part of the contiguous string of bytes. This distinction matters when a value is copied.
Every value supports a few universal operations, which are defined in terms of the string of bytes that represents the value:
a = b
, it always
means the same thing: find the string of bytes that represents
b
, and the range of memory addresses that represents
a
, and copy the former into the latter. No exception is made
for any type of value.new a
it always means the same
thing: find the string of bytes representing a
, allocate a new
mutable heap block that is just big enough, copy the bytes into it, and
return a reference to it. No exception is made for any type of value.&a
it always means the
same thing: find the range of memory addresses that represents
a
, and return a reference to them. No exception is made for any
type of value.(a, b)
it always means
the same thing: make a string of bytes containing the bytes that represent
a
and those that represent b
. The layout is chosen
in a defined way, depending on the sizes and alignments of a
and
b
.Every value has a size, being the number of bytes it occupies. The size is not stored in memory (in general) but is instead calculated at compile time.
Every value also has an alignment, which can be 0, 1, 2, 4 or 8. The address at which a value is stored in memory must be a multiple of its alignment. Like the size, the alignment of a value is calculated at compile time.
Values with alignment zero must all be stored at address zero. This
alignment is only used for values of zero length, such as the empty struct, of
type void
. It is used for all such values. It is not possible to
read or write bytes from/to such values, so it does not really matter what
their address is; giving them an alignment of zero is just a way of saying
"don't care".
On most processors, words of size n can (efficiently) only be read from or written to addresses that are multiples of n. For example, if you want to copy a large value eight bytes at a time, it has to be eight-byte-aligned. To allow itself to copy values in a reasonable number of operations, Welly therefore imposes a minimum alignment on values, equal to a quarter of their size, or eight, whichever is smaller. For example, a string of nineteen bytes will have an alignment of eight, and can therefore be copied as two words of eight bytes, one word of two bytes, and one more byte.
In summary, the allowed combinations of size and alignment are as follows:
Alignment | Size |
---|---|
0 | 0. |
1 | More than zero and less than 4. |
2 | More than zero and less than 8. |
4 | More than zero and less than 16. |
8 | More than zero. |
Heap blocks in Welly always have addresses that are a multiple of eight. Therefore, any value can be stored at the start of a heap block without any further checks. It is to ensure this property that Welly does not support alignments larger than eight.
Every value has the ability to enumerate the pointers inside it. It is important that the garbage collector is aware of all pointers inside values, as this is needed to define which heap blocks are reachable and which are garbage. It is also needed to allow the garbage collector to update pointers when it moves heap blocks around in memory.
Like the size and alignment of a value, the locations of the pointers within a value are not stored in memory as part of the value. Instead, they are calculated at compile time, depending only on the value's type. At run time, the program can manipulate the value efficiently, without having to manipulate type information as well.
Having said that, some metadata is stored in a separate area of memory whenever a heap block is allocated. The metadata is not accessible from within Welly, but is only accessed during garbage collection. The metadata is designed to be general enough to support other languages besides Welly, and uses a format that is simpler than Welly's types.
Pointers are eight bytes long. Any value that contains a pointer must have an alignment of eight.
Welly supposes that every non-zero 64-bit value can in principle be a pointer. Pointers always point inside allocated heap blocks, but not always to the start of those blocks, so pointers are not always multiples of 8. Welly does not "steal" any of the bits of a pointer for purposes other than storing the address of the pointee.
Every value is either const
or var
. Values that
are const
cannot be modified by anyone, and can be relied upon
not to change over time. Values that are var
can be modified by
anyone, and the changes will be seen by everyone. Both contnesses are useful.
The constness of a value depends on where a value is stored. Constness is orthogonal to type; i.e. either constness can be combined with any type. The constness is determined when the value is first created, and cannot be changed without making a copy of the value.
In addition to const
and var
values, one
sometimes encounters values of unknown constness. There is no third constness
really; every such value is either const
or var
.
Saying that the constness of a value is unknown simply means that your code
must work in both cases. Values of unknown constness cannot safely be
modified, but nor can they be trusted not to change their value.
Every storage location and every value in Welly has a type. A value can be stored in a location if and only if their types are compatible.
The type of a value tells you how it is represented in memory. The representation of a value is chosen to be compact, while still being able to distinguish it from other values of the same type.
We say that a type t
is a subtype of a type u
if all values that can be stored in a location of type t
can also
be stored in a location of type u
. Subtyping arises in Welly in
several ways. Read more about subtypes.
Unlike in many programming languages, types are themselves first-class
values, which can be stored in variables and passed to functions. Therefore,
types have a type. The type of types is called type
, and it has the
following definition:
union constness {
BASE
CONST
VAR
}
union type {
PRIMITIVE(uint8, uint8)
REF(constness, type)
ARRAY0
ARRAY(constness, type)
STRUCT(array const type)
UNION(array const struct {name: str, type: array const type})
FUNCTION(array const type, type)
DYNAMIC
MACRO
WITH(type, module)
// TODO...
} with { ... }
This means that every type is one of the following:
ref
or array
type.array0
in source code).fn
type.struct
type.union
type.dynamic
and macro
.with
type.[Aspirational - the current implementation is simpler.]
The primitive types defined in Welly's standard library are:
Type name | Size | Alignment | Description |
---|---|---|---|
bool | 1 | 1 | Only values are `true` or `false`. |
char | 4 | 4 | Unicode code point. |
byte | 1 | 1 | Synonym for `uint8`. |
uint8 | 1 | 1 | Unsigned 8-bit integer. |
int8 | 1 | 1 | Signed 8-bit integer. |
uint16 | 2 | 2 | Unsigned 16-bit integer. |
int16 | 2 | 2 | Signed 16-bit integer. |
uint32 | 4 | 4 | Unsigned 32-bit integer. |
int32 | 4 | 4 | Signed 32-bit integer. |
int | 8 | 8 | 64-bit integer. |
float32 | 4 | 4 | 32-bit floating-point number. |
float | 8 | 8 | 64-bit floating-point number. |
There is no separate 64-bit uint
type, because 64-bit integers
are never implicitly sign-extended. Read more about
signedness.
These types will all be represented as instances of
PRIMITIVE(size, alignment)
. The differences between them will be
represented by wrapping the type in a with
type. This arrangement
allows libraries written by third parties to define new primitive types.
It is possible to take the address of any existing Welly value, without
copying it into a heap block all of its own. Such an address is called a
"reference". The Welly syntax for taking the address of a value x
is &x
. The Welly syntax for the type of a reference to type
t
is ref t
, ref const t
or
ref var t
.
The difference between the three kinds of reference is as follows:
ref
neither guarantees that the value won't change, nor
allows the value to be modified.ref const
guarantees that the value pointed to won't
change.ref var
allows the value to be modified.ref t
offers the weakest guarantee. It is useful because it
is a supertype of the other two; a function that accepts ref t
can accept any reference to type t
.
References occupy a single word: a pointer to the referee. They have a size of 8 and an alignment of 8. References can refer to locations in the middle of a heap block.
Note the distinction between references and pointers. References are first-class values that are represented as pointers. However, pointers are a low-level concept that is also used for other purposes, and pointers are not always first-class values.
An array block is a heap block that contains many values of the same type, packed together as closely as possible. The values are called the "elements" of the array. The size of an array block is the size of each value plus padding, times the number of elements. The number of elements is not stored in the array block; anyone with a reference into the array block is expected to know how many elements they can access there. Array blocks are not first-class values in Welly; you can't stored them in variables, or pass them to functions.
An array value is a pointer into an array block (not necessarily to the first element) along with the number of elements that may be accessed via the pointer. An array value is a first-class value.
The Welly syntax for an array literal is [a, b, c]
, where
a
, b
and c
are values of the same type
(there can be any number of values, from 0 upwards). The "array" module in
the standard library provides other ways of constructing array values.
The Welly syntax for the type of an array with element type t
is array t
, array const t
or
array var t
(see the section "References" above for an
explanation of what these mean).
Arrays have a size of 16 and an alignment of 8.
[Aspirational - the current version treats strings as a special case.]
Strings in Welly are represented as immutable arrays of bytes, i.e. strings
are assignment-compatible with array const byte
. The characters
of the string are packed into the array using the UTF-8 encoding. The type of
strings is called str
.
Values can be concatenated into a struct
. The Welly syntax for
writing the concatenation of values a
, b
and
c
is (a, b, c)
. The Welly syntax for writing the
type of the struct is struct (t, u, v)
, in which t
,
u
and v
are the types of a
,
b
and c
respectively.
The constituent values of a struct are called its "fields". The size and alignment of a struct is calculated from the size and alignment of its fields. The offset of each field within a struct (by which I mean the difference between the address of the field and the address of the struct) is similarly calculated.
Structs are built from left to right. Each value is placed at the smallest offset that is consistent with its alignment and that does not cause it to overlap with any previous value. Note that it is possible for later fields to have a smaller offset than earlier fields, if they have a smaller alignment. When the struct is complete, the gaps are filled with padding bytes. The size of the struct is the offset of the first unused byte, and the alignment is the largest alignment of any member.
struct (a: byte, b: int, c: byte)
a
has offset 0; b
has offset 8; c
has offset 1. There are six bytes of padding at offset 2. The size of the
struct is 16 and its alignment is 8.struct (struct (a: byte, b: int), c: byte)
a
has offset 0; b
has offset 8; c
has offset 16. There are seven bytes of padding at offset 1. The size of the
struct is 17 and its alignment is 8. c
cannot be placed at offset
1 because it would overlap with the padding of the inner struct.struct (struct (a: int, b: byte), c: byte)
a
has offset 0; b
has offset 8; c
has offset 9. There is no padding. The size of the struct is 10 and its
alignment is 8.struct (a: byte, struct (b: int, c: byte))
a
has offset 0; b
has offset 8; c
has offset 16. There are seven bytes of padding at offset 1. The size of the
struct is 17 and its alignment is 8. The inner struct cannot be placed at
offset 1 because it has an alignment of 8.struct (a: byte, struct (b: byte, c: int))
a
has offset 0; b
has offset 8; c
has offset 16. There are seven bytes of padding at offset 1, and another seven
bytes of padding at offset 9, i.e. at offset 1 of the inner struct. The size
of the struct is 24 and its alignment is 8. The inner struct cannot be placed
at offset 1 because it has an alignment of 8.[Implemented, but likely to change.]
The type void
is a synonym for struct ()
. It has
only one value: ()
. Therefore, a function that returns
void
does not return any information.
Unions are Welly's algebraic types. Unions are useful when you want a storage location that is able to hold more than one type of value. Each type of value that the location can hold is called a "case". The cases are named.
Case names are written in capital letters; they are lexically distinguished from identifiers, which must always contain a lower-case letter.
A good example of an algebraic type is the type of a binary tree with
ring
s on its branches and bell
s on its leaves
(ring
and bell
being previously defined types).
This type has two cases:
LEAF
contains a bell
.BRANCH
contains a tree
, a ring
and another tree
.The Welly syntax for this type is:
union tree {
LEAF(bell)
BRANCH(tree, ring, tree)
}
The Welly syntax for constructing a LEAF
is
LEAF(x)
where x
is a bell
.
x
is called the "payload" of the union value. (In the case where
the payload is empty, the brackets can be omitted, i.e. NONE()
can be written NONE
).
A union type with more cases than this one is a supertype of it (assuming the payloads are compatible). A union type with fewer cases is a subtype. Read more about subtypes.
A union value always occupies 16 bytes, and always has an alignment of 8. The case name is stored at offset 0, represented as a 64-bit hash. The payload value is stored in a separate heap block. A pointer to it is stored at offset 8.
Note that the word at offset 0 is never a pointer, and the word at offset 8 is always a pointer. That's all the garbage collector needs to know, because the payload is a separate heap block with its own metadata.
Note another restriction on the representation of union values: when a value is cast from a subtype to a supertype, its representation must not change. Thus, it is not possible for any union value to occupy less than 16 bytes.
[Implemented, but likely to change.]
The type never
is a synonym for union {}
. It does
not have any values. Therefore, a function that returns never
cannot return (except by throwing an exception).
Functions are first-class values in Welly. The Welly syntax for a function
literal is fn name(a:t, b:u, c:v): w { ... }
where
a
, b
and c
are the parameter names,
t
, u
and v
are their types, and
w
is the return type (which can often be omitted). The syntax for
the type of that function is fn(t, u, v): w
.
Function values have a size of 16 and an alignment of 8. A pointer to the code for the function is stored at offset 0. The non-static values captured when the function was defined are packed into a struct and stored in a separate heap block. A pointer to the capture struct is stored at offset 8.
[TODO: Cross-reference a page about the calling convention, and check that it is up to date. Make sure it defines how to throw an exception.]
Dynamic values are useful when you want to store a value whose type is not
a compile-time-known constant. The Welly syntax for wrapping up a value
x
as a dynamic value is box x
. If
y
is a dynamic value, then y.type
is the type of
the value it contains. If in addition y
is known at compile time,
then y.value
is the value it contains.
Dynamic values have a size of 40 and an alignment of 8. The constness of the value is stored at offset 0. The type of the value is stored at offset 16. The value itself is stored in a separate heap block, and a pointer to it is stored at offset 32.
It is quite wasteful to use 16 bytes for the constness, which only has three possible values. We are likely to optimise the representation at some point.
[Aspirational and unfinished.]
Macros are first-class values. The representation of macros is
implementation-defined. The type of macros is called macro
.
[Implemented, but likely to change.]
Modules are first-class values. The representation of modules is
implementation-defined, but they typically use a representation that can be
defined and manipulated by Welly source code, i.e. something described
elsewhere in this document. The type of modules is called module
,
and it is an opaque type.
with
types[Principle agreed, mechanism not. See [TODO] "doc/extensions/compact.txt" for an alternative design.]
It is possible to define a type that is based on another type, and
assignment compatible with it, but with different meanings for its operators.
The Welly syntax for defining a type u
based on t
is:
const u = t with(wrap) { ... }
The braces have the same syntax as the body of a module, and typically
contain macros defining the various operators, and any other API that values
of type u
support. The way in which this is done is described in
"reference/operators.html".
Values of type u
have the same representation as values of
type t
, and are inter-assignable with them. In particular, you
can use the cast operator :
to convert a value between the two
types, in either direction.
Values of type u
have exactly the same size and alignment as
values of type t
.