xci.cz/ fire-script/ syntax

Fire Script

Language description, v0.11

Fire Script is a functional, strongly and statically typed language, with Haskell-like generics (type classes and type variables).

The language is designed to allow small and easily embeddable implementation, with possibility of compilation to native code.

This document presents the syntax and semantics by example.

An interpreter is being developed in C++ together with this description. The interpreter consists of a C++ static library (libxci-script) and a REPL interpreter (fire). The implementation is a little behind this document, and it’s also possible that not everything described here will be actually implemented. This is a design document, containing miscellaneous ideas. The language may change during implementation.

The plan is that the implementation will completely match this description when the document reaches v1.0. (Note that the implementation is versioned independently of this document.)

See xci::script in xcikit codebase.

1. Basic syntax

1.1. Literals

Literal syntax is derived from C++ and Python.

123        // integer (64bit)
123u       // unsigned integer (64bit)
0b101101   // binary integer
0o765      // octal integer
0xBEEFED   // hexadecimal integer
42d        // integer (32bit)
42h        // integer (16bit)
42ud       // unsigned integer (32bit)
12.3       // floating point (64bit)
12.3f      // floating point (32bit)
'@'        // unicode character
b'@'       // byte / ASCII character
42b        // byte (8bit integer, unsigned)
"lava"     // string (UTF-8)
b"lava"    // bytes (ASCII string)
"""abc"""  // raw string (here doc)
b"""abc""" // raw bytes
("abc",42) // tuple
[1, 2, 3]  // list
[]:[Byte]  // empty list (type can be specified or inferred from context)
{}         // empty block, has type () -> () (no parameters, no return value)
()         // empty tuple (value of zero size)
void       // alias of ()

The raw string can be multi-line. It has a few special rules:

  • Any whitespace followed by a newline after opening quotes, and a newline followed by whitespace immediately before closing quotes, are removed, when they appear together. If it’s only one or the other end, no whitespace or newlines are removed.

  • If the previous rule is satisfied, uniform indentation up to the level of the closing quotes is removed.

  • A sequence of three or more quotes inside the raw string can be escaped with a backslash: \"""""". Other backslashes are kept verbatim: \\"""\""". A backslash right before ending quotes is not possible, it would be interpreted as escaping the quotes. Instead, use the first rule, i.e. add a newline after the backslash and another one after opening quotes (they’ll both be removed).

a = """Hello"""     // `Hello`
b = """
    stripped
    """             // `stripped`
c = """
    \n\0\ <- not special
    """             // `\n\0\ <- not special`
d = """
    no indent removal
"""                 // `    no indent removal`
e = """
    partial indent removal
  """               // `  partial indent removal`
f = """not stripped
"""                 // `not stripped<nl>`
g = """
    not stripped""" // `<nl>    not stripped`
h = """

first/last nl stripped

"""                 // `<nl>first/last nl stripped<nl>`
i = """
    \""""escaped quotes\""""
    """             // `""""escaped quotes""""`
j = """
    \""""
    \\""""
    \\\""""
    """             // `""""<nl>\""""<nl>\\""""`
k = """\""" <- escaped! """
l = """
\
"""                 // `\` (trailing backslash requires multi-line)

1.2. Scoped block

// define some names in a scope:
{ a = 1; b = 2 }    // the whole expression evaluates to ()
a                   // ERROR - `a` is not defined in outer scope

// block returns `a`, `c` evaluates to `1`
c = { a = 1; a }

// the outer scope is visible inside the block
x = 1; y = { x + 2 }
  • Semicolons are separators, not required after last expression and before EOL/EOF

  • The block has a return value which is the result of the last expression.

  • Definitions don’t return a value - explicit expression is required instead.

1.3. Function call

neg 42           // => -42
add (1, 2)       // => 3
sub (1 + 2, 3)   // => 0
(1, 2) .add      // dot-call
1 .add 2         // infix style dot-call
  • Function call syntax is minimalistic - function name, space, argument.

  • Parentheses are optional in case of a single argument.

  • Multiple arguments are passed as a tuple.

1.4. Infix function call

Any function can be used as “infix operator”, or when comparing to object-oriented languages, as a method call, giving the first argument is the “object” on which it operates:

foo .push bar
"string".len

The evaluation rule is simple: The left-hand side expression is combined with the right-hand side in a tuple, which is passed as the function argument. The right-hand side is optional.

Spaces around the dot are optional, but numbers might need parenthesizing if the dot is not preceded by a space:

one = 1; one.add 2    // ok, but bad style
1.add 2               // parse error, "add" might be float suffix
(1).add 2             // ok, but better add a space before the dot
one. add 2            // ok, but bad style
1 . add 2             // ok, but bad style

Putting the first argument on left-hand side improves readability in some cases:

"{} {}" .format ("hello", 91)
"string".len

Unlike infix operators, functions have no precedence - they are always evaluated from left to right:

1 .add 2 .mul 3  // => 9
(1 .add 2).mul 3  // => 9
1 .add (2 .mul 3)  // => 7

The dot operator inverts the calling order. Calls can be chained:

// all these lines are equivalent
uniq (sort (a_list))        // forced right-to-left evaluation
a_list .sort .uniq          // implicit left-to-right evaluation
((a_list) .sort) .uniq      // the same, explicit

// also equivalent, the general rule still applies
list_1 .cat (list_2, list_3) .sort .uniq
cat (list_1, list_2, list_3) .sort .uniq

The dot operator has same precedence as a function call:

neg 1 .add 2  // => 1
(neg 1) .add 2  // equivalent
f 1 .combine 2 3  // evaluated left-to-right
((f 1).combine 2) 3  // equivalent

1.5. Operators

Infix and prefix operators, operator precedence:

1 + 2 / 3 == 1 + (2 / 3)
-(1 + 2)

1.6. Variables

There are no real variables. Let’s discuss what looks like variables and how it works.

All "variables" (symbolic names) are scoped and unique. It’s not possible to assign the same name again in the same scope. It’s not possible to change to what the name points, it’s always immutable. Instead, it’s possible to introduce a new name or override the name in inner scope.

// type is inferred
i = 1

// right-hand side can be any expression
j = 1 + 2
k = add (1, 2)

// error, redefinition of a name
k = 1; k = 2

// ok, inner `l` has its own scope
l = 1; { l = 2 }

// error: type of 'm' cannot be inferred
// (the third 'm' refers to the second one, not the first, outer one)
m = 1; { m = m + 1 }

// variable type can be explicitly declared
l:Int32 = k
s:String = "XCI"

There are three basic ways of naming values:

a = 1                // [1.] literal value
b = add (1, 2)       // [2.] result of expression
data c = add (1, 2)  // [3.] constant value initialized with a result of a (constant) expression

The first two cases create a function which takes no arguments and returns the expected value as the result. The compiler is free to optimize them and just point the symbolic names to precomputed values. In the third case, this is enforced. The data keyword makes sure the value is computed in compile-time and no run-time code is generated. It’s similar to consteval in C++20. The compiler emits error if the expression does not lead to compile-time value.

A function (object) can’t be assigned to data value, because that’s precisely what the keyword does — it prevents creating a function and enforces creating a data value in the compiled module.

The picture gets a little more complicated when we start to consider side effects. Without side effects, it’s not really important when the evaluation happens — everything can be lazy. But when the right side of = has side effects, the compiler switches to eager evaluation.

a = write "hello\n"      // eager: prints "hello" immediately
a = { write "hello\n" }  // lazy: `a` becomes a function that prints "hello" when called

Note: Currently not implemented. Both are lazy. To be reconsidered.

On module-level, all statements are evaluated eagerly. Code like this works as expected:

write "Hello "
flush
write "World!\n"

1.7. Function definition

Define a function with parameters:

add2 = fun (a, b) {a + b}   // generic function - works with any type supported by op+
add2 (1, 2)
add2 (1.0, 2.0)

add2b = fun<T> (a:T, b:T) -> T {a + b}      // same as above, but with explicit type variable
add2c = fun (a:Int, b:Int) -> Int {a + b}   // specific, with type declarations
add2d : (Int, Int) -> Int = fun (a, b) {a + b}  // type declaration on left side (i.e. disable type inference)

// function definition can span multiple lines
add2e = fun (a:Int, b:Int) -> Int {
    a + b
}

// possible program main function
main = fun args:[String] -> Void {
    write "Hello World!\n"
}

Function call can explicitly name the arguments:

type MyBook = (name: String, author: String, isbn: Int)
make_book = fun (name:String, author:String, isbn:Int) -> MyBook { (name, author, isbn):MyBook }
make_book (name="Title", author="Karel IV", isbn=12345)

This allows rearranging the arguments, but it doesn’t allow skipping arguments in middle (the last arguments might be left out to make partial call).

It also requires that the argument names are available together with function prototype.

Pass a function as an argument:

eval2 = fun (f, a, b) { f (a, b) }
eval2 (add2, 1, 2)                  // calls `add2 (1, 2)`
eval2 (fun (a, b) {a + b}, 1, 2)    // calls anonymous function

Return a function from a function:

sub2 = fun (a, b) { a - b }
choose = fun x { if (x == "add") then add2 else sub2 }
choose "add" (1, 2)
choose "sub" (1, 2)

Block is a function with zero arguments:

block1 = { c = add2 (1, 2); c }    // returns closure c
block2 = { c = add2 (1, 2) }       // returns ()
block1  // evaluate the block (actually, it might have been evaluated above - that's up to compiler)

a = {f = fun x {5}}; f    // ERROR - block creates new scope - f is undefined outside
a = (f = fun x {5}); f    // ok - f is declared in outer scope

Infix operators:

// C++ style operators, with similar precedence rules
// (exception is comparison operators)
1 + 2 * 3 ** 4 == 1 + (2 * (3 ** 4))
// Bitwise operators
1 | 2 & 3 >> 1 == 1 | (2 & (3 >> 1))

Record field lookup:

type MyRecord = (name: String, age: Int)
rec = ("A name", 42):MyRecord
rec.name    // dot operator

1.8. Overloading functions

Plain functions may be overloaded, but the mechanic is somewhat limited. See Type classes for more flexible construct for function overloading.

The limitations are:

  • All overloads must be defined in the same scope (module, function).

  • A forward declaration is possible only for the immediately following overload.

The overloads have to differ in a type:

f : Int -> Int = fun a { a }
f : Float -> Float = fun a { a }
f : String -> String = fun a { a }

The type has to be declared on the left-hand side, it cannot be inferred from the function type on right-hand side:

f = fun a:Int -> Int { a }
f = fun a:Float -> Float { a }  // error: redefined name f

Plain variables are also functions and may be overloaded:

a : Int = 2
a : String = "two"
a           // error - cannot be uniquely resolved
a:Int       // -> 2
a:String    // -> "two"
a:Int64     // error - cannot be uniquely resolved

1.9. Other syntax

C++ style comments:

// comment line

print "hello " /* inline comment */ "world"

/* multiline
   comment */

2. Control flow

2.1. If-expression

if x == "add" then add2 else sub2

The if-branch can occur multiple times to handle different conditions. This is equivalent to nested if-expressions but simplifies the syntax.

The if keyword is non-ambiguous in this case, so there is no need for special elif keyword and using the same keyword helps with vertical alignment.

a = if x > 0 then x
    if y > 0 then y
    if z > 0 then z
    else 0

Else branch is always mandatory - the parser needs it to find end of the expression.

  • Spec: [if <cond> then <expr>]…​ else <expr>

  • The parentheses around the condition are optional.

  • The if-expression evaluates to a value → all branches must have the same type.

Nested if-expressions are possible, still without any braces:

if a > 1 then
    if a > 10 then 10
    else 1
else 0

The parsing is well-defined, because both expressions are ended by else-branch.

A possible multiline style for complex conditions and expressions:

if (
   x == "add" ||
   x == "add2"
)
then {
    fun a, b { a + b }
}
else {
    fun a, b { a - b }
}

See also Pattern matching.

3. Language constructs

3.1. Type classes

A type class contains a set of functions for a type.

class MyEq T {
    my_eq : (T, T) -> Bool
    my_ne : (T, T) -> Bool
}

A type class can be specialized to create another, more specific, type class:

class MyOrd T (MyEq T) {
    my_lt : (T, T) -> Bool
    my_gt : (T, T) -> Bool
    my_le : (T, T) -> Bool
    my_ge : (T, T) -> Bool
}

Instantiating a type class means to define all functions it contains for a specific type:

instance MyEq Int32 {
    my_eq = fun (a, b) { a == b }
    my_ne = fun (a, b) { a != b }
}

The contained function can now be called directly on Int32:

my_eq (3, 4)

Similar classes are part of std module, but the actual implementation is different, because the equality operator translates to a call to eq function. Using the actual operator in the implementation would lead to a recursion.

The function names that are declared by a class and implemented by the instances are in global name space. That means that no other function with the same name and no other class declaring the same function name can be visible in the same module.

3.2. Pattern matching

Match expression can simplify nested ifs.

Used as simple C-style switch:

match an_int {
    1         => "one"
    2         => "two"
    3 | 4 | 5 => "three to five"
    _         => "other"
}

Use semicolon to separate multiple cases on single line:

match an_int { 1 => "one"; 2 => "two"; _ => "other" }

Enums / tagged unions:

type MyVariant = int Int | string String | none
a : MyVariant = int 42
match a {
    int x     => x.to_string
    string x  => x
    none      => "<none>"
}

Or in combination with destructuring:

match a_list {
    []     => 0
    [x]    => x
    [x, y] => x + y
    [*z]   => sum(z)
}

Standalone destructuring:

let [first, *rest] = a_list

3.3. I/O streams

Builtin functions like open, read, write, flush, error work with a set of streams that is silently passed around. Default set of streams is (stdin, stdout, stderr). To change them for a scope of an expression, use the with expression:

with (out=(open "/tmp/file.txt" "w"), err=stderr, in=stdin) {
    // output stream is now redirected to a file
    write "this goes to file.txt"
    flush
    // ...
}

This changes the set of current streams and saves the original streams on stack. When the block finishes, the original streams are restored, and the streams from the with context are released. This means that the opened file is open only inside the scope.

Internally, there are two functions: enter and leave. Before entering the inner block (second argument of with), enter function is called. It gets the first argument of the with expression as the sole argument. The value returned by enter is stored on stack. When leaving the inner scope, this value is read back from stack and passed to the leave function.

For example, in the above fragment, the following functions are called:

type Streams = (in: Stream, out: Stream, err: Stream)
enter : Streams -> Streams
leave : Streams -> Void

The functions are overloaded. Other overloads accept tuples: (out), (in, out), (in, out, err). This allows a condensed syntax:

with (open "/tmp/file.txt" "w")
    write "this goes to file.txt"

Except special parsing, with expression behaves like a normal function, taking two arguments: with <context> <expr>. The parsing is relaxed in two ways:

  • Unlike normal function call, newlines are allowed between with keyword and first argument, and also between first and second argument.

  • The second argument can be any expression, including unparenthesized if-then-else, or a function call. This is not possible in arguments of a normal function call.

The return value of the whole expression is what the inner expression returns.

3.4. Exceptions

See Side Effects below for information on how this works.

try {
    throw (Exception "Catch me!")
} catch ex:Exception {
    log "Exception caught!"
}

Braces can be omitted in case of single statement:

try this_may_throw
catch ex:Exception
    log "Exception caught!"

The parser looks for a single expression after try, which may be a braced block. Then it expects catch keyword followed by a variable and again a single expression.

Catch all possible exceptions - use generic type T:

try
    this_may_throw
catch ex:T
    log "Exception caught!"

4. Types

4.1. Built-in types

Primitive types:

12d, 12:Int32       // Int32
12, 12:Int64        // Int64 (alias Int)
1.2f, 1.2:Float32   // Float32
1.2, 1.2:Float64    // Floaf64 (alias Float)
true, false         // Bool
b'a', 'a':Byte      // Byte=UInt8     -- ASCII
27b, 27:UInt8       // UInt8          -- binary 0..255
'a', 97:Char        // Char          -- Unicode (32bit code point)
Table 1. Numeric types
Type Literal Description

UInt8, Byte

42u8, 42b, b'@'

ASCII or binary byte (8bit)

UInt16

42u16, 42uh

unsigned short int (16bit)

UInt32

42u32, 42ud

unsigned int (32bit)

UInt64, UInt

42u64, 42ul, 42u

unsigned long int (64bit)

UInt128

42u128, 42uq

unsigned wide int (128bit)

Int8

-42i8, -42c

signed byte (8bit)

Int16

-42i16, -42h

signed short int (16bit)

Int32

-42i32, -42d

signed int (32bit)

Int64, Int

-42i64, -42l, -42

signed long int (64bit)

Int128

-42i128, 42q

signed wide int (128bit)

(Float16)

(4.2f16, 4.2h)

(reserved) half-precision float (16bit)

Float32

4.2f32, 4.2f

single-precision float (32bit)

Float64, Float

4.2f64, 4.2l, 4.2

double-precision float (64bit)

Float128

4.2f128, 4.2q

quadruple-precision float (128bit)

Note: Some of the short suffixes cannot be used in specific contexts, for example in 0xFFb the b is not interpreted as type modifier, but as a hex digit. Use long suffix instead: 0xFFu8. Same applies to c and i8. Similarly, use 0xFFi32 instead of 0xFFd. Note that 0xFFud is fine.

The f suffix may convert integer literal to float: 12f is same as 12.f or 12.0f. This does not work with l or q, which require the decimal dot to distinguish between Int and Float.

Table 2. Type aliases for interfacing with C
Type alias C type Description

(CChar)

char

just use Int8 or Uint8

(CShort)

short

just use Int16 or Uint16

CInt, CUInt

int, unsigned int

might be 16bit, but is always 32bit on modern platforms

CSize

size_t, uintptr_t

arch-dependent size (pointer-sized)

COffset

ssize_t, ptrdiff_t

arch-dependent size (pointer-sized)

These types don’t have their own literals. The types in parentheses are not available, because they are easily replaceable by basic types (see Description).

Composite types:

b"abc"              // [Byte]
[10b, 11b, 13b]     // [Byte]         -- equivalent to the "bytes" literal
"Hello."            // String         -- UTF-8 string
['a', 'b', 'c']     // [Char]         -- compatible with String, but not the same
[1, 2, 3]           // [Int]          -- a list
("Hello", 33)       // (String, Int)  -- a tuple
()                  // () aka Void    -- empty tuple

The type of value is inferred from the literal. Assigning literal of a type with smaller range is fine. Assigning a value of bigger range is only fine if it fits, otherwise it’s a compile-time error.

ok = true           // inferred type Bool
c = 'a'             // inferred type Char
byte = 27b          // inferred type Byte
b1:Byte = 12        // ok
b2:Byte = 300       // error
b3:Byte = c         // error, not a literal, must be casted explicitly
b4:Byte = c:Byte    // cast ok, value clipped

Strings and lists have the same interface and can be handled universally in generic functions. List of chars has different underlying implementation than String: it stores 32bit characters, allowing constant-time indexing, but taking more space. String is UTF-8 encoded, random access is slower (linear-time), but it takes less space.

4.1.1. Void type

_0 ? v:() = ()
_1 ? .d v
Function v: () -> ()

A special case of the tuple type is an empty tuple (), also known as Void. A value of this type, which is also written as () (alias void), doesn’t carry any data. A function taking () as input won’t pull anything from data stack. A function returning () doesn’t push any data on the stack. The size of the value is effectively zero, so the value doesn’t exist at all. It’s only known to the compiler for the purpose of type checking.

4.2. User-defined types

User-defined types are made by giving a name to a type, or to a composition of types. All type names must begin with uppercase letter (this is enforced by the compiler):

type MyVoid = ()     // empty tuple => Void
type MyType = Int    // make new type by giving other type a new name
type MyTuple = (String, Int)
type MyStruct = (name:String, age:Int)    // struct (a tuple with named fields)
type MyBool = false | true   // enum
type MyUnion = Int | String | Void   // tagged union
type MyVariant = int Int | string String | none   // tagged union with explicit names
type MyOptional T = some T | none   // generic type (a kind?)
type MyOptionalInt = MyOptional Int   // instance of the generic type
type MyFunction = ([Int], Int) -> Int

The type definition creates a new type known to a compiler. The original type can be cast to the new type (and vice versa), but it does not coerce. On the other hand, literals always coerce when the underlying type is the same.

For example:

type Number = Int
f = fun (a:Number, b:Number) -> Number { a+b }  // `add` must be implemented for Number
f (11, 22)   // OK, literals coerce
a = 11; b = 22:Number
f (a, a)   // Error: `f` expects Number, not Int
f (b, b)   // OK
f (a:Number, b)   // OK, returns 33:Number

Tuples and structs have the same rules. In addition, a struct can be initialized with a tuple of same types.

type MyStruct = (name:String, age:Int)
// All of these initializations are valid:
a: MyStruct = ("Luke", 10)
a: MyStruct = (name="Luke", age=10)
a = ("Luke", 10):MyStruct
a = (name="Luke", age=10):MyStruct
// Variables and function return values need explicit cast
a = ("Luke", 10)
b: MyStruct = a   // Error: cannot assign (String, Int) to MyStruct
b: MyStruct = a: MyStruct  // OK
b = a: MyStruct  // same, type of `b` is inferred

get_age = fun st:MyStruct -> Int { st.age }
get_age a   // Error, variable does not coerce
get_age { ("Luke", 10) }  // Error, the block's return value doesn't coerce
get_age a:MyStruct  // OK

// Anonymous struct
a: (name: String, age: Int) = ("Luke", 10)
// Alias - the type of `a` is still an anonymous struct
Record = (name: String, age: Int)
a: Record = ("Luke", 10)

It’s also possible to make an alias of a type:

MyInt = Int
MyFun = String -> String

The alias can be used in place of the actual type. It’s replaced by the actual type wherever it’s used.

Function types:

(a:Int, b:Int) -> Int               // with parameter names
(Int, Int) -> Int                   // without parameter names
((Int, Int), Int) -> Int            // two arguments, first is tuple
(Int, Int) -> Int -> Int            // function returning another function
Int -> Int -> Int -> Int            // curried function (this is different type than the previous one)

The currying is explicit. Partial function calls are not automatic, they need to be done explicitly via a lambda. The syntax for calling the curried function is different from normal call:

f1 = fun (a:Int, b:Int) -> Int = { a + b }
f2 = fun a:Int -> Int -> Int { fun b:Int -> Int { a + b } }
f1 (1, 2)
f2 1 2

The space is a "function application operator". It can be chained (the last line). Multiple arguments of a single function needs to be passed as a tuple.

4.3. Forward declarations

Type of function or variable can be declared using decl statement.

The declared name can be used inside blocks even before giving it a value:

decl x:Int  // declare x
y={x}       // reference x inside a block
x=7         // define x
y           // -> 7

Similarly, postpone a function definition:

decl f: Int->Int     // declare function f
w = fun x {2 * f x} // reference f inside another function
f = fun x {x + 1}   // define function f
w 7                 // -> 16

4.4. Casting

Any expression can be cast to another type. The syntax is similar as in variable definition with explicit type.

42:Int64
a = 42; a:Byte
(1 + 2):Int64
['a', 'b', 'c']:String   // -> "abc"

Effectively, this calls a cast function:

a = 42:Int64
// is equivalent to
a = (cast 42):Int64
// also equivalent to
a:Int64 = cast 42
// this won't work - the target type has to be specified somehow
a = cast 42

The cast function can be implemented for custom types like this:

instance Cast MyType Int {
    cast = fun x:MyType { /* convert MyType to Int */ }
}

instance Cast Int MyType {
    cast = fun x:Int { /* convert Int to MyType }
}

4.5. Initializers

Type names can be called. This syntax is translated to call of an init function of Init class. The default init function calls the cast function, so it supports same operations as Casting:

Int64(42)
a = 42; Byte(a)
Int64(1 + 2)
String ['a', 'b', 'c']   // -> "abc"

As with cast, the init function can be called explicitly:

a:Int64 = init 42
(init 42):Int64

User-defined types can have custom initializers:

type MyType = (Int, String)
instance Init Int MyType {
    init = fun a { (a, "Foo"):MyType }
}
MyType(42)  // -> (42, "Foo")
42:MyType  // Error, cast doesn't fall back to init

The initializer can also be called with the dot syntax:

(42).Int64
42 .Int64
42.Int64  // Syntax error: could be float literal suffix
a = 42; a.Byte
['a', 'b', 'c'].String   // -> "abc"

4.6. Coercion

The values of the same kind can coerce to a bigger type. For example, Int32 or Byte can be used in a function accepting only Int64. When resolving overloads, the most specific one and the closest one is used. For a Byte value, an Int32 overload is used if it exists, otherwise Int64 etc.

4.7. Lists

Lists are homogeneous data types:

nums = [1, 2, 3, 4, 5]
chars = ['a', 'b', 'c', 'd', 'e']

List of chars is equivalent to a string.

Basic operations:

len nums == 5
empty nums == false

head nums == 1
tail nums == [2, 3, 4, 5]
last nums == 5
init nums == [1, 2, 3, 4]

take 3 nums == [1, 2, 3]
take 10 nums == [1, 2, 3, 4, 5]
drop 3 nums == [4, 5]
drop 10 nums == []

reverse nums == [5, 4, 3, 2, 1]
min nums == 1
max nums == 5
sum nums == 15

Subscript (index) operator:

// zero-based index
nums ! 3 == 4
// note that this calls `nums` with list arg `[3]`
nums [3]   // not subscription!

Concatenation:

cat (nums, [6, 7])              // =>  [1, 2, 3, 4, 5, 6, 7]
cat ("hello", [' '], "world")   // =>  "hello world"
cons (0, nums)                  // =>  [0, 1, 2, 3, 4, 5]

Ranges:

[1..10] == [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
['a'..'z']

Comprehensions:

[2*x for x in [1..10] if x > 3]
[2*x | x <= [1..10], x > 3]

5. Type introspection

There is a set of builtin functions that can be called on types to obtain various meta information.

The functions are usually generic, instantiated with type arguments, and taking no real arguments.

type_name = fun<T> Void -> String { /* compiler intrinsics */ }
// Used as:
type_name<String>   // -> "String"
type_name<[Int]>   // -> "[Int32]]"
String.type_name    // equivalent to type_name<String>
[Int].type_name     // ParseError - this notation works only on named types

The last line is a syntax sugar. Dot-call on a type is transformed into applying that type as a type argument of the function. It can still take normal arguments:

String.myfun "hello"  // => myfun<String> "hello"

A special builtin TypeOf can be used to obtain the type of expression. It takes a single argument and returns its type. It can only be used in context where a type is expected:

String   // syntax error
TypeOf "abc"   // syntax error
(TypeOf "abc").type_name     // -> "String"
type_name<TypeOf "abc">     // -> "String"

6. Side Effects

Each function may have side effects. Writing to a disk or throwing an exception are examples of such side effects. The effects are gathered from any called functions, and the parent function is flagged. The effects are visible in the function prototype, and they can be declared also explicitly (this is needed only for native functions). The effects may be used for optimizations - a pure function can be automatically memoized, for example.

Side effects supported at the moment:

  • in, out, err - I/O streams

  • exc - Exceptions

Other side effects:

  • random - random function, the return value is not deterministically linked to parameters

  • noreturn - may not return, e.g. exec, exit

At all times, each function has three streams at disposal: in, out, err. If it touches one of these streams, it’s flagged accordingly (the effects have the same names).

The streams are always pointed somewhere. It may be the default character stream (stdout etc.), a file, a socket or even a special null stream. When a function sets the out stream to a null stream, and then calls some other function which is flagged with the out effect, the calling function is not flagged and can still be considered pure and optimized accordingly.

You can think about streams as three hidden parameters and return values. They might be returned untouched or processed in the function body and returned modified.

Another effect is exc, which allows throwing exceptions. This is basically a hidden return value. It’s implicitly handled (imagine an if condition and early return with the same hidden value), but it can also be handled explicitly by a try-catch construct.

By catching all exceptions, the exc effect is no longer propagated. Note that it’s not possible to track a set of actually thrown exceptions, so the only way to prevent automatically adding the exc effect to the calling function is to catch all possible exceptions thrown by any called function with the exc effect.

Declaring the side effects explicitly:

f = fun msg:String | out exc
{
    write msg   // this may throw
}

// type of f: String -> Void | out exc

Undeclaring the side effects (if compiler adds them, but you want to override it):

f = fun msg:String | !out !exc { write msg }
// type of f: String -> Void
f  // this call can be removed by the compiler, because it has no effect
   // according to the type of `f`

The write function will still use the out stream and possibly throw an exception. But the compiler is now free to ignore the side effects and optimize-out the f function completely, because it returns Void and does not have any (declared) side effects.

7. Modules

A top-level translation unit is named Module. Module-level statements are either Declarations or Invocations. Declaration can be written in any order, each name can be used only once in a scope. Named functions or expressions, type classes, instances — all are Declarations.

7.1. Invocations

Invocations are order dependent - when executing the Module, each Invocation is evaluated and its result is passed to Executor, which is special function (possibly hardcoded in C++) which gets a result from each Invocation, processes it and passes another value to next Invocation. The previous value can be accessed inside the Invocation under special name: _.

Given this source file:

1 + 2
3 * _

Imagine that it’s executed like this:

_0 = void
_1 = executor (fun _ { 1 + 2 } _0)
_2 = executor (fun _ { 3 * _ } _1)

The Executor can do anything with the results, for example:

  • print them to the console (i.e. just printing the program output)

  • interpret them as drawing commands (i.e. implementing something similar to PostScript)

  • test them for a condition (i.e. unit testing)

  • concatenate them as an HTTP response (i.e. Web application)

  • implementing anything else that needs a sequence of records

7.2. Importing modules

my_mod = import "my_mod"    // import only Declarations
my_mod::func                // run function imported from module `my_mod`
my_mod                      // run all associated Invocations
  • In the last line, the whole module is executed.

  • The first Invocation from the module gets current '_' value.

  • The statement returns the result of last Invocation in the module.

Module names must be valid function names, i.e. start with lower case letter.

Module import paths are configurable, by passing -I option to compiler, by setting them in config file or via C++ interface.

All configured paths are searched in order (which yet needs to be defined), checking for existence of source file or bytecode file:

  • Source file pattern: <import_path>/<requested_name>.fire

  • Bytecode file pattern: <import_path>/<requested_name>.firm

  • <import_path> is one of paths specified by -I etc.

  • <requested_name> is the string from import statement, without quotes (it may contain slashes, e.g. "lib/mod")

  • The file extension might be configurable too, especially in the embedding scenario.

If only the source file is found, it will be compiled on-the-fly, in memory.

Bytecode cache: a directory used to store and retrieve the bytecode of the on-the-fly compiled modules.

7.3. Compilation

The complete program is composed of main source plus all imported modules, each of which is compiled into bytecode. The interpreter gathers all the modules (resolving transitive dependencies and possibly compiling some modules on-the-fly) and builds a module tree. Then it starts executing the main module:

  • Embedded interpreter: Calls a provided callback (Executor) for each Invocation and then returns final result to the caller.

  • CLI interpreter: Prints the value from each Invocation and then prints the final result. The "print" action can be configured (e.g. null-terminate, call a program etc.)

A possible "LTO" optimization: Put all modules together and compile-in the Executor. For example, Null Executor would throw away all intermediate results from Invocations, so the related code can be thrown away, too.

8. Syntax decisions

8.1. Semicolons separate statements on one line

Decision:

  • Semicolons are used to separate statements (not terminate them).

  • Line-break also separates statements, in most cases. While having one statement per line, semicolons are optional.

Reasoning:

  • Mandatory semicolons would allow slightly simpler grammar for parsing the language, but semicolon-free code is a little easier to write, and it looks cleaner—​semicolons before line-breaks are mostly just noise.

  • The main drawback is when a statement spans multiple lines, it needs either a special guide (e.g. escaping newlines), or the grammar needs special rules (parenthesized expression, continuation of expression when a line begins or ends with an operator).

  • Example of a function call spanning multiple lines:

    // with mandatory semicolons
    some_fun 1 2 3
        b 4;
    
    // with optional semicolon, using a guide
    some_fun 1 2 3 \
        b 4
    
    // with optional semicolon, using parentheses
    (some_fun 1 2 3
        b 4)
  • Depending on how you look at the example, you may find some of the example snippets more readable. But it’s mostly just matter of taste. Note that you can always add the semicolon, even when it’s optional.

  • Some languages, like Python or Haskell, use code layout (indent) to recognize continuation. This doesn’t help to make the language easier to parse either.

9. Appendix

9.1. List of keywords

catch
class
else
fun
if
import
instance
in
match
module
then
try
type
with

9.2. Operator precedence table

Table 3. Operator precedence

(-2)

definition

=

(-1)

condition

if …​

1

comma

,

2

logical or

||

3

logical and

&&

4

comparison

== != <= >= < >

5

bitwise or, xor

| ^

6

bitwise and

&

7

bitwise shift

<< >>

8

add, subtract

+ -

9

multiply, divide

* / %

10

power

**

11

subscript

x ! y

(12)

function call

<callable> <arg>

12

dot function call

<arg> . <callable>

(14)

unary ops

- + ! ~

(15)

cast

<val> : Type

Higher precedence means tighter binding.

Infix operators have numbered precedence, which can be easily changed in compiler implementation. The other precedences are hard-coded in parser grammar. Function call is a hybrid - it’s partially hard-coded but also uses the precedence parser with priority same as dot-call, so in a combined expression, evaluation goes from left to right.

9.3. Terminology

9.3.1. Parentheses, braces, brackets

In this document and in the code, the various brackets are called similarly as in C++: parentheses or parens (round), braces (curly), brackets (square), and angle brackets (not chevrons, because they are actually different glyphs).

Table 4. Brackets
Type Name Usage

{}

braces

blocks of code

()

parentheses

parenthesizing of expressions, tuples

[]

(square) brackets

lists

<>

angle brackets

type parameters