The Hello World Overview of the Rust Programming Language

With the advent of virtual machines and containerization, the neglected art of systems programming - that is, using a language that compiles to native code and runs without any sort of runtime support (as in, a managed code runtime a la the CLR or JVM) - finds itself the subject of study by a growing number of programmers. After all, when building a Docker image designed to be run in parallel in hundreds of instances, one of the goals is to minimize the actual execution footprint. Go is one such systems language that's drawing interest; another is the subject of this article: Rust.

From its home page (at Rust http://doc.rust-lang.org/book/README.html), describes itself as “a systems programming language focused on three goals: safety, speed, and concurrency”; in other words, it wants to supplant C++ as the “down-to-the-metal” language of choice. As the documentation goes on to say, Rust “maintains these goals without having a garbage collector, making it a useful language for a number of use cases other languages aren't good at: embedding in other languages, programs with specific space and time requirements, and writing low-level code, like device drivers and operating systems.”

However, absent any strong desire to write your own operating system, Rust still is an interesting language to explore, largely by virtue of its approach to raw pointer semantics - because Rust doesn't run with a garbage collector, memory ownership semantics take a front seat in the language, just as they do with C or C++. However, Rust doesn't require explicit release calls (such as free or delete from C or C++), and instead uses a clever combination of syntax and semantics to allow the compiler to enforce safety without compromising speed (or relying upon a runtime, which would also potentially hurt its speed).

Enough chit-chat; let's get Rusty.

Installing

Getting Rust onto your system is pretty trivial; being a system language, Rust consists principally of a compiler (along with a build system, which you'll see shortly) and standard library system, and no other runtime footprint. This means that Rust applications, once built, are specific to the platform on which they were compiled, similar to C/C++ or other natively-compiled programs. Obtaining the compiler is a simple matter of wandering on over to the Rust website (https://www.rust-lang.org) and clicking the Install button - it uses browser magic to decide what system you're on, and sends the appropriate download your way. Barring that, go to https://www.rust-lang.org/downloads.html and choose the appropriate one for the platform on which the Rust toolchain will be built.

For Windows, Rust offers two installers; one designed to work with the GNU GCC toolchain on Windows (MSYS, to be specific), and one designed to work with Visual Studio. Given that many CODE readers already have Visual Studio installed, that's probably the better choice most of the time.

Once installed, check to see that the Rust toolchain is on the PATH by running the Rust compiler, rustc, with its version flag, like this:

rustc --version

At this point, the compiler will respond (very quickly!) with its current version number; as of May 26, 2016, that version is 1.9.0. Now you have Rust on your system, you've run rustc for the last time, and you're ready to go with the world's simplest Rust program.

Hello, Rust

That last sentence may have come across a little confusing - a compiled language like Rust should be calling the compiler all the time, so… what gives? Idiomatically, the designers of Rust decided that the build and package system would be a part of the standard toolchain, rather than a separate entity, so right from the beginning, Rust programmers are taught to make use of cargo, the Rust packaging and build system. It's what's used to create a Rust project, build it, and even run it during development.

Right from the beginning, Rust programmers are taught to make use of cargo, the Rust packaging and build system.

Thus, since every new language requires programmers to write the usual homage to Kernighan and Pike's “Hello World” program, use the build system to create a new Rust application called “hello_world” by using Cargo to create the project directory structure and configuration files. This is done with:

cargo new hello_world --bin

This creates a new subdirectory called “hello_world”, and inside of it, generates a Rust project designed to compile into an executable file. (Leaving off the “–bin” flag causes Cargo to create a library project, which generates a DLL on Windows or a static-link library on MacOS or Linux.)

Inside of “hello_world” is a file, “Cargo.toml”, which acts as the project manifest file (much as a .sln or .csproj file does for Visual Studio), and a subdirectory, src, which is pretty self-explanatory. Inside of the source directory, Cargo has already scaffolded out a simple program, which looks pretty familiar to anyone comfortable with the C# syntax:

fn main() {
    println!("Hello, world!");
}

It's pretty straightforward: a main() function, which in turn makes use of what looks like a function to print to the command-line. (As it turns out, println! isn't actually a function, but a macro, which is why it has the exclamation point suffix; for the moment, however, the distinction isn't relevant.) There are semicolons, parentheses (which are for function parameters), and those good-ol' trusty curly-brackets. Rust clearly seeks to be lexically familiar to the C-family language programmer crowd.

To compile and run this, Cargo offers the run option; from the directory in which Cargo.toml is defined, like this:

cargo run

In a few seconds (literally), Rust has compiled the source and executed the resulting application. More importantly, it tells you that it's compiled the source and is running it out of target/debug, which is a new directory where the compiled artifacts end up. (Poking your head into target reveals that it contains a few intermediate files, too, much as the bin/Debug or bin/Release directories do in a .NET project.)

So far, so good. However, there's just not enough here, so let's take a pass at a slightly more complicated program, a simple guess-the-number program (which happens to be the focus of the Rust documentation's tutorial).

Guess the Number

A new program demands a new project space, so get Cargo to flesh one out:

cargo new guessing_game --bin

Pop into the generated guessing_game subdirectory, and crack open the main.rs file in an editor. Change the “Hello, world” string to “Welcome to Guess the Number” and do a quick cargo run to make sure that everything's copacetic.

For those who were unlucky enough to never play games as a child, a “guess-the-number” game works like this: one player (the computer, in this case) thinks of a random number between 1 and 100. The other player offers up guesses, to which the first player responds with “too high” if the guess was above the secret number, “too low” if the guess was too low, or “You win” if it's correct. (For most of us, this is how we learned how a binary search algorithm works.)

First of all, the program needs to be able to generate random numbers, which isn't part of the default standard library. To use the random number generator, you need to tell Rust to make use of a crate - an external library that Cargo automatically pulls down and installs for you. (I'm not sure what cargo and crates has to do with rust, but these are the terms that the language uses, so…) This requires adding it to the [dependencies] section in Cargo.toml, like so:

[package]
name = "guessing_game"
version = "0.1.0"
authors = ["Ted Neward <ted@tedneward.com>"]

[dependencies]
rand="0.3.0"

Everything but the last line will already be there from the scaffolding. The [dependencies], as its name implies, specifies that this program will depend on version 0.3.0 of the rand crate. This is exactly like a NuGet dependency, and just as a NuGet dependecy is downloaded during a .NET build, this crate is downloaded and cached for current and future builds. The Cargo documentation talks about how to manage the semantic versioning that Rust uses for crates, for those who want to discover more, but for the most part, it behaves very closely to how NuGet's own versioning scheme works.

Next, the crate needs to be referenced from within the code (so that there's a way to refer to the contents of the crate). This is done via the extern directive, like so:

extern crate rand;

In addition, Rust offers a lexical-reference mechanism similar to C#'s using, which Rust shortens to use:

use std::cmp::Ordering;
use std::io;
use rand::Rng;

Just as using does, use takes types or symbols from a namespace and makes them referenceable without qualification. So, for example, now the type Rng can be used without having to fully-qualify it as rand::Rng.

As “hello_world” demonstrated, Rust begins execution in a function called main, so the next step is to use main to generate a random number (and for this tiny little program, display it, just so that there's a quick way to exit if desired). Then it asks for a guess from the user, compares it and displays the results, and loops around if it's not been guessed correctly. That looks like Listing 1.

Listing 1: Guessing random numbers

fn main() {
    println!("Guess the number");

    let secret = rand::thread_rng().gen_range(1,101);
    println!("Shhhh: secret number is {}", secret);

    loop {
        let guess = get_guess();
        if guess == -1 {
            continue;
        }
        println!("You guessed:  {}", guess);

        match guess.cmp(&secret) {
            Ordering::Less => println!("Too small!"),
            Ordering::Equal => {
                println!("CORRECT!");
                break;
            },
            Ordering::Greater => println!("Too large!")
        }
    }
}

Although the language is clearly C-influenced, a number of differences make themselves apparent right off the bat.

fn. Unlike C#, which makes use of position and parentheses to determine whether a particular construct is a method, Rust follows a more Javascript-ish style of using a keyword, fn, to indicate a function definition. The main function here takes no parameters and yields no results, hence no decorations inside the parentheses or declared return type. (You'll see an example of doing both shortly.)
let. New variables - or, to be more precise, value bindings - are declared using let. Rust is a strongly typed language, but makes heavy use of type inference (much like C#'s var keyword) to keep code syntactically terse. In this particular case, the type of the value guess is inferred from the return type of the function get_guess (which you'll examine in a moment). However, unlike most system languages (and like most functional languages), a value binding is immutable by default - once defined, a binding cannot be changed. If the binding needs to be modifiable, add the mut (short for mutable) modifier after the let, as in let mut x = 12. Rust makes a very distinct difference between mutable and immutable values, in a manner that's reminiscent of C++'s const.
if. As is becoming more common, the if construct is the same decision-making construct found in almost every other language, but the parentheses around the tested expression are optional. Fortunately for those of us who grew up in the C/C++/Java/C# world where parentheses weren't optional, it isn't an error to include them, and in some cases, it's helpful when working with more complicated expressions.
loop. Although Rust also supports traditional while and for loops, Rust prefers that any sort of infinite loop - such as the one used here - be written using the straight loop construct. This allows Rust to generate more optimal native code, according to the documentation, than using a construct like while true. The keywords continue and break interact with loops the same way as they do in other C-based languages.
match. Like many other languages before it, Rust also supports a pattern-matching construct, which, on the surface, appears to be quite similar to the traditional C/C++/Java/C# style switch statement. The value guess is a native 32-bit integer (because that's what's returned from the get_guess() function), but supports a compare method, cmp, which returns an instance of the enumerated type Ordering, which can be Less, Greater, or Equal. The match construct takes the value to compare against (the secret value), and then accepts a list of possible values and code to execute. In this case, if it's Less or Greater, the program prints the appropriate message and passively allows the loop to circle back around. If the value is exact, it uses the break keyword to exit the loop, fall off the end of main(), and terminate the program.

By the way, curious readers may notice the & in front of the use of the local value secret and wonder if that's taking the address of secret, the way the same symbol does in C/C++. It's not, but what's actually happening there is a lot more complicated than what I can discuss right this second, so hold on to that thought for a bit.

The other part of the code to examine is the get_guess function, which looks like this:

fn get_guess() -> i32 {
    println!("Please input your guess:");
    let mut guess = String::new();
    io::stdin().read_line(&mut guess).
        expect("Failed to read line");

    let guess: i32 = match guess.trim().parse() {
        Ok(num) => num,
        Err(_) => -1
    };
    guess
}

First, get_guess is declared as taking no parameters, but it returns an unsigned 32-bit integer (i32). Rust, being a natively compiled language, has the full range of integral and floating-point types that are common in many other system languages, using these short type names (i32, u64, and so on).

Notice that allocating a new string uses the associated function new() to create a String. (The term “associated function” is what Rust calls those things that C# refers to as static methods.) More importantly, notice that guess is annotated with the modifier mut; this makes the value a mutable value - in essence, a variable. This is what allows Rust to put whatever the user types in at the command line into the guess variable via the read_line() method. Notice as well that the &mut appears before the use of guess as the parameter in the read_line. This has everything to do with Rust's concepts of ownership, which I'll get into shortly.

In the meantime, the returned value from read_line isn't a string, but an io::Result type, and the compiler issues warnings or errors if the Result instance isn't examined. In this particular case, calling the Result's expect method examines the result, and if the Result isn't a legitimate value, Rust terminates the program and prints the passed-in message. It's one way that Rust enforces the safety that it promises.

After having read in the value, get_guess trims the String (to get rid of the carriage return read_line put into the string), then parses it into an integer value. Rust follows the more modern approach of returning an Option type, an enumeration which can have either Ok or Err as possible values, along with an associated value (what other languages call a discriminated union?). Rather than doing explicit if/else comparisons, idiomatic Rust uses pattern-matching to examine the results, and here we see that the match can either be an Ok result with an associated value (num), or an Err result (whose associated value you don't care about, represented by an underscore). More importantly, the match construct, like most things in Rust, is an expression, which means that it returns a value - in the case of Ok, it will be num, and in the case of, it will be -1. Either of these is then stored into the newly-defined local value guess, which is declared to be an i32 type using a type annotation (the i32 following the colon right after the name). Lastly, that second guess is the value that needs to be returned, and Rust, like many new languages, holds that the last expression in the function as its implicit return value.

Give it a cargo run, and guess the number. (It shouldn't be too hard, because you print it out right at the beginning.)

Composite Elements

Rust isn't just a procedural language, despite what the above might imply. Rust also contains a number of higher-level constructs, similar in many ways to a traditional OO language, including a few that C# doesn't have (but should).

First on this list are tuples, which will be familiar to any developer who's explored F# or some other of the more recent languages. A tuple is effectively an unnamed collection of fields of any type, declared and defined using a parenthesized list of types (for declarations) or values (for definitions), like so:

fn get_another_guess() -> (bool, i32) {
    println!("Please input your guess:");
    let mut guess = String::new();
    io::stdin().read_line(&mut guess).
        expect("Failed to read line");

    return match guess.trim().parse() {
        Ok(num) => (true, num),
        Err(_) => (false, -1)
    };
}

Here, the function declares that it returns a two-part tuple (sometimes called a pair), the first part being a Boolean value indicating whether or not the guess was successful, and the second part being the value guessed. Tuples can be of any length, but are still strongly typed, such that a (bool, i32) value is distinctly different and separate from a (i32, bool) value.

Next, Rust supports traditional named structures, structs, where the fields are individually named, like so:

struct Point {
    x: i32,
    y: i32,
}

fn main() {
    let origin = Point { x: 0, y: 0 };

    println!("The origin is at ({}, {})",
        origin.x, origin.y);
}

Structs, like every other value in Rust, default to being immutable, so that once defined (such as the origin in the above example), the contents of a given struct remain constant forever. However, you can use the mut modifier to indicate that a struct can be modified, like so:

fn main() {
    let mut point = Point { x: 0, y: 0 };
    point.x = 5;
    println!("The point is at ({}, {})",
        point.x, point.y);
}

However, one thing that Rust distinctly lacks is the ability to selectively set mutability on an individual field basis; there's no ability to “mix” a mutable field in with an immutable (default) field in the Point struct. This is not an accident or a flaw in Rust's design; it has to do with Rust's view of how mutability works. Specifically, in the documentation, Rust states, “Mutability is a property of the binding, not of the structure itself.” This means that, to Rust programmers, mutability isn't an aspect that can (or should) be defined in the structure, but in the usage of the structure. If a value needs to be mutable, it's declared as such (using mut), and otherwise, it remains immutable. This is part of how Rust avoids some of the “const-correctness” mayhem that C++ wandered into.

One thing that Rust distinctly lacks is the ability to selectively set mutability on an individual field basis.

Rust also combines tuples and structs together into an interesting new type, what it calls tuple structs, a named type that has unnamed fields within it:

struct Color(i32, i32, i32);
struct Point(i32, i32, i32);

let black = Color(0, 0, 0);
let origin = Point(0, 0, 0);

In other languages, this is sometimes called a “case class.” Were these two values tuples, they would be equivalent, as they are each made up of three i32 types. Instead, because they are tuple structs, they are nominally-typed (meaning they are named), and therefore they are not equivalent, despite the contents being bit-equivalent (all zeroes).

Rust also permits the definition of methods on a specific struct, like so:

#[derive(Debug)]
struct Point {
    x: i32,
    y: i32
}

impl Point {
    fn offset(&self, dx: i32, dy: i32) -> Point {
        Point { x: self.x + dx, y: self.y + dy }
    }
}

This provides methods in the same way that C++/Java/C# provide methods on objects:

let origin = Point { x: 0, y: 0 };
let other = origin.offset(2, 2);
println!("origin = {:?} and other = {:?}",
    origin, other);

The line above the Point struct is an attribute (in the same spirit as a C# attribute), and it states that Point derives from the trait called Debug. Traits in Rust are more like interfaces in that they can declare methods that must be implemented by anything that implements them, and in this particular case, Debug is a trait that allows Rust to print a “debug view” of the value, such as in the case where println! is used. (This is usually sufficient for any non-user-facing output.) Rust supports a number of other attributes, to support conditional compilation as one example, or to mark methods that are test methods, similar to how MSTest or XUnit use .NET attributes.

Traits are, as mentioned, similar to .NET interfaces, and are declared using the trait keyword:

trait HasArea {
    fn area(&self) -> f64;
}

Then, if there is a type to which the trait should apply, like a Circle (which, of course, has an area):

struct Circle {
    x: f64,
    y: f64,
    radius: f64,
}

Then the Rust programmer can create an implementation of HasArea::area for a Circle like so:

impl HasArea for Circle {
    fn area(&self) -> f64 {
        std::f64::consts::PI * (self.radius * self.radius)
    }
}

Unlike the .NET inheritance syntax, note that the trait being implemented comes before the type on which it is being implemented; this takes a few tries to get correct when first working with Rust.

There's certainly much more to the Rust language (including, among others, features like Generics, operator overloading, or the Drop trait, which defines what should happen when a struct goes out of scope, similar in concept to how a C++ destructor works), but it's time to examine one of Rust's more interesting and important notions: Ownership.

Ownership

One of the principal pain points of C++ (and its predecessor, C) is that of ownership semantics: Who, exactly, is responsible for the release of a particular object or other block of memory? When using stack-allocated variables, this is trivial, because when the code block terminates, anything declared locally on the stack in that block is released, but when using dynamically-allocated entities, this becomes a huge issue. Various platforms have approached this in different ways. Greybeards will recall the days of Microsoft COM and its ownership semantic: reference counting, manipulated through AddRef and Release. Java and C# choose to use a garbage collector, which tracks object “reachability” and releases objects when it's no longer reachable from a root set of references. Classic C/C++ offers the least in the way of assistance, essentially leaving it entirely up to the developer to devise their own rules or scheme; more modern C++ libraries have begun to offer “smart pointer” semantics to allow the compiler to assist in various ways (using the C++11 shared_ptr and its kin). Rust uses a different approach, simultaneously different (and therefore awkward to work with at first) and a little more abstract.

One of the principal pain points of C++ (and its predecessor, C) is that of ownership semantics: Who, exactly, is responsible for the release of a particular object or other block of memory?

First, Rust dictates that a specified binding (value declaration) can only be consumed once. To understand what that means, let's start with a very simple declaration:

fn foo() {
    let v = vec![1, 2, 3];
}

Here, foo() creates a local value v of a vector type, essentially a resizable array, allocated on the heap. When foo() returns, Rust de-allocates everything, including the dynamically allocated vector. This is typical Rust.

In some cases, you may want to have a second reference to the same vector, which is trivial to write as well:

fn foo() {
    let v = vec![1, 2, 3];
    let v2 = v;
}

However, if what I said earlier holds, theoretically, when foo() exits, Rust should try to deallocate this vector twice: once for v, and a second time for v2. This kind of double-deallocation bug usually results in ugly results in C++ code, and which is why this kind of “aliasing” was frowned upon in other languages.

Rust takes a different approach: by suggesting that a given value can only be “consumed” (used) once. If code tries to use v after creating a second binding to the same vector, Rust complains:

fn foo() {
    let v = vec![1, 2, 3];
    let v2 = v;
    println!("v is {}", v);
        // error: use of moved value: `v`
}

By creating the second binding to the vector, Rust has essentially moved ownership of the vector to the second binding, v2, and v is no longer a viable path to that vector. That way, when foo() returns, only the one that has ownership of the vector - v2, in this case - needs to clean up the vector upon exit, thus avoiding a double-deallocation problem.

As it turns out, this concept of move semantics happens any time a value is passed to a function, as well:

fn take(v: Vec<i32>) {
    // what happens here isn't important.
}

let v = vec![1, 2, 3];

take(v);

println!("v[0] is: {}", v[0]);
    // error: use of moved value: `v`

Again, it's the same basic scenario: by passing v into the function, Rust has determined that the vector that v pointed to is now owned by the function take, and therefore the vector will be deallocated when take() returns.

This seems extreme. Fortunately, Rust offers a few options to help mitigate some of the more draconian implications of this, and, more importantly, bring Rust in line with the expectations you might have when coming from other languages.

Copy

First, Rust allows developers to define types that permit copying (making a complete clone of the original) by applying a trait called Copy to the type. This is what Rust does for all of its primitive types (a la i32), so that developers can write:

let a = 5
let b : double = double(a);
println!("a is {} and b is {}", a, b);

There's no surprise in that bit of code. However, not every type is suitable for copying, and it wouldn't exactly work for passing values in to functions. Certainly, functions could always be written to hand back ownership of the values they operate on, but that would get tedious pretty quickly.

Borrowing

Hand-in-hand with the concept of ownership is that of borrowing, meaning letting another entity make use of something you own for a while before handing it back. This is what the & syntax seen earlier does: it borrows the string, modifies its contents, and then returns it back to its original owner when the function returns.

For example, create a function that adds an element to a vector like this:

fn add_one(v: &mut Vec<i32>) {
    v.push(12);
}

This could then be called by taking a mutable reference to the vector, like so:

let mut v = vec![1, 2, 3];
println!("v = {:?}", v);
add_one(&mut v);
println!("v = {:?}", v);

Upon return, the vector referenced by v now has four elements in it.

Most of the time, however, Rust code wants to borrow a reference without mutating it, in which case the mut is left out of both scenarios:

fn print_vec(v: &Vec<i32>) {
    println!("vector = {:?}", v);
}
fn main() {
    let v = vec![1, 2, 3];
    print_vec(&v);
        // prints "vector = [1, 2, 3]"
}

Most of the time, Rust programmers borrow references to values, except for those specific cases where copy or move semantics are desired. More details (including how this actually looks under the hood) can be found in the Rust documentation.

Wrapping Up

So where does Rust fit into the overall picture of the spectrum of programming languages? Rust, certainly, sees itself as part of the lower end of the spectrum, where it sits close to the hardware and doesn't try to abstract away the underlying system the same way a managed runtime (like the the CLR or JVM) seeks to do. Clearly this is a language for those who enjoy feeling the bits and bytes between their toes when they dip into the programming pool.

But the system nature of Rust also has some interesting side benefits - for starters, a Rust binary is a standalone, completely independent executable. No additional runtime is required, making it perfect for the new wave of container-based development (such as Docker) that currently sweeps the fancy of CTOs and CEOs everywhere. And Rust certainly has the community-backed ecosystem support that building non-trivial programs requires - although getting used to Rust's ownership semantics can be a bit tricky at first, literally nothing holds a Rust developer back from building an HTTP-based Web API. In fact, depending on the problem in question, it can be quicker to develop (and faster to run!) than a corresponding ASP.NET WebAPI project would be.

Rust will probably not take over the .NET development world any time soon - it's simply a too-different solution from what .NET does to seriously consider the two to be direct competitors. However, if you're thinking about .NET native compilation, which will also create a standalone, self-contained executable, and particularly if the project requires avoiding the non-deterministic time hits that a garbage collector can introduce, then Rust should definitely be in your toolbox.

Rust

Published in:

Filed under: