Looking at Zig

I finally took some time this weekend to look at Zig, and I am very impressed. It’s a fairly low-level language, but could be appropriate for some performance-critical data science use cases. Based on an initial test, it is twice as fast as Rust, which is about 50% faster than Go. And it appears to be faster than C, which I find puzzling.

To explore the language, I rewrote a naive and computationally intensive brute-force solution to day 5 of last year’s Advent of Code. My non-sophisticated solution took 5:40 in Go for both parts, fast enough that I didn’t bother finding a more streamlined solution (which would have been necessary in Python). For comparison, I also rewrote the same solution in Rust and C.

More on the performance comparisons in a future post. Here, I want to point out some observations on the language, and some pointers on how to use it. I’m not an experienced Zig user, so bear with me if you are.

Starting a project

The Zig ecosystem can be installed from your package manager, e.g., brew install zig on the Mac or pacman -S zig on Arch Linux. You will also want to install zls, the Zig Language Server, for IDE support.

To create a new project, create a new folder, enter it and run zig init. This creates a src folder with a couple of zig files in it, and two build.zig files, which control the build process.

To compile, run zig build, and the binary will be created in ./zig-out/bin and you can run this directly, i.e., zig-out/bin/test. When you’re ready, you can compile with full optimizations by typing zig build -Doptimization=ReleaseFast and it is true to this promise, as we’ll see.

The language

The Zig language looks a lot like C, but has more checks, and a type system that looks a lot like Rust’s. E.g., i32 and i64 for ints, etc.

The big difference is that Zig is very picky about memory, and you have to manually allocate memory, and free it or the program will show a lot of errors when it’s finished. So unlike Go, which does garbage collection to manage memory automatically, and Rust, which compiles using a “borrow checker” protocol that forces you to keep track of which variable currently “owns” each value but then automatically deallocates it for you, Zig requires to to allocate memory, and free it when you’re done. I found this very tedious, but it gets easier over time (as did the initially horrendous borrow checking in Rust).

For example:

//  Get a memory allocator
var gpa = std.heap.GeneralPurposeAllocator(.{}){};
const allocator = gpa.allocator();
defer _ = gpa.deinit();

// Create a vector for numbers, add a number to it
var nums = std.ArrayList(i64).init(allocator);
try nums.append(5);

// Create a copy of a string
const my_name = "jabba the hut";
const name = try std.mem.Allocator.dupe(allocator, u8, my_name);

// When finished, free up the list and string
nums.deinit();
allocator.free(name);

You can free things up when they go out of scope by using defer as above, but you have to be careful that these are no longer being used, as the compiler will allow you to free things that have been passed to other variables (e.g., return values from a function call). This approach led to some very time-consuming debugging.

As noted, Zig has some useful data structures which are missing from C, but are present in every modern language, such as vectors and hash maps. Everything in the standard library is made available by importing one file, and is accessed by prefixing with std. as shown here:

// At the beginning of every program
const std = @import("std");

// Use stdout for writing formatted output, note the arguments have to be 
// in a .{} list, don't forget the period
const stdout = std.io.getStdOut().writer();
try stdout.print("A number {d} and a string {s}\n", .{33, "hello"});

// Read a file into memory, automatically allocating space for it, up to 
// the size given, fails if the file is too big. Any data must be copied 
// if you want to use it after the buffer is freed when the function ends.
const data = try std.fs.cwd().readFileAlloc(allocator, "input.txt", 10000);
defer allocator.free(data);

Functions are central as you would expect, and programs are driven by a main function which calls other functions that take and return values:

pub fn main() !void {
    try stdout.print("{d} doubled is {d}\n", .{12, double(12)});
}

fn double(n: i32) i32 {
    return n * 2;
}

You will notice that some function calls start with try and some function return types start with !. This is because they might fail, and Zig’s error handling is based on return values that might be errors, just like in Rust. In Zig, try executes the function call, and raises an error if the call fails. The ! before the return type indicates that the calling function might fail. It’s simple and quite elegant.

If you’re interested, have a look at the AoC example, the documentation, and the brief but excellent Zig by Example. There are currently no books about Zig, but that will change.

2024-10-13

https://fastdatascience.io/post/2024-10-13-look_at_zig/ Andreas Kaempf

FastDataScience.io

Looking at Zig

Starting a project

The language

Read next