What I learned during Advent of Code 2023

December 27, 2023

Advent of Code is an Advent calendar of small programming puzzles. I participated in this year's edition, finishing it for the second time in row. The puzzles of all editions are always accessible.

The principle is to read the problem, get a puzzle input (more or less tailored to your account), process it anyway you like (with code you wrote, most of the time), and then put the (short) result in the website for it to validate your answer.

I wrote solutions in Rust again this year, so let's see what I learned about the language, trying to reveal as little as possible from this year's problems.

Rust

Tuples

In Rust, tuples are comparable, hashable, and have default values. This is very handy to sort a list of tuples, or initialize them with Default::default(). This can be handy in order to write shorter code.

Iterators

I started this year with an idea of solving the puzzles (that could) in a single pass. But I still want the parsing to be separate from the algorithmic resolution. For that, instead of parsing the input data into a Vec, the parsing function would return an iterator. This is what I put in my template:

type ParsedItem = u8;
fn parse(input: &str) -> impl Iterator<Item = ParsedItem> + Clone + '_ {
    input.lines().map(|x| x.parse().expect("not int"))
}

The idea would be to change the type of the parsed item, and edit the parser to return this type.

Note: the Iterator has to be Clone because I'm using it twice, once for part 1, and another for part 2. It has a lifetime because it takes ownership of the input, which is an &str with the same lifetime.

I'm aware that this means that the parsing would happen twice, once for each part, sort of defeating the point of a "single pass".

For days 1, 2, 4, 7, 9, 12, 13, 15, 18, 19 and 22, this approach worked well. But for days 5, 6, 8 and 20 I removed the iterator approach from parsing entirely. For days 3, 24 and 25, I wrote the parsing as an iterator, and then immediately used .collect() to put it in a collection data structure. For grid problems, I initially did the same: iterator, then collect (10, 11, 14); but caught on and removed the iterator as well for days 16, 17, 21 and 23.

What this shows is that there is no magic solution, it always depends on the problem. I think I learned enough about iterators, so that next time I might go back to a simpler approach.

Enums vs raw data comparison

Last year, in my solutions everything was parsed to an abstract representation and had a given data type. This year I opted for a simpler approach, mostly for speed of writing: when needed, compare a char directly in the code. It makes it less abstract, but the code is mostly write-only.

This contrasts with the Iterator approach: as I master more Rust features, I use them a bit less, only when I deem necessary. I used enum representation for 4 out of 25 days.

u8 vs char

In Rust, char represents an unicode character, the equivalent of a rune in go or wchar_t in C. But during AoC, all input is purely ASCII, so this is overkill to use a 4 bytes type to represent 1 byte of data. I started the month using Vec<char>, for grid rows, but finished the month mostly using Vec<u8>. ASCII u8 literals are possible by prefixing b to a single-quoted character: b'@'.

Borrow checker

I'm getting much better at guessing when something would or wouldn't trip the borrow checker (except maybe day 20 where I was a bit too optimistic). So often I won't use iter(), but use indexed accesses if I need to modify multiple arbitrary elements during an iteration.

I'll often take shortcuts too, and using multiple cloned String instead of &str.

When needing to add memoization to a recursive algorithm, I noticed again that one couldn't use the Entry api of HashMap, because it borrows from the HashMap.

usize-only indexing is still annoying

Not really a new learning, but it is still as annoying as before than one cannot index slices with anything other integer than usize. It makes the type pervasive when wanting to write code quickly. To prevent repetition of casting, I might often have two variables with the same data, one of which is casted to usize.

I think the rust compiler could allow indexing with any unsigned type smaller than the current platform's usize without any loss of correctness. I realize this is not as trivial because of the way slices implement the Index trait, but this would improve ergonomics. It could even add additional bound checking when indexing with signed negative numbers, but I realize this would go against "zero cost abstractions".

Z3

As a given problem was quite complex, I had to resort to using a solver. I chose one the most used nowadays, z3, and its library's Rust bindings. Since it depends on a C++ library, it a bit challenging.

Despite what the doc said, there were no examples (I'm not the only one who noticed). Luckily, another crate, z3d had examples for its uses that also included the z3 equivalent code.

It did not build on Fedora because the published crate did not use pkg-config to find z3.h, but a hardcoded path instead. It provided environment variables to configure this, but this isn't exactly the most portable way to build a dependency, since I wanted the project to build in the (ubuntu-based) github action CI, as well as my Fedora laptop.

The crate z3-sys (low-level bindings) has an embedded version of the z3 library that it could use instead with a feature flag, but this is a big project, and it takes a bit long to build on my laptop. In the end, I moved to the git version of z3 which added pkg-config support, and it works flawlessly on both my laptop and the CI.

I was also a bit annoyed by the ergonomics of using z3 directly; I saw that z3d provides an additional abstraction layer to make it a bit more ergonomic, but I did not try it.

ints helper

One of AoC's often recommended helper to have in your "toolbox", is a function that parses multiple integers. I finally wrote one (it returns an Iterator), and while it had a rough start, I still used it for 5 different problems.

I also finally wrote common LCM and GCD code for reuse.

Small things

I usually comment the debug statements after finishing. For debug helpers, I now start them with an _ (underscore) so that it won't show an unused code warning anymore.

Incremental debugging: just like LSPs, running the program in a loop in an another window to see the output of debug statement during parsing is very useful.

Algorithms and general tricks

Grid iteration

When working with grids, one should pick a coordinate system, and stick with it all the time. I use mostly (x, y); but (row, column) makes a lot of sense, especially for grid indexing.

I have become quite proficient at grid iteration now. The main tricks I use are lifted from watching Johnathan Paulson's real-time solves. The first is to just iterate over an inline array of offsets:

for dir in [(0, -1), (1, 0), (0, 1), (-1, 0)].into_iter()

And then for each, just check if they go over the map boundaries.

But that's just the beginning: if you need to remember in which direction you're heading, just use a number from 0 to 3, making them consecutive, clockwise for example: North = 0, East = 1, South, = 2, West = 3. Then you can use very simple map operations to change directions:

  • to rotate 90° clockwise, just add 1 (and then modulo 4). Add 3 (and then module 4) for -90° (or 270°).
  • to go in the opposite direction, add +2 (and then modulo 4).

You can also use the direction as an index into the array I showed in the above example. This modeling greatly simplifies code for grid problems.

Tortoise and Hare: not this year

A few times this year, detecting a cycle was necessary, and I thought I could reuse something I learned from last year (and the code I wrote). But this wasn't necessary.

Floyd's tortoise and hare algorithm is very useful to detect when a cycle has happened in a series of comparable things. It can return the start of the first cycle, and its period. But it's only really useful when there are hidden dependencies between different states of the series. If a given value always gives the same next state, then using Tortoise and Hare is overkill, and a simple set is sufficient.

Transposition

At least one time, I wrote quite complex code that could have been greatly simplified using transposition. No need to make an algorithm work in all four directions, if a simple transposition can make you apply it to a different direction.

To transpose a grid, iterate over columns first instead of rows to generate another grid. Then you can transpose again to get the original grid back.

This can be combined with horizontal or vertical flipping in order to do a rotation. All of this was explained in HyperNeutrino's solution explanation for a given day (contains spoilers).

Shoelace formula and Pick's theorem

These two showed up twice. The first time they weren't mandatory (ray casting was sufficient), but as I looked at other people's solution, I noticed how elegant it was; so I used it for the other day, when it was finally mandatory.

The shoelace formula can give an arbitrary polygon's area from its vertices coordinates. Pick's theorem gives the area of a polygon with integer coordinates. It's interesting because its formula uses the number of the points with integer coordinates.

So if one wants to count the points inside a polygon (with integer coordinates), the idea is to first compute the area A of the polygon with the shoelace formula, then use Pick's theorem to get the number i of points inside:

i = A - b / 2 + 1

b here is the number of points with integer coordinates on the polygon edge. So on a segment with integer bounds, all integer points on it should also be counted.

In addition, when working on grid, the area from the shoelace formula should include the points on the edge as well.

Gaussian Elimination

This one I put here, but did not learn. That's where I decided to use z3 instead. But I'm now aware that in order to solve multiple linear equations, there exists Gaussian Elimination, an algorithmic approach using matrices.

Visualization and input analysis

For Advent of Code, it's not required to write a general solution to every problem. Only one that solves it for the current input. Some problems might seem intractable in the general case, but after analyzing the input, it might in fact be specially crafted to have a simple solution, or other properties.

So for this edition, I learned to use the graphviz's dot language in order to generate graphs. Those can also be used for debugging graphs much more visually than ASCII text.

This article →