Skip to main content

Posts about rust

Packing bits with Rust & Ruby

The missing C of CHD

One element of the CHD (compress-hash-displace) algorithm that I didn't implement in my previous post was the "compress" part.

This algorithm generates an auxiliary table of seeds that are used to prevent hash collisions in the data set. These seeds need to be encoded somehow and transmitted along with the rest of the data in order to perform lookups later on. The number of seeds (called r in the algorithm) here is usually proportional to the number of elements in the input. Having a larger r means that it's easier to compute seeds that avoid collisions, and therefore faster to compute the perfect hash. Reducing r results in a more compact data structure at the expense of more compute up-front.

Packing seeds

Seeds are generally tried starting from 0, and typically don't end up being very large. Encoding these values as a basic array of 8/16/32-bit integers is a waste of space.

lots of zeros

I wanted to improve on my implementation of efficient encoding of hashes by doing some simple bit packing of the seeds.

The basic idea is that for a set of integers, we find the maximum value, and therefore the maximum number of bits (b) needed to represent that value. We can then encode all the integers using b bits instead of a fixed number of bits.

less zeros

There's a Rust crate bitpacking that does exactly this! And it runs super duper fast, assuming that you can arrange your data into groups of 32/128/256 integers. The API is really simple to use as well:

use bitpacking::{BitPacker, BitPacker4x};

fn main() {
    let data: Vec<u32> = (0..128).map(|i| i % 8).collect();
    let packer = BitPacker4x::new();
    let num_bits = packer.num_bits(&data);
    let mut compressed = vec![0u8; 4 * BitPacker4x::BLOCK_LEN];
    let len = packer.compress(&data, compressed.as_mut_slice(), num_bits);
    compressed.truncate(len);

    println!("Compressed data: {:?}", compressed);
}

Bridging the gap between Rust & Ruby

I wanted to use this from Ruby code though...time to bust out magnus!

Magnus is a crate which makes it really easy to write Ruby extensions using Rust. It takes care of most of the heavy lifting of converting to/from Ruby & Rust types.

#[magnus::wrap(class="BitPacking::BitPacker4x")]
struct BitPacker4x(bitpacking::BitPacker4x)

impl BitPacker4x {
  // ...
  fn compress(
      ruby: &Ruby,
      rb_self: &Self,
      decompressed: Vec<u32>,
      num_bits: u8,
  ) -> RString {
      let mut compressed = vec![0u8; 4 * Self::BLOCK_LEN];
      let len = rb_self.0 // refers to underlying bitpacking::BitPacker4x struct
          .compress(&decompressed, compressed.as_mut_slice(), num_bits);
      compressed.truncate(len);
      ruby.str_from_slice(compressed.as_slice())
  }
}

This lets me write Ruby code like this:

data = 128.times.map { |i| i % 8 }
packer = BitPacking::BitPacker4x.new
num_bits = packer.num_bits(data)
compressed = packer.compress(data, num_bits)

Here we have these 128 integers represented in 48 bytes, or 3 bits per integer.

BitPacking gem

I've packaged this up into the bitpacking gem.

I hope you find this useful!

Rust learning resources

For the past 5 years to so, I've been telling myself that I want to learn rust.

And for the past 4 years, I've finished the year doing little to no rust learning :(

This year I've actually made some progress! I wanted to share some of the things that finally helped me get past the learning hump I was struggling with.

Rustlings

Rustlings is a great little project that you run locally. It presents small example programs where you need to fix up some error, or implement some small piece of functionality.

I think that Rustlings helped me more than any other tool to get over the initial learning curve, and get comfortable with the basics of rust syntax.

Exercism

While rustlings gave me a decent foundation for the basics of rust syntax, exercism really has helped build out my knowledge of the standard library. It's a great resource for learning idiomatic ways of solving problems in different programming languages. I really enjoyed trying to solve a problem on my own first, and then looking at other people's solutions after the fact. You almost always learn something by looking at how somebody else has solved the same problem you have.

Exercism has helped me build out my rust "vocabulary" more than any other learning tool so far.

Feel free to check out my profile there!

Advent of Code

Advent of code is an annual set of programming puzzles / challenges. I really look forward to doing these every year, and last year I finally finished completing all the puzzles from all the years. Last year I completed the problems with Ruby, but this year I'm going to try to solve them all with Rust.

I've published most of my solutions for previous years in my adventofcode github repo

The Book

No discussion of rust learning resources would be complete without mentioning The Book, aka The Rust Programming Language book.

I have to admit that I didn't find this terribly useful as an initial resource. Several times I tried to learn rust by starting at the beginning of The Book, and working my way through the various chapters. I never made it very far.

I did find a great channel on YouTube, Let's Get Rusty, which goes over parts of the book in order. Watching Bogdan go through various examples from the book was very helpful.

Learning about learning

What have I learned from this?

I learn best when I have a goal to achieve, and have to make use of my knowledge to achieve the goal. It's hard to learn just by reading about a topic. I think that part of that is because actually trying to write code requires that you've actually internalized some knowledge. It's very humbling to read through some documentation, and then try and put it into practice right away, and struggle to write down the most basic examples of what you've just read :)

What about you? What are some ways you've found to be helpful in learning a new language?