FOSDEM and AI Plumbers


I’ve attended FOSDEM for the very first time this February. It’s the biggest open-source conference in the world, and completely free. It occupies the campus of Brussels Free University. Because there are no tickets, there are also no badges. You don’t know where anybody is from until you hear them speak. Most languages of Europe are represented. Little gangs of nerds arrive from Germany or Slovakia. French is spoken in abundance and from all French-speaking corners of the Earth.
I spent the most time in the Low-level AI Hacking Devroom, run by my friend Roman Shaposhnik, a real OSS OG. I think Roman found a way to get real hackers engaged with LLMs: put them in toasters! The LLMs I mean.
The spirit of the dev room was extremely refreshing. By now, a lot of AI activity revolves around paying mega-model providers for access, querying their APIs, and building wrappers around them. Developers are struggling with a firehose of information about new models, new modes, and it all flies above your head. As an engineer, what can you actually do, instead of sending money to OpenAI?
Running OSS LLMs on commodity hardware is the answer that Roman and friends embrace and popularize. Following FOSDEM, the same team, Ainekko and AI Foundry, ran the AI Plumbers, a FOSDEM Fringe event in Ghent. Roman’s talk was titled llama.cpp is all you need, and covered various ways to run inference locally for free. Roman also covered the history of llama.cpp. It evolved in a real hacker fashion, through the folks obsessed with running models locally and learning from each other.
The OSS approaches to inference are numerous and diverse. There’s a whole art form now, quantizing LLMs, taking them layer by layer and shrinking them. The world authority on quantizing is Ivan Karakow, who shared his insights at FOSDEM. He is not present on any social media. A former physicist, he writes pull requests that are now studied as ground truth on quantization. There is work related to MLIR and XLA compilers, representing neural network architectures optimally for a variety of accelerator backends. The most exciting development for me was running inference on novel architectures such as Tenstorrent. They develop a mesh of computational units controlled by baby RISC-V processors, parallelizing AI computations. For those who’ve been around for a while, Tenstorrent mesh resembles a transputer!
We met with Yann Leger, the founder and CEO of Koyeb, an AI workflow company, at their Station F base in Paris during the AI Action Summit side events. The Hugging Face party was also held there. Yann shared the partnership with Tenstorrent, to be announced shortly. It was announced at the meetups in South Bay and San Francisco, and I’ve joined the latter at the Llama Index offices. Koyeb is the first cloud provider of Tenstorrent chips! Now, this opportunity for hackers to play with AI acceleration in a new way is real anywhere.
Overall, Roman and friends, OSS OGs, are leading the way to bring the proper hacker culture to AI. We hope to see more events to show this and are happy to host Roman By the Bay again to share the good news!