Dave Schumaker

TokenFlow: Visualize LLM token streaming speeds

February 12, 2025

Have you ever wondered how fast your favorite LLM really compares to other SoTA models? I recently saw a Reddit post where someone was able to get a distilled version of Deepseek R1 running on a Raspberry Pi! It could generate output at a whopping 1.97 tokens per second. That sounds slow. Is that even usable? I don't know!

Meanwhile, Mistral announced that their Le Chat platform can output tokens at 1,100 per second! That sounds pretty fast? How fast? I don't know!

So, that’s why I put together TokenFlow. It’s a (very!) simple webpage that lets you see the speed of different LLMs in action. You can select from a few preset models / services or enter a custom speed, and boom! You watch it spit out tokens in real time, showing you exactly how fast a given inference speed is for user experience.

Check it out: https://dave.ly/tokenflow/

The code is also available on Github.

Comparing reasoning in open-source LLMs

December 03, 2024 at 5:40 PM

Alibaba recently released their "QwQ" model, which they claim is capable of chain-of-thought reasoning comparable to OpenAI's o1-mini model. It's pretty impressive -- even more so because we can run this model on our own devices (provided you have enough RAM).

While testing the chain-of-thought reasoning abilities, I decided to compare my test prompt to Llama3.2 and was kind of shocked at how good it was. I had to come up with ever more ridiculous scenarios to try and break it.

That is pretty good, especially for a non chain-of-thought model. Okay, come on. How do we break it! Can we?

Alright, magical unicorns for the win.

Project: Super Simple ChatUI

May 22, 2024 at 5:17 PM

I've been playing around a lot with Ollama, an open source project that allows one to run LLMs locally on their machine. It's been fun to mess around with. Some benefits: no rate-limits, private (e.g., trying to create a pseudo therapy bot, trying to simulate a foul mouthed smarmy sailor, or trying to generate ridiculous fake news articles about a Florida Man losing a fight to a wheel of cheese), and access to all sorts of models that get released.

I decided to try my hand at creating a simplified interface for interacting with it. The result: Super Simple ChatUI.

As if I need more side projects. So it goes!

Tag: llms

TokenFlow: Visualize LLM token streaming speeds

Comparing reasoning in open-source LLMs

Project: Super Simple ChatUI