How a shared vocabulary cache makes AI labelling almost free

The simplest way to label a vocabulary card with AI is also the most expensive: send each unknown word to an LLM, ask for a gloss and an example, store the result. At Lemnly scale, this would be untenable. So from day one we built around a shared cache. Here is how it works and why it matters.

The naive approach

Imagine 10,000 users each import the same Italian novel. That novel has roughly 6,000 unique lemmas. Naively, that is 60 million LLM calls. Even at Claude Haiku 4.5 pricing, that is real money — and most of those calls produce identical output.

Now imagine the same 10,000 users each paste five articles a week. That’s a million imports a year, each with their own long-tail vocabulary. The naive approach scales badly with the thing we want to encourage most: people importing more.

The cache shape

Lemnly maintains a single table keyed by (language, lemma). The first user who needs the gloss for an Italian lemma pays the AI call. Every subsequent user — whether importing the same book, a different book, or a URL — gets the gloss for free. The cache also stores up to three good example sentences, ranked by clarity, so the second user usually gets a better experience than the first.

Each cache entry holds:

The lemma and its language.
The most common part of speech.
A short gloss in English (we’re working on more pivot languages).
Up to three example sentences, each tagged with the source context type (book, article, conversation).
A frequency rank within the language’s reference corpus.
An approximate CEFR level.

That last field is what powers the "skip the words below your level" filter in the import preview. The cache is doing more than deduplication — it’s the language model behind the whole product.

What the cache does not share

Your decks, your reviews, your imports, your data — never shared. The cache is purely linguistic: this lemma in this language means roughly this. It is no different in spirit from a public dictionary, except that it is built incrementally and includes example sentences harvested from the very corpus people are reading.

We never store the full article body or book text from an import. We extract the lemmas we need, grab the surrounding sentence as a potential example, then throw the source away. If you delete your account, your cards and reviews go. The cache entries that were created on your behalf stay — they’re anonymous and de-identified from the moment they’re written.

The compounding

After our first 1,000 active learners, the cache hit rate on imports was 67%. By 10,000 learners, it was 94%. By 100,000 — projected — it will be over 99% for any of the world’s top 50 languages. The marginal cost of a new user approaches zero. That is what lets us keep the free plan honest.

For URL imports specifically, the cache hit rate is higher than for books — because news vocabulary repeats far more than literary vocabulary. Two Le Monde articles a week, year after year, draw from the same 8,000 lemmas. After the first month, almost nothing in your imports needs the AI.

The boring infrastructure bit

The cache is a single Convex table with a compound index on (language, lemma). Reads are O(1). Writes are batched. The AI proxy is a Convex HTTP action so the Anthropic key never leaves the server. Nothing fancy. Most of the work was deciding what the cache should and should not contain — the engineering was easy after that.

Cache entries are versioned. When we improve the prompt or move to a smarter model, we don’t invalidate the cache — we add a new version of each entry on access. Old users keep their existing glosses; new users get the better ones. Nothing breaks.

What this means for you, the learner

Imports are fast. 90%+ of the lemmas in any import resolve from the cache in milliseconds. Only the genuinely rare ones hit the AI, and we batch those.
Examples get better over time. The cache is self-improving — the more example sentences flow through it, the more we can pick the best ones for each lemma.
The free plan stays free. The unit economics stop breaking around 10,000 users. Your free account is paid for by the cache, not by the Pro subscribers above you.
Rare languages benefit most. A learner of Welsh or Tagalog used to pay an outsized AI bill because every word was new to the cache. As we cross a critical mass of users per language, that bill collapses too.

Why we wrote this

Two reasons. One, transparency: if you’re going to trust an app with what you’re reading, you should know what it does with that data. Two, hiring: if you find this kind of problem fun, we are probably hiring. Send us an email.

How a shared vocabulary cache makes AI labelling almost free

The naive approach

The cache shape

What the cache does not share

The compounding

The boring infrastructure bit

What this means for you, the learner

Why we wrote this

Keep reading

Reading Lord of the Rings in Spanish: my actual workflow with Lemnly

The URL import workflow: how to mine your reading list for vocabulary

Why spaced repetition actually works — and how to use Lemnly to ride it

The article you’d normally skim?
Paste it in tonight.

How a shared vocabulary cache makes AI labelling almost free

The naive approach

The cache shape

What the cache does not share

The compounding

The boring infrastructure bit

What this means for you, the learner

Why we wrote this

Keep reading

Reading Lord of the Rings in Spanish: my actual workflow with Lemnly

The URL import workflow: how to mine your reading list for vocabulary

Why spaced repetition actually works — and how to use Lemnly to ride it

The article you’d normally skim?Paste it in tonight.

The article you’d normally skim?
Paste it in tonight.