How Language Models Actually Generate Text - AI for Writing & Content Creation: A Practical Guide

You Are Talking to a Prediction Engine

Here is the single most useful fact about AI writing tools: they do not know anything. A large language model is a machine that predicts the next chunk of text, over and over, based on patterns it absorbed from a huge pile of writing. That's it. When you ask ChatGPT or Claude a question, it isn't looking up an answer. It's calculating, word by word, what text is most likely to come next given everything you typed.

Once that clicks, the weird behavior stops being mysterious. The model rambles because rambling is statistically safe. It invents fake sources because a plausible-looking citation is a great prediction even when it's false. It changes its mind mid-paragraph because each new word is just the next best guess, not a step in a plan.

You are not chatting with a mind. You are steering a very good autocomplete.

Tokens: The Lego Bricks of Text

Models don't read words. They read tokens — pieces of words. "Writing" might be one token. "Unbelievable" might split into "un," "believ," and "able." Roughly, 1,000 tokens is about 750 English words.

Why should you care? Three practical reasons.

Speed and cost. Every token in and out takes time and (on paid tools) money. A bloated 600-word prompt full of throat-clearing makes the model slower and no smarter.

Limits are measured in tokens. When a tool says it has a "128k context," that's tokens, not words.

Rare or weird strings break. Unusual names, long numbers, and made-up words get chopped into odd token pieces, which is partly why models fumble spelling, math, and exact quotes. If you ask for an exact character count or to reverse a string letter by letter, expect mistakes — the model literally doesn't see individual letters the way you do.

Prediction Is Why It Hallucinates

A hallucination is when the model states something false with total confidence. People imagine this is a bug. It isn't. It's the core mechanic working as designed.

The model's only job is to produce likely-sounding text. "The study was published in the Journal of Behavioral Economics in 2019" is an extremely likely-sounding sentence. Whether that journal ran that study is a question the model never asks itself. Fluency and truth are different things, and the model optimizes for fluency.

This tells you exactly when to be paranoid:

Specific facts, stats, and dates
Citations, quotes, and page numbers
Anything recent or niche
Names of people, products, or laws

You can lower the odds of invention. The biggest lever is giving the model the facts instead of asking it to recall them:

Using ONLY the notes below, write a 150-word summary.
If something isn't in the notes, write "[not in notes]" — do not guess.

NOTES:
<paste your research here>

When the model has source material in front of it, it predicts from your text instead of from the fog of its training data. You'll meet this idea again in the research chapter — for now, just remember: don't ask it to remember, give it something to read.

If you want to go deeper on spotting bad information — yours or the AI's — the course at /courses/ai-literacy-spot-misinformation-beginners is a solid companion.

The Context Window: Its Entire Short-Term Memory

The context window is everything the model can "see" at once: your instructions, the conversation so far, any pasted text, and its own replies. Think of it as a desk with a fixed size. New pages push old pages off the edge.

This explains a lot of frustrating moments.

It "forgot" your instruction. In a long chat, your early rules can slide out of view or get drowned out. Fix: restate the important constraint near your latest message instead of trusting it to remember from twenty messages ago.

Long documents get vague. Dump a 40-page PDF and ask for a summary, and the middle often gets thin. Models pay the most attention to the start and end of what's in the window. Fix: work in chunks, summarize each, then summarize the summaries.

Quality drifts in marathon chats. The longer a single conversation runs, the more cluttered the desk gets. When answers go sideways, start a fresh chat and paste in only the essentials. A clean window beats a long one.

Why It Rambles (and How to Stop It)

Left alone, models pad. Padding is low-risk prediction: transition sentences, restated questions, "it's important to note that." None of it is wrong, which is exactly why the model reaches for it.

You stop rambling by constraining the prediction. Tell it the length, the format, and what to leave out:

Answer in under 80 words.
No introduction, no summary sentence.
Use a 3-bullet list. Each bullet starts with a verb.

Specific limits give the model a smaller target to aim at, and a smaller target means less filler. Vague prompts get vague, bloated drafts — every single time.

The Mental Model to Keep

Boil this chapter down to four things and you'll out-prompt most people:

It predicts, it doesn't know. Treat every factual claim as a draft to verify.
It reads tokens, not words or letters. Don't trust it with exact counts, spelling tricks, or precise quotes from memory.
It only sees its context window. Feed it the facts, keep instructions close, start fresh when chats get long.
It defaults to filler. Constrain length and format or you'll drown in padding.

None of this makes the tool less powerful. It makes you more powerful, because now you know which jobs to hand it (drafting, restructuring, rephrasing from source material you supply) and which to keep a hand on (facts, citations, anything that has to be exactly right). Everything else in this book is just applying these four facts to real writing.