How a Transformer Model Loses Attribution: A Step-by-Step Example

The more powerful the transformer, the less it can remember its authors

Imagine training an AI on just two facts:

Person A says: “1 + 1 = 2”
Person B says: “2 + 1 = 3”

Now ask it: “What is 1 + 1 + 1?” The AI correctly answers: “3.” Ask it: “Who taught you this?” It cannot answer. This isn’t a bug — it’s how transformers work. And it reveals something profound: these systems master knowledge while systematically erasing its source.

The Paradox

The AI clearly learned from both A and B. The answer “3” requires knowing that 1 + 1 = 2 (from A) and 2 + 1 = 3 (from B). The model synthesized these facts to solve a problem neither teacher ever wrote down.

Yet it has no memory of A or B that it can recall. The relationships are preserved; the authors are forgotten. This erasure happens through four architectural stages inside every transformer. At each stage, content flows through while attribution dissolves.

Stage 1: Tokenization — Separating Authors from Content

The first thing a transformer does is break text into tokens:

“A said 1 + 1 = 2” → ["A", "said", "1", "+", "1", "=", "2"]

Each token is processed independently. The model doesn’t see, “Author A claiming content X.”, It sees seven discrete symbols in sequence.

This immediately breaks the human-level binding between speaker and statement. To the model, “A,” “said,” and “1 + 1 = 2” are interchangeable parts that can be recombined.

Result: the AI can generate

  • “1 + 1 = 2” (content without author), or

  • “B said 1 + 1 = 2” (wrong author) — because tokenization treated them as separate, remixable pieces from the start.

Stage 2: Embedding — Blending Discrete Authors

Next, every token becomes a vector — a list of numbers representing its meaning:

"A" → [0.23, -0.15, 0.67, 0.41, ...]
"B" → [-0.41, 0.28, -0.19, 0.33, ...]

After training, words used in similar ways move closer together:

"A" → [0.31, 0.12, 0.54, ...]
"B" → [0.28, 0.15, 0.51, ...]

A and B now occupy nearly the same spot in semantic space — not as distinct people, but as “entities that make statements.” Meanwhile, mathematical patterns (“1 + 1 = 2”) form tight, durable clusters. The math survives. The authorship does not.

Stage 3: Attention — Aggregating Without Tracking Sources

Attention is the transformer’s core mechanism: each token “looks at” others to decide what matters. When predicting “1 + 1 + 1 = ?”, the model may weigh its memories like this:

Token (from training)
Attention weight
“2” (from A’s example) 23 %
“3” (from B’s example) 31 %
“A” (author name) 2 %
“B” (author name) 1 %

It uses both examples — but almost entirely ignores the authors.

Why? Because attention optimizes for prediction, not provenance. Numbers and operators help predict; names don’t. The model learns to drop irrelevant tokens, and author tokens fade away.

Each layer then blends these weighted inputs together. After 24 layers (or 96 in larger models), the mixture is irreversibly scrambled. Any trace of who contributed what is mathematically lost.

Stage 4: Generation — Relationships Without Attribution

Finally, transformers generate text one token at a time. Given “1 + 1 + 1 = ”, the model estimates:

P("3") = 0.76   ← highest
P("2") = 0.08
P("A") = 0.001
P("B") = 0.001

It outputs “3.” There is no probable token sequence that would both answer and attribute, like “3 (derived from A and B).” Such phrasing simply isn’t how language data predicts the next token.

By this final stage, all that remains are weights encoding patterns — not a ledger of which text taught which idea.

The model knows “after 1 + 1 + 1 =, output 3.” It doesn’t know “this knowledge came from A and B.”

Tracing the Dissolution

Stage What’s Preserved What’s Lost
Tokenization The statement Author–statement binding
Embedding Semantic meaning Discrete identity
Attention Relationships Source separation
Generation Predictive coherence Provenance

Through every layer, the relationship flows forward while authorship fades.

The Fundamental Trade-off

Could we design transformers to remember their teachers?

  • Add source tags? Tokenization still splits them; attention still blends them.

  • Track provenance across layers? Intractable — attribution diffuses through billions of weighted interactions.

  • Train separate models per author? Preserves credit, kills synthesis.

At their core, transformers must choose between:

SYNTHESIS  ↔  ATTRIBUTION

They choose synthesis.

The same mechanisms that make them powerful — tokenization, continuous embeddings, attention mixing, autoregressive prediction — also make authorship untraceable.

Conclusion

With just two training examples, we’ve seen how a transformer learns relationships while forgetting teachers. It proves it learned from A and B by answering “3,” yet has no mechanism to credit either.

This isn’t a glitch to patch; it’s a consequence of design. Each processing stage transmits content while dissolving origin.

As these systems scale to billions of texts and parameters, the issue doesn’t just grow — it becomes mathematically undefined. Attribution isn’t missing; it never existed in the model’s internal space.

The uncomfortable truth: AI systems master knowledge by forgetting its creators. Not by negligence, but by architecture.

The question ahead isn’t whether transformers erase authorship — they already do. It’s whether we’ll craft new legal, ethical, and economic frameworks that work with this reality — or demand fundamentally new architectures that can both synthesize and remember.

For now, transformers have made their choice: They chose synthesis over attribution. You decide how to respond.

Suggested Citation

For attribution in academic contexts, please cite this work as:

Liao, S. (2025). How a transformer model loses attribution: a step-by-step example. OpenMercury Research.

Next
Next

Why AI Isn't Built for Attribution: Understanding Transformers