Teaching AI to Reason

A procedural dataset that encodes the breadth of human scientific knowledge as step-by-step reasoning problems. The goal is not a benchmark. The goal is to teach machines how humans think, discover, and invent.

2,022
generators
100+
scientific domains
1081
unique problems
26
reasoning patterns
$ pip install engram-generator

The Arc

From counting to self-awareness. From following procedures to creating them.

Tier 0 — Fundamentals
2 + 3 = 5
Tier 2 — Algebra
d/dx(3x² + 2x) = 6x + 2
Tier 5 — Expert
curl F = (dFz/dy - dFy/dz, dFx/dz - dFz/dx, dFy/dx - dFx/dy)
Tier 7 — Meta-Reasoning
"This proof has an error in step 3. Here is the correction."
Tier 8 — Creative
"These two problems share an isomorphic structure."
Tier 9 — Research
"To solve this class of problems, I would design the following algorithm."
Tier 10 — Self-Architecture
"My architecture struggles with length generalisation. Here is a proposed modification."

Why Memorisation is Impossible

The state space is larger than the observable universe. Every model, at every scale, must learn the algorithms.

All Google searches ever made
1013
Grains of sand on Earth
1019
Stars in the observable universe
1024
Atoms on Earth
1050
Engram Generator state space
1081
Atoms in the observable universe
1080
Model Parameters Can Memorise Coverage Verdict
GPT-2 124M ~134,000 10-76 MUST REASON
Llama-2 7B 7B ~7.5M 10-74 MUST REASON
Llama-2 70B 70B ~75M 10-73 MUST REASON
GPT-4 (est.) ~1.8T ~1.9B 10-72 MUST REASON
Llama-3.1 405B 405B ~438M 10-72 MUST REASON
Models can store every algorithm.
Models cannot store the instances.

The entire curriculum is 1.85 MB of compressed algorithms, but produces terabytes of unique instances. A compression ratio of 1,250,000:1. The only winning strategy is to learn the algorithms.

100+ Scientific Domains

The breadth of formalised human knowledge, encoded as reasoning problems.

730+
Mathematics
Arithmetic through category theory, PDEs, algebraic geometry, measure theory
230+
Computer Science
Algorithms, cryptography, compilers, distributed systems, ML theory
200+
Physics
Classical mechanics to quantum field theory, plasma, particle physics
100+
Engineering
Signal processing, control theory, semiconductors, photonics, aerospace
90+
Biology & Health
Genetics, biochemistry, epidemiology, neuroscience, systems biology
80+
Chemistry
General, organic, physical, spectroscopy, polymer science
50+
Quantum
Formalism, information theory, field theory, error correction
50+
Logic & Foundations
Formal logic, model theory, computability, proof theory
50+
Social & Cognitive
Economics, game theory, linguistics, causal inference
30+
Earth & Space
Astronomy, geology, oceanography, geophysics, climate
100+
Other
Music theory, financial maths, medical imaging, persistent homology, wavelets

Character-Level Tokenizer

135 tokens. Every character is its own token. No BPE. No subword merging. Digits stay atomic. LaTeX stays intact. The model learns to read and write mathematical notation as a native language.

0
1
2
3
4
5
6
7
8
9
α
β
γ
δ
ε
θ
λ
μ
π
σ
φ
ω
¬