Teaching AI to Reason

Covering as much of human scientific knowledge as possible, encoded as procedural reasoning problems. The goal: guide AI along the same path of discovery that humans have walked - from first principles to frontier research.

2,022

generators

100+

scientific domains

10⁸¹

unique problems

reasoning patterns

GitHub PyPI Samples Skill Tree 3D Graph

$ pip install engram-generator

The Arc

A model trained on this curriculum doesn't just get better at maths. It climbs a ladder from mechanical computation to creative reasoning - the same ladder humans climb through years of education.

Tier 0–1 - Compute

Follow a procedure. Carry digits. Apply an operation.

125 + 859 → 5+9=14, carry 1 → 984

Tier 2–3 - Manipulate

Transform expressions. Apply rules symbolically. Recognise structure.

d/dx(-x³ - 2x² - 2x - 5) = -3x² - 4x - 2

Tier 4–5 - Model

Map problems to formalisms. Physics, cryptography, quantum mechanics. The same maths, applied to the real world.

RSA encrypt: 19^5 mod 129 = 73

Tier 6–7 - Reason

Construct proofs. Detect errors. Choose strategies. Explain why, not just what.

Prove log₂3 is irrational → strategy: contradiction

Tier 8–9 - Create

See isomorphisms between problems. Design algorithms for new problem classes. Generate conjectures.

"These two problems share an isomorphic structure."

Tier 10 - Self-Improve

Analyse your own architecture. Predict scaling behaviour. Design your own loss function. Propose modifications to yourself.

"My architecture struggles with length generalisation. Here is a proposed modification."

Quick Start

Install and start generating in under a minute. See Samples for real output with rendered LaTeX.

Generate samples

from engram_generator.curriculum.registry import get_generator

gen = get_generator("rsa_encrypt", min_difficulty=3, max_difficulty=5)
samples = gen.generate(100)

for sample in samples[:3]:
    print(f"Input:  {sample.input_text}")
    print(f"Target: {sample.target_text}")
    print(f"Answer: {sample.answer}")

Use the skill tree

from engram_generator.curriculum.registry import get_all_generators
from engram_generator.curriculum.skill_tree import SkillTree

generators = get_all_generators()
tree = SkillTree(generators, retention_ratio=0.1)

# See what's unlocked
print(tree.get_unlocked_tasks())

# Level up by proving mastery
events = tree.update({"addition": 0.97, "subtraction": 0.85})

Balanced training across 26 reasoning patterns

from engram_generator.curriculum.reasoning_patterns import (
    get_pattern_weights, get_pattern_summary,
)

gens = get_all_generators()
weights = get_pattern_weights(gens)

# Each of the 26 reasoning patterns gets equal training exposure
summary = get_pattern_summary(gens)
for pattern, count in sorted(summary.items(), key=lambda x: -x[1])[:5]:
    print(f"{pattern}: {count} generators -> 3.8% of training")

Why Memorisation is Impossible

The state space is larger than the observable universe. Every model, at every scale, must learn the algorithms.

All Google searches ever made

10¹³

Grains of sand on Earth

10¹⁹

Stars in the observable universe

10²⁴

Atoms on Earth

10⁵⁰

Engram Generator state space
1081

Atoms in the observable universe

10⁸⁰

Model	Parameters	Can Memorise	Coverage of 10⁸¹
GPT-2	124M	~134,000	10^-76
Llama-2 7B	7B	~7.5M	10^-74
Llama-2 70B	70B	~75M	10^-73
GPT-4 (est.)	~1.8T	~1.9B	10^-72
Llama-3.1 405B	405B	~438M	10^-72

100+ Scientific Domains

The breadth of formalised human knowledge, encoded as reasoning problems.

730+

Mathematics

Arithmetic through category theory, PDEs, algebraic geometry, measure theory

230+

Computer Science

Algorithms, cryptography, compilers, distributed systems, ML theory

200+

Physics

Classical mechanics to quantum field theory, plasma, particle physics

100+

Engineering

Signal processing, control theory, semiconductors, photonics, aerospace

90+

Biology & Health

Genetics, biochemistry, epidemiology, neuroscience, systems biology

80+

Chemistry

General, organic, physical, spectroscopy, polymer science

50+

Quantum

Formalism, information theory, field theory, error correction

50+

Logic & Foundations

Formal logic, model theory, computability, proof theory

50+

Social & Cognitive

Economics, game theory, linguistics, causal inference

30+

Earth & Space

Astronomy, geology, oceanography, geophysics, climate

100+

Other

Music theory, financial maths, medical imaging, persistent homology, wavelets

Character-Level Tokenizer

135 tokens. Every character is its own token. No BPE. No subword merging. Digits stay atomic. LaTeX stays intact. The model learns to read and write mathematical notation as a native language.

∀

∃

∧

∨

↔

⊥

≤

≥

≠

≈

∞

√

∂

∫

∈

⊂

∅

∩

∪

Verification Coverage

Training data is the product. Every generator that can be independently verified, is. No circular self-checking. Two independent methods: third-party library recomputation and double-blind textbook verification.

79%

independently verified

903

library-verified

694

double-blind verified

425

structurally unverifiable

Library 903

Double-blind 694

208

Library Verification

Third-party libraries (scipy, numpy, sympy) independently recompute the answer. Generator code is never called. 903 generators, 21,672 samples, 100% pass rate.

Double-Blind Verification

Textbook worked examples from Wikipedia are hardcoded as ground truth. Python independently recomputes the result. No generator code is called. 714 handlers, 714/714 pass.

Formula-only (126)

Qualitative outputs: proofs, category theory, protocol descriptions. No numeric value to independently verify.

Reference (208)

Text-matching and lookup tasks. Verification requires a reference database, not formula recomputation.

Classification (91)

Categorical outputs at tiers 7-10. Expert-level analysis tasks with no formula to verify.

Filtering for training

>>> get_all_generators(verified_only=True) # returns 1,597 verified generators

Every generator exposes .is_verified and .verification_method properties.