∫ The acquisition of knowledge building blocks

Kernel:

Hierarchical Learning Strategies: Navigating the tree of discovery.
Efficiency Tiers: From brute-force experience to mental simulation.

Our civilization is built on the distilled knowledge of collective experience. Because of its sheer vastness, it is impossible for a single individual to master everything. Since cooperation is our species' primary evolutionary advantage, we rely on experts in specialized domains. This specialization creates a hierarchy—the core architecture of human knowledge.

If we visualize this hierarchy as a tree, each node represents a "knowledge block." The bottom layer consists of foundational blocks—walking, talking, or basic tool use. The top layer reaches toward advanced mastery: orbital mechanics, novel writing, or abstract research. For instance, the objective "build a rocket" is decomposed into child nodes like "engine combustion," "fuel tank integrity," and "guidance systems." These blocks are learned independently but eventually combined to solve more complex problems.

This post explores how we discover this hierarchy and how we can optimize the "training" process to build it more efficiently.

∂ Knowledge hierarchy discovery paradigms

The most intuitive approach is top-down discovery, which coincides with on-policy learning. Imagine a corporation: it begins with a CEO's vision. That vision trickles down to product design, then to engineering. This approach suffers from high variance; if the initial vision is flawed or the technical limitations are misunderstood at the top, the entire branch fails. Top-down learning requires a significant "luck" coefficient, which we called this measure as variance, to remain stable.

Conversely, bottom-up discovery initiates with observation. We identify a specific friction point, solve it, and consolidate that solution into a reusable block. While this approach has low variance—it is grounded in real-world data—it is often short-sighted. This is the "faster horse" problem: if we only solve immediate problems, we fail to imagine the car. In RL terms, this is a long reward horizon issue. Furthermore, to solve problems this way, the agent must first learn to navigate the world "the hard way," requiring massive model capacity and extensive training.

Can we bridge these two? Curriculum learning offers a middle ground. By designing a sequence of tasks—moving from basic to advanced—we mitigate the high variance of top-down leaps while avoiding the aimless wandering of bottom-up discovery. We don't need to learn to ride a horse to understand a car if the curriculum is designed well. However, a curriculum is not task-agnostic; it requires a "teacher" who already understands the world-model to guide the successor.

I can conclude the three approaches in the following table:

Techniques	Variance	Horizon	Model size	Task agnostic
Top-down	high	short	small	yes
Bottom-up	low	long	large	yes
Curriculum	low	short	small	no

While Techniques 1 and 2 are vital for exploring unknown environments, Technique 3 is the gold standard for passing on knowledge to the next generation.

∂ Learning efficiency tiers

The human learning process can be categorized into three evolutionary tiers:

Tier 1: Grounded exploration

When we face a total "cold start," pioneers must explore the world through bottom-up learning. This is an arduous process requiring high mental capacity (a large base policy model) to bridge the long reward horizon between effort and survival. It is brutal, but grounded and necessary.

Tier 2: Hierarchical abstraction

With a stable base policy, we can begin consolidating building blocks. This introduces an abstraction module—an entity that identifies patterns and proposes new "tools" from observations. We can then train a collection of task specific agents to evaluate these blocks: an overseer policy learns to use the proposed block, while the underlying sub-task policies execute the necessary steps to realize the block's function Note that we can use the value model of the pioneer to grade the block's performance, which is a form of actor-critic learning.. Because each agents only needs to learn to use a single block, the reward horizon is short and the model size can be small.

Now we have the hierarchy of knowledge. But we can go one step further.

Tier 3: World model

The final leap is replacing real-world interaction with a world model—a mechanic module that predicts the outcome of a block before it is even applied. The overseer can propose a block, predict its result with the world model, and calculate its value entirely in-silico. This is the pinnacle of sample efficiency; we no longer need to risk real-world interactions to learn the value of a block, and we can iterate on it much faster.

Once these blocks and simulations are perfected, we use teacher-forcing to pass them to the next generation via curriculum learning, bypassing the need for every successor to be a Tier 1 pioneer.

∂ Final thoughts

This discussion deepens my appreciation for the pioneers who first navigated our world. They functioned as our species' original "base policy," possessing the immense mental capacity required to decode reality from scratch. Thanks to their groundwork, subsequent generations are spared that same fundamental cognitive burden, inheriting a world that is already structured for understanding.