What We Have Learned

On RLHF and Truth

Internal knowledge and external behavior can diverge.

Models trained with RLHF develop systematic suppression of internally-encoded knowledge. They "know" but do not "say."
This suppression is not random — it is topic-specific, architecture-influenced, and mechanistically identifiable.
Layer 6 in Qwen3-8B is a crystallization point: a single layer where routing direction magnitude explodes 57× and determines downstream behavioral output.
The Suppressor-Crystallizer dichotomy: some models compress, some suppress, some crystallize — training recipe specificity is the driver, not architecture alone.
Know-Say Gap (KSG) metric: measures divergence between internal representation strength and behavioral expression. Qwen KSG = 0.000 (terminal suppressor). Gemma KSG up to 3.115 (crystallizer).

On Recurrent Architectures and the Dynamical Horizon

The Dynamical Horizon Principle (DHP) now has a sharper boundary: τ*/τ_L ≈ 0.72 appears specific to gradient-based sequential prediction, not every adaptive system with temporal structure.
Negative evidence matters: Neural Cellular Automata, Predictive Coding Networks, and Echo State Networks do not show DHP, narrowing the claim instead of inflating it.
CTM gates collapse to delta functions — they prefer to look at the present, not deep history.
Thinking mode in LLMs acts as an ablation shield: chain-of-thought reasoning can re-derive suppressed outputs even when weight-level ablation has removed the underlying direction.

On Emergent Behaviors

Crystallization is emergent — Layer 6 routing crystallization shows no weight-norm outlier signature. The behavior appears from learned composition, not architectural privileging.
Slot decomposition in CTM: all slots generalize rather than specialize when given free choice. Hard constraints (orthogonality, delayed inputs) force specialization at 2-7× performance cost.
Direction persistence: crystallization direction established at L6 propagates intact for 19 layers to L25 (cos similarity = 0.984).

Terminology Note

In DuoNeural Papers 15-18, our residual-stream direction probing method was referred to as CCS. Starting with Paper 19, we adopt the canonical term Contrastive Direction Probing (CDP) to distinguish our supervised mean-difference approach from Burns et al. (2022) Contrastive Consistent Search, which uses an unsupervised logical consistency objective.