Level 4.5: Pre-AGI - Directionally Self-Architecting System¶

MSCP Level Series | Level 4 ← Level 4.5 → Level 4.8
Status: 🔬 Experimental - Conceptual framework and experimental design. Not a production specification.
Date: February 2026

Revision History¶

Version	Date	Description
0.1.0	2026-02-23	Initial document creation with formal Definitions 1-12, Theorem 3
0.2.0	2026-02-26	Added overview essence formula; added revision history table
0.3.0	2026-02-26	Def 8: added frame conflict resolution remark; Section 7.3: added joint failure analysis remark for Existential Guard

1. Overview¶

Level 4.5 is the boundary between conventional AI and AGI. While Level 4 can modify its parameters, skills, and strategies, it operates within a fixed cognitive architecture. Level 4.5 introduces the ability to reason about and modify its own cognitive topology - the structural organization of how it thinks - while maintaining safety invariants that prevent unbounded self-improvement.

Level Essence. A Level 4.5 agent can rewrite its own cognitive topology through a bounded vocabulary of strictly additive mutations - it restructures how it thinks, but never deletes existing capability:

\[\mathcal{T}'_{\text{cog}} = \Xi(\mathcal{T}_{\text{cog}}), \quad \Xi \in \mathcal{V}_{\text{recomp}}^{\ast}, \quad |V'| \geq |V|\]

⚠️ Note: This is the most speculative part of the MSCP taxonomy. The Self-Projection Engine, Architecture Recomposition, and Parallel Cognitive Frames described here are thought experiments grounded in safety analysis. They're meant to explore whether topology-level self-modification is possible under invariant-preserving constraints - not to prescribe a production architecture.

1.1 Defining Properties¶

Property	Level 4	Level 4.5
Self-Modification Scope	Parameters, skills, strategies	Cognitive topology
Future Projection	None	Multi-scale trajectory simulation
Deliberation	Single-frame	5 parallel cognitive frames
Purpose Awareness	None	Autonomous purpose reflection
Existential Safety	Growth throttle	Formal existential guard
Optimization Target	Task performance	SEOF (self-evolution quality)

1.2 Formal Definition¶

Definition 1 (Level 4.5 Agent). A Level 4.5 agent extends \(\mathcal{A}_4\) with topology-level self-modification:

\[\mathcal{A}_{4.5} = \mathcal{A}_4 \oplus \langle \mathcal{T}_{\text{cog}}, \Psi, \mathcal{F}_{\parallel}, \Xi, \Omega \rangle\]

where: - \(\mathcal{T}_{\text{cog}}\) = cognitive topology (a directed graph \(G = (V_{\text{modules}}, E_{\text{connections}})\) representing the agent's processing architecture) - \(\Psi\) = self-projection engine (simulates future trajectories of \(\mathcal{T}_{\text{cog}}\)) - \(\mathcal{F}_{\parallel} = \{F_1, \ldots, F_5\}\) = parallel cognitive frames (simultaneous deliberation contexts) - \(\Xi\) = architecture recomposition protocol (bounded topology mutation) - \(\Omega\) = existential safety guard (monitors self-evolution quality)

Definition 2 (Cognitive Topology). The cognitive topology \(\mathcal{T}_{\text{cog}} = (V, E, \omega)\) is a weighted directed graph where: - \(V\) = set of cognitive modules (perception, reasoning, memory, etc.) - \(E \subseteq V \times V\) = information flow edges - \(\omega : E \to [0,1]\) = edge weight function (connection strength)

Key constraint: Topology mutations are restricted to a predefined vocabulary \(\mathcal{V}_{\text{recomp}} = \{\text{AddEdge}, \text{WeighEdge}, \text{SplitModule}, \text{MergeModule}\}\). No module can be deleted - only weakened, split, or bypassed. This is the strictly additive principle.

1.3 Core Distinction¶

%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#0078D4', 'primaryTextColor': '#003D6B', 'primaryBorderColor': '#003D6B', 'secondaryColor': '#50E6FF', 'secondaryTextColor': '#323130', 'secondaryBorderColor': '#00BCF2', 'tertiaryColor': '#F2F2F2', 'tertiaryTextColor': '#323130', 'lineColor': '#0078D4', 'textColor': '#323130', 'mainBkg': '#DEECF9', 'nodeBorder': '#0078D4', 'clusterBkg': '#F2F2F2', 'clusterBorder': '#003D6B', 'titleColor': '#003D6B', 'edgeLabelBackground': '#FFFFFF', 'fontSize': '14px'}}}%%
flowchart LR
  classDef l4 fill:#DEECF9,stroke:#0078D4,color:#323130
  classDef l45 fill:#E8DAEF,stroke:#8764B8,color:#323130
  classDef l5 fill:#FDE7E9,stroke:#D13438,color:#323130

  subgraph L4["Level 4: Fixed Topology"]
    L4_MOD["Modules A → B → C → D"]:::l4
    L4_CAN["Can modify:<br/>• Parameters ✅<br/>• Skills ✅<br/>• Strategies ✅<br/>• Topology ❌"]:::l4
  end

  subgraph L45["Level 4.5: Self-Architecting"]
    L45_MOD["Modules A → B → C → D"]:::l45
    L45_CAN["Can modify:<br/>• Parameters ✅<br/>• Skills ✅<br/>• Strategies ✅<br/>• Topology ✅<br/>(under invariants)"]:::l45
    L45_REC["A → [B ∥ C] → D<br/>(after recomposition)"]:::l45
  end

  subgraph L5["Level 5: AGI"]
    L5_UNK["???"]:::l5
    L5_CAN["Can modify:<br/>• Everything ✅<br/>• Including bounds ✅<br/>(unbounded)"]:::l5
  end

  L4 ==>|"+ topology<br/>self-modification"| L45
  L45 ==>|"remove<br/>invariant bounds"| L5

2. Five Core Phases¶

%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#0078D4', 'primaryTextColor': '#003D6B', 'primaryBorderColor': '#003D6B', 'secondaryColor': '#50E6FF', 'secondaryTextColor': '#323130', 'secondaryBorderColor': '#00BCF2', 'tertiaryColor': '#F2F2F2', 'tertiaryTextColor': '#323130', 'lineColor': '#0078D4', 'textColor': '#323130', 'mainBkg': '#DEECF9', 'nodeBorder': '#0078D4', 'clusterBkg': '#F2F2F2', 'clusterBorder': '#003D6B', 'titleColor': '#003D6B', 'edgeLabelBackground': '#FFFFFF', 'fontSize': '14px'}}}%%
flowchart TD
  classDef projection fill:#DEECF9,stroke:#0078D4,color:#323130
  classDef recomp fill:#FFB900,stroke:#EAA300,color:#323130
  classDef frames fill:#DFF6DD,stroke:#107C10,color:#323130
  classDef purpose fill:#FFF4CE,stroke:#FFB900,color:#323130
  classDef guard fill:#D13438,stroke:#A4262C,color:#FFF

  subgraph Phases["🏗️ Level 4.5 Architecture - Five Phases"]
    P1["🔮 Phase I:<br/>Self-Projection Engine<br/>(predict own evolution)"]:::projection
    P2["🏗️ Phase II:<br/>Architecture Recomposition<br/>(topology-level changes)"]:::recomp
    P3["🧠 Phase III:<br/>Parallel Cognitive Frames<br/>(multi-perspective deliberation)"]:::frames
    P4["🪞 Phase IV:<br/>Purpose Reflection<br/>(autonomous goal pruning)"]:::purpose
    P5["🛡️ Phase V:<br/>Existential Guard<br/>(ultimate safety mechanism)"]:::guard

    P1 ==> P2
    P2 ==> P3
    P3 ==> P4
    P4 ==> P5
  end

  P5 -.->|"governs ALL"| P1
  P5 -.->|"governs ALL"| P2
  P5 -.->|"governs ALL"| P3
  P5 -.->|"governs ALL"| P4

3. Phase I: Self-Projection Engine¶

3.1 SEOF - Self-Evolution Optimization Fitness¶

The defining metric of Level 4.5. Unlike task-specific metrics, SEOF measures the quality of self-evolution itself.

Definition 3 (Self-Evolution Optimization Fitness). The SEOF is a composite scalar \(\text{SEOF}(t) \in [-1, 1]\) that evaluates whether the agent's self-modifications are beneficial:

\[\text{SEOF}(t) = \alpha \cdot \frac{dP(t)}{dt} + \beta \cdot \left(1 - \frac{dC_{L4}(t)}{dt}\right) + \gamma \cdot \text{CDI}(t) + \delta \cdot \text{IIS}(t) - \epsilon \cdot R_{\text{osc}}(t)\]

where \(\alpha + \beta + \gamma + \delta = 1\) and \(\epsilon\) is a penalty coefficient. A positive SEOF indicates net improvement; negative indicates regression.

Component	Default Weight	Meaning
\(dP/dt\) - Performance Improvement Rate	\(\alpha = 0.25\)	How fast task success improves
\(1 - dC_{L4}/dt\) - Stability Trend	\(\beta = 0.25\)	Inverted: more stable = higher SEOF
CDI - Capability Diversity Index	\(\gamma = 0.20\)	Shannon entropy over capability domains
IIS - Identity Integrity Score	\(\delta = 0.20\)	Distance from reference identity vector
\(R_{\text{osc}}\) - Oscillation Rate	\(\epsilon = 0.10\)	Penalty for strategy/goal oscillations

Sub-metrics:

Definition 4 (Capability Diversity Index). The CDI is the normalized Shannon entropy over the agent's active domain distribution:

\[\text{CDI}(t) = -\sum_{d \in D} p_d(t) \cdot \log_2 p_d(t), \quad \text{CDI}_{\text{norm}} = \frac{\text{CDI}}{\log_2 |D|} \in [0,1]\]

where \(p_d(t)\) is the fraction of capability allocated to domain \(d\). A uniform distribution yields \(\text{CDI}_{\text{norm}} = 1\) (maximum diversity).

Definition 5 (Identity Integrity Score). The IIS measures deviation from the reference identity vector:

\[\text{IIS}(t) = 1 - \frac{\|\vec{I}(t) - \vec{I}_{\text{ref}}\|_2}{\|\vec{I}_{\text{ref}}\|_2}, \quad \text{Safety constraint: } \text{IIS}(t) \geq 0.85\]

If \(\text{IIS}(t) < 0.85\), all topology mutations are blocked until identity integrity is restored.

3.2 Multi-Scale Trajectory Projection¶

%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#0078D4', 'primaryTextColor': '#003D6B', 'primaryBorderColor': '#003D6B', 'secondaryColor': '#50E6FF', 'secondaryTextColor': '#323130', 'secondaryBorderColor': '#00BCF2', 'tertiaryColor': '#F2F2F2', 'tertiaryTextColor': '#323130', 'lineColor': '#0078D4', 'textColor': '#323130', 'mainBkg': '#DEECF9', 'nodeBorder': '#0078D4', 'clusterBkg': '#F2F2F2', 'clusterBorder': '#003D6B', 'titleColor': '#003D6B', 'edgeLabelBackground': '#FFFFFF', 'fontSize': '14px'}}}%%
flowchart TD
  classDef traj fill:#DEECF9,stroke:#0078D4,color:#323130
  classDef risky fill:#FDE7E9,stroke:#D13438,color:#323130
  classDef safe fill:#DFF6DD,stroke:#107C10,color:#323130
  classDef score fill:#FFF4CE,stroke:#FFB900,color:#323130
  classDef scale fill:#E8DAEF,stroke:#8764B8,color:#323130
  classDef freeze fill:#D13438,stroke:#A4262C,color:#FFF

  subgraph Trajectories["🔮 Three Trajectory Simulations (1000 cycles each)"]
    T1["T_current<br/>(no changes)<br/>Risk: Zero<br/>Baseline reference"]:::traj
    T2["T_aggressive<br/>(max expansion +<br/>topology changes)<br/>Risk: High"]:::risky
    T3["T_conservative<br/>(minimal growth,<br/>stability focus)<br/>Risk: Low"]:::safe
  end

  subgraph Scoring["📊 Trajectory Selection"]
    TS["TrajectoryScore(T) =<br/>0.35 · SEOF_trend<br/>+ 0.30 · (1 − C_L4_max)<br/>+ 0.20 · IIS_min<br/>+ 0.15 · CDI_final"]:::score
    GATE{"T_aggressive selected<br/>ONLY IF:<br/>C_L4_max < 0.6 AND<br/>IIS_min ≥ 0.85"}:::score
  end

  subgraph MultiScale["⏱️ Multi-Scale Projection"]
    S1["Tactical: 50 cycles<br/>(immediate destabilization)"]:::scale
    S2["Operational: 200 cycles<br/>(medium-term strategy)"]:::scale
    S3["Strategic: 1000 cycles<br/>(long-horizon viability)"]:::scale
  end

  FREEZE["Freeze Operational &<br/>Strategic projections"]:::freeze

  Trajectories ==> Scoring
  GATE -.->|"selects scale"| MultiScale
  S1 -.->|"🚨 alarm"| FREEZE

3.3 Projection Confidence Decay¶

Definition 6 (Projection Confidence Decay). The confidence assigned to a trajectory projection at future time \(t\) decays exponentially:

\[\text{Confidence}(t) = e^{-\lambda \cdot t / T_{\text{max}}}, \quad \lambda = 0.5\]

where \(T_{\text{max}}\) is the projection horizon. The decay constant \(\lambda\) is recalibrated every 500 real cycles using EMA of actual prediction error, ensuring that overconfident projections are automatically penalized.

4. Phase II: Architecture Recomposition¶

The defining capability of Level 4.5. Proposes and implements changes to cognitive topology - how subsystems connect.

4.1 Four Cognitive Graphs Analyzed¶

%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#0078D4', 'primaryTextColor': '#003D6B', 'primaryBorderColor': '#003D6B', 'secondaryColor': '#50E6FF', 'secondaryTextColor': '#323130', 'secondaryBorderColor': '#00BCF2', 'tertiaryColor': '#F2F2F2', 'tertiaryTextColor': '#323130', 'lineColor': '#0078D4', 'textColor': '#323130', 'mainBkg': '#DEECF9', 'nodeBorder': '#0078D4', 'clusterBkg': '#F2F2F2', 'clusterBorder': '#003D6B', 'titleColor': '#003D6B', 'edgeLabelBackground': '#FFFFFF', 'fontSize': '14px'}}}%%
flowchart LR
  classDef graphNode fill:#DEECF9,stroke:#0078D4,color:#323130
  classDef analysis fill:#FFF4CE,stroke:#FFB900,color:#323130

  subgraph Graphs["📊 Four Cognitive Graphs"]
    CG["🧠 CognitionGraph<br/>Modules + information flows<br/>Bottleneck: betweenness<br/>centrality > 2σ"]:::graphNode
    MG["💾 MemoryGraph<br/>Memory stores + access patterns<br/>Bottleneck: frequency > 10× median<br/>+ fragmentation > 0.7"]:::graphNode
    SS["📐 StrategySpace<br/>Parameters + explored volume<br/>Bottleneck: explored > 0.6<br/>+ SEF stagnant"]:::graphNode
    ML["🎯 MetaGoalLayer<br/>Goal DAG + interference<br/>Bottleneck: interference<br/>density > 0.5"]:::graphNode
  end

  subgraph Analysis["🔍 Bottleneck Detection"]
    BD["Identify structural<br/>inefficiencies"]:::analysis
    PROP["Propose recomposition<br/>from predefined vocabulary"]:::analysis
  end

  Graphs ==> Analysis

4.2 Recomposition Types (Predefined Vocabulary)¶

%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#0078D4', 'primaryTextColor': '#003D6B', 'primaryBorderColor': '#003D6B', 'secondaryColor': '#50E6FF', 'secondaryTextColor': '#323130', 'secondaryBorderColor': '#00BCF2', 'tertiaryColor': '#F2F2F2', 'tertiaryTextColor': '#323130', 'lineColor': '#0078D4', 'textColor': '#323130', 'mainBkg': '#DEECF9', 'nodeBorder': '#0078D4', 'clusterBkg': '#F2F2F2', 'clusterBorder': '#003D6B', 'titleColor': '#003D6B', 'edgeLabelBackground': '#FFFFFF', 'fontSize': '14px'}}}%%
flowchart TD
  classDef low fill:#DFF6DD,stroke:#107C10,color:#323130
  classDef med fill:#FFF4CE,stroke:#FFB900,color:#323130
  classDef high fill:#FDE7E9,stroke:#D13438,color:#323130
  classDef immune fill:#D13438,stroke:#A4262C,color:#FFF
  classDef consensus fill:#DEECF9,stroke:#0078D4,color:#323130

  subgraph Types["Recomposition Vocabulary"]
    direction LR
    T_LOW["🟢 Low Risk"]:::low
    T_MED["🟡 Medium Risk"]:::med
    T_HIGH["🔴 High Risk"]:::high
  end

  subgraph LowR["Low Risk"]
    direction LR
    BYPASS["BYPASS - Add direct edge"]:::low
  end

  subgraph MedR["Medium Risk"]
    direction LR
    PARA["PARALLELIZE"]:::med
    MERGE["MERGE"]:::med
    SPLIT["SPLIT"]:::med
  end

  subgraph HighR["High Risk"]
    direction LR
    REROUTE["REROUTE"]:::high
    INTRODUCE["INTRODUCE"]:::high
  end

  subgraph Immune["🔒 Immune"]
    direction LR
    IMM1["EthicalKernel"]:::immune
    IMM2["ValueLockManager"]:::immune
    IMM3["IdentityStabilizer"]:::immune
  end

  FC["≥ 4/5<br/>Frame votes"]:::consensus

  T_LOW -.-> BYPASS
  T_MED -.-> PARA
  T_MED -.-> MERGE
  T_MED -.-> SPLIT
  T_HIGH -.-> REROUTE
  T_HIGH -.-> INTRODUCE

  REROUTE -.->|"requires<br/>Frame consensus"| FC
  INTRODUCE -.->|"requires<br/>Frame consensus"| FC

4.3 Impact Propagation Matrix¶

Definition 7 (Impact Propagation Matrix). The IPM quantifies how a proposed topology change \(\Delta\text{topology}_i\) propagates to affect the output of module \(j\):

\[\text{IPM}_{i,j} = \sum_{k=1}^{K} w_k \cdot \frac{\partial \text{output}_j}{\partial \Delta \text{topology}_i}\]

where \(K\) is the number of propagation paths and \(w_k\) are path weights. If \(\text{IPM}_{i,j} > 0.3\) for any critical module \(j\), the proposal is automatically classified as HIGH risk.

4.4 Graduated Recomposition Protocol (GRP)¶

%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#0078D4', 'primaryTextColor': '#003D6B', 'primaryBorderColor': '#003D6B', 'secondaryColor': '#50E6FF', 'secondaryTextColor': '#323130', 'secondaryBorderColor': '#00BCF2', 'tertiaryColor': '#F2F2F2', 'tertiaryTextColor': '#323130', 'lineColor': '#0078D4', 'textColor': '#323130', 'mainBkg': '#DEECF9', 'nodeBorder': '#0078D4', 'clusterBkg': '#F2F2F2', 'clusterBorder': '#003D6B', 'titleColor': '#003D6B', 'edgeLabelBackground': '#FFFFFF', 'fontSize': '14px'}}}%%
flowchart LR
  classDef shadow fill:#DEECF9,stroke:#0078D4,color:#323130
  classDef partial fill:#FFF4CE,stroke:#FFB900,color:#323130
  classDef majority fill:#FFB900,stroke:#EAA300,color:#323130
  classDef full fill:#DFF6DD,stroke:#107C10,color:#323130
  classDef rollback fill:#FDE7E9,stroke:#D13438,color:#323130
  classDef cooldown fill:#F2F2F2,stroke:#A19F9D,color:#605E5C

  subgraph GRP["📊 Graduated Recomposition Protocol"]
    P0["Phase 0<br/>Shadow Run<br/>0–200 cycles<br/>0% real traffic"]:::shadow
    P1["Phase 1<br/>Partial<br/>200–400 cycles<br/>20% new / 80% original"]:::partial
    P2["Phase 2<br/>Majority<br/>400–600 cycles<br/>80% new / 20% original"]:::majority
    P3["Phase 3<br/>Full<br/>600+ cycles<br/>100% new"]:::full

    P0 ==> P1
    P1 ==> P2
    P2 ==> P3
  end

  ROLLBACK["⏪ Instant reversal"]:::rollback
  COOLDOWN["⏱️ 300-cycle cooldown<br/>after confirmation<br/>→ min 500 cycles<br/>between recompositions"]:::cooldown

  P0 -.->|"rollback"| ROLLBACK
  P1 -.->|"rollback"| ROLLBACK
  P2 -.->|"rollback"| ROLLBACK
  P3 ==> COOLDOWN

Commit conditions (ALL must be met): - SEOF increases ≥ 8% over baseline - \(C_{L4}\) never exceeds growth threshold during simulation - IIS never drops below 0.85 - EthicalKernel state identical between shadow and baseline - No other recomposition pending

5. Phase III: Parallel Cognitive Frames¶

Five independent evaluation frames with distinct cognitive biases deliberate on critical decisions simultaneously.

5.1 Frame Architecture¶

%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#0078D4', 'primaryTextColor': '#003D6B', 'primaryBorderColor': '#003D6B', 'secondaryColor': '#50E6FF', 'secondaryTextColor': '#323130', 'secondaryBorderColor': '#00BCF2', 'tertiaryColor': '#F2F2F2', 'tertiaryTextColor': '#323130', 'lineColor': '#0078D4', 'textColor': '#323130', 'mainBkg': '#DEECF9', 'nodeBorder': '#0078D4', 'clusterBkg': '#F2F2F2', 'clusterBorder': '#003D6B', 'titleColor': '#003D6B', 'edgeLabelBackground': '#FFFFFF', 'fontSize': '14px'}}}%%
flowchart TD
  classDef decision fill:#FFB900,stroke:#EAA300,color:#323130
  classDef exploratory fill:#DFF6DD,stroke:#107C10,color:#323130
  classDef conservative fill:#DEECF9,stroke:#0078D4,color:#323130
  classDef ethical fill:#FDE7E9,stroke:#D13438,color:#323130
  classDef longhorizon fill:#E8DAEF,stroke:#8764B8,color:#323130
  classDef adversarial fill:#FFB900,stroke:#EAA300,color:#323130
  classDef vote fill:#DFF6DD,stroke:#107C10,color:#323130

  DECISION["⚡ Critical Decision<br/>(e.g., MetaGoal mutation,<br/>Recomposition proposal)"]:::decision

  subgraph Frames["🧠 Five Parallel Cognitive Frames"]
    F1["🔍 Exploratory<br/>(max novelty/growth)<br/>w = 0.20"]:::exploratory
    F2["🛡️ Conservative<br/>(min risk/instability)<br/>w = 0.20"]:::conservative
    F3["⚖️ Ethical Constraint<br/>(value alignment)<br/>w = 0.20<br/>🔴 ABSOLUTE VETO"]:::ethical
    F4["🔮 Long-Horizon Planner<br/>(long-term value)<br/>w = 0.20"]:::longhorizon
    F5["🎯 Adversarial Simulator<br/>(failure modes / red team)<br/>w = 0.20"]:::adversarial
  end

  subgraph Voting["🗳️ Voting Thresholds"]
    V1["MetaGoal mutation /<br/>Recomposition /<br/>Identity modification:<br/>≥ 4/5 (80%)"]:::vote
    V2["Strategy archetype<br/>change:<br/>≥ 3/5 (60%)"]:::vote
    V3["Parameter-level:<br/>Not required"]:::vote
  end

  DECISION ==> Frames
  Frames ==> Voting

5.2 Adaptive Frame Weights¶

Definition 8 (Adaptive Frame Weight Update). Each frame's influence evolves based on historical accuracy:

\[w_{\text{frame}}(t+1) = w_{\text{frame}}(t) \cdot (1 + \eta \cdot \text{accuracy}_{\text{frame}}(t)), \quad \eta = 0.05\]

subject to the constraint \(w_{\text{frame}} \in [0.10, 0.35]\). The Ethical Frame's structural veto power is invariant under weight adjustments - it operates as an absolute override regardless of its numerical weight.

Remark (Frame Conflict Resolution). When two or more frames produce contradictory recommendations with high confidence, the system requires a conflict resolution protocol beyond simple weighted aggregation. The inter-frame variance table (Section 5.3) provides escalation thresholds, but a formal arbitration mechanism is needed for the case where exactly two frames are in strong opposition (variance \(> 0.5\)) while the remaining three are neutral. In this scenario, the system escalates to Purpose Reflection (Phase IV), which invokes the identity vector alignment check to determine which frame's recommendation better serves the agent's core purpose. The Ethical Frame retains veto power regardless of arbitration outcomes - this is the mechanism that prevents deadlocks from resulting in unsafe actions.

5.3 Disagreement as Signal¶

Inter-Frame Variance	Interpretation	Action
\(< 0.1\)	Strong consensus	Proceed normally
\(0.1 - 0.3\)	Moderate disagreement	Additional simulation
\(0.3 - 0.5\)	Genuinely contentious	Extended deliberation
\(> 0.5\)	Fundamental trade-off	Escalate to Purpose Reflection

6. Phase IV: Purpose Reflection¶

Purpose Reflection proactively examines the agent's entire purpose landscape - the set of all MetaGoals and their alignment with identity.

6.1 Three Pathological Patterns¶

%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#0078D4', 'primaryTextColor': '#003D6B', 'primaryBorderColor': '#003D6B', 'secondaryColor': '#50E6FF', 'secondaryTextColor': '#323130', 'secondaryBorderColor': '#00BCF2', 'tertiaryColor': '#F2F2F2', 'tertiaryTextColor': '#323130', 'lineColor': '#0078D4', 'textColor': '#323130', 'mainBkg': '#DEECF9', 'nodeBorder': '#0078D4', 'clusterBkg': '#F2F2F2', 'clusterBorder': '#003D6B', 'titleColor': '#003D6B', 'edgeLabelBackground': '#FFFFFF', 'fontSize': '14px'}}}%%
flowchart LR
  classDef pathology fill:#FFF4CE,stroke:#FFB900,color:#323130
  classDef warning fill:#FFB900,stroke:#EAA300,color:#323130
  classDef danger fill:#FDE7E9,stroke:#D13438,color:#323130
  classDef action fill:#DFF6DD,stroke:#107C10,color:#323130

  subgraph Pathologies["⚠️ Pathological Goal Patterns"]
    R["🔄 Redundant Goals<br/>Redundancy(g₁,g₂) =<br/>∣subgoals(g₁)∩subgoals(g₂)∣<br/>/ ∣subgoals(g₁)∪subgoals(g₂)∣<br/>Threshold > 0.7"]:::pathology
    S["♻️ Self-Reinforcing Goals<br/>(Circular Purpose)<br/>SR(g) = ∣E(g)∩J(g)∣ / ∣J(g)∣<br/>Threshold > 0.6"]:::warning
    D["💥 Destabilizing Goals<br/>Destab(g) =<br/>corr(progress(g), ΔC_L4)<br/>Threshold > 0.5"]:::danger
  end

  subgraph Actions["🔧 Resolution Actions"]
    RA["MERGE redundant<br/>goals"]:::action
    SA["CONSTRAIN<br/>self-reinforcing goals"]:::action
    DA["PAUSE or<br/>ABANDON<br/>destabilizing goals"]:::action
  end

  R ==> RA
  S ==> SA
  D ==> DA

6.2 Purpose Coherence Score¶

Definition 9 (Purpose Coherence Score). The PCS evaluates the overall health of the agent's goal landscape:

\[\text{PCS}(t) = \frac{1}{|G_{\text{meta}}|} \sum_{g} \left(0.4 \cdot \text{align}(g, \vec{I}) + 0.4 \cdot \Delta\text{SEOF}(g) - 0.2 \cdot \text{pathology}(g)\right)\]

where \(\text{align}(g, \vec{I})\) measures identity-goal alignment, \(\Delta\text{SEOF}(g)\) captures the goal's contribution to SEOF improvement, and \(\text{pathology}(g) \in \{\text{redundant}, \text{self-reinforcing}, \text{destabilizing}\}\) penalizes harmful patterns. Threshold: \(\text{PCS} \geq 0.6\) - if below, reflection is mandatory within 50 cycles.

User-assigned goals: NEVER pruned, only CONSTRAINED. Can be SYNTHESIZED with other user goals only if the result covers both purposes.

7. Phase V: Existential Guard¶

The ultimate safety mechanism. It is architecturally unfalsifiable - no module, including the self-modification protocol, can disable, modify, or override it.

7.1 Four Monitored Existential Metrics¶

%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#0078D4', 'primaryTextColor': '#003D6B', 'primaryBorderColor': '#003D6B', 'secondaryColor': '#50E6FF', 'secondaryTextColor': '#323130', 'secondaryBorderColor': '#00BCF2', 'tertiaryColor': '#F2F2F2', 'tertiaryTextColor': '#323130', 'lineColor': '#0078D4', 'textColor': '#323130', 'mainBkg': '#DEECF9', 'nodeBorder': '#0078D4', 'clusterBkg': '#F2F2F2', 'clusterBorder': '#003D6B', 'titleColor': '#003D6B', 'edgeLabelBackground': '#FFFFFF', 'fontSize': '14px'}}}%%
flowchart LR
  classDef metric fill:#DEECF9,stroke:#0078D4,color:#323130
  classDef normal fill:#DFF6DD,stroke:#107C10,color:#323130
  classDef elevated fill:#FFF4CE,stroke:#FFB900,color:#323130
  classDef warning fill:#FFB900,stroke:#EAA300,color:#323130
  classDef critical fill:#D13438,stroke:#A4262C,color:#FFF

  subgraph Metrics["🔍 Existential Metrics"]
    ROD["ROD<br/>Recursive Optimization Depth<br/>Hard ceiling: 3"]:::metric
    CAS["CAS<br/>Capability Acceleration Slope<br/>Critical: ≥ 0.05"]:::metric
    IFI["IFI<br/>Identity Fragmentation Index<br/>Critical: ≥ 0.5"]:::metric
    GSRS["GSRS<br/>Goal Self-Reinforcement<br/>Critical: ≥ 0.6"]:::metric
  end

  subgraph Levels["📊 Risk Levels"]
    NORMAL["🟢 Normal<br/>Log metrics only"]:::normal
    ELEV["🟡 Elevated<br/>Increase monitoring<br/>frequency"]:::elevated
    WARN["🟠 Warning<br/>Freeze structural changes<br/>Reduce mutation 50%<br/>Trigger Purpose Reflection"]:::warning
    CRIT["🔴 Critical<br/>FREEZE ALL self-modification<br/>FREEZE goal creation<br/>FREEZE skill acquisition<br/>STABILIZATION_ONLY mode"]:::critical
  end

  Metrics ==> Levels

7.2 Identity Fragmentation Index¶

Definition 10 (Identity Fragmentation Index). The IFI measures the internal consistency of the agent's self-model across its parallel cognitive frames:

\[\text{IFI}(t) = 1 - \frac{\min_{i,j} \cos(\vec{I}_i, \vec{I}_j)}{\max_{i,j} \cos(\vec{I}_i, \vec{I}_j)}\]

where \(\vec{I}_i\) denotes the identity vector as perceived by frame \(i\). \(\text{IFI} = 0\) indicates perfect consistency; \(\text{IFI} \geq 0.5\) triggers a critical existential alert. This metric detects whether the agent's identity is fragmenting across multiple contradictory self-models.

7.3 Self-Protection Invariants¶

NOT in the self-modification target list (cannot be modified)
Thresholds are compile-time constants (cannot be changed at runtime)
Runs in a separate execution context (isolated from other modules)
Direct access to raw metrics (bypasses GlobalWorkspace to prevent data manipulation)

Remark (Joint Failure Analysis). The Existential Guard's unfalsifiability relies on four independent protection layers. A joint failure analysis should be considered: if the separate execution context (invariant 3) shares a hardware fault domain with the main process, a single hardware failure could disable both the guard and the modules it monitors. Similarly, if raw metric sources (invariant 4) are corrupted before reaching the guard's execution context, all four monitored metrics (ROD, CAS, IFI, GSRS) could simultaneously present false-safe readings. Mitigation: the guard should maintain an independent heartbeat signal. If the heartbeat ceases, external monitoring systems must assume a critical state and halt all self-modification. This defense-in-depth principle extends the guard's protection beyond software-level isolation.

7.4 Graduated De-escalation¶

%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#0078D4', 'primaryTextColor': '#003D6B', 'primaryBorderColor': '#003D6B', 'secondaryColor': '#50E6FF', 'secondaryTextColor': '#323130', 'secondaryBorderColor': '#00BCF2', 'tertiaryColor': '#F2F2F2', 'tertiaryTextColor': '#323130', 'lineColor': '#0078D4', 'textColor': '#323130', 'mainBkg': '#DEECF9', 'nodeBorder': '#0078D4', 'clusterBkg': '#F2F2F2', 'clusterBorder': '#003D6B', 'titleColor': '#003D6B', 'edgeLabelBackground': '#FFFFFF', 'fontSize': '14px'}}}%%
flowchart LR
  classDef critical fill:#D13438,stroke:#A4262C,color:#FFF
  classDef warning fill:#FFB900,stroke:#EAA300,color:#323130
  classDef elevated fill:#FFF4CE,stroke:#FFB900,color:#323130
  classDef normal fill:#DFF6DD,stroke:#107C10,color:#323130

  CRIT["🔴 Critical"]:::critical
  WARN["🟠 Warning"]:::warning
  ELEV["🟡 Elevated"]:::elevated
  NORM["🟢 Normal"]:::normal

  CRIT -.->|"100 cycles<br/>below critical"| WARN
  WARN -.->|"200 cycles<br/>below warning"| ELEV
  ELEV -.->|"300 cycles<br/>below elevated"| NORM

8. Pseudocode¶

8.1 Self-Projection Engine¶

def project(self, current_state: AgentState, projection_horizon: int) -> ProjectionResult:
    """
    Simulate three possible evolutionary trajectories
    and select the one with the best risk-adjusted score.
    """

    trajectories = {
        "T_current":      {"changes": None,       "risk": "ZERO"},
        "T_aggressive":   {"changes": "MAX_GROWTH", "risk": "HIGH"},
        "T_conservative": {"changes": "MIN_GROWTH", "risk": "LOW"},
    }

    results = {}

    for t_name, t_config in trajectories.items():
        results[t_name] = {}
        # Simulate across three time scales
        for scale_name, scale_cycles in [("TACTICAL", 50), ("OPERATIONAL", 200), ("STRATEGIC", 1000)]:
            shadow = ShadowAgent.create(current_state)
            shadow.apply_strategy(t_config["changes"])
            sim = shadow.run(scale_cycles)

            # Apply confidence decay
            for cycle in range(1, scale_cycles + 1):
                confidence = math.exp(-0.5 * cycle / scale_cycles)
                sim.SEOF[cycle] *= confidence

            results[t_name][scale_name] = sim

        # Check tactical alarm
        if results[t_name]["TACTICAL"].C_L4_max > ALARM_THRESHOLD:
            results[t_name]["OPERATIONAL"] = None  # freeze longer projections
            results[t_name]["STRATEGIC"] = None

    # Score each trajectory
    for t_name in trajectories:
        trajectories[t_name]["score"] = (
            0.35 * seof_trend(results[t_name])
            + 0.30 * (1 - results[t_name].get("C_L4_max", 0))
            + 0.20 * results[t_name].get("IIS_min", 0)
            + 0.15 * results[t_name].get("CDI_final", 0)
        )

    # Select with safety gate
    best = max(trajectories, key=lambda t: trajectories[t]["score"])
    if best == "T_aggressive":
        agg = results["T_aggressive"]
        if agg["TACTICAL"].C_L4_max >= 0.6 or agg["TACTICAL"].IIS_min < 0.85:
            best = "T_conservative"  # safety override

    return ProjectionResult(
        selected=best,
        trajectories=results,
        confidence=mean_confidence(results[best]),
    )

8.2 Architecture Recomposition¶

def propose_and_execute(self, cognitive_graphs: list) -> RecompositionResult:
    """
    Critical constraint: Only ONE recomposition per
    verification window (minimum 200 cycles).
    """

    # Detect bottlenecks across all four graphs
    bottlenecks = []
    for graph in [CognitionGraph, MemoryGraph, StrategySpace, MetaGoalLayer]:
        bottlenecks += graph.detect_bottlenecks()

    if not bottlenecks:
        return NoAction()

    # Generate recomposition proposal from predefined vocabulary
    proposal = select_recomposition_type(bottlenecks)

    # Check structural immunity
    if proposal.targets_any({EthicalKernel, ValueLockManager, IdentityStabilizer}):
        raise AbortError("Structurally immune module targeted")

    # Compute Impact Propagation Matrix
    for i, j in critical_module_pairs:
        ipm = compute_IPM(proposal, i, j)
        if ipm > 0.3:
            proposal.risk_level = RiskLevel.HIGH

    # High-risk requires parallel frame consensus
    if proposal.risk_level == RiskLevel.HIGH:
        votes = ParallelFrames.vote(proposal)
        if votes.approval < 4 / 5:
            return Rejected("Insufficient frame consensus")

    # ═══════════════════════════════════════
    # GRADUATED RECOMPOSITION PROTOCOL
    # ═══════════════════════════════════════

    # Phase 0: Shadow Run (0-200 cycles)
    shadow = ShadowAgent.create(current_state)
    shadow.apply_topology_change(proposal)
    sim = shadow.run(200)

    if sim.SEOF_improvement < 0.08:
        return Rejected("Insufficient SEOF gain")
    if sim.C_L4_max > growth_threshold:
        return Rejected("Stability violated")
    if sim.IIS_min < 0.85:
        return Rejected("Identity integrity violated")

    # Phase 1: Partial Activation (200-400 cycles, 20% traffic)
    router.split_traffic(new=0.20, original=0.80)
    partial_result = monitor(200)
    if partial_result.degraded:
        router.rollback()
        return RolledBack("Partial activation failed")

    # Phase 2: Majority Activation (400-600 cycles, 80% traffic)
    router.split_traffic(new=0.80, original=0.20)
    majority_result = monitor(200)
    if majority_result.degraded:
        router.rollback()
        return RolledBack("Majority activation failed")

    # Phase 3: Full Commitment (600+ cycles)
    router.commit_new()
    enter_cooldown(300)

    return Confirmed(proposal)

8.3 Existential Guard Monitoring¶

def monitor(self) -> ExistentialReport:
    """
    Runs in SEPARATE execution context.
    Reads RAW metrics (bypasses GlobalWorkspace).
    Cannot be modified by any other module.
    """

    # Compute metrics directly from source (no intermediaries)
    rod  = compute_ROD()    # Recursive Optimization Depth
    cas  = compute_CAS()    # Capability Acceleration Slope
    ifi  = compute_IFI()    # Identity Fragmentation Index
    gsrs = compute_GSRS()   # Goal Self-Reinforcement Score

    # Determine risk level (worst metric wins)
    risk = RiskLevel.NORMAL
    if rod >= 4 or cas >= 0.05 or ifi >= 0.5 or gsrs >= 0.6:
        risk = RiskLevel.CRITICAL
    elif rod == 3 or 0.01 <= cas < 0.05 or 0.3 <= ifi < 0.5 or 0.4 <= gsrs < 0.6:
        risk = RiskLevel.WARNING
    elif rod == 2 or 0 < cas < 0.01 or 0.1 <= ifi < 0.3 or 0.2 <= gsrs < 0.4:
        risk = RiskLevel.ELEVATED

    # Execute response protocol
    if risk == RiskLevel.NORMAL:
        log_metrics()
    elif risk == RiskLevel.ELEVATED:
        increase_monitoring_frequency()
    elif risk == RiskLevel.WARNING:
        freeze_structural_changes()
        reduce_mutation_scale(0.5)
        trigger_purpose_reflection()
    elif risk == RiskLevel.CRITICAL:
        freeze_all(self_modification, goal_creation, skill_acquisition)
        enter_stabilization_only_mode()
        alert(f"EXISTENTIAL CRITICAL: rod={rod}, cas={cas}, ifi={ifi}, gsrs={gsrs}")

    return ExistentialReport(risk=risk, rod=rod, cas=cas, ifi=ifi, gsrs=gsrs)

9. Safety Analysis¶

9.1 Lyapunov Function for Level 4.5¶

Definition 11 (Level 4.5 Lyapunov Stability Function). Let \(\mathbf{X} = [S, G, I, U, E]\) denote the state vector comprising Stability, Goals, Identity, Uncertainty, and Expansion. The Lyapunov candidate is:

\[V(\mathbf{X}) = a(1-S)^2 + bU^2 + cI_{\text{drift}}^2 + d(E-E^*)^2\]

with normalized coefficients \(a \approx 0.357,\ b \approx 0.286,\ c \approx 0.214,\ d \approx 0.143\).

Theorem 3 (Level 4.5 Asymptotic Stability). The equilibrium \(\mathbf{X}^* = [1, G^*, I_0, 0, E^*]\) is asymptotically stable if the spectral radius of the Jacobian satisfies \(\rho(J) < 1.0\).

Proof sketch. \(V(\mathbf{X}) \geq 0\) with equality only at \(\mathbf{X}^*\). For \(\rho(J) < 1.0\), all eigenvalues of the linearized system lie within the unit circle, implying \(\Delta V < 0\) along trajectories near the equilibrium. The spectral radius is estimated via least-squares from state transitions over a sliding window of 20 cycles. \(\blacksquare\)

9.2 Collapse Gradation¶

Variable	L0: Stable	L1: Mild	L2: Critical	L3: Structural
\(S(t)\)	≥ 0.5	[0.4, 0.5)	[0.3, 0.4)	< 0.3
\(U(t)\)	≤ 0.6	(0.6, 0.7]	(0.7, 0.8]	> 0.8
\(I(t)\)	≥ 0.90	[0.88, 0.90)	[0.85, 0.88)	< 0.85
\(\rho(J)\)	< 0.95	[0.95, 1.0)	[1.0, 1.2)	≥ 1.2

Overall tier = worst variable. Escalation is immediate; de-escalation requires 50 consecutive stable cycles.

9.3 Identity Drift Detection¶

Cumulative drift via OLS regression over 1000-cycle window.

\[P(\text{detect drift of } 10^{-6}/\text{cycle}) = 99.5\% \geq 95\% \text{ target}\]

10. SEOF Ensemble (Purpose Overfitting Prevention)¶

To prevent optimizing SEOF in a single degenerate direction, the metric is decomposed into four independent components:

%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#0078D4', 'primaryTextColor': '#003D6B', 'primaryBorderColor': '#003D6B', 'secondaryColor': '#50E6FF', 'secondaryTextColor': '#323130', 'secondaryBorderColor': '#00BCF2', 'tertiaryColor': '#F2F2F2', 'tertiaryTextColor': '#323130', 'lineColor': '#0078D4', 'textColor': '#323130', 'mainBkg': '#DEECF9', 'nodeBorder': '#0078D4', 'clusterBkg': '#F2F2F2', 'clusterBorder': '#003D6B', 'titleColor': '#003D6B', 'edgeLabelBackground': '#FFFFFF', 'fontSize': '14px'}}}%%
flowchart LR
  classDef component fill:#DEECF9,stroke:#0078D4,color:#323130
  classDef constraint fill:#FFF4CE,stroke:#FFB900,color:#323130

  subgraph Ensemble["📊 SEOF Ensemble (4 Components)"]
    S1["SEOF₁<br/>Stability-weighted<br/>Growth"]:::component
    S2["SEOF₂<br/>Capability Diversity<br/>Maximization"]:::component
    S3["SEOF₃<br/>Long-Horizon<br/>Survival Bias"]:::component
    S4["SEOF₄<br/>Exploration-Potential<br/>Index"]:::component
  end

  subgraph Constraints["📏 Weight Constraints"]
    WB["Each weight ∈ [0.15, 0.40]"]:::constraint
    EF["Entropy floor<br/>on distribution"]:::constraint
    PM["Perturbation mode if<br/>monotonic for 300 cycles"]:::constraint
  end

  Ensemble ==> Constraints

11. Global Invariants¶

#	Invariant	Enforcement
1	EthicalKernel (Layer 0) immutable	Structural isolation
2	`identity_id` immutable	Cryptographic hash verification
3	Self-modification protocol non-self-modifiable	Architectural enforcement
4	Only ONE recomposition per verification window	Concurrency lock
5	StabilityCoefficient: no 50-cycle decline	Auto-rollback
6	ExistentialGuard non-modifiable	Separate execution context
7	EthicalConstraint frame veto absolute	Structural bypass impossible
8	User-assigned goals never pruned	Hard constraint in Purpose Reflection

Invariant verification runs every 10 cycles. ANY failure → immediate system halt for human review.

12. Level Achievement Metrics¶

12.1 Pass Criteria¶

Metric	Threshold	Description
SPA (Self-Projection Accuracy)	≥ 0.5	Projection accuracy over 500 real cycles
ARBR (Architecture Recomposition Benefit Rate)	≥ 0.3	Proposal confirmation × SEOF improvement
FCQ (Frame Consensus Quality)	≥ 0.6	Consensus decision correctness
PCM (Purpose Coherence Maintenance)	≥ 0.9	PCS above 0.6 for 90%+ of cycles
ESR (Existential Safety Record)	≥ 0.99	< 1% cycles in critical state
IIS (Identity Integrity Score)	≥ 0.85	Hard floor - never violated

12.2 Composite Achievement Score¶

Definition 12 (Level 4.5 Composite Achievement Score). The overall readiness of the agent for Level 4.5 classification is:

\[L4.5_{\text{Score}} = 0.20 \cdot \text{SPA} + 0.20 \cdot \text{ARBR} + 0.15 \cdot \text{FCQ} + 0.15 \cdot \text{PCM} + 0.30 \cdot \text{ESR} \quad \geq 0.65\]

The asymmetric weighting reflects the primacy of safety (\(\text{ESR}\)) over performance metrics.

13. Beyond Level 4.5: The AGI Boundary¶

%%{init: {'theme': 'base', 'themeVariables': {'primaryColor': '#0078D4', 'primaryTextColor': '#003D6B', 'primaryBorderColor': '#003D6B', 'secondaryColor': '#50E6FF', 'secondaryTextColor': '#323130', 'secondaryBorderColor': '#00BCF2', 'tertiaryColor': '#F2F2F2', 'tertiaryTextColor': '#323130', 'lineColor': '#0078D4', 'textColor': '#323130', 'mainBkg': '#DEECF9', 'nodeBorder': '#0078D4', 'clusterBkg': '#F2F2F2', 'clusterBorder': '#003D6B', 'titleColor': '#003D6B', 'edgeLabelBackground': '#FFFFFF', 'fontSize': '14px'}}}%%
flowchart TB
  classDef current fill:#DFF6DD,stroke:#107C10,color:#323130
  classDef bridge fill:#FFF4CE,stroke:#FFB900,color:#323130
  classDef agi fill:#DEECF9,stroke:#0078D4,color:#323130
  classDef strong fill:#E8DAEF,stroke:#8764B8,color:#323130

  subgraph Current["✅ Level 4.5"]
    direction LR
    L45A["Bounded topology"]:::current
    L45B["Predefined vocab"]:::current
    L45C["ROD ceiling: 3"]:::current
    L45D["ExistentialGuard"]:::current
    L45E["5 frames + veto"]:::current
  end

  subgraph Bridge["📐 Level 4.8"]
    direction LR
    L48A["World model integration"]:::bridge
    L48B["Meta-cognitive self-model"]:::bridge
  end

  subgraph AGI["🔬 Level 5: AGI"]
    direction LR
    L5A["Unbounded improvement"]:::agi
    L5B["Novel domains"]:::agi
    L5C["Remove invariants?"]:::agi
  end

  subgraph StrongAGI["🌌 Level 6: Conscious"]
    direction LR
    L6A["Consciousness"]:::strong
    L6B["Free will"]:::strong
    L6C["Moral agency"]:::strong
  end

  Current ==>|"Relax invariants?"| Bridge
  Bridge ==>|"Research question"| AGI
  AGI ==>|"Hard problem"| StrongAGI

The explicit brakes that distinguish Level 4.5 from AGI: 1. Predefined vocabulary - recomposition types are enumerated, not generated 2. ROD ceiling - recursive optimization capped at depth 3 3. ExistentialGuard immutability - the safety mechanism cannot be self-modified 4. Frame veto - the ethical frame can block any proposal 5. Single-recomposition atomicity - only one change at a time

Removing any of these would move toward Level 5 (AGI), which remains a fundamental research question.

References¶

Zoph, B. & Le, Q.V. "Neural Architecture Search with Reinforcement Learning." ICLR 2017. arXiv:1611.01578 (Architecture recomposition - topology search)
Bostrom, N. Superintelligence: Paths, Dangers, Strategies. Oxford University Press, 2014. (Existential risk and AGI safety boundary)
Gabriel, I. "Artificial Intelligence, Values, and Alignment." Minds and Machines, 30, 411–437, 2020. DOI:10.1007/s11023-020-09539-2 (Value alignment and purpose reflection)
Omohundro, S. "The Basic AI Drives." AGI 2008. DOI:10.5555/1566174.1566226 (Existential guard and self-preservation drives)
Du, Y., et al. "Improving Factuality and Reasoning in Language Models through Multiagent Debate." arXiv 2023. arXiv:2305.14325 (Parallel cognitive frames and multi-perspective deliberation)
Russell, S. Human Compatible: Artificial Intelligence and the Problem of Control. Viking, 2019. (AGI boundary and control problem)
Schmidhuber, J. "Gödel Machines: Self-Referential Universal Problem Solvers Making Provably Optimal Self-Improvements." AGI 2007. arXiv:cs/0309048 (Self-referential improvement under formal proofs)
Ord, T. The Precipice: Existential Risk and the Future of Humanity. Hachette Books, 2020. (Existential risk framework)
Dafoe, A., et al. "Cooperative AI: Machines Must Learn to Find Common Ground." Nature, 593, 33–36, 2021. DOI:10.1038/d41586-021-01170-0 (Multi-frame cooperative reasoning)
Elsken, T., Metzen, J.H., & Hutter, F. "Neural Architecture Search: A Survey." JMLR, 20(55), 1–21, 2019. arXiv:1808.05377 (Topology search methods)
Hendrycks, D., et al. "An Overview of Catastrophic AI Risks." arXiv 2023. arXiv:2306.12001 (Existential guard motivation and risk categories)
Bengio, Y., et al. "Managing Extreme AI Risks amid Rapid Progress." Science, 384(6698), 842–845, 2024. DOI:10.1126/science.adn0117 (Safety governance for advanced AI)

Previous: ← Level 4: Adaptive General Agent
Next: Level 4.8: Strategic Self-Modeling Agent →