Self Model
To effectively adapt to the environment and interact with it, an embodied agent needs to understand not only the external environment but also its internal self. Inspired by the human cognition of self, we propose the self model for embodied AI. The self model serves as a core component of embodied AI systems by integrating perception, prediction, memory, and decision modules, thereby enabling agents with diverse embodiments to perform various tasks such as manipulation, navigation, and question answering.
The Concept of Self: From Human to Agent
For humans, the concept of self is not only proof of one’s own existence but also the foundation of perception, thought, and emotion. Biologically, the core neural mechanisms of self include: body schema for structural representation of self, forward model for dynamical/causal prediction, inverse model that maps desired goals to control commands, mechanisms for agency that distinguish self from environment, and a perceptual‑memory model that tracks temporally extended self‑states and interaction histories. This unified, subjective framework is equally applicable and necessary in embodied AI. However, existing research mostly focuses on isolated components, lacking an integrated framework that fuses these functions, which prevents agents from achieving true autonomy, adaptation, and continuous learning.
Therefore, to replicate human-like self-awareness in robots, we start from four fundamental pillars: self-perception (awareness of its own body and environment), self-prediction (anticipating action outcomes), self-memory (maintaining internal state continuity), and self-decision (selecting feasible and goal-directed actions). These pillars fill the gaps in current embodied capabilities, enabling robots not only to understand the external world but also to form a coherent internal representation of “themselves”.
Self model in embodied AI
The Self Model is a unified internal representation that enables an embodied agent to understand its own body, capabilities, memories, and decision processes. It bridges perception, memory, prediction, and decision into a closed‑loop cognitive architecture, allowing the agent to reason not only about the external world but also about itself — its actions, limitations, and consequences.
Four core modules:
- Perception – Real‑time awareness of body state (joints, collisions, morphology).
- Memory – A 3D semantic self‑map that records spatial and experiential history.
- Prediction – Forecasting action outcomes (e.g., grasp success) and diagnosing failures.
- Decision – Goal‑to‑action mapping guided by self‑identity and predicted results.
These modules form a perception–memory–prediction–decision loop, enabling continuous self‑calibration and adaptive behavior in complex environments.
The L0–L5 Hierarchy
To systematically evaluate self‑modeling capabilities, we propose a six‑level hierarchy (L0–L5) that characterizes developmental stages from non‑self‑representation to full self‑awareness.
- L0 – Non‑Self Model: Purely reactive, no self/non‑self distinction.
- L1 – Basic Self‑Awareness: Static physical self, short‑term memory, basic collision judgments.
- L2 – Basic Self‑Adaptation: Dynamic self‑environment coupling, generalized forward prediction.
- L3 – Socialized Self: Role‑aware behavior, social memory, recognition of others’ intentions.
- L4 – Sustained Self‑Evolution: Autobiographical memory, metacognitive monitoring, value‑oriented iteration.
- L5 – Full Self‑Awareness: Worldview and ethical reasoning, hierarchically organized decision‑making.
This hierarchy provides an operational taxonomy for evaluating progress toward autonomous, self‑aware embodied systems.
The paper “Self Model for Embodied Artificial Intelligence” presents a systematic formulation of this framework, including a detailed L1 implementation and experimental validation. For the full technical details, please see the paper below.
📄 Series of Papers
Self Model for Embodied Artificial Intelligence
Shuqiang Jiang, Sixian Zhang, Shida Tao, Xihong Zhu, Tianliang Qi, Xinhang Song
Journal of Computer Science and Technology, 2026