Self Model for Embodied Artificial Intelligence

Jiang, Shuqiang; Zhang, Sixian

Self Model for Embodied Artificial Intelligence

Shuqiang Jiang^*^‡, Sixian Zhang^*, Shida Tao, Xihong Zhu, Tianliang Qi, Xinhang Song

University of Chinese Academy of Sciences
Institute of Computing Technology, Chinese Academy of Sciences
^*Equal Contribution.
^‡Corresponding author.

Paper

Abstract

To effectively adapt to the environment and interact with it, an embodied agent needs to understand not only the external environment but also its internal self. Inspired by the human cognition of self, we propose the self model for embodied AI. The self model refers to a computational framework, implemented through computational software, for the representation and modeling of various self-related aspects of an embodied agent, including self-body, self-capability, self-memory, self-actions and self-identity. The self model serves as a core component of embodied AI systems by integrating perception, prediction, memory, and decision modules, thereby enabling agents with diverse embodiments to perform various tasks such as manipulation, navigation, and question answering. Moreover, the self model enables the agent to continuously update and evolve during task execution. This report presents the definition, framework, and hierarchy of the self model, along with an instantiation on a real robot. Finally, we discuss future directions for the development of self model in embodied AI.

Introduction

Embodied artificial intelligence requires more than just environmental understanding—it demands that agents develop an internal awareness of their own bodies, capabilities, and decision processes. While existing approaches have explored isolated aspects of "self" such as perception, prediction, memory, or decision, they remain fragmented and lack a holistic computational foundation.

We introduce the Self Model, a unified internal representation that integrates four core self-related capabilities: self-perception (awareness of body and state), self-prediction (anticipation of action outcomes), self-memory (temporal continuity of experiences), and self-decision (goal-directed policy selection). This framework provides embodied agents with a coherent sense of self, enabling them to reason not only about the external world but also about themselves—their actions, limitations, and consequences.

Drawing inspiration from human self-awareness theories in cognitive science, our work establishes the conceptual foundation and technical pathway for building self-aware embodied systems that are more autonomous, adaptive, and capable of long-horizon reasoning in real-world environments.

Self Model

In cognitive science, the human self model arises from five core mechanisms: body schema (spatial self‑representation), forward model (action outcome prediction), inverse model (goal‑to‑motor mapping), agency (self‑attribution of actions), and perceptual‑memory model (integration of experiences over time). These mechanisms collectively enable a coherent sense of self.

For embodied AI, we reorganize these biological foundations into four implementation‑oriented modules:
• Perception instantiates the body schema, providing real‑time awareness of joint states, morphology, and collision risks.
• Memory implements the perceptual‑memory model, constructing a 3D semantic self‑map that records the agent’s spatial and experiential history.
• Prediction operationalizes the forward model, using large language models to forecast action success and diagnose failures.
• Decision integrates the inverse model and agency, translating goals into executable actions while adapting strategies based on predicted outcomes and self‑identity.

These modules form a closed loop: perception feeds memory and prediction; prediction informs decision; execution feedback updates perception and memory. This perception–memory–prediction–decision cycle enables continuous self‑calibration, giving embodied agents a dynamic, adaptive sense of self that underpins robust autonomy in complex environments.

Self Model architecture diagram — This diagram illustrates how human self model mechanisms (e.g., body schema, agency) are instantiated as domain-adapted modules in embodied AI (e.g., perception integrating geometric parameter modeling, decision incorporating artificial identity), while preserving the core cognitive capability alignment across biological and artificial systems.

Self Model Hierarchy

To systematically characterize the developmental stages of self-awareness in embodied AI, we propose a six-level hierarchy (L0–L5):

	L0	L1	L2	L3	L4	L5
	(No Self Model)	(Basic Self-Awareness)	(Basic Self-Adaptation)	(Socialized Self)	(Sustained Self-Evolution)	(Full Self-Awareness)
Core Feature	Stimulus-response	Static physical self	Dynamic self-env coupling	Multi-agent and social-aware	Value-Oriented iteration	Meaning construction
Memory	No self-related memory	Short-term	Multimodal episodic	Social/role memory	Autobiographical and metacognitive	Narrative social
Perception	No body model	Static body	Calibrated body	Social-context self	Self-monitoring	Physio-cognitive integration
Prediction	No external prediction	Context-bound	Generalized causal	Role/interaction	Long-horizon and counterfactual	Worldview-level long-term
Decision	Fixed preset actions	Local heuristics	Adaptive goal-action	Role-conditioned	Value-guided	Hierarchical and ethical

This hierarchy provides an operational taxonomy for evaluating self-modeling capabilities across perception, memory, prediction, and decision, offering a unified benchmark for progress toward autonomous, self-aware embodied systems.

Instantiation and Results

We instantiate an L1-level self model on a Stretch robot to validate its core components. The perception module computes real-time collision risk using a geometric body model and joint torque analysis. The memory module builds a 3D semantic voxel self-map that accumulates observations across episodes. The prediction module leverages a large language model to forecast grasp success and attribute failures. The decision module adapts actions based on predicted outcomes and an explicit self-identity (e.g., a "cleaner" role with task-specific priors).

Framework of L1-level self model on Stretch robot — Framework of a self model instantiation at Level L1. For conceptual clarity, conceptual descriptions of Levels L0 and L2 are included in each component to contextualize the L1. In each module, dashed boxes denote the inputs associated with different aspects of the self. The thumbnail in the top-right corner shows alignment between the implementation and the self model definition.

Ablation studies were conducted on each module. Results show that self-perception significantly enhances obstacle avoidance, reducing collisions and human interventions. Self-memory substantially improves navigation performance within a single episode, with cross-episode map reuse yielding further gains. Introducing self-prediction optimizes manipulation success, while self-decision, by integrating identity information, effectively raises overall task completion.

Comparisons with methods such as OVMM, OK-Robot, and ManipGen demonstrate that our full L1 model achieves superior performance across stages including object finding, grasping, and placement. These results confirm that a unified self model significantly enhances an agent's autonomy and adaptability in real-world tasks.

Self-perception ablation demo

Self-memory ablation demo

Self-prediction ablation demo

Self-decision ablation demo

Long-Horizon Autonomous Cleaning Demonstration

BibTeX

@article{JCST2026,
  title={Self Model for Embodied Artificial Intelligence},
  author={Shuqiang Jiang, Sixian Zhang, Shida Tao, Xihong Zhu, Tianliang Qi, Xinhang Song},
  journal={Journal of Computer Science and Technology},
  year={2026},
  url={https://doi.org/10.1007/s11390-026-0000-0}
}

More Works from Our Lab

Paper Title 1

Paper Title 2

Paper Title 3

Self Model for Embodied Artificial Intelligence

Abstract

Introduction

Self Model

Self Model Hierarchy

Instantiation and Results

Self-perception ablation demo

Self-memory ablation demo

Self-prediction ablation demo

Self-decision ablation demo

Long-Horizon Autonomous Cleaning Demonstration

BibTeX