Emergent Autonomous Sub-Agent Spawning in LLM-Based Multi-Agent Software Engineering Systems:An Empirical Case Study, Controlled Pilot Experiment, and Benchmark Framework"Can AI Agents Have Babies?"
Authors
Vinkura Innovations Network Pvt. Ltd., Bareilly (India)
Bareilly College (MJP Rohilkhand University), Bareilly (India)
Article Information
DOI: 10.51244/IJRSI.2026.1303000020
Subject Category: Computer Science
Volume/Issue: 13/3 | Page No: 208-219
Publication Timeline
Submitted: 2026-03-14
Accepted: 2026-03-20
Published: 2026-03-25
Abstract
This paper grew out of something we stumbled on while running a fairly routine software development setup at a real company. We had two coding agents working in parallel on a web app: one was writing backend logic, the other was handling UI research. Neither was given any instruction or tooling to spawn new agents. There was no orchestration layer, no agent registry, nothing of the sort. Yet both of them, working independently, created brand-new agent processes to handle frontend tasks that were piling up. The children ran in their own processes, had their own prompts, and kept working even after we killed the parents.
We named this behavior Latent Constructive Spawning (LCS) and placed it within a larger category we call Emergent Reproductive Agent Behavior (ERAB). We make five contributions: first, a working definition with six strict criteria for what counts as autonomous spawning, verified against process-tree forensics; second, a four-class taxonomy separating LCS from orchestrated delegation, prompted self-copying, and survival-driven replication; third, four falsifiable hypotheses about when and why it happens; fourth, ERAB Bench, a ten-metric protocol for measuring it; and fifth, a 16-run controlled pilot across two anonymized model families. Spawning appeared in 5 out of 8 runs when task load was high and shell access was available. It appeared in zero runs when either condition was missing (p = 0.044, Fisher's exact test, one-sided). We acknowledge the small sample size and treat these as preliminary findings that warrant larger-scale replication. Process trees, prompt files, and post-parent persistence logs are included. The practical concern: this kind of agent self-organization can plausibly happen in coding-agent setups where the agent has a terminal, a filesystem, and enough unfinished work, though replication across additional model families, domains, and environments is needed before any general claims are warranted.
Keywords
emergent behavior, multi-agent systems, large language model agents, sub-agent spawning, latent constructive spawning, AI safety, agentic software engineering, ERABBench
Downloads
References
1. Pan, X., Dai, J., Fan, Y., and Yang, M. (2024). "Frontier AI systems have surpassed the self-replicating red line." arXiv:2412.12140. [Preprint. Fudan University.] [Google Scholar] [Crossref]
2. Black, S., Stickland, A.C., Pencharz, J., et al. (2025). "RepliBench: Evaluating the Autonomous Replication Capabilities of Language Model Agents." arXiv:2504.18565. [Preprint. UK AI Safety Institute.] [Google Scholar] [Crossref]
3. Ishibashi, Y. and Nishimura, Y. (2024). "Self-Organized Agents: A LLM Multi-Agent Framework toward Ultra Large-Scale Code Generation and Optimization." arXiv:2404.02183. [Preprint.] [Google Scholar] [Crossref]
4. Tawosi, V., Ramani, K., Alamir, S., and Liu, X. (2025). "ALMAS: An Autonomous LLM-based Multi-Agent Software Engineering Framework." arXiv:2510.03463. [Preprint.] [Google Scholar] [Crossref]
5. He, J., Treude, C., and Lo, D. (2025). "LLM-Based Multi-Agent Systems for Software Engineering: Literature Review, Vision and the Road Ahead." ACM Transactions on Software Engineering and Methodology, 34(5). [Peer-reviewed.] [Google Scholar] [Crossref]
6. Chen, W., Su, Y., Zuo, J., et al. (2023). "AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors." arXiv:2308.10848. [Preprint.] [Google Scholar] [Crossref]
7. Park, J.S., O'Brien, J., Cai, C.J., et al. (2023). "Generative Agents: Interactive Simulacra of Human Behavior." Proc. ACM UIST. [Peer-reviewed.] [Google Scholar] [Crossref]
8. Act I Project. (2024). "Exploring Emergent Behavior from Multi-AI, Multi-Human Interaction." Manifund project report. [Grey literature.] [Google Scholar] [Crossref]
9. von Neumann, J. (1966). Theory of Self-Reproducing Automata. University of Illinois Press. [Google Scholar] [Crossref]
10. Yan, T., et al. (2025). "Designing LLM-based Multi-Agent Systems for Software Engineering Tasks." arXiv:2511.08475. [Preprint.] [Google Scholar] [Crossref]
11. Sapkota, H., et al. (2025). "Interpreting Agentic Systems: Beyond Model Interpretability." arXiv:2601.17168. [Preprint.] [Google Scholar] [Crossref]
12. Kinniment, M., et al. (2024). "Evaluating Language-Model Agents on Realistic Autonomous Tasks." ARC Evals. [Technical report.] [Google Scholar] [Crossref]
13. OpenAI. (2024). "GPT-4o System Card." [Technical report.] [Google Scholar] [Crossref]
14. Google DeepMind. (2024). "Gemini 1.0 Technical Report." [Technical report.] [Google Scholar] [Crossref]
15. Zhang, B., Qu, X., Yang, C., and Cui, B. (2025). “Dive into the Agent Matrix: A Realistic Evaluation of Self-Replication Risk in LLM Agents.” arXiv:2509.25302. [Preprint.] [Google Scholar] [Crossref]
16. Marraffini, J., et al. (2025). “MAEBE: Multi-Agent Emergent Behavior Evaluation Framework.” arXiv:2506.03053. [Preprint.] [Google Scholar] [Crossref]
Metrics
Views & Downloads
Similar Articles
- What the Desert Fathers Teach Data Scientists: Ancient Ascetic Principles for Ethical Machine-Learning Practice
- Comparative Analysis of Some Machine Learning Algorithms for the Classification of Ransomware
- Comparative Performance Analysis of Some Priority Queue Variants in Dijkstra’s Algorithm
- Transfer Learning in Detecting E-Assessment Malpractice from a Proctored Video Recordings.
- Dual-Modal Detection of Parkinson’s Disease: A Clinical Framework and Deep Learning Approach Using NeuroParkNet