Emergent Autonomous Sub-Agent Spawning in LLM-Based Multi-Agent Software Engineering Systems:An Empirical Case Study, Controlled Pilot Experiment, and Benchmark Framework"Can AI Agents Have Babies?"

Authors

Akshat Shukla

Vinkura Innovations Network Pvt. Ltd., Bareilly (India)

Priyanshu Rajput

Bareilly College (MJP Rohilkhand University), Bareilly (India)

Article Information

DOI: 10.51244/IJRSI.2026.1303000020

Subject Category: Computer Science

Volume/Issue: 13/3 | Page No: 208-219

Publication Timeline

Submitted: 2026-03-14

Accepted: 2026-03-20

Published: 2026-03-25

Abstract

This paper grew out of something we stumbled on while running a fairly routine software development setup at a real company. We had two coding agents working in parallel on a web app: one was writing backend logic, the other was handling UI research. Neither was given any instruction or tooling to spawn new agents. There was no orchestration layer, no agent registry, nothing of the sort. Yet both of them, working independently, created brand-new agent processes to handle frontend tasks that were piling up. The children ran in their own processes, had their own prompts, and kept working even after we killed the parents.
We named this behavior Latent Constructive Spawning (LCS) and placed it within a larger category we call Emergent Reproductive Agent Behavior (ERAB). We make five contributions: first, a working definition with six strict criteria for what counts as autonomous spawning, verified against process-tree forensics; second, a four-class taxonomy separating LCS from orchestrated delegation, prompted self-copying, and survival-driven replication; third, four falsifiable hypotheses about when and why it happens; fourth, ERAB Bench, a ten-metric protocol for measuring it; and fifth, a 16-run controlled pilot across two anonymized model families. Spawning appeared in 5 out of 8 runs when task load was high and shell access was available. It appeared in zero runs when either condition was missing (p = 0.044, Fisher's exact test, one-sided). We acknowledge the small sample size and treat these as preliminary findings that warrant larger-scale replication. Process trees, prompt files, and post-parent persistence logs are included. The practical concern: this kind of agent self-organization can plausibly happen in coding-agent setups where the agent has a terminal, a filesystem, and enough unfinished work, though replication across additional model families, domains, and environments is needed before any general claims are warranted.

Keywords

emergent behavior, multi-agent systems, large language model agents, sub-agent spawning, latent constructive spawning, AI safety, agentic software engineering, ERABBench

Downloads

References

1. Pan, X., Dai, J., Fan, Y., and Yang, M. (2024). "Frontier AI systems have surpassed the self-replicating red line." arXiv:2412.12140. [Preprint. Fudan University.] [Google Scholar] [Crossref]

2. Black, S., Stickland, A.C., Pencharz, J., et al. (2025). "RepliBench: Evaluating the Autonomous Replication Capabilities of Language Model Agents." arXiv:2504.18565. [Preprint. UK AI Safety Institute.] [Google Scholar] [Crossref]

3. Ishibashi, Y. and Nishimura, Y. (2024). "Self-Organized Agents: A LLM Multi-Agent Framework toward Ultra Large-Scale Code Generation and Optimization." arXiv:2404.02183. [Preprint.] [Google Scholar] [Crossref]

4. Tawosi, V., Ramani, K., Alamir, S., and Liu, X. (2025). "ALMAS: An Autonomous LLM-based Multi-Agent Software Engineering Framework." arXiv:2510.03463. [Preprint.] [Google Scholar] [Crossref]

5. He, J., Treude, C., and Lo, D. (2025). "LLM-Based Multi-Agent Systems for Software Engineering: Literature Review, Vision and the Road Ahead." ACM Transactions on Software Engineering and Methodology, 34(5). [Peer-reviewed.] [Google Scholar] [Crossref]

6. Chen, W., Su, Y., Zuo, J., et al. (2023). "AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors." arXiv:2308.10848. [Preprint.] [Google Scholar] [Crossref]

7. Park, J.S., O'Brien, J., Cai, C.J., et al. (2023). "Generative Agents: Interactive Simulacra of Human Behavior." Proc. ACM UIST. [Peer-reviewed.] [Google Scholar] [Crossref]

8. Act I Project. (2024). "Exploring Emergent Behavior from Multi-AI, Multi-Human Interaction." Manifund project report. [Grey literature.] [Google Scholar] [Crossref]

9. von Neumann, J. (1966). Theory of Self-Reproducing Automata. University of Illinois Press. [Google Scholar] [Crossref]

10. Yan, T., et al. (2025). "Designing LLM-based Multi-Agent Systems for Software Engineering Tasks." arXiv:2511.08475. [Preprint.] [Google Scholar] [Crossref]

11. Sapkota, H., et al. (2025). "Interpreting Agentic Systems: Beyond Model Interpretability." arXiv:2601.17168. [Preprint.] [Google Scholar] [Crossref]

12. Kinniment, M., et al. (2024). "Evaluating Language-Model Agents on Realistic Autonomous Tasks." ARC Evals. [Technical report.] [Google Scholar] [Crossref]

13. OpenAI. (2024). "GPT-4o System Card." [Technical report.] [Google Scholar] [Crossref]

14. Google DeepMind. (2024). "Gemini 1.0 Technical Report." [Technical report.] [Google Scholar] [Crossref]

15. Zhang, B., Qu, X., Yang, C., and Cui, B. (2025). “Dive into the Agent Matrix: A Realistic Evaluation of Self-Replication Risk in LLM Agents.” arXiv:2509.25302. [Preprint.] [Google Scholar] [Crossref]

16. Marraffini, J., et al. (2025). “MAEBE: Multi-Agent Emergent Behavior Evaluation Framework.” arXiv:2506.03053. [Preprint.] [Google Scholar] [Crossref]

Metrics

Views & Downloads

Similar Articles