Embedding Large Language Models into a layered Social-BDI cognitive architecture to build agents with explicit cognitive states and traceable decision chains, driven by Hawkes self-exciting point processes for high-fidelity public opinion simulation.
"All models are wrong, but some are useful." — George E. P. Box
Existing LLM-driven social simulation frameworks treat models as end-to-end behavior mappers without explicit cognitive states. POSIM addresses this by embedding LLMs into a structured cognitive architecture.
| Platform | Explicit Cognitive Modeling | Validation (M/P/S) | Real-Case Intervention | LLM Multi-Type Agents | Temporal Precision | Modular Design |
|---|---|---|---|---|---|---|
| S3 | ✗ | ✗/✓/✓ | ✗ | ✗ | ★★★ | ★★★ |
| HiSim | ✗ | ✗/✗/✓ | ✗ | ✗ | ★★ | ★★ |
| GA-S3 | ✗ | ✗/✗/✓ | ✗ | ✓ | ★★★ | ★★ |
| SPARK | ✗ | ✗/✓/✗ | ✗ | ✓ | ★★ | ★★ |
| FDE-LLM | ✗ | ✗/✗/✓ | ✗ | ✗ | ★★ | ★★ |
| TrendSim | ✗ | ✓/✗/✗ | ✗ | ✓ | ★★★★ | ★★★ |
| OASIS | ✗ | ✗/✓/✓ | ✗ | ✗ | ★★★★ | ★★★★ |
| LMAgent | ✗ | ✗/✗/✓ | ✗ | ✓ | ★★ | ★★ |
| POSIM (Ours) | ✓ | ✓/✓/✓ | ✓ | ✓ | ★★★★★ | ★★★★★ |
M = Mechanism validation; P = Phenomenon validation; S = Statistical validation.
Embeds LLMs within a layered cognitive framework (Perception → Belief → Desire → Intention → Action), incorporating emotional arousal and cognitive biases. Three cognitive subsystems are each powered by independent LLM calls, communicating through structured intermediate states. The entire behavioral generation process is fully traceable.
Hawkes self-exciting point processes jointly model exogenous event shocks and endogenous user interactions, combined with circadian rhythm modulation, reproducing non-stationary activity patterns at minute-level temporal resolution.
Drawing on classical V&V methodology: micro-level behavioral mechanism calibration → macro-level emergent phenomenon verification → statistical result consistency alignment, building simulation credibility layer by layer.
Agents, simulation environment, and strategy evaluation communicate through standard interfaces — swap the cognitive architecture, change the time engine, or plug in new evaluation metrics without touching other modules.
POSIM consists of three core components: (1) Social-BDI Agents, (2) Hawkes process-driven simulation environment, and (3) Strategy evaluation module for counterfactual reasoning.
Figure 1. POSIM framework architecture. Left: Social-BDI agent cognitive pipeline. Upper-center: Hawkes process-driven simulation environment and virtual social media platform. Lower-right: Strategy evaluation module (Intervenor-Simulator-Evaluator).
The conditional intensity function models collective activity as the superposition of a background rate, exogenous event shocks (high intensity, slow decay), and endogenous user interactions (low intensity, fast decay):
$$\lambda(t) = \underbrace{\mu}_{\text{background}} + \underbrace{\sum \alpha_{ext} e^{-\beta_{ext}(t - t_i)}}_{\text{exogenous}} + \underbrace{\sum \alpha_{int} e^{-\beta_{int}(t - t_j)}}_{\text{endogenous}}$$
Extending the classical BDI architecture with an emotional dimension, building agents with explicit cognitive states and auditable multi-stage decision chains.
Gender, location, occupation, followers, verification type
Fixed (Personality Anchor)Conformity, paranoia, catharsis, curiosity-seeking patterns
Highly StableStance & reasoning on event entities, dynamically evolving
Dynamically Evolving6D emotion vector: happy, sad, angry, fear, surprise, disgust
Real-time FluctuationL1 — What to do & to whom: Select action type (like / repost / comment / original post) and target
L2 — How to express: Plan across 4 orthogonal dimensions: Emotion × Stance × Style × Narrative
L3 — What to say: Generate role-consistent social media text under L1+L2 constraints
Primary opinion participants
Colloquial, fragmented, emotion-driven. Impulsive expression under high arousal.
Key intermediary in two-step flow
Independent views, agenda-setting. Significant influence on downstream belief updates.
Information collection & dissemination
Formal, restrained, timely. Information confirmation at critical junctures.
Official stance & public governance
Low frequency, high authority. Post-fermentation statements with turning-point impact.
Three representative public opinion events from Sina Weibo spanning social controversy, campus incidents, and food safety. Simulation precision: 10 min/step.
An actress's earrings identified as ¥2.3M luxury goods, sparking intense public debate on celebrity extravagance.
Harassment allegation dispute at Wuhan University; court ruling reignited large-scale debate on justice and campus safety.
Internet celebrity publicly accused a restaurant chain of extensive prepared food use, sparking food safety concerns.
POSIM's behavioral, content, and topological metrics outperform the best baseline across three real-world Weibo datasets:
Cognitive-behavior chain consistency (0–5), personality stability (0–1), decision robustness (0–1)
Opinion lifecycle, multi-agent heterogeneity, emotional polarization, scale-free topology & cascade power-law
9 quantitative metrics across behavior (3), content (3), and topology (3) layers
500 randomly sampled users, 12 simulation rounds, four methods under identical conditions.
| Method | Cognitive-Behavior Chain (0–5) ↑ | Personality Stability (0–1) ↑ | Decision Robustness (0–1) ↑ |
|---|---|---|---|
| Direct-Nothink | 1.47 ± 0.50 | 0.478 ± 0.263 | 0.629 ± 0.240 |
| Direct-Think | 1.75 ± 0.43 | 0.448 ± 0.269 | 0.603 ± 0.299 |
| CoT | 3.09 ± 0.29 | 0.516 ± 0.272 | 0.541 ± 0.356 |
| Social-BDI (Ours) | 4.64 ± 0.48 | 0.661 ± 0.215 | 0.695 ± 0.213 |
Key Finding: CoT's decision robustness is actually the lowest (0.541) — without stable state anchoring, input perturbations ripple through the entire reasoning chain. Social-BDI's explicit belief states provide a cognitive anchoring effect, maintaining decision stability.
All macro phenomena emerged spontaneously from agent interactions — none were pre-programmed.
| Data | Method | JSD ↓ | Act.ρ ↑ | RMSE ↓ | Beh.Avg ↑ | Confr. ↑ | |ΔTTR| ↓ | |ΔS̄| ↓ | Cont.Avg ↑ | Net. ↑ | Casc. ↑ | PL ↑ | Topo.Avg ↑ |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| LE | Rule ABM | 0.427 | 0.808 | 0.158 | 0.741 | — | — | — | — | 0.479 | 0.633 | 0.543 | 0.552 |
| w/ LLM | 0.289 | 0.799 | 0.162 | 0.783 | 0.544 | 0.185 | 0.319 | 0.680 | 0.565 | 0.735 | 0.918 | 0.739 | |
| w/ CoT | 0.394 | 0.806 | 0.151 | 0.754 | 0.584 | 0.141 | 0.123 | 0.774 | 0.755 | 0.777 | 0.756 | 0.763 | |
| POSIM | 0.193 | 0.809 | 0.154 | 0.821 | 0.790 | 0.030 | 0.029 | 0.910 | 0.895 | 0.830 | 0.961 | 0.896 | |
| WL | Rule ABM | 0.318 | 0.681 | 0.126 | 0.746 | — | — | — | — | 0.869 | 0.696 | 0.642 | 0.736 |
| w/ LLM | 0.237 | 0.722 | 0.119 | 0.789 | 0.453 | 0.172 | 0.360 | 0.640 | 0.528 | 0.667 | 0.580 | 0.592 | |
| w/ CoT | 0.229 | 0.744 | 0.116 | 0.800 | 0.474 | 0.091 | 0.365 | 0.673 | 0.941 | 0.740 | 0.673 | 0.784 | |
| POSIM | 0.073 | 0.750 | 0.118 | 0.853 | 0.841 | 0.010 | 0.203 | 0.876 | 0.850 | 0.758 | 0.965 | 0.858 | |
| XF | Rule ABM | 0.312 | 0.664 | 0.187 | 0.721 | — | — | — | — | 0.528 | 0.614 | 0.279 | 0.474 |
| w/ LLM | 0.244 | 0.671 | 0.190 | 0.746 | 0.765 | 0.117 | 0.076 | 0.858 | 0.774 | 0.695 | 0.482 | 0.650 | |
| w/ CoT | 0.293 | 0.699 | 0.181 | 0.742 | 0.774 | 0.134 | 0.014 | 0.875 | 0.767 | 0.696 | 0.460 | 0.641 | |
| POSIM | 0.148 | 0.727 | 0.168 | 0.804 | 0.843 | 0.019 | 0.046 | 0.926 | 0.885 | 0.696 | 0.513 | 0.698 |
Ablation study on the LE dataset to verify the necessity of each module.
| Configuration | JSD ↓ | ρ ↑ | RMSE ↓ | Confr. ↑ | |ΔTTR| ↓ | |ΔS̄| ↓ | Net. ↑ | Casc. ↑ |
|---|---|---|---|---|---|---|---|---|
| Full POSIM | 0.193 | 0.809 | 0.154 | 0.790 | 0.030 | 0.029 | 0.895 | 0.830 |
| w/o Belief | 0.258 | 0.762 | 0.172 | 0.706 | 0.058 | 0.067 | 0.861 | 0.773 |
| w/o Desire | 0.267 | 0.779 | 0.169 | 0.682 | 0.071 | 0.083 | 0.853 | 0.788 |
| w/o Intention | 0.237 | 0.802 | 0.159 | 0.728 | 0.064 | 0.055 | 0.858 | 0.814 |
| w/o Hawkes | 0.177 | 0.235 | 0.362 | 0.787 | 0.207 | 0.028 | 0.822 | 0.754 |
Each component has a clear functional division:
Demonstrating POSIM as a computational experiment platform for cognitive priming and counterfactual strategy evaluation.
200 agents, 30 simulation steps. Two cognitive priming strategies applied at varying coverage rates (20%–100%).
Rational Cognition (RC): Negative emotion drops from 0.844 → 0.571 (−32.3%). Effect monotonically increases with coverage.
Empathy Priming (EP): Counterintuitive empathy paradox — negative emotion increases (0.878 vs 0.844). Deep understanding of others' suffering amplifies rather than mitigates negative sentiment.
Coverage crossing 60% shows a clear threshold effect — below this point, priming barely propagates through social networks.
Five PR strategies compared under identical external events on the Luxury-Earring dataset. The Intervenor-Simulator-Evaluator pipeline enables “what-if” analysis without real-world deployment.
| Strategy | Neg. Emotion ↓ | Anger ↓ | Intensity ↓ |
|---|---|---|---|
| Actual Response | 0.792 | 0.791 | 0.685 |
| Swift Apology | 0.749 | 0.749 | 0.612 |
| Proactive Transparency | 0.773 | 0.773 | 0.645 |
| Consumer Dialogue | 0.744 | 0.743 | 0.598 |
| Strategic Silence | 0.831 | 0.831 | 0.702 |
Consumer Dialogue achieves the best results across all metrics (−6.1% neg. emotion vs actual response). Strategic Silence is the worst — inaction amplifies anger (+4.9%).
All strategies exhibit an immediate cooling & gradual rebound pattern: intervention triggers initial sentiment relief, but public opinion naturally restores toward baseline as new events unfold.
posim/ ├── posim/ # Core framework │ ├── agents/ # Agent module │ │ ├── base_agent.py # Base class (cognitive pipeline) │ │ ├── citizen_agent.py # Citizen agent │ │ ├── kol_agent.py # KOL agent │ │ ├── media_agent.py # Media agent │ │ │ └── government_agent.py # Government agent │ │ └── ebdi/ # Social-BDI cognitive architecture │ │ ├── belief/ # Belief subsystem │ │ ├── desire/ # Desire subsystem │ │ ├── intention/ # Intention subsystem (3-level CoT) │ │ └── memory/ # Streaming memory store │ ├── engine/ # Simulation engine │ │ ├── simulator.py # Main loop (async concurrent) │ │ ├── hawkes_process.py # Hawkes self-exciting process │ │ └── time_engine.py # Time engine (circadian) │ ├── environment/ # Virtual social media platform │ │ ├── recommendation.py # Content recommendation │ │ ├── social_network.py # Three-layer social network │ │ ├── hot_search.py # Trending topics │ │ └── event_queue.py # External event queue │ ├── evaluation/ # Evaluation framework │ ├── llm/ # LLM resource management │ ├── prompts/ # Prompt templates │ └── config/ # Configuration ├── scripts/ # Simulation & evaluation scripts ├── data/ # Datasets └── requirements.txt
If this work is helpful to your research, please cite:
@misc{zhang2026posimmultiagentsimulationframework,
title = {POSIM: A Multi-Agent Simulation Framework for
Social Media Public Opinion Evolution and Governance},
author = {Yongmao Zhang and Kai Qiao and Zhengyan Wang and
Ningning Liang and Dekui Ma and Wenyao Sun and
Jian Chen and Bin Yan},
year = {2026},
eprint = {2603.23884},
archivePrefix = {arXiv},
primaryClass = {cs.GL},
url = {https://arxiv.org/abs/2603.23884}
}