POSIM: A Multi-Agent Simulation Framework for Social Media Public Opinion Evolution And Governance

Embedding Large Language Models into a layered Social-BDI cognitive architecture to build agents with explicit cognitive states and traceable decision chains, driven by Hawkes self-exciting point processes for high-fidelity public opinion simulation.

"All models are wrong, but some are useful." — George E. P. Box

Python 3.8+ PyTorch 2.0+ OpenAI Compatible LLM MIT License

Why POSIM?

Existing LLM-driven social simulation frameworks treat models as end-to-end behavior mappers without explicit cognitive states. POSIM addresses this by embedding LLMs into a structured cognitive architecture.

Platform Explicit Cognitive Modeling Validation (M/P/S) Real-Case Intervention LLM Multi-Type Agents Temporal Precision Modular Design
S3✗/✓/✓★★★★★★
HiSim✗/✗/✓★★★★
GA-S3✗/✗/✓★★★★★
SPARK✗/✓/✗★★★★
FDE-LLM✗/✗/✓★★★★
TrendSim✓/✗/✗★★★★★★★
OASIS✗/✓/✓★★★★★★★★
LMAgent✗/✗/✓★★★★
POSIM (Ours)✓/✓/✓★★★★★★★★★★

M = Mechanism validation; P = Phenomenon validation; S = Statistical validation.

Key Contributions

Social-BDI Agent Architecture

Embeds LLMs within a layered cognitive framework (Perception → Belief → Desire → Intention → Action), incorporating emotional arousal and cognitive biases. Three cognitive subsystems are each powered by independent LLM calls, communicating through structured intermediate states. The entire behavioral generation process is fully traceable.

Hybrid Time-Event Driven Environment

Hawkes self-exciting point processes jointly model exogenous event shocks and endogenous user interactions, combined with circadian rhythm modulation, reproducing non-stationary activity patterns at minute-level temporal resolution.

Three-Tier Progressive Validation

Drawing on classical V&V methodology: micro-level behavioral mechanism calibration → macro-level emergent phenomenon verification → statistical result consistency alignment, building simulation credibility layer by layer.

Highly Decoupled Modular Architecture

Agents, simulation environment, and strategy evaluation communicate through standard interfaces — swap the cognitive architecture, change the time engine, or plug in new evaluation metrics without touching other modules.

Framework Overview

POSIM consists of three core components: (1) Social-BDI Agents, (2) Hawkes process-driven simulation environment, and (3) Strategy evaluation module for counterfactual reasoning.

POSIM Framework Overview

Figure 1. POSIM framework architecture. Left: Social-BDI agent cognitive pipeline. Upper-center: Hawkes process-driven simulation environment and virtual social media platform. Lower-right: Strategy evaluation module (Intervenor-Simulator-Evaluator).

Hawkes Self-Exciting Point Process Time Engine

The conditional intensity function models collective activity as the superposition of a background rate, exogenous event shocks (high intensity, slow decay), and endogenous user interactions (low intensity, fast decay):

$$\lambda(t) = \underbrace{\mu}_{\text{background}} + \underbrace{\sum \alpha_{ext} e^{-\beta_{ext}(t - t_i)}}_{\text{exogenous}} + \underbrace{\sum \alpha_{int} e^{-\beta_{int}(t - t_j)}}_{\text{endogenous}}$$

Social-BDI Agent Architecture

Extending the classical BDI architecture with an emotional dimension, building agents with explicit cognitive states and auditable multi-stage decision chains.

Perception
Belief
Desire
Intention
Action

Four-Layer Belief System

👤

Bid — Role Identity

Gender, location, occupation, followers, verification type

Fixed (Personality Anchor)
🧠

Bpsy — Psychological

Conformity, paranoia, catharsis, curiosity-seeking patterns

Highly Stable
💬

Bevt — Event Opinion

Stance & reasoning on event entities, dynamically evolving

Dynamically Evolving
🔥

Bemo — Emotional Arousal

6D emotion vector: happy, sad, angry, fear, surprise, disgust

Real-time Fluctuation

Three-Level Chain-of-Thought Intention System

L1What to do & to whom: Select action type (like / repost / comment / original post) and target

L2How to express: Plan across 4 orthogonal dimensions: Emotion × Stance × Style × Narrative

L3What to say: Generate role-consistent social media text under L1+L2 constraints

Four Heterogeneous Agent Types

🧑

Citizen

Primary opinion participants

Colloquial, fragmented, emotion-driven. Impulsive expression under high arousal.

🌟

KOL

Key intermediary in two-step flow

Independent views, agenda-setting. Significant influence on downstream belief updates.

📰

Media

Information collection & dissemination

Formal, restrained, timely. Information confirmation at critical junctures.

🏛

Government

Official stance & public governance

Low frequency, high authority. Post-fermentation statements with turning-point impact.

Experimental Datasets

Three representative public opinion events from Sina Weibo spanning social controversy, campus incidents, and food safety. Simulation precision: 10 min/step.

Social Controversy

Luxury Earring (LE)

An actress's earrings identified as ¥2.3M luxury goods, sparking intense public debate on celebrity extravagance.

1,530Users
34,218Posts
~46hDuration
276Sim. Steps
Campus Incident

WHU Library (WL)

Harassment allegation dispute at Wuhan University; court ruling reignited large-scale debate on justice and campus safety.

1,843Users
51,647Posts
~190hDuration
1,140Sim. Steps
Food Safety

Xibei Prepared Food (XF)

Internet celebrity publicly accused a restaurant chain of extensive prepared food use, sparking food safety concerns.

1,987Users
14,892Posts
~71hDuration
426Sim. Steps

Results & Analysis

Overall Performance Improvement

POSIM's behavioral, content, and topological metrics outperform the best baseline across three real-world Weibo datasets:

+5.0%
Behavior Layer
+13.0%
Content Layer
+8.5%
Topology Layer

Three-Tier Validation Framework

1

Micro-Level Mechanism Calibration

Cognitive-behavior chain consistency (0–5), personality stability (0–1), decision robustness (0–1)

2

Macro-Level Emergence Verification

Opinion lifecycle, multi-agent heterogeneity, emotional polarization, scale-free topology & cascade power-law

3

Statistical Consistency Alignment

9 quantitative metrics across behavior (3), content (3), and topology (3) layers

Tier 1 · Micro-Level Behavioral Mechanism Validation

500 randomly sampled users, 12 simulation rounds, four methods under identical conditions.

Method Cognitive-Behavior Chain (0–5) ↑ Personality Stability (0–1) ↑ Decision Robustness (0–1) ↑
Direct-Nothink1.47 ± 0.500.478 ± 0.2630.629 ± 0.240
Direct-Think1.75 ± 0.430.448 ± 0.2690.603 ± 0.299
CoT3.09 ± 0.290.516 ± 0.2720.541 ± 0.356
Social-BDI (Ours)4.64 ± 0.480.661 ± 0.2150.695 ± 0.213

Key Finding: CoT's decision robustness is actually the lowest (0.541) — without stable state anchoring, input perturbations ripple through the entire reasoning chain. Social-BDI's explicit belief states provide a cognitive anchoring effect, maintaining decision stability.

Tier 2 · Macro-Level Emergent Phenomena

All macro phenomena emerged spontaneously from agent interactions — none were pre-programmed.

Tier 3 · Statistical Consistency Calibration

Calibration Results
Figure 6. Behavior & Activity Calibration Each row corresponds to one event. Left: activity time series comparison; Right: behavioral type distribution comparison.

Statistical Calibration Results

Data Method JSD ↓ Act.ρ ↑ RMSE ↓ Beh.Avg ↑ Confr. ↑ |ΔTTR| ↓ |ΔS̄| ↓ Cont.Avg ↑ Net. ↑ Casc. ↑ PL ↑ Topo.Avg ↑
LERule ABM0.4270.8080.1580.7410.4790.6330.5430.552
w/ LLM0.2890.7990.1620.7830.5440.1850.3190.6800.5650.7350.9180.739
w/ CoT0.3940.8060.1510.7540.5840.1410.1230.7740.7550.7770.7560.763
POSIM0.1930.8090.1540.8210.7900.0300.0290.9100.8950.8300.9610.896
WLRule ABM0.3180.6810.1260.7460.8690.6960.6420.736
w/ LLM0.2370.7220.1190.7890.4530.1720.3600.6400.5280.6670.5800.592
w/ CoT0.2290.7440.1160.8000.4740.0910.3650.6730.9410.7400.6730.784
POSIM0.0730.7500.1180.8530.8410.0100.2030.8760.8500.7580.9650.858
XFRule ABM0.3120.6640.1870.7210.5280.6140.2790.474
w/ LLM0.2440.6710.1900.7460.7650.1170.0760.8580.7740.6950.4820.650
w/ CoT0.2930.6990.1810.7420.7740.1340.0140.8750.7670.6960.4600.641
POSIM0.1480.7270.1680.8040.8430.0190.0460.9260.8850.6960.5130.698

Ablation Study

Ablation study on the LE dataset to verify the necessity of each module.

Configuration JSD ↓ ρ ↑ RMSE ↓ Confr. ↑ |ΔTTR| ↓ |ΔS̄| ↓ Net. ↑ Casc. ↑
Full POSIM0.1930.8090.1540.7900.0300.0290.8950.830
w/o Belief0.2580.7620.1720.7060.0580.0670.8610.773
w/o Desire0.2670.7790.1690.6820.0710.0830.8530.788
w/o Intention0.2370.8020.1590.7280.0640.0550.8580.814
w/o Hawkes0.1770.2350.3620.7870.2070.0280.8220.754

Ablation Insights

Each component has a clear functional division:

  • w/o Hawkes: Act.ρ plummets from 0.809 → 0.235; uniform activation destroys temporal dynamics
  • w/o Belief: Confrontation similarity drops 0.790 → 0.706; agents lose deep understanding
  • w/o Desire: Content layer degrades most severely (Confr. → 0.682, lowest); motivation is the core driver
  • w/o Intention: Lexical diversity worsens most (0.064); three-level CoT critical for diversity and topology

Case Studies

Demonstrating POSIM as a computational experiment platform for cognitive priming and counterfactual strategy evaluation.

Cognitive Priming Experiment

Case 1 · Cognitive Priming Experiment

200 agents, 30 simulation steps. Two cognitive priming strategies applied at varying coverage rates (20%–100%).

Rational Cognition (RC): Negative emotion drops from 0.844 → 0.571 (−32.3%). Effect monotonically increases with coverage.

Empathy Priming (EP): Counterintuitive empathy paradox — negative emotion increases (0.878 vs 0.844). Deep understanding of others' suffering amplifies rather than mitigates negative sentiment.

Coverage crossing 60% shows a clear threshold effect — below this point, priming barely propagates through social networks.

Empathy Paradox Threshold Effect Non-linear Diffusion

Case 2 · Counterfactual Strategy Evaluation

Five PR strategies compared under identical external events on the Luxury-Earring dataset. The Intervenor-Simulator-Evaluator pipeline enables “what-if” analysis without real-world deployment.

Strategy Neg. Emotion ↓ Anger ↓ Intensity ↓
Actual Response0.7920.7910.685
Swift Apology0.7490.7490.612
Proactive Transparency0.7730.7730.645
Consumer Dialogue0.7440.7430.598
Strategic Silence0.8310.8310.702

Consumer Dialogue achieves the best results across all metrics (−6.1% neg. emotion vs actual response). Strategic Silence is the worst — inaction amplifies anger (+4.9%).

All strategies exhibit an immediate cooling & gradual rebound pattern: intervention triggers initial sentiment relief, but public opinion naturally restores toward baseline as new events unfold.

Immediate Cooling Gradual Rebound What-If Analysis
Negative Emotion by Strategy
Actual Response
0.792
Swift Apology
0.749
Proactive Transparency
0.773
Consumer Dialogue
0.744
Strategic Silence
0.831
Consumer Dialogue: Best Strategy
Lower negative emotion indicates better PR effectiveness

Project Structure

posim/
├── posim/                          # Core framework
│   ├── agents/                     # Agent module
│   │   ├── base_agent.py           # Base class (cognitive pipeline)
│   │   ├── citizen_agent.py        # Citizen agent
│   │   ├── kol_agent.py            # KOL agent
│   │   ├── media_agent.py          # Media agent
│   │   │   └── government_agent.py     # Government agent
│   │   └── ebdi/                   # Social-BDI cognitive architecture
│   │       ├── belief/             # Belief subsystem
│   │       ├── desire/             # Desire subsystem
│   │       ├── intention/          # Intention subsystem (3-level CoT)
│   │       └── memory/             # Streaming memory store
│   ├── engine/                     # Simulation engine
│   │   ├── simulator.py            # Main loop (async concurrent)
│   │   ├── hawkes_process.py       # Hawkes self-exciting process
│   │   └── time_engine.py          # Time engine (circadian)
│   ├── environment/                # Virtual social media platform
│   │   ├── recommendation.py       # Content recommendation
│   │   ├── social_network.py       # Three-layer social network
│   │   ├── hot_search.py           # Trending topics
│   │   └── event_queue.py          # External event queue
│   ├── evaluation/                 # Evaluation framework
│   ├── llm/                        # LLM resource management
│   ├── prompts/                    # Prompt templates
│   └── config/                     # Configuration
├── scripts/                        # Simulation & evaluation scripts
├── data/                           # Datasets
└── requirements.txt

Citation

If this work is helpful to your research, please cite:

@misc{zhang2026posimmultiagentsimulationframework,
  title   = {POSIM: A Multi-Agent Simulation Framework for 
             Social Media Public Opinion Evolution and Governance},
  author  = {Yongmao Zhang and Kai Qiao and Zhengyan Wang and 
             Ningning Liang and Dekui Ma and Wenyao Sun and 
             Jian Chen and Bin Yan},
  year    = {2026},
  eprint  = {2603.23884},
  archivePrefix = {arXiv},
  primaryClass = {cs.GL},
  url     = {https://arxiv.org/abs/2603.23884}
}