TESTING_V2.md

# Testing Autonomous System V2

## Quick Start Guide

### Step 1: Enable V2 System (Optional - Test Mode)

The V2 system can run **alongside** V1 for comparison. To enable it:

**Option A: Edit `bot.py` to start V2 on bot ready**

Add this to the `on_ready()` function in `bot/bot.py`:

```python
# After existing setup code, add:
from utils.autonomous_v2_integration import start_v2_system_for_all_servers

# Start V2 autonomous system
await start_v2_system_for_all_servers(client)
```

**Option B: Manual API testing (no code changes needed)**

Just use the API endpoints to check what V2 is thinking, without actually running it.

### Step 2: Test the V2 Decision System

#### Check what V2 is "thinking" for a server:

```bash
# Get current social stats
curl http://localhost:3939/autonomous/v2/stats/<GUILD_ID>

# Example response:
{
  "status": "ok",
  "guild_id": 759889672804630530,
  "stats": {
    "loneliness": "0.42",
    "boredom": "0.65",
    "excitement": "0.15",
    "curiosity": "0.20",
    "chattiness": "0.70",
    "action_urgency": "0.48"
  }
}
```

#### Trigger a manual V2 analysis:

```bash
# See what V2 would decide right now
curl http://localhost:3939/autonomous/v2/check/<GUILD_ID>

# Example response:
{
  "status": "ok",
  "guild_id": 759889672804630530,
  "analysis": {
    "stats": { ... },
    "interest_score": "0.73",
    "triggers": [
      "KEYWORD_DETECTED (0.60): Interesting keywords: vocaloid, miku",
      "CONVERSATION_PEAK (0.60): Lots of people are chatting"
    ],
    "recent_messages": 15,
    "conversation_active": true,
    "would_call_llm": true
  }
}
```

#### Get overall V2 status:

```bash
# See V2 status for all servers
curl http://localhost:3939/autonomous/v2/status

# Example response:
{
  "status": "ok",
  "servers": {
    "759889672804630530": {
      "server_name": "Example Server",
      "loop_running": true,
      "action_urgency": "0.52",
      "loneliness": "0.30",
      "boredom": "0.45",
      "excitement": "0.20",
      "chattiness": "0.70"
    }
  }
}
```

### Step 3: Monitor Behavior

#### Watch for V2 log messages:

```bash
docker compose logs -f bot | grep -E "🧠|🎯|🤔"
```

You'll see messages like:
```
🧠 Starting autonomous decision loop for server 759889672804630530
🎯 Interest score 0.73 - Consulting LLM for server 759889672804630530
🤔 LLM decision: YES, someone mentioned you (Interest: 0.73)
```

#### Compare V1 vs V2:

**V1 logs:**
```
💬 Miku said something general in #miku-chat
```

**V2 logs:**
```
🎯 Interest score 0.82 - Consulting LLM
🤔 LLM decision: YES
💬 Miku said something general in #miku-chat
```

### Step 4: Tune the System

Edit `bot/utils/autonomous_v2.py` to adjust behavior:

```python
# How sensitive is the decision system?
self.LLM_CALL_THRESHOLD = 0.6  # Lower = more responsive (more LLM calls)
self.ACTION_THRESHOLD = 0.5     # Lower = more chatty

# How fast do stats build?
LONELINESS_BUILD_RATE = 0.01    # Higher = gets lonely faster
BOREDOM_BUILD_RATE = 0.01       # Higher = gets bored faster

# Check intervals
MIN_SLEEP = 30    # Seconds between checks during active chat
MAX_SLEEP = 180   # Seconds between checks when quiet
```

### Step 5: Understanding the Stats

#### Loneliness (0.0 - 1.0)
- **Increases**: When not mentioned for >30 minutes
- **Decreases**: When mentioned, engaged
- **Effect**: At 0.7+, seeks attention

#### Boredom (0.0 - 1.0)
- **Increases**: When quiet, hasn't spoken in >1 hour
- **Decreases**: When shares content, conversation happens
- **Effect**: At 0.7+, likely to share tweets/content

#### Excitement (0.0 - 1.0)
- **Increases**: During active conversations
- **Decreases**: Fades over time (decays fast)
- **Effect**: Higher = more likely to jump into conversation

#### Curiosity (0.0 - 1.0)
- **Increases**: Interesting keywords detected
- **Decreases**: Fades over time
- **Effect**: High curiosity = asks questions

#### Chattiness (0.0 - 1.0)
- **Set by mood**:
  - excited/bubbly: 0.85-0.9
  - neutral: 0.5
  - shy/sleepy: 0.2-0.3
  - asleep: 0.0
- **Effect**: Base multiplier for all interactions

### Step 6: Trigger Examples

Test specific triggers by creating conditions:

#### Test MENTIONED trigger:
1. Mention @Miku in the autonomous channel
2. Check stats: `curl http://localhost:3939/autonomous/v2/check/<GUILD_ID>`
3. Should show: `"triggers": ["MENTIONED (0.90): Someone mentioned me!"]`

#### Test KEYWORD trigger:
1. Say "I love Vocaloid music" in channel
2. Check stats
3. Should show: `"triggers": ["KEYWORD_DETECTED (0.60): Interesting keywords: vocaloid, music"]`

#### Test CONVERSATION_PEAK:
1. Have 3+ people chat within 5 minutes
2. Check stats
3. Should show: `"triggers": ["CONVERSATION_PEAK (0.60): Lots of people are chatting"]`

#### Test LONELINESS:
1. Don't mention Miku for 30+ minutes
2. Check stats: `curl http://localhost:3939/autonomous/v2/stats/<GUILD_ID>`
3. Watch loneliness increase over time

### Step 7: Debugging

#### V2 won't start?
```bash
# Check if import works
docker compose exec bot python -c "from utils.autonomous_v2 import autonomous_system_v2; print('OK')"
```

#### V2 never calls LLM?
```bash
# Check interest scores
curl http://localhost:3939/autonomous/v2/check/<GUILD_ID>

# If interest_score is always < 0.6:
# - Channel might be too quiet
# - Stats might not be building
# - Try mentioning Miku (instant 0.9 score)
```

#### V2 calls LLM too much?
```bash
# Increase threshold in autonomous_v2.py:
self.LLM_CALL_THRESHOLD = 0.7  # Was 0.6
```

## Performance Monitoring

### Expected LLM Call Frequency

**Quiet server (few messages):**
- V1: ~10 random calls/day
- V2: ~2-5 targeted calls/day
- **GPU usage: LOWER with V2**

**Active server (100+ messages/day):**
- V1: ~10 random calls/day (same)
- V2: ~10-20 targeted calls/day (responsive to activity)
- **GPU usage: SLIGHTLY HIGHER, but much more relevant**

### Check GPU Usage

```bash
# Monitor GPU while bot is running
nvidia-smi -l 1

# V1: GPU spikes randomly every 15 minutes
# V2: GPU spikes only when something interesting happens
```

### Monitor LLM Queue

If you notice lag:
1. Check how many LLM calls are queued
2. Increase `LLM_CALL_THRESHOLD` to reduce frequency
3. Increase check intervals for quieter periods

## Migration Path

### Phase 1: Testing (Current)
- V1 running (scheduled actions)
- V2 running (parallel, logging decisions)
- Compare behaviors
- Tune V2 parameters

### Phase 2: Gradual Replacement
```python
# In server_manager.py, comment out V1 jobs:
# scheduler.add_job(
#     self._run_autonomous_for_server,
#     IntervalTrigger(minutes=15),
#     ...
# )

# Keep V2 running
autonomous_system_v2.start_loop_for_server(guild_id, client)
```

### Phase 3: Full Migration
- Disable all V1 autonomous jobs
- Keep only V2 system
- Keep manual triggers for testing

## Troubleshooting

### "Module not found: autonomous_v2"
```bash
# Restart the bot container
docker compose restart bot
```

### "Stats always show 0.00"
- V2 decision loop might not be running
- Check: `curl http://localhost:3939/autonomous/v2/status`
- Should show: `"loop_running": true`

### "Interest score always low"
- Channel might be genuinely quiet
- Try creating activity: post messages, images, mention Miku
- Loneliness/boredom build over time (30-60 min)

### "LLM called too frequently"
- Increase thresholds in `autonomous_v2.py`
- Check which triggers are firing: use `/autonomous/v2/check`
- Adjust trigger scores if needed

## API Endpoints Reference

```
GET  /autonomous/v2/stats/{guild_id}  - Get social stats
GET  /autonomous/v2/check/{guild_id}  - Manual analysis (what would V2 do?)
GET  /autonomous/v2/status             - V2 status for all servers
```

## Next Steps

1. Run V2 for 24-48 hours
2. Compare decision quality vs V1
3. Tune thresholds based on server activity
4. Gradually phase out V1 if V2 works well
5. Add dashboard for real-time stats visualization
Initial commit: Miku Discord Bot 2025-12-07 17:15:09 +02:00			`# Testing Autonomous System V2`

			`## Quick Start Guide`

			`### Step 1: Enable V2 System (Optional - Test Mode)`

			`The V2 system can run alongside V1 for comparison. To enable it:`

			Option A: Edit `bot.py` to start V2 on bot ready

			Add this to the `on_ready()` function in `bot/bot.py`:

			```python
			`# After existing setup code, add:`
			`from utils.autonomous_v2_integration import start_v2_system_for_all_servers`

			`# Start V2 autonomous system`
			`await start_v2_system_for_all_servers(client)`
			```

			`Option B: Manual API testing (no code changes needed)`

			`Just use the API endpoints to check what V2 is thinking, without actually running it.`

			`### Step 2: Test the V2 Decision System`

			`#### Check what V2 is "thinking" for a server:`

			```bash
			`# Get current social stats`
			`curl http://localhost:3939/autonomous/v2/stats/<GUILD_ID>`

			`# Example response:`
			`{`
			`"status": "ok",`
			`"guild_id": 759889672804630530,`
			`"stats": {`
			`"loneliness": "0.42",`
			`"boredom": "0.65",`
			`"excitement": "0.15",`
			`"curiosity": "0.20",`
			`"chattiness": "0.70",`
			`"action_urgency": "0.48"`
			`}`
			`}`
			```

			`#### Trigger a manual V2 analysis:`

			```bash
			`# See what V2 would decide right now`
			`curl http://localhost:3939/autonomous/v2/check/<GUILD_ID>`

			`# Example response:`
			`{`
			`"status": "ok",`
			`"guild_id": 759889672804630530,`
			`"analysis": {`
			`"stats": { ... },`
			`"interest_score": "0.73",`
			`"triggers": [`
			`"KEYWORD_DETECTED (0.60): Interesting keywords: vocaloid, miku",`
			`"CONVERSATION_PEAK (0.60): Lots of people are chatting"`
			`],`
			`"recent_messages": 15,`
			`"conversation_active": true,`
			`"would_call_llm": true`
			`}`
			`}`
			```

			`#### Get overall V2 status:`

			```bash
			`# See V2 status for all servers`
			`curl http://localhost:3939/autonomous/v2/status`

			`# Example response:`
			`{`
			`"status": "ok",`
			`"servers": {`
			`"759889672804630530": {`
			`"server_name": "Example Server",`
			`"loop_running": true,`
			`"action_urgency": "0.52",`
			`"loneliness": "0.30",`
			`"boredom": "0.45",`
			`"excitement": "0.20",`
			`"chattiness": "0.70"`
			`}`
			`}`
			`}`
			```

			`### Step 3: Monitor Behavior`

			`#### Watch for V2 log messages:`

			```bash
			`docker compose logs -f bot \| grep -E "🧠\|🎯\|🤔"`
			```

			`You'll see messages like:`
			```
			`🧠 Starting autonomous decision loop for server 759889672804630530`
			`🎯 Interest score 0.73 - Consulting LLM for server 759889672804630530`
			`🤔 LLM decision: YES, someone mentioned you (Interest: 0.73)`
			```

			`#### Compare V1 vs V2:`

			`V1 logs:`
			```
			`💬 Miku said something general in #miku-chat`
			```

			`V2 logs:`
			```
			`🎯 Interest score 0.82 - Consulting LLM`
			`🤔 LLM decision: YES`
			`💬 Miku said something general in #miku-chat`
			```

			`### Step 4: Tune the System`

			Edit `bot/utils/autonomous_v2.py` to adjust behavior:

			```python
			`# How sensitive is the decision system?`
			`self.LLM_CALL_THRESHOLD = 0.6 # Lower = more responsive (more LLM calls)`
			`self.ACTION_THRESHOLD = 0.5 # Lower = more chatty`

			`# How fast do stats build?`
			`LONELINESS_BUILD_RATE = 0.01 # Higher = gets lonely faster`
			`BOREDOM_BUILD_RATE = 0.01 # Higher = gets bored faster`

			`# Check intervals`
			`MIN_SLEEP = 30 # Seconds between checks during active chat`
			`MAX_SLEEP = 180 # Seconds between checks when quiet`
			```

			`### Step 5: Understanding the Stats`

			`#### Loneliness (0.0 - 1.0)`
			`- Increases: When not mentioned for >30 minutes`
			`- Decreases: When mentioned, engaged`
			`- Effect: At 0.7+, seeks attention`

			`#### Boredom (0.0 - 1.0)`
			`- Increases: When quiet, hasn't spoken in >1 hour`
			`- Decreases: When shares content, conversation happens`
			`- Effect: At 0.7+, likely to share tweets/content`

			`#### Excitement (0.0 - 1.0)`
			`- Increases: During active conversations`
			`- Decreases: Fades over time (decays fast)`
			`- Effect: Higher = more likely to jump into conversation`

			`#### Curiosity (0.0 - 1.0)`
			`- Increases: Interesting keywords detected`
			`- Decreases: Fades over time`
			`- Effect: High curiosity = asks questions`

			`#### Chattiness (0.0 - 1.0)`
			`- Set by mood:`
			`- excited/bubbly: 0.85-0.9`
			`- neutral: 0.5`
			`- shy/sleepy: 0.2-0.3`
			`- asleep: 0.0`
			`- Effect: Base multiplier for all interactions`

			`### Step 6: Trigger Examples`

			`Test specific triggers by creating conditions:`

			`#### Test MENTIONED trigger:`
			`1. Mention @Miku in the autonomous channel`
			2. Check stats: `curl http://localhost:3939/autonomous/v2/check/<GUILD_ID>`
			3. Should show: `"triggers": ["MENTIONED (0.90): Someone mentioned me!"]`

			`#### Test KEYWORD trigger:`
			`1. Say "I love Vocaloid music" in channel`
			`2. Check stats`
			3. Should show: `"triggers": ["KEYWORD_DETECTED (0.60): Interesting keywords: vocaloid, music"]`

			`#### Test CONVERSATION_PEAK:`
			`1. Have 3+ people chat within 5 minutes`
			`2. Check stats`
			3. Should show: `"triggers": ["CONVERSATION_PEAK (0.60): Lots of people are chatting"]`

			`#### Test LONELINESS:`
			`1. Don't mention Miku for 30+ minutes`
			2. Check stats: `curl http://localhost:3939/autonomous/v2/stats/<GUILD_ID>`
			`3. Watch loneliness increase over time`

			`### Step 7: Debugging`

			`#### V2 won't start?`
			```bash
			`# Check if import works`
			`docker compose exec bot python -c "from utils.autonomous_v2 import autonomous_system_v2; print('OK')"`
			```

			`#### V2 never calls LLM?`
			```bash
			`# Check interest scores`
			`curl http://localhost:3939/autonomous/v2/check/<GUILD_ID>`

			`# If interest_score is always < 0.6:`
			`# - Channel might be too quiet`
			`# - Stats might not be building`
			`# - Try mentioning Miku (instant 0.9 score)`
			```

			`#### V2 calls LLM too much?`
			```bash
			`# Increase threshold in autonomous_v2.py:`
			`self.LLM_CALL_THRESHOLD = 0.7 # Was 0.6`
			```

			`## Performance Monitoring`

			`### Expected LLM Call Frequency`

			`Quiet server (few messages):`
			`- V1: ~10 random calls/day`
			`- V2: ~2-5 targeted calls/day`
			`- GPU usage: LOWER with V2`

			`Active server (100+ messages/day):`
			`- V1: ~10 random calls/day (same)`
			`- V2: ~10-20 targeted calls/day (responsive to activity)`
			`- GPU usage: SLIGHTLY HIGHER, but much more relevant`

			`### Check GPU Usage`

			```bash
			`# Monitor GPU while bot is running`
			`nvidia-smi -l 1`

			`# V1: GPU spikes randomly every 15 minutes`
			`# V2: GPU spikes only when something interesting happens`
			```

			`### Monitor LLM Queue`

			`If you notice lag:`
			`1. Check how many LLM calls are queued`
			2. Increase `LLM_CALL_THRESHOLD` to reduce frequency
			`3. Increase check intervals for quieter periods`

			`## Migration Path`

			`### Phase 1: Testing (Current)`
			`- V1 running (scheduled actions)`
			`- V2 running (parallel, logging decisions)`
			`- Compare behaviors`
			`- Tune V2 parameters`

			`### Phase 2: Gradual Replacement`
			```python
			`# In server_manager.py, comment out V1 jobs:`
			`# scheduler.add_job(`
			`# self._run_autonomous_for_server,`
			`# IntervalTrigger(minutes=15),`
			`# ...`
			`# )`

			`# Keep V2 running`
			`autonomous_system_v2.start_loop_for_server(guild_id, client)`
			```

			`### Phase 3: Full Migration`
			`- Disable all V1 autonomous jobs`
			`- Keep only V2 system`
			`- Keep manual triggers for testing`

			`## Troubleshooting`

			`### "Module not found: autonomous_v2"`
			```bash
			`# Restart the bot container`
			`docker compose restart bot`
			```

			`### "Stats always show 0.00"`
			`- V2 decision loop might not be running`
			- Check: `curl http://localhost:3939/autonomous/v2/status`
			- Should show: `"loop_running": true`

			`### "Interest score always low"`
			`- Channel might be genuinely quiet`
			`- Try creating activity: post messages, images, mention Miku`
			`- Loneliness/boredom build over time (30-60 min)`

			`### "LLM called too frequently"`
			- Increase thresholds in `autonomous_v2.py`
			- Check which triggers are firing: use `/autonomous/v2/check`
			`- Adjust trigger scores if needed`

			`## API Endpoints Reference`

			```
			`GET /autonomous/v2/stats/{guild_id} - Get social stats`
			`GET /autonomous/v2/check/{guild_id} - Manual analysis (what would V2 do?)`
			`GET /autonomous/v2/status - V2 status for all servers`
			```

			`## Next Steps`

			`1. Run V2 for 24-48 hours`
			`2. Compare decision quality vs V1`
			`3. Tune thresholds based on server activity`
			`4. Gradually phase out V1 if V2 works well`
			`5. Add dashboard for real-time stats visualization`