====== BOTTLENECK C1/C2 CONFIRMED: SYNCHRONOUS BLOCKING ======

TEST METHODOLOGY:
- Added detailed timing to broadcast_worker() loop
- Measured: Soprano wait time, RVC processing time, loop iteration time
- Test text: "This is a comprehensive test of the full Soprano plus RVC pipeline..."

RAW DATA:
- Total elapsed: 14.49s
- Blocks processed: 26
- Audio duration: 6.50s
- Realtime factor: 0.45x (should be 1.49x if parallel)
- Performance loss: 69.9%

TIME BREAKDOWN:
- Total RVC blocking time: 10.07s (69.4% of elapsed)
- Total Soprano waiting time: 4.38s (30.2% of elapsed)

CRITICAL FINDING - FIRST BLOCK PENALTY:
- Block 1 RVC: 6107.8ms (6.1 SECONDS!)
- Block 2 RVC: 163.2ms
- Block 3+ RVC: ~155-165ms each

ROOT CAUSE: ROCm/HIP kernel JIT compilation + first-time memory allocation
This 6.1s penalty is ONE-TIME per server start, not per request.

STEADY-STATE BEHAVIOR (blocks 2-26):
- Soprano wait per chunk: ~400ms (generation time)
- RVC processes 2-3 blocks per chunk: 320-480ms total
- Loop iteration: 720-880ms (400ms + 320-480ms)
- Pattern: SYNCHRONOUS - Soprano waits for RVC to complete all blocks

EXPECTED BEHAVIOR (65.8 - 48.3 = 17.5%, likely due to different test length)
If fixed: Loop iteration = max(400ms Soprano, 160ms RVC) = 400ms
Pipeline limited by Soprano at 1.49x realtime (isolated component speed)

CONFIRMATION OF HYPOTHESIS C1/C2:
✅ Soprano generation BLOCKS while waiting for RVC processing
✅ RVC processing BLOCKS Soprano from generating next chunk
✅ No overlap/parallelism between components
✅ Measured loop iteration (780ms avg) matches expected additive behavior (785ms)
✅ 69.4% of time spent in RVC blocking = Soprano sitting idle

NEXT STEP:
Implement async producer-consumer pattern:
1. Soprano generates chunks → queue (continuous, non-blocking)
2. RVC pulls from queue → processes (continuous, non-blocking)
3. Both components run in parallel

EXPECTED IMPROVEMENT:
Current: 0.45x realtime (with 6.1s first-block penalty)
Steady-state current: ~0.78x realtime (blocks 2-26 average)
After fix: ~1.49x realtime (limited by Soprano generation speed)
Improvement: 0.78x → 1.49x = +91% performance, achieves >1.0x target! ✅

