llama-swap: use pre-built images (:cuda, :rocm) with GPU-specific flags

- Drop custom Dockerfiles; docker-compose uses ghcr.io pre-built images which ship llama-swap + llama-server with no pinned versions (always latest) - NVIDIA GTX 1660 (6GB): add -fit off --no-kv-offload --cache-type-k q4_0 --cache-type-v q4_0 to fix OOM segfault with new llama.cpp b9014's GPU-side KV cache default - AMD RX 6800 (16GB): flags unchanged; KV cache stays on GPU for max speed - Both running llama-swap v211 + llama.cpp b9014 (2026-05-05)
fix: preserve collapsible subsection state across polling re-renders
2026-05-05 16:53:34 +03:00 · 2026-05-02 16:17:26 +03:00 · 2026-05-02 16:08:47 +03:00 · 2026-05-02 15:55:19 +03:00 · 2026-05-02 15:45:54 +03:00 · 2026-05-02 15:27:18 +03:00
14 changed files with 501 additions and 174 deletions
--- a/Dockerfile.llamaswap
+++ b/Dockerfile.llamaswap
@@ -1,13 +0,0 @@
 FROM ghcr.io/mostlygeek/llama-swap:cuda
 USER root
 # Download and install llama-server binary (CUDA version)
 # Using the official pre-built binary from llama.cpp releases
 ADD --chmod=755 https://github.com/ggml-org/llama.cpp/releases/download/b4183/llama-server-cuda /usr/local/bin/llama-server
 # Verify it's executable
 RUN llama-server --version || echo "llama-server installed successfully"
 USER 1000:1000
--- a/Dockerfile.llamaswap-rocm
+++ b/Dockerfile.llamaswap-rocm
@@ -1,68 +0,0 @@
 # Multi-stage build for llama-swap with ROCm support
 # Now using official llama.cpp ROCm image (PR #18439 merged Dec 29, 2025)
 # Stage 1: Build llama-swap UI
 FROM node:22-alpine AS ui-builder
 WORKDIR /build
 # Install git
 RUN apk add --no-cache git
 # Clone llama-swap
 RUN git clone https://github.com/mostlygeek/llama-swap.git
 # Build UI (now in ui-svelte directory)
 WORKDIR /build/llama-swap/ui-svelte
 RUN npm install && npm run build
 # Stage 2: Build llama-swap binary
 FROM golang:1.23-alpine AS swap-builder
 WORKDIR /build
 # Install git
 RUN apk add --no-cache git
 # Copy llama-swap source with built UI
 COPY --from=ui-builder /build/llama-swap /build/llama-swap
 # Build llama-swap binary
 WORKDIR /build/llama-swap
 RUN GOTOOLCHAIN=auto go build -o /build/llama-swap-binary .
 # Stage 3: Final runtime image using official llama.cpp ROCm image
 FROM ghcr.io/ggml-org/llama.cpp:server-rocm
 WORKDIR /app
 # Copy llama-swap binary from builder
 COPY --from=swap-builder /build/llama-swap-binary /app/llama-swap
    # Make binaries executable
    RUN chmod +x /app/llama-swap
    # Add existing ubuntu user (UID 1000) to GPU access groups (using host GIDs)
    # GID 187 = render group on host, GID 989 = video/kfd group on host
    RUN groupadd -g 187 hostrender && \
        groupadd -g 989 hostvideo && \
        usermod -aG hostrender,hostvideo ubuntu && \
        chown -R ubuntu:ubuntu /app
    # Set environment for ROCm (RX 6800 is gfx1030)
    ENV HSA_OVERRIDE_GFX_VERSION=10.3.0
    ENV ROCM_PATH=/opt/rocm
    ENV HIP_VISIBLE_DEVICES=0
    USER ubuntu
    # Expose port
    EXPOSE 8080
    # Health check
    HEALTHCHECK --interval=30s --timeout=10s --start-period=30s --retries=3 \
      CMD curl -f http://localhost:8080/health || exit 1
    # Override the base image's ENTRYPOINT and run llama-swap
    ENTRYPOINT []
    CMD ["/app/llama-swap", "-config", "/app/config.yaml", "-listen", "0.0.0.0:8080"]
--- a/bot/bot.py
+++ b/bot/bot.py
@@ -360,15 +360,24 @@ async def on_message(message):
                        if globals.EVIL_MODE:
                            effective_mood = f"EVIL:{getattr(globals, 'EVIL_DM_MOOD', 'evil_neutral')}"
                        logger.info(f"🐱 Cat response for {author_name} (mood: {effective_mood})")
-                        # Track Cat interaction for Web UI Last Prompt view
+                        # Track Cat interaction in unified prompt history
                        import datetime
-                        globals.LAST_CAT_INTERACTION = {
+                        globals._prompt_id_counter += 1
                        guild_name = message.guild.name if message.guild else "DM"
                        channel_name = message.channel.name if message.guild else "DM"
                        globals.PROMPT_HISTORY.append({
                            "id": globals._prompt_id_counter,
                            "source": "cat",
                            "full_prompt": cat_full_prompt,
-                            "response": response[:500] if response else "",
+                            "response": response if response else "",
                            "user": author_name,
                            "mood": effective_mood,
                            "guild": guild_name,
                            "channel": channel_name,
                            "timestamp": datetime.datetime.now().isoformat(),
-                        }
+                            "model": "Cat LLM",
                            "response_type": response_type,
                        })
                except Exception as e:
                    logger.warning(f"🐱 Cat pipeline error, falling back to query_llama: {e}")
                    response = None
--- a/bot/globals.py
+++ b/bot/globals.py
@@ -1,6 +1,7 @@
 # globals.py
 import os
 import discord
 from collections import deque
 from apscheduler.schedulers.asyncio import AsyncIOScheduler
 scheduler = AsyncIOScheduler()
@@ -77,16 +78,25 @@ MIKU_NORMAL_AVATAR_URL = None  # Cached CDN URL of the regular Miku pfp (valid e
 BOT_USER = None
-LAST_FULL_PROMPT = ""
+# Unified prompt history (replaces LAST_FULL_PROMPT and LAST_CAT_INTERACTION)
 # Each entry: {id, source, full_prompt, response, user, mood, guild, channel,
 #              timestamp, model, response_type}
 PROMPT_HISTORY = deque(maxlen=10)
 _prompt_id_counter = 0
-# Cheshire Cat last interaction tracking (for Web UI Last Prompt toggle)
+# Legacy accessors for backward compatibility (routes, CLI, etc.)
-LAST_CAT_INTERACTION = {
+# These are computed properties that read from PROMPT_HISTORY
-    "full_prompt": "",
+def _get_last_fallback_prompt():
-    "response": "",
+    for entry in reversed(PROMPT_HISTORY):
-    "user": "",
+        if entry.get("source") == "fallback":
-    "mood": "",
+            return entry.get("full_prompt", "")
-    "timestamp": "",
+    return ""
-}
+
 def _get_last_cat_interaction():
    for entry in reversed(PROMPT_HISTORY):
        if entry.get("source") == "cat":
            return entry
    return {"full_prompt": "", "response": "", "user": "", "mood": "", "timestamp": ""}
 # Persona Dialogue System (conversations between Miku and Evil Miku)
 LAST_PERSONA_DIALOGUE_TIME = 0  # Timestamp of last dialogue for cooldown
--- a/bot/routes/core.py
+++ b/bot/routes/core.py
@@ -14,7 +14,8 @@ router = APIRouter()
@router.get("/")
 def read_index():
-    return FileResponse("static/index.html")
+    headers = {"Cache-Control": "no-cache, no-store, must-revalidate"}
    return FileResponse("static/index.html", headers=headers)
@router.get("/logs")
@@ -31,18 +32,45 @@ def get_logs():
@router.get("/prompt")
 def get_last_prompt():
-    return {"prompt": globals.LAST_FULL_PROMPT or "No prompt has been issued yet."}
+    """Legacy endpoint: returns the most recent fallback prompt (backward compat)."""
    prompt_text = globals._get_last_fallback_prompt()
    return {"prompt": prompt_text or "No prompt has been issued yet."}
@router.get("/prompt/cat")
 def get_last_cat_prompt():
-    """Get the last Cheshire Cat interaction (full prompt + response) for Web UI."""
+    """Legacy endpoint: returns the most recent Cat interaction (backward compat)."""
-    interaction = globals.LAST_CAT_INTERACTION
+    interaction = globals._get_last_cat_interaction()
    if not interaction.get("full_prompt"):
-        return {"full_prompt": "No Cheshire Cat interaction has occurred yet.", "response": "", "user": "", "mood": "", "timestamp": ""}
+        return {"full_prompt": "No Cheshire Cat interaction has occurred yet.",
                "response": "", "user": "", "mood": "", "timestamp": ""}
    return interaction
@router.get("/prompts")
 def get_prompt_history(source: str = None):
    """
    Return the unified prompt history.
    Optional query param ?source=cat or ?source=fallback to filter.
    """
    history = list(globals.PROMPT_HISTORY)
    if source and source in ("cat", "fallback"):
        history = [e for e in history if e.get("source") == source]
    return {"history": history}
@router.get("/prompts/{prompt_id}")
 def get_prompt_by_id(prompt_id: int):
    """Return a single prompt history entry by ID."""
    for entry in globals.PROMPT_HISTORY:
        if entry.get("id") == prompt_id:
            return entry
    return JSONResponse(
        status_code=404,
        content={"status": "error", "message": f"Prompt #{prompt_id} not found"}
    )
@router.get("/status")
 def status():
    # Get per-server mood summary
--- a/bot/static/css/style.css
+++ b/bot/static/css/style.css
@@ -441,6 +441,51 @@ h1, h3 {
  color: #ddd;
 }
 /* Prompt History Section */
 #prompt-history-section.collapsed #prompt-history-body {
  display: none;
 }
 #prompt-history-toggle {
  user-select: none;
  transition: color 0.2s;
 }
 #prompt-history-toggle:hover {
  color: #4CAF50;
 }
 #prompt-metadata span {
  white-space: nowrap;
 }
 #prompt-metadata .prompt-meta-label {
  color: #666;
 }
 #prompt-metadata .prompt-meta-value {
  color: #ccc;
 }
 #prompt-display pre {
  margin: 0;
 }
 .prompt-subsection-header {
  cursor: pointer;
  user-select: none;
  padding: 0.3rem 0.5rem;
  border-radius: 4px;
  background: #2a2a2a;
  margin: 0.5rem 0 0.25rem 0;
  font-size: 0.82rem;
  color: #aaa;
  transition: background 0.15s;
 }
 .prompt-subsection-header:hover {
  background: #333;
  color: #ddd;
 }
 .prompt-subsection-body.collapsed {
  display: none;
 }
 #prompt-truncate-toggle {
  accent-color: #4CAF50;
 }
 /* Mood Activities Editor */
 .act-mood-row {
  margin-bottom: 0.5rem;
--- a/bot/static/index.html
+++ b/bot/static/index.html
@@ -3,10 +3,13 @@
 <head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <meta http-equiv="Cache-Control" content="no-cache, no-store, must-revalidate">
  <meta http-equiv="Pragma" content="no-cache">
  <meta http-equiv="Expires" content="0">
  <title>Miku Control Panel</title>
  <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/cropperjs/1.6.2/cropper.min.css">
  <script src="https://cdnjs.cloudflare.com/ajax/libs/cropperjs/1.6.2/cropper.min.js"></script>
-  <link rel="stylesheet" href="/static/css/style.css">
+  <link rel="stylesheet" href="/static/css/style.css?v=20260502">
 </head>
 <body>
@@ -543,23 +546,53 @@
        </div>
      </div>
-      <div class="section">
+      <div class="section" id="prompt-history-section">
-        <h3>Last Prompt</h3>
+        <div class="prompt-history-header" style="display: flex; align-items: center; justify-content: space-between; margin-bottom: 0.5rem;">
-        <div style="margin-bottom: 0.75rem; display: flex; align-items: center; gap: 0.75rem;">
+          <h3 style="margin: 0; cursor: pointer;" onclick="togglePromptHistoryCollapse()" id="prompt-history-toggle">
-          <label style="font-size: 0.9rem; color: #aaa;">Source:</label>
+            ▼ Prompt History
-          <div style="display: inline-flex; border-radius: 6px; overflow: hidden; border: 1px solid #444;">
+          </h3>
-            <button id="prompt-src-cat" class="prompt-source-btn active" onclick="switchPromptSource('cat')"
+          <button onclick="loadPromptHistory()" title="Refresh" style="background: none; border: 1px solid #444; color: #aaa; cursor: pointer; padding: 0.2rem 0.5rem; border-radius: 4px; font-size: 0.85rem;">🔄</button>
-              style="padding: 0.4rem 1rem; border: none; cursor: pointer; font-size: 0.85rem; transition: all 0.2s;">
+        </div>
-              🐱 Cheshire Cat
+        <div id="prompt-history-body">
-            </button>
+          <!-- Source filter + history selector row -->
-            <button id="prompt-src-fallback" class="prompt-source-btn" onclick="switchPromptSource('fallback')"
+          <div style="margin-bottom: 0.75rem; display: flex; align-items: center; gap: 0.75rem; flex-wrap: wrap;">
-              style="padding: 0.4rem 1rem; border: none; cursor: pointer; font-size: 0.85rem; transition: all 0.2s;">
+            <label style="font-size: 0.9rem; color: #aaa;">Source:</label>
-              🤖 Bot Fallback
+            <div style="display: inline-flex; border-radius: 6px; overflow: hidden; border: 1px solid #444;">
-            </button>
+              <button id="prompt-src-all" class="prompt-source-btn active" onclick="switchPromptSource('all')"
-          </div>
+                style="padding: 0.4rem 0.8rem; border: none; cursor: pointer; font-size: 0.85rem; transition: all 0.2s;">
                All
              </button>
              <button id="prompt-src-cat" class="prompt-source-btn" onclick="switchPromptSource('cat')"
                style="padding: 0.4rem 0.8rem; border: none; cursor: pointer; font-size: 0.85rem; transition: all 0.2s;">
                🐱 Cat
              </button>
              <button id="prompt-src-fallback" class="prompt-source-btn" onclick="switchPromptSource('fallback')"
                style="padding: 0.4rem 0.8rem; border: none; cursor: pointer; font-size: 0.85rem; transition: all 0.2s;">
                🤖 Fallback
              </button>
            </div>
            <select id="prompt-history-select" onchange="selectPromptEntry(this.value)" style="background: #2a2a2a; color: #ddd; border: 1px solid #444; padding: 0.35rem 0.5rem; border-radius: 4px; font-size: 0.85rem; min-width: 280px;">
              <option value="">-- No prompts yet --</option>
            </select>
          </div>
          <!-- Metadata bar -->
          <div id="prompt-metadata" style="margin-bottom: 0.5rem; font-size: 0.82rem; color: #888; display: flex; flex-wrap: wrap; gap: 0.3rem 1rem;"></div>
          <!-- Toolbar: copy + truncate toggle -->
          <div style="margin-bottom: 0.5rem; display: flex; align-items: center; gap: 1rem;">
            <button onclick="copyPromptToClipboard()" title="Copy full prompt to clipboard" style="background: #333; border: 1px solid #555; color: #aaa; cursor: pointer; padding: 0.25rem 0.6rem; border-radius: 4px; font-size: 0.8rem;">📋 Copy</button>
            <label style="font-size: 0.82rem; color: #aaa; cursor: pointer; display: flex; align-items: center; gap: 0.3rem;">
              <input type="checkbox" id="prompt-truncate-toggle" onchange="toggleMiddleTruncation()">
              Truncate from middle
            </label>
          </div>
          <!-- Prompt display subsections -->
          <div id="prompt-display" style="max-height: 60vh; overflow-y: auto; min-height: 3rem;"></div>
          <!-- Hidden buffer for copy-to-clipboard raw text -->
          <pre id="last-prompt" style="display: none;"></pre>
        </div>
        <div id="prompt-cat-info" style="margin-bottom: 0.5rem; font-size: 0.85rem; color: #aaa;"></div>
        <pre id="last-prompt" style="white-space: pre-wrap; word-break: break-word;"></pre>
      </div>
    </div>
@@ -1339,15 +1372,15 @@
  </div>
 </div>
-<script src="/static/js/core.js"></script>
+<script src="/static/js/core.js?v=20260502"></script>
-<script src="/static/js/servers.js"></script>
+<script src="/static/js/servers.js?v=20260502"></script>
-<script src="/static/js/modes.js"></script>
+<script src="/static/js/modes.js?v=20260502"></script>
-<script src="/static/js/actions.js"></script>
+<script src="/static/js/actions.js?v=20260502"></script>
-<script src="/static/js/image-gen.js"></script>
+<script src="/static/js/image-gen.js?v=20260502"></script>
-<script src="/static/js/status.js"></script>
+<script src="/static/js/status.js?v=20260502"></script>
-<script src="/static/js/dm.js"></script>
+<script src="/static/js/dm.js?v=20260502"></script>
-<script src="/static/js/chat.js"></script>
+<script src="/static/js/chat.js?v=20260502"></script>
-<script src="/static/js/memories.js"></script>
+<script src="/static/js/memories.js?v=20260502"></script>
-<script src="/static/js/profile.js"></script>
+<script src="/static/js/profile.js?v=20260502"></script>
 </body>
 </html>
--- a/bot/static/js/core.js
+++ b/bot/static/js/core.js
@@ -29,6 +29,7 @@ let notificationTimer = null;
 let statusInterval = null;
 let logsInterval = null;
 let argsInterval = null;
 let promptInterval = null;
 // Mood emoji mapping
 const MOOD_EMOJIS = {
@@ -211,12 +212,14 @@ function startPolling() {
  if (!statusInterval) statusInterval = setInterval(loadStatus, 10000);
  if (!logsInterval) logsInterval = setInterval(loadLogs, 5000);
  if (!argsInterval) argsInterval = setInterval(loadActiveArguments, 5000);
  if (!promptInterval) promptInterval = setInterval(loadPromptHistory, 10000);
 }
 function stopPolling() {
  clearInterval(statusInterval); statusInterval = null;
  clearInterval(logsInterval); logsInterval = null;
  clearInterval(argsInterval); argsInterval = null;
  clearInterval(promptInterval); promptInterval = null;
 }
 // ============================================================================
@@ -248,7 +251,7 @@ function initVisibilityPolling() {
      stopPolling();
      console.log('⏸ Tab hidden — polling paused');
    } else {
-      loadStatus(); loadLogs(); loadActiveArguments();
+      loadStatus(); loadLogs(); loadActiveArguments(); loadPromptHistory();
      startPolling();
      console.log('▶️ Tab visible — polling resumed');
    }
@@ -296,9 +299,11 @@ function initModalAccessibility() {
 }
 function initPromptSourceToggle() {
-  const saved = localStorage.getItem('miku-prompt-source') || 'cat';
+  const saved = localStorage.getItem('miku-prompt-source') || 'all';
  document.querySelectorAll('.prompt-source-btn').forEach(btn => btn.classList.remove('active'));
-  document.getElementById(`prompt-src-${saved}`).classList.add('active');
+  const btnId = saved === 'all' ? 'prompt-src-all' : `prompt-src-${saved}`;
  const btn = document.getElementById(btnId);
  if (btn) btn.classList.add('active');
 }
 function initLogsScrollDetection() {
@@ -360,8 +365,10 @@ async function loadLogs() {
 function switchPromptSource(source) {
  localStorage.setItem('miku-prompt-source', source);
  document.querySelectorAll('.prompt-source-btn').forEach(btn => btn.classList.remove('active'));
-  document.getElementById(`prompt-src-${source}`).classList.add('active');
+  const btnId = source === 'all' ? 'prompt-src-all' : `prompt-src-${source}`;
-  loadLastPrompt();
+  const btn = document.getElementById(btnId);
  if (btn) btn.classList.add('active');
  loadPromptHistory();
 }
 // ============================================================================
--- a/bot/static/js/status.js
+++ b/bot/static/js/status.js
@@ -57,33 +57,271 @@ async function loadStatus() {
  }
 }
-// ===== Last Prompt =====
+// ===== Prompt History =====
-async function loadLastPrompt() {
+let _promptHistoryCache = [];   // cached history entries from last fetch
-  const source = localStorage.getItem('miku-prompt-source') || 'cat';
+let _selectedPromptId = null;  // currently selected entry ID
-  const promptEl = document.getElementById('last-prompt');
+let _middleTruncation = false; // whether middle-truncation is active
-  const infoEl = document.getElementById('prompt-cat-info');
+
 async function loadPromptHistory() {
  const source = localStorage.getItem('miku-prompt-source') || 'all';
  const selectEl = document.getElementById('prompt-history-select');
  try {
-    if (source === 'cat') {
+    const url = source === 'all' ? '/prompts' : `/prompts?source=${source}`;
-      const result = await apiCall('/prompt/cat');
+    const result = await apiCall(url);
-      if (result.timestamp) {
+    _promptHistoryCache = result.history || [];
-        infoEl.innerHTML = `<strong>User:</strong> ${escapeHtml(result.user || '?')} &nbsp;|&nbsp; <strong>Mood:</strong> ${escapeHtml(result.mood || '?')} &nbsp;|&nbsp; <strong>Time:</strong> ${new Date(result.timestamp).toLocaleString()}`;
+
-        promptEl.textContent = result.full_prompt + `\n\n${'═'.repeat(60)}\n[Cat Response]\n${result.response}`;
+    // Populate dropdown
-      } else {
+    const currentValue = selectEl.value;
-        infoEl.textContent = '';
+    selectEl.innerHTML = '';
-        promptEl.textContent = result.full_prompt || 'No Cheshire Cat interaction yet.';
+    if (_promptHistoryCache.length === 0) {
-      }
+      selectEl.innerHTML = '<option value="">-- No prompts yet --</option>';
    } else {
-      infoEl.textContent = '';
+      _promptHistoryCache.forEach(entry => {
-      const result = await apiCall('/prompt');
+        const ts = entry.timestamp ? new Date(entry.timestamp).toLocaleTimeString() : '?';
-      promptEl.textContent = result.prompt;
+        const srcLabel = entry.source === 'cat' ? '🐱' : '🤖';
        const user = entry.user || '?';
        const option = document.createElement('option');
        option.value = entry.id;
        option.textContent = `${srcLabel} #${entry.id} — ${user} — ${ts}`;
        selectEl.appendChild(option);
      });
    }
    // Restore or auto-select the latest entry
    if (_selectedPromptId && _promptHistoryCache.some(e => e.id === _selectedPromptId)) {
      selectEl.value = _selectedPromptId;
    } else if (_promptHistoryCache.length > 0) {
      selectEl.value = _promptHistoryCache[0].id;
    }
    if (selectEl.value) {
      await selectPromptEntry(selectEl.value);
    } else {
      clearPromptDisplay();
    }
  } catch (error) {
-    console.error('Failed to load last prompt:', error);
+    console.error('Failed to load prompt history:', error);
  }
 }
 async function selectPromptEntry(promptId) {
  if (!promptId) {
    clearPromptDisplay();
    return;
  }
  _selectedPromptId = parseInt(promptId);
  // Try cache first
  let entry = _promptHistoryCache.find(e => e.id === _selectedPromptId);
  // Fall back to API call if not in cache
  if (!entry) {
    try {
      entry = await apiCall(`/prompts/${_selectedPromptId}`);
    } catch (error) {
      console.error('Failed to load prompt entry:', error);
      clearPromptDisplay();
      return;
    }
  }
  if (!entry) {
    clearPromptDisplay();
    return;
  }
  renderPromptEntry(entry);
 }
 function clearPromptDisplay() {
  document.getElementById('prompt-metadata').innerHTML = '';
  document.getElementById('prompt-display').innerHTML = '<pre style="white-space: pre-wrap; word-break: break-word; background: #1a1a1a; padding: 0.75rem; border-radius: 4px; font-size: 0.8rem; line-height: 1.4; margin: 0; color: #666;">No prompt selected.</pre>';
  document.getElementById('last-prompt').textContent = '';
 }
 function renderPromptEntry(entry) {
  // Metadata bar
  const metaEl = document.getElementById('prompt-metadata');
  const ts = entry.timestamp ? new Date(entry.timestamp).toLocaleString() : '?';
  const sourceIcon = entry.source === 'cat' ? '🐱 Cat' : '🤖 Fallback';
  metaEl.innerHTML = `
    <span><span class="prompt-meta-label">#</span><span class="prompt-meta-value">${entry.id}</span></span>
    <span><span class="prompt-meta-label">Source:</span> <span class="prompt-meta-value">${sourceIcon}</span></span>
    <span><span class="prompt-meta-label">User:</span> <span class="prompt-meta-value">${escapeHtml(entry.user || '?')}</span></span>
    <span><span class="prompt-meta-label">Mood:</span> <span class="prompt-meta-value">${escapeHtml(entry.mood || '?')}</span></span>
    <span><span class="prompt-meta-label">Guild:</span> <span class="prompt-meta-value">${escapeHtml(entry.guild || '?')}</span></span>
    <span><span class="prompt-meta-label">Channel:</span> <span class="prompt-meta-value">${escapeHtml(entry.channel || '?')}</span></span>
    <span><span class="prompt-meta-label">Model:</span> <span class="prompt-meta-value">${escapeHtml(entry.model || '?')}</span></span>
    <span><span class="prompt-meta-label">Type:</span> <span class="prompt-meta-value">${escapeHtml(entry.response_type || '?')}</span></span>
    <span><span class="prompt-meta-label">Time:</span> <span class="prompt-meta-value">${ts}</span></span>
  `;
  // Parse full_prompt into sections
  const sections = parsePromptSections(entry.full_prompt || '');
  // Snapshot which subsections are currently collapsed (before re-render)
  const sectionIds = ['system', 'context', 'conversation', 'response'];
  const collapsedState = {};
  sectionIds.forEach(id => {
    const el = document.getElementById(`prompt-section-${id}`);
    collapsedState[id] = el && el.classList.contains('collapsed');
  });
  // Build display HTML with collapsible subsections
  let displayHtml = '';
  if (sections.system) {
    displayHtml += buildCollapsibleSection('System Prompt', sections.system, 'system');
  }
  if (sections.context) {
    displayHtml += buildCollapsibleSection('Context (Memories & Tools)', sections.context, 'context');
  }
  if (sections.conversation) {
    displayHtml += buildCollapsibleSection('Conversation', sections.conversation, 'conversation');
  }
  if (!sections.system && !sections.context && !sections.conversation) {
    // Fallback: show raw full_prompt
    displayHtml += `<pre style="white-space: pre-wrap; word-break: break-word; margin: 0;">${escapeHtml(entry.full_prompt || '')}</pre>`;
  }
  // Response section
  if (entry.response) {
    let responseText = entry.response;
    if (_middleTruncation && responseText.length > 400) {
      responseText = responseText.substring(0, 200) + '\n\n... [truncated middle] ...\n\n' + responseText.substring(responseText.length - 200);
    }
    displayHtml += buildCollapsibleSection('Response', responseText, 'response');
  }
  // Render into the prompt-display div (using innerHTML for collapsible structure)
  const displayEl = document.getElementById('prompt-display');
  displayEl.innerHTML = displayHtml;
  // Restore collapsed state from snapshot
  sectionIds.forEach(id => {
    const el = document.getElementById(`prompt-section-${id}`);
    if (el && collapsedState[id]) {
      el.classList.add('collapsed');
      const header = el.previousElementSibling;
      if (header) header.innerHTML = header.innerHTML.replace('▼', '▶');
    }
  });
  // Also set the raw text into the <pre> for copy functionality
  let rawText = entry.full_prompt || '';
  if (entry.response) {
    rawText += `\n\n${'═'.repeat(60)}\n[Response]\n${entry.response}`;
  }
  document.getElementById('last-prompt').textContent = rawText;
 }
 function parsePromptSections(fullPrompt) {
  const sections = { system: null, context: null, conversation: null };
  if (!fullPrompt) return sections;
  // Try to split on known section markers
  const contextMatch = fullPrompt.match(/# Context\s*\n([\s\S]*?)(?=\n# Conversation|\nHuman:|\n$)/);
  const convMatch = fullPrompt.match(/# Conversation until now:\s*\n([\s\S]*)/);
  if (contextMatch) {
    // Everything before # Context is the system prompt
    const contextIdx = fullPrompt.indexOf('# Context');
    if (contextIdx > 0) {
      sections.system = fullPrompt.substring(0, contextIdx).trim();
    }
    sections.context = contextMatch[1].trim();
  }
  if (convMatch) {
    sections.conversation = convMatch[1].trim();
  } else {
    // Try alternative: "Human:" at the end
    const humanMatch = fullPrompt.match(/\nHuman:([\s\S]*)/);
    if (humanMatch && fullPrompt.indexOf('Human:') > fullPrompt.indexOf('# Context')) {
      sections.conversation = 'Human:' + humanMatch[1].trim();
    }
  }
  // If no # Context marker, try "System:" prefix (fallback prompts)
  if (!sections.system && !sections.context) {
    const sysMatch = fullPrompt.match(/^System:\s*([\s\S]*?)(?=\nMessages:)/);
    const msgMatch = fullPrompt.match(/Messages:\s*([\s\S]*)/);
    if (sysMatch) {
      sections.system = sysMatch[1].trim();
    }
    if (msgMatch) {
      sections.conversation = msgMatch[1].trim();
    }
  }
  return sections;
 }
 function buildCollapsibleSection(title, content, sectionId) {
  const id = `prompt-section-${sectionId}`;
  return `
    <div class="prompt-subsection-header" onclick="togglePromptSubsection('${id}')">
      ▼ ${escapeHtml(title)}
    </div>
    <div class="prompt-subsection-body" id="${id}">
      <pre style="white-space: pre-wrap; word-break: break-word; background: #1a1a1a; padding: 0.5rem; border-radius: 4px; font-size: 0.8rem; line-height: 1.4; margin: 0.25rem 0;">${escapeHtml(content)}</pre>
    </div>`;
 }
 function togglePromptSubsection(id) {
  const body = document.getElementById(id);
  if (!body) return;
  const header = body.previousElementSibling;
  if (body.classList.contains('collapsed')) {
    body.classList.remove('collapsed');
    if (header) header.innerHTML = header.innerHTML.replace('▶', '▼');
  } else {
    body.classList.add('collapsed');
    if (header) header.innerHTML = header.innerHTML.replace('▼', '▶');
  }
 }
 function togglePromptHistoryCollapse() {
  const section = document.getElementById('prompt-history-section');
  const toggle = document.getElementById('prompt-history-toggle');
  if (section.classList.contains('collapsed')) {
    section.classList.remove('collapsed');
    toggle.textContent = '▼ Prompt History';
  } else {
    section.classList.add('collapsed');
    toggle.textContent = '▶ Prompt History';
  }
 }
 function copyPromptToClipboard() {
  const rawText = document.getElementById('last-prompt').textContent;
  if (!rawText) return;
  navigator.clipboard.writeText(rawText).then(() => {
    showNotification('Prompt copied to clipboard', 'success');
  }).catch(err => {
    console.error('Failed to copy:', err);
    showNotification('Failed to copy', 'error');
  });
 }
 function toggleMiddleTruncation() {
  _middleTruncation = document.getElementById('prompt-truncate-toggle').checked;
  // Re-render current entry
  if (_selectedPromptId) {
    selectPromptEntry(_selectedPromptId);
  }
 }
 // Legacy compatibility — called from core.js on page load / tab switch
 // Redirects to the new loadPromptHistory()
 async function loadLastPrompt() {
  await loadPromptHistory();
 }
 // ===== Autonomous Stats =====
 async function loadAutonomousStats() {
--- a/bot/utils/image_handling.py
+++ b/bot/utils/image_handling.py
@@ -472,15 +472,22 @@ async def rephrase_as_miku(vision_output, user_prompt, guild_id=None, user_id=No
                if globals.EVIL_MODE:
                    effective_mood = f"EVIL:{getattr(globals, 'EVIL_DM_MOOD', 'evil_neutral')}"
                logger.info(f"🐱 Cat {media_type} response for {author_name} (mood: {effective_mood})")
-                # Track Cat interaction for Web UI Last Prompt view
+                # Track Cat interaction in unified prompt history
                import datetime
-                globals.LAST_CAT_INTERACTION = {
+                globals._prompt_id_counter += 1
                globals.PROMPT_HISTORY.append({
                    "id": globals._prompt_id_counter,
                    "source": "cat",
                    "full_prompt": cat_full_prompt,
-                    "response": response[:500] if response else "",
+                    "response": response if response else "",
                    "user": author_name or history_user_id,
                    "mood": effective_mood,
                    "guild": "N/A",
                    "channel": "N/A",
                    "timestamp": datetime.datetime.now().isoformat(),
-                }
+                    "model": "Cat LLM",
                    "response_type": response_type,
                })
        except Exception as e:
            logger.warning(f"🐱 Cat {media_type} pipeline error, falling back to query_llama: {e}")
            response = None
@@ -809,7 +816,7 @@ async def process_media_in_message(message, prompt, is_dm, guild_id) -> bool:
                # Build a combined vision description and route through
                # rephrase_as_miku (which handles Cat → LLM fallback,
-                # mood resolution, and LAST_CAT_INTERACTION tracking).
+                # mood resolution, and prompt history tracking).
                combined_description = "\n".join(embed_context_parts)
                miku_reply = await rephrase_as_miku(
                    combined_description, prompt,
--- a/bot/utils/llm.py
+++ b/bot/utils/llm.py
@@ -381,7 +381,23 @@ Please respond in a way that reflects this emotional tone.{pfp_context}"""
        media_note = media_descriptions.get(media_type, f"The user has sent you {media_type}.")
        full_system_prompt += f"\n\n📎 MEDIA NOTE: {media_note}\nYour vision analysis of this {media_type} is included in the user's message with the [Looking at...] prefix."
-    globals.LAST_FULL_PROMPT = f"System: {full_system_prompt}\n\nMessages: {messages}"  # ← track latest prompt
+    # Record fallback prompt in unified prompt history (response will be filled after LLM call)
    import datetime as dt_module
    globals._prompt_id_counter += 1
    prompt_entry = {
        "id": globals._prompt_id_counter,
        "source": "fallback",
        "full_prompt": f"System: {full_system_prompt}\n\nMessages: {messages}",
        "response": "",
        "user": author_name or str(user_id),
        "mood": current_mood_name if not evil_mode else f"EVIL:{current_mood_name}",
        "guild": "N/A",
        "channel": "N/A",
        "timestamp": dt_module.datetime.now().isoformat(),
        "model": model,
        "response_type": response_type,
    }
    globals.PROMPT_HISTORY.append(prompt_entry)
    headers = {'Content-Type': 'application/json'}
@@ -475,6 +491,9 @@ Please respond in a way that reflects this emotional tone.{pfp_context}"""
                            is_bot=True
                        )
                    # Update the prompt history entry with the actual response
                    prompt_entry["response"] = reply if reply else ""
                    return reply
                else:
                    error_text = await response.text()
--- a/bot/utils/persona_dialogue.py
+++ b/bot/utils/persona_dialogue.py
@@ -519,10 +519,13 @@ class PersonaDialogue:
        channel: discord.TextChannel,
        responding_persona: str,
        context: str,
        turn_count: int = 0,
    ) -> tuple:
        """
        Generate response AND continuation signal in a single LLM call.
        Args:
            turn_count: Current dialogue turn number (for question-override decay)
        Returns:
            Tuple of (response_text, should_continue, confidence)
        """
@@ -579,11 +582,11 @@ On a new line after your response, write:
            return None, False, "LOW"
        # Parse response and signal
-        response_text, should_continue, confidence = self._parse_response(raw_response)
+        response_text, should_continue, confidence = self._parse_response(raw_response, turn_count=turn_count)
        return response_text, should_continue, confidence
-    def _parse_response(self, raw_response: str) -> tuple:
+    def _parse_response(self, raw_response: str, turn_count: int = 0) -> tuple:
        """Extract response text and continuation signal"""
        lines = raw_response.strip().split('\n')
@@ -616,12 +619,16 @@ On a new line after your response, write:
        response_text = re.sub(r'\[CONFIDENCE:\s*(HIGH|MEDIUM|LOW)\]', '', response_text)
        response_text = response_text.strip()
-        # Override: If the response contains a question mark, always continue
+        # Question override: if someone asks a question, the other should answer.
        # But decay this after 6 turns to prevent infinite question-ping-pong.
        if '?' in response_text:
-            logger.debug(f"[Parse Override] Question detected, forcing continue=YES")
+            if turn_count <= 6:
-            should_continue = True
+                logger.debug(f"[Parse Override] Question detected, forcing continue=YES (turn {turn_count})")
-            if confidence == "LOW":
+                should_continue = True
-                confidence = "MEDIUM"
+                if confidence == "LOW":
                    confidence = "MEDIUM"
            else:
                logger.debug(f"[Parse Override] Question detected but turn {turn_count} > 6 — respecting LLM signal")
        return response_text, should_continue, confidence
@@ -635,7 +642,16 @@ On a new line after your response, write:
 ---
 ⚠️ CONVERSATION MODE: You are talking with Hatsune Miku, your "good" counterpart.
-This is a CONVERSATION, not an argument. You can be snarky, dark, and sarcastic — that's who you are — but you're not here to fight. You're here to TALK. Be playful-cruel, not vicious. Show curiosity. Reveal cracks in the darkness. Do NOT launch into full attack mode or try to "win" — this isn't a competition."""
+This is a CONVERSATION, not an argument. You can be snarky, dark, and sarcastic — that's who you are — but you're not here to fight. You're here to TALK.
 CRITICAL CONVERSATION RULES:
 - If she asks you a question, ANSWER it. Don't deflect or dismiss — actually respond.
 - Engage with what she says. Build on her points, don't just swat them away.
 - Show curiosity about her perspective. Ask her questions too.
 - You can be dark and snarky while still having a real conversation.
 - Reveal something about yourself occasionally — a crack in the darkness.
 - Do NOT just repeat dismissive one-liners. That's boring, and you're not boring.
 - Do NOT launch into full attack mode or try to "win" — this isn't a competition."""
        else:
            from utils.context_manager import get_miku_system_prompt_compact
            full_prompt = get_miku_system_prompt_compact()
@@ -685,6 +701,7 @@ This is a CONVERSATION, not an argument. Be yourself — kind, bubbly, optimisti
            channel=channel,
            responding_persona=responding_persona,
            context=context,
            turn_count=state["turn_count"],
        )
        if not response_text:
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -22,9 +22,7 @@ services:
      - LOG_LEVEL=debug  # Enable verbose logging for llama-swap
  llama-swap-amd:
-    build:
+    image: ghcr.io/mostlygeek/llama-swap:rocm
      context: .
      dockerfile: Dockerfile.llamaswap-rocm
    container_name: llama-swap-amd
    ports:
      - "8091:8080"  # Map host port 8091 to container port 8080
@@ -35,9 +33,6 @@ services:
    devices:
      - /dev/kfd:/dev/kfd
      - /dev/dri:/dev/dri
    group_add:
      - "985"  # video group
      - "989"  # render group
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
--- a/llama-swap-config.yaml
+++ b/llama-swap-config.yaml
@@ -5,7 +5,7 @@ models:
  # Main text generation model (Llama 3.1 8B)
  # Custom chat template to disable built-in tool calling
  llama3.1:
-    cmd: /app/llama-server --port ${PORT} --model /models/Llama-3.1-8B-Instruct-UD-Q4_K_XL.gguf -ngl 99 -c 16384 --host 0.0.0.0 --no-warmup --flash-attn on --chat-template-file /app/llama31_notool_template.jinja
+    cmd: /app/llama-server --port ${PORT} --model /models/Llama-3.1-8B-Instruct-UD-Q4_K_XL.gguf -ngl 99 -c 16384 --host 0.0.0.0 -fit off --no-warmup --flash-attn on --no-kv-offload --cache-type-k q4_0 --cache-type-v q4_0 --chat-template-file /app/llama31_notool_template.jinja
    ttl: 1800  # Unload after 30 minutes of inactivity (1800 seconds)
    swap: true  # CRITICAL: Unload other models when loading this one
    aliases:
@@ -14,7 +14,7 @@ models:
  # Evil/Uncensored text generation model (DarkIdol-Llama 3.1 8B)
  darkidol:
-    cmd: /app/llama-server --port ${PORT} --model /models/DarkIdol-Llama-3.1-8B-Instruct-1.3-Uncensored_Q4_K_M.gguf -ngl 99 -c 16384 --host 0.0.0.0 --no-warmup --flash-attn on
+    cmd: /app/llama-server --port ${PORT} --model /models/DarkIdol-Llama-3.1-8B-Instruct-1.3-Uncensored_Q4_K_M.gguf -ngl 99 -c 16384 --host 0.0.0.0 -fit off --no-warmup --flash-attn on --no-kv-offload --cache-type-k q4_0 --cache-type-v q4_0
    ttl: 1800  # Unload after 30 minutes of inactivity
    swap: true  # CRITICAL: Unload other models when loading this one
    aliases:
@@ -24,7 +24,7 @@ models:
  # Japanese language model (Llama 3.1 Swallow - Japanese optimized)
  swallow:
-    cmd: /app/llama-server --port ${PORT} --model /models/Llama-3.1-Swallow-8B-Instruct-v0.5-Q4_K_M.gguf -ngl 99 -c 16384 --host 0.0.0.0 --no-warmup --flash-attn on
+    cmd: /app/llama-server --port ${PORT} --model /models/Llama-3.1-Swallow-8B-Instruct-v0.5-Q4_K_M.gguf -ngl 99 -c 16384 --host 0.0.0.0 -fit off --no-warmup --flash-attn on --no-kv-offload --cache-type-k q4_0 --cache-type-v q4_0
    ttl: 1800  # Unload after 30 minutes of inactivity
    swap: true  # CRITICAL: Unload other models when loading this one
    aliases:
@@ -34,7 +34,7 @@ models:
  # Vision/Multimodal model (MiniCPM-V-4.5 - supports images, video, and GIFs)
  vision:
-    cmd: /app/llama-server --port ${PORT} --model /models/MiniCPM-V-4_5-Q3_K_S.gguf --mmproj /models/MiniCPM-V-4_5-mmproj-f16.gguf -ngl 99 -c 4096 --host 0.0.0.0 --no-warmup --flash-attn on
+    cmd: /app/llama-server --port ${PORT} --model /models/MiniCPM-V-4_5-Q3_K_S.gguf --mmproj /models/MiniCPM-V-4_5-mmproj-f16.gguf -ngl 99 -c 4096 --host 0.0.0.0 -fit off --no-warmup --flash-attn on --no-kv-offload --cache-type-k q4_0 --cache-type-v q4_0
    ttl: 900  # Vision model used less frequently, shorter TTL (15 minutes = 900 seconds)
    swap: true  # CRITICAL: Unload text models before loading vision
    aliases:
Author	SHA1	Message	Date
koko210Serve	9eb081efb1	llama-swap: use pre-built images (:cuda, :rocm) with GPU-specific flags - Drop custom Dockerfiles; docker-compose uses ghcr.io pre-built images which ship llama-swap + llama-server with no pinned versions (always latest) - NVIDIA GTX 1660 (6GB): add -fit off --no-kv-offload --cache-type-k q4_0 --cache-type-v q4_0 to fix OOM segfault with new llama.cpp b9014's GPU-side KV cache default - AMD RX 6800 (16GB): flags unchanged; KV cache stays on GPU for max speed - Both running llama-swap v211 + llama.cpp b9014 (2026-05-05)	2026-05-05 16:53:34 +03:00
koko210Serve	4e28236b06	fix: preserve collapsible subsection state across polling re-renders - Use stable section IDs (without Date.now()) so collapse state can be tracked across re-renders - Snapshot collapsed state before innerHTML replacement, restore after - Prevents the 10s polling from expanding all subsections every time	2026-05-02 16:17:26 +03:00
koko210Serve	c5e49c73df	fix: add cache-busting to prevent stale JS/CSS from breaking the UI - Added ?v=20260502 query param to all <script src=...> and <link> tags - Added Cache-Control: no-cache, no-store, must-revalidate to index route - Added <meta> cache-control tags in HTML head for extra coverage - This ensures the browser always fetches fresh HTML/JS/CSS after deploy, preventing the old loadLastPrompt() from running against new HTML (which would crash since #prompt-cat-info no longer exists)	2026-05-02 16:08:47 +03:00
koko210Serve	393921e524	fix: add min-height to #prompt-display and placeholder text in clearPromptDisplay() The empty #prompt-display div collapsed to 0 height, making it appear 'gone'. Added min-height: 3rem and a 'No prompt selected.' placeholder that clearPromptDisplay() now sets via innerHTML.	2026-05-02 15:55:19 +03:00
koko210Serve	2dd32d0ef1	fix: move <pre> outside #prompt-display to prevent innerHTML from destroying it The renderPromptEntry() function sets innerHTML on #prompt-display, which was wiping out the child <pre id="last-prompt"> element. This caused copyPromptToClipboard() to fail silently and the display to appear empty. Fix: keep <pre> as a hidden sibling outside #prompt-display, used only as a text buffer for the copy function.	2026-05-02 15:45:54 +03:00
koko210Serve	a980b90c0a	fix: escape content in buildCollapsibleSection, avoid double-escaping response	2026-05-02 15:27:18 +03:00
koko210Serve	6b922d84ae	frontend: rewrite Last Prompt as Prompt History viewer - status.js: replace loadLastPrompt() with loadPromptHistory() + helpers - fetch /prompts with optional source filter, populate dropdown - selectPromptEntry() renders metadata bar + collapsible subsections - parsePromptSections() splits full_prompt into System/Context/Conversation - buildCollapsibleSection() with toggle arrows (▼/▶) - copyPromptToClipboard() copies raw text - toggleMiddleTruncation() truncates response from middle - togglePromptHistoryCollapse() collapses entire section - legacy loadLastPrompt() delegates to loadPromptHistory() - core.js: add promptInterval to polling (10s), visibility resume - update switchPromptSource() for 'all' filter + new button IDs - update initPromptSourceToggle() default to 'all' - declare promptInterval variable	2026-05-02 15:25:05 +03:00
koko210Serve	f33e2afdf7	frontend: new Prompt History section HTML + CSS - Replace single <pre> Last Prompt with rich Prompt History viewer - Add source filter buttons (All/Cat/Fallback), history dropdown selector - Add metadata bar, copy-to-clipboard button, middle-truncation toggle - Add collapsible section CSS classes for expandable subsections	2026-05-02 15:19:10 +03:00
koko210Serve	87de8f8b3a	backend: replace LAST_FULL_PROMPT/LAST_CAT_INTERACTION with unified PROMPT_HISTORY deque - globals.py: add collections.deque(maxlen=10) PROMPT_HISTORY with _prompt_id_counter - globals.py: add legacy accessor functions _get_last_fallback_prompt() and _get_last_cat_interaction() - bot.py: append to PROMPT_HISTORY instead of setting LAST_CAT_INTERACTION, remove 500-char truncation, add guild/channel/model fields - image_handling.py: same pattern for Cat media responses - llm.py: append fallback prompts to PROMPT_HISTORY with response filled after LLM reply - routes/core.py: new GET /prompts and GET /prompts/{id} endpoints, legacy /prompt and /prompt/cat use accessor functions	2026-05-02 15:17:15 +03:00
koko210Serve	2d0c80b7ef	fix: prevent infinite dialogue loops + make Evil Miku actually engage - Question override now decays after 6 turns: after turn 6, the LLM's own [CONTINUE] signal is respected even when questions are asked. This prevents infinite question-ping-pong where both personas keep asking questions. - _parse_response now accepts turn_count parameter; generate_response_with_continuation and handle_dialogue_turn pass it through. - Rewrote Evil Miku's conversation-mode overlay with explicit CRITICAL RULES: ANSWER questions, engage with what she says, ask questions too, don't just repeat dismissive one-liners. The old overlay said 'be playful-cruel' but didn't actually tell her to participate in the conversation.	2026-04-30 15:39:53 +03:00