llama-swap: use pre-built images (:cuda, :rocm) with GPU-specific flags

- Drop custom Dockerfiles; docker-compose uses ghcr.io pre-built images which ship llama-swap + llama-server with no pinned versions (always latest) - NVIDIA GTX 1660 (6GB): add -fit off --no-kv-offload --cache-type-k q4_0 --cache-type-v q4_0 to fix OOM segfault with new llama.cpp b9014's GPU-side KV cache default - AMD RX 6800 (16GB): flags unchanged; KV cache stays on GPU for max speed - Both running llama-swap v211 + llama.cpp b9014 (2026-05-05)
fix: preserve collapsible subsection state across polling re-renders
2026-05-05 16:53:34 +03:00 · 2026-05-02 16:17:26 +03:00 · 2026-05-02 16:08:47 +03:00 · 2026-05-02 15:55:19 +03:00 · 2026-05-02 15:45:54 +03:00 · 2026-05-02 15:27:18 +03:00
13 changed files with 476 additions and 166 deletions
--- a/Dockerfile.llamaswap
+++ b/Dockerfile.llamaswap
@@ -1,13 +0,0 @@
-FROM ghcr.io/mostlygeek/llama-swap:cuda
-
-USER root
-
-# Download and install llama-server binary (CUDA version)
-# Using the official pre-built binary from llama.cpp releases
-ADD --chmod=755 https://github.com/ggml-org/llama.cpp/releases/download/b4183/llama-server-cuda /usr/local/bin/llama-server
-
-# Verify it's executable
-RUN llama-server --version || echo "llama-server installed successfully"
-
-USER 1000:1000
-
--- a/Dockerfile.llamaswap-rocm
+++ b/Dockerfile.llamaswap-rocm
@@ -1,68 +0,0 @@
-# Multi-stage build for llama-swap with ROCm support
-# Now using official llama.cpp ROCm image (PR #18439 merged Dec 29, 2025)
-
-# Stage 1: Build llama-swap UI
-FROM node:22-alpine AS ui-builder
-
-WORKDIR /build
-
-# Install git
-RUN apk add --no-cache git
-
-# Clone llama-swap
-RUN git clone https://github.com/mostlygeek/llama-swap.git
-
-# Build UI (now in ui-svelte directory)
-WORKDIR /build/llama-swap/ui-svelte
-RUN npm install && npm run build
-
-# Stage 2: Build llama-swap binary
-FROM golang:1.23-alpine AS swap-builder
-
-WORKDIR /build
-
-# Install git
-RUN apk add --no-cache git
-
-# Copy llama-swap source with built UI
-COPY --from=ui-builder /build/llama-swap /build/llama-swap
-
-# Build llama-swap binary
-WORKDIR /build/llama-swap
-RUN GOTOOLCHAIN=auto go build -o /build/llama-swap-binary .
-
-# Stage 3: Final runtime image using official llama.cpp ROCm image
-FROM ghcr.io/ggml-org/llama.cpp:server-rocm
-
-WORKDIR /app
-
-# Copy llama-swap binary from builder
-COPY --from=swap-builder /build/llama-swap-binary /app/llama-swap
-
-    # Make binaries executable
-    RUN chmod +x /app/llama-swap
-    
-    # Add existing ubuntu user (UID 1000) to GPU access groups (using host GIDs)
-    # GID 187 = render group on host, GID 989 = video/kfd group on host
-    RUN groupadd -g 187 hostrender && \
-        groupadd -g 989 hostvideo && \
-        usermod -aG hostrender,hostvideo ubuntu && \
-        chown -R ubuntu:ubuntu /app
-    
-    # Set environment for ROCm (RX 6800 is gfx1030)
-    ENV HSA_OVERRIDE_GFX_VERSION=10.3.0
-    ENV ROCM_PATH=/opt/rocm
-    ENV HIP_VISIBLE_DEVICES=0
-    
-    USER ubuntu
-    
-    # Expose port
-    EXPOSE 8080
-    
-    # Health check
-    HEALTHCHECK --interval=30s --timeout=10s --start-period=30s --retries=3 \
-      CMD curl -f http://localhost:8080/health || exit 1
-    
-    # Override the base image's ENTRYPOINT and run llama-swap
-    ENTRYPOINT []
-    CMD ["/app/llama-swap", "-config", "/app/config.yaml", "-listen", "0.0.0.0:8080"]
--- a/bot/bot.py
+++ b/bot/bot.py
@@ -360,15 +360,24 @@ async def on_message(message):
                        if globals.EVIL_MODE:
                            effective_mood = f"EVIL:{getattr(globals, 'EVIL_DM_MOOD', 'evil_neutral')}"
                        logger.info(f"🐱 Cat response for {author_name} (mood: {effective_mood})")
-                        # Track Cat interaction for Web UI Last Prompt view
+                        # Track Cat interaction in unified prompt history
                        import datetime
-                        globals.LAST_CAT_INTERACTION = {
+                        globals._prompt_id_counter += 1
+                        guild_name = message.guild.name if message.guild else "DM"
+                        channel_name = message.channel.name if message.guild else "DM"
+                        globals.PROMPT_HISTORY.append({
+                            "id": globals._prompt_id_counter,
+                            "source": "cat",
                            "full_prompt": cat_full_prompt,
-                            "response": response[:500] if response else "",
+                            "response": response if response else "",
                            "user": author_name,
                            "mood": effective_mood,
+                            "guild": guild_name,
+                            "channel": channel_name,
                            "timestamp": datetime.datetime.now().isoformat(),
-                        }
+                            "model": "Cat LLM",
+                            "response_type": response_type,
+                        })
                except Exception as e:
                    logger.warning(f"🐱 Cat pipeline error, falling back to query_llama: {e}")
                    response = None
--- a/bot/globals.py
+++ b/bot/globals.py
@@ -1,6 +1,7 @@
 # globals.py
 import os
 import discord
+from collections import deque
 from apscheduler.schedulers.asyncio import AsyncIOScheduler

 scheduler = AsyncIOScheduler()
@@ -77,16 +78,25 @@ MIKU_NORMAL_AVATAR_URL = None  # Cached CDN URL of the regular Miku pfp (valid e

 BOT_USER = None

-LAST_FULL_PROMPT = ""
+# Unified prompt history (replaces LAST_FULL_PROMPT and LAST_CAT_INTERACTION)
+# Each entry: {id, source, full_prompt, response, user, mood, guild, channel,
+#              timestamp, model, response_type}
+PROMPT_HISTORY = deque(maxlen=10)
+_prompt_id_counter = 0

-# Cheshire Cat last interaction tracking (for Web UI Last Prompt toggle)
-LAST_CAT_INTERACTION = {
-    "full_prompt": "",
-    "response": "",
-    "user": "",
-    "mood": "",
-    "timestamp": "",
-}
+# Legacy accessors for backward compatibility (routes, CLI, etc.)
+# These are computed properties that read from PROMPT_HISTORY
+def _get_last_fallback_prompt():
+    for entry in reversed(PROMPT_HISTORY):
+        if entry.get("source") == "fallback":
+            return entry.get("full_prompt", "")
+    return ""
+
+def _get_last_cat_interaction():
+    for entry in reversed(PROMPT_HISTORY):
+        if entry.get("source") == "cat":
+            return entry
+    return {"full_prompt": "", "response": "", "user": "", "mood": "", "timestamp": ""}

 # Persona Dialogue System (conversations between Miku and Evil Miku)
 LAST_PERSONA_DIALOGUE_TIME = 0  # Timestamp of last dialogue for cooldown
--- a/bot/routes/core.py
+++ b/bot/routes/core.py
@@ -14,7 +14,8 @@ router = APIRouter()

@router.get("/")
 def read_index():
-    return FileResponse("static/index.html")
+    headers = {"Cache-Control": "no-cache, no-store, must-revalidate"}
+    return FileResponse("static/index.html", headers=headers)


@router.get("/logs")
@@ -31,18 +32,45 @@ def get_logs():

@router.get("/prompt")
 def get_last_prompt():
-    return {"prompt": globals.LAST_FULL_PROMPT or "No prompt has been issued yet."}
+    """Legacy endpoint: returns the most recent fallback prompt (backward compat)."""
+    prompt_text = globals._get_last_fallback_prompt()
+    return {"prompt": prompt_text or "No prompt has been issued yet."}


@router.get("/prompt/cat")
 def get_last_cat_prompt():
-    """Get the last Cheshire Cat interaction (full prompt + response) for Web UI."""
-    interaction = globals.LAST_CAT_INTERACTION
+    """Legacy endpoint: returns the most recent Cat interaction (backward compat)."""
+    interaction = globals._get_last_cat_interaction()
    if not interaction.get("full_prompt"):
-        return {"full_prompt": "No Cheshire Cat interaction has occurred yet.", "response": "", "user": "", "mood": "", "timestamp": ""}
+        return {"full_prompt": "No Cheshire Cat interaction has occurred yet.",
+                "response": "", "user": "", "mood": "", "timestamp": ""}
    return interaction


+@router.get("/prompts")
+def get_prompt_history(source: str = None):
+    """
+    Return the unified prompt history.
+    Optional query param ?source=cat or ?source=fallback to filter.
+    """
+    history = list(globals.PROMPT_HISTORY)
+    if source and source in ("cat", "fallback"):
+        history = [e for e in history if e.get("source") == source]
+    return {"history": history}
+
+
+@router.get("/prompts/{prompt_id}")
+def get_prompt_by_id(prompt_id: int):
+    """Return a single prompt history entry by ID."""
+    for entry in globals.PROMPT_HISTORY:
+        if entry.get("id") == prompt_id:
+            return entry
+    return JSONResponse(
+        status_code=404,
+        content={"status": "error", "message": f"Prompt #{prompt_id} not found"}
+    )
+
+
@router.get("/status")
 def status():
    # Get per-server mood summary
--- a/bot/static/css/style.css
+++ b/bot/static/css/style.css
@@ -441,6 +441,51 @@ h1, h3 {
  color: #ddd;
 }

+/* Prompt History Section */
+#prompt-history-section.collapsed #prompt-history-body {
+  display: none;
+}
+#prompt-history-toggle {
+  user-select: none;
+  transition: color 0.2s;
+}
+#prompt-history-toggle:hover {
+  color: #4CAF50;
+}
+#prompt-metadata span {
+  white-space: nowrap;
+}
+#prompt-metadata .prompt-meta-label {
+  color: #666;
+}
+#prompt-metadata .prompt-meta-value {
+  color: #ccc;
+}
+#prompt-display pre {
+  margin: 0;
+}
+.prompt-subsection-header {
+  cursor: pointer;
+  user-select: none;
+  padding: 0.3rem 0.5rem;
+  border-radius: 4px;
+  background: #2a2a2a;
+  margin: 0.5rem 0 0.25rem 0;
+  font-size: 0.82rem;
+  color: #aaa;
+  transition: background 0.15s;
+}
+.prompt-subsection-header:hover {
+  background: #333;
+  color: #ddd;
+}
+.prompt-subsection-body.collapsed {
+  display: none;
+}
+#prompt-truncate-toggle {
+  accent-color: #4CAF50;
+}
+
 /* Mood Activities Editor */
 .act-mood-row {
  margin-bottom: 0.5rem;
--- a/bot/static/index.html
+++ b/bot/static/index.html
@@ -3,10 +3,13 @@
 <head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
+  <meta http-equiv="Cache-Control" content="no-cache, no-store, must-revalidate">
+  <meta http-equiv="Pragma" content="no-cache">
+  <meta http-equiv="Expires" content="0">
  <title>Miku Control Panel</title>
  <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/cropperjs/1.6.2/cropper.min.css">
  <script src="https://cdnjs.cloudflare.com/ajax/libs/cropperjs/1.6.2/cropper.min.js"></script>
-  <link rel="stylesheet" href="/static/css/style.css">
+  <link rel="stylesheet" href="/static/css/style.css?v=20260502">
 </head>
 <body>

@@ -543,23 +546,53 @@
        </div>
      </div>

-      <div class="section">
-        <h3>Last Prompt</h3>
-        <div style="margin-bottom: 0.75rem; display: flex; align-items: center; gap: 0.75rem;">
-          <label style="font-size: 0.9rem; color: #aaa;">Source:</label>
-          <div style="display: inline-flex; border-radius: 6px; overflow: hidden; border: 1px solid #444;">
-            <button id="prompt-src-cat" class="prompt-source-btn active" onclick="switchPromptSource('cat')"
-              style="padding: 0.4rem 1rem; border: none; cursor: pointer; font-size: 0.85rem; transition: all 0.2s;">
-              🐱 Cheshire Cat
-            </button>
-            <button id="prompt-src-fallback" class="prompt-source-btn" onclick="switchPromptSource('fallback')"
-              style="padding: 0.4rem 1rem; border: none; cursor: pointer; font-size: 0.85rem; transition: all 0.2s;">
-              🤖 Bot Fallback
-            </button>
-          </div>
+      <div class="section" id="prompt-history-section">
+        <div class="prompt-history-header" style="display: flex; align-items: center; justify-content: space-between; margin-bottom: 0.5rem;">
+          <h3 style="margin: 0; cursor: pointer;" onclick="togglePromptHistoryCollapse()" id="prompt-history-toggle">
+            ▼ Prompt History
+          </h3>
+          <button onclick="loadPromptHistory()" title="Refresh" style="background: none; border: 1px solid #444; color: #aaa; cursor: pointer; padding: 0.2rem 0.5rem; border-radius: 4px; font-size: 0.85rem;">🔄</button>
+        </div>
+        <div id="prompt-history-body">
+          <!-- Source filter + history selector row -->
+          <div style="margin-bottom: 0.75rem; display: flex; align-items: center; gap: 0.75rem; flex-wrap: wrap;">
+            <label style="font-size: 0.9rem; color: #aaa;">Source:</label>
+            <div style="display: inline-flex; border-radius: 6px; overflow: hidden; border: 1px solid #444;">
+              <button id="prompt-src-all" class="prompt-source-btn active" onclick="switchPromptSource('all')"
+                style="padding: 0.4rem 0.8rem; border: none; cursor: pointer; font-size: 0.85rem; transition: all 0.2s;">
+                All
+              </button>
+              <button id="prompt-src-cat" class="prompt-source-btn" onclick="switchPromptSource('cat')"
+                style="padding: 0.4rem 0.8rem; border: none; cursor: pointer; font-size: 0.85rem; transition: all 0.2s;">
+                🐱 Cat
+              </button>
+              <button id="prompt-src-fallback" class="prompt-source-btn" onclick="switchPromptSource('fallback')"
+                style="padding: 0.4rem 0.8rem; border: none; cursor: pointer; font-size: 0.85rem; transition: all 0.2s;">
+                🤖 Fallback
+              </button>
+            </div>
+            <select id="prompt-history-select" onchange="selectPromptEntry(this.value)" style="background: #2a2a2a; color: #ddd; border: 1px solid #444; padding: 0.35rem 0.5rem; border-radius: 4px; font-size: 0.85rem; min-width: 280px;">
+              <option value="">-- No prompts yet --</option>
+            </select>
+          </div>
+
+          <!-- Metadata bar -->
+          <div id="prompt-metadata" style="margin-bottom: 0.5rem; font-size: 0.82rem; color: #888; display: flex; flex-wrap: wrap; gap: 0.3rem 1rem;"></div>
+
+          <!-- Toolbar: copy + truncate toggle -->
+          <div style="margin-bottom: 0.5rem; display: flex; align-items: center; gap: 1rem;">
+            <button onclick="copyPromptToClipboard()" title="Copy full prompt to clipboard" style="background: #333; border: 1px solid #555; color: #aaa; cursor: pointer; padding: 0.25rem 0.6rem; border-radius: 4px; font-size: 0.8rem;">📋 Copy</button>
+            <label style="font-size: 0.82rem; color: #aaa; cursor: pointer; display: flex; align-items: center; gap: 0.3rem;">
+              <input type="checkbox" id="prompt-truncate-toggle" onchange="toggleMiddleTruncation()">
+              Truncate from middle
+            </label>
+          </div>
+
+          <!-- Prompt display subsections -->
+          <div id="prompt-display" style="max-height: 60vh; overflow-y: auto; min-height: 3rem;"></div>
+          <!-- Hidden buffer for copy-to-clipboard raw text -->
+          <pre id="last-prompt" style="display: none;"></pre>
        </div>
-        <div id="prompt-cat-info" style="margin-bottom: 0.5rem; font-size: 0.85rem; color: #aaa;"></div>
-        <pre id="last-prompt" style="white-space: pre-wrap; word-break: break-word;"></pre>
      </div>
    </div>

@@ -1339,15 +1372,15 @@
  </div>
 </div>

-<script src="/static/js/core.js"></script>
-<script src="/static/js/servers.js"></script>
-<script src="/static/js/modes.js"></script>
-<script src="/static/js/actions.js"></script>
-<script src="/static/js/image-gen.js"></script>
-<script src="/static/js/status.js"></script>
-<script src="/static/js/dm.js"></script>
-<script src="/static/js/chat.js"></script>
-<script src="/static/js/memories.js"></script>
-<script src="/static/js/profile.js"></script>
+<script src="/static/js/core.js?v=20260502"></script>
+<script src="/static/js/servers.js?v=20260502"></script>
+<script src="/static/js/modes.js?v=20260502"></script>
+<script src="/static/js/actions.js?v=20260502"></script>
+<script src="/static/js/image-gen.js?v=20260502"></script>
+<script src="/static/js/status.js?v=20260502"></script>
+<script src="/static/js/dm.js?v=20260502"></script>
+<script src="/static/js/chat.js?v=20260502"></script>
+<script src="/static/js/memories.js?v=20260502"></script>
+<script src="/static/js/profile.js?v=20260502"></script>
 </body>
 </html>
--- a/bot/static/js/core.js
+++ b/bot/static/js/core.js
@@ -29,6 +29,7 @@ let notificationTimer = null;
 let statusInterval = null;
 let logsInterval = null;
 let argsInterval = null;
+let promptInterval = null;

 // Mood emoji mapping
 const MOOD_EMOJIS = {
@@ -211,12 +212,14 @@ function startPolling() {
  if (!statusInterval) statusInterval = setInterval(loadStatus, 10000);
  if (!logsInterval) logsInterval = setInterval(loadLogs, 5000);
  if (!argsInterval) argsInterval = setInterval(loadActiveArguments, 5000);
+  if (!promptInterval) promptInterval = setInterval(loadPromptHistory, 10000);
 }

 function stopPolling() {
  clearInterval(statusInterval); statusInterval = null;
  clearInterval(logsInterval); logsInterval = null;
  clearInterval(argsInterval); argsInterval = null;
+  clearInterval(promptInterval); promptInterval = null;
 }

 // ============================================================================
@@ -248,7 +251,7 @@ function initVisibilityPolling() {
      stopPolling();
      console.log('⏸ Tab hidden — polling paused');
    } else {
-      loadStatus(); loadLogs(); loadActiveArguments();
+      loadStatus(); loadLogs(); loadActiveArguments(); loadPromptHistory();
      startPolling();
      console.log('▶️ Tab visible — polling resumed');
    }
@@ -296,9 +299,11 @@ function initModalAccessibility() {
 }

 function initPromptSourceToggle() {
-  const saved = localStorage.getItem('miku-prompt-source') || 'cat';
+  const saved = localStorage.getItem('miku-prompt-source') || 'all';
  document.querySelectorAll('.prompt-source-btn').forEach(btn => btn.classList.remove('active'));
-  document.getElementById(`prompt-src-${saved}`).classList.add('active');
+  const btnId = saved === 'all' ? 'prompt-src-all' : `prompt-src-${saved}`;
+  const btn = document.getElementById(btnId);
+  if (btn) btn.classList.add('active');
 }

 function initLogsScrollDetection() {
@@ -360,8 +365,10 @@ async function loadLogs() {
 function switchPromptSource(source) {
  localStorage.setItem('miku-prompt-source', source);
  document.querySelectorAll('.prompt-source-btn').forEach(btn => btn.classList.remove('active'));
-  document.getElementById(`prompt-src-${source}`).classList.add('active');
-  loadLastPrompt();
+  const btnId = source === 'all' ? 'prompt-src-all' : `prompt-src-${source}`;
+  const btn = document.getElementById(btnId);
+  if (btn) btn.classList.add('active');
+  loadPromptHistory();
 }

 // ============================================================================
--- a/bot/static/js/status.js
+++ b/bot/static/js/status.js
@@ -57,33 +57,271 @@ async function loadStatus() {
  }
 }

-// ===== Last Prompt =====
+// ===== Prompt History =====

-async function loadLastPrompt() {
-  const source = localStorage.getItem('miku-prompt-source') || 'cat';
-  const promptEl = document.getElementById('last-prompt');
-  const infoEl = document.getElementById('prompt-cat-info');
+let _promptHistoryCache = [];   // cached history entries from last fetch
+let _selectedPromptId = null;  // currently selected entry ID
+let _middleTruncation = false; // whether middle-truncation is active
+
+async function loadPromptHistory() {
+  const source = localStorage.getItem('miku-prompt-source') || 'all';
+  const selectEl = document.getElementById('prompt-history-select');

  try {
-    if (source === 'cat') {
-      const result = await apiCall('/prompt/cat');
-      if (result.timestamp) {
-        infoEl.innerHTML = `<strong>User:</strong> ${escapeHtml(result.user || '?')} &nbsp;|&nbsp; <strong>Mood:</strong> ${escapeHtml(result.mood || '?')} &nbsp;|&nbsp; <strong>Time:</strong> ${new Date(result.timestamp).toLocaleString()}`;
-        promptEl.textContent = result.full_prompt + `\n\n${'═'.repeat(60)}\n[Cat Response]\n${result.response}`;
-      } else {
-        infoEl.textContent = '';
-        promptEl.textContent = result.full_prompt || 'No Cheshire Cat interaction yet.';
-      }
+    const url = source === 'all' ? '/prompts' : `/prompts?source=${source}`;
+    const result = await apiCall(url);
+    _promptHistoryCache = result.history || [];
+
+    // Populate dropdown
+    const currentValue = selectEl.value;
+    selectEl.innerHTML = '';
+    if (_promptHistoryCache.length === 0) {
+      selectEl.innerHTML = '<option value="">-- No prompts yet --</option>';
    } else {
-      infoEl.textContent = '';
-      const result = await apiCall('/prompt');
-      promptEl.textContent = result.prompt;
+      _promptHistoryCache.forEach(entry => {
+        const ts = entry.timestamp ? new Date(entry.timestamp).toLocaleTimeString() : '?';
+        const srcLabel = entry.source === 'cat' ? '🐱' : '🤖';
+        const user = entry.user || '?';
+        const option = document.createElement('option');
+        option.value = entry.id;
+        option.textContent = `${srcLabel} #${entry.id} — ${user} — ${ts}`;
+        selectEl.appendChild(option);
+      });
+    }
+
+    // Restore or auto-select the latest entry
+    if (_selectedPromptId && _promptHistoryCache.some(e => e.id === _selectedPromptId)) {
+      selectEl.value = _selectedPromptId;
+    } else if (_promptHistoryCache.length > 0) {
+      selectEl.value = _promptHistoryCache[0].id;
+    }
+
+    if (selectEl.value) {
+      await selectPromptEntry(selectEl.value);
+    } else {
+      clearPromptDisplay();
    }
  } catch (error) {
-    console.error('Failed to load last prompt:', error);
+    console.error('Failed to load prompt history:', error);
  }
 }

+async function selectPromptEntry(promptId) {
+  if (!promptId) {
+    clearPromptDisplay();
+    return;
+  }
+
+  _selectedPromptId = parseInt(promptId);
+
+  // Try cache first
+  let entry = _promptHistoryCache.find(e => e.id === _selectedPromptId);
+
+  // Fall back to API call if not in cache
+  if (!entry) {
+    try {
+      entry = await apiCall(`/prompts/${_selectedPromptId}`);
+    } catch (error) {
+      console.error('Failed to load prompt entry:', error);
+      clearPromptDisplay();
+      return;
+    }
+  }
+
+  if (!entry) {
+    clearPromptDisplay();
+    return;
+  }
+
+  renderPromptEntry(entry);
+}
+
+function clearPromptDisplay() {
+  document.getElementById('prompt-metadata').innerHTML = '';
+  document.getElementById('prompt-display').innerHTML = '<pre style="white-space: pre-wrap; word-break: break-word; background: #1a1a1a; padding: 0.75rem; border-radius: 4px; font-size: 0.8rem; line-height: 1.4; margin: 0; color: #666;">No prompt selected.</pre>';
+  document.getElementById('last-prompt').textContent = '';
+}
+
+function renderPromptEntry(entry) {
+  // Metadata bar
+  const metaEl = document.getElementById('prompt-metadata');
+  const ts = entry.timestamp ? new Date(entry.timestamp).toLocaleString() : '?';
+  const sourceIcon = entry.source === 'cat' ? '🐱 Cat' : '🤖 Fallback';
+  metaEl.innerHTML = `
+    <span><span class="prompt-meta-label">#</span><span class="prompt-meta-value">${entry.id}</span></span>
+    <span><span class="prompt-meta-label">Source:</span> <span class="prompt-meta-value">${sourceIcon}</span></span>
+    <span><span class="prompt-meta-label">User:</span> <span class="prompt-meta-value">${escapeHtml(entry.user || '?')}</span></span>
+    <span><span class="prompt-meta-label">Mood:</span> <span class="prompt-meta-value">${escapeHtml(entry.mood || '?')}</span></span>
+    <span><span class="prompt-meta-label">Guild:</span> <span class="prompt-meta-value">${escapeHtml(entry.guild || '?')}</span></span>
+    <span><span class="prompt-meta-label">Channel:</span> <span class="prompt-meta-value">${escapeHtml(entry.channel || '?')}</span></span>
+    <span><span class="prompt-meta-label">Model:</span> <span class="prompt-meta-value">${escapeHtml(entry.model || '?')}</span></span>
+    <span><span class="prompt-meta-label">Type:</span> <span class="prompt-meta-value">${escapeHtml(entry.response_type || '?')}</span></span>
+    <span><span class="prompt-meta-label">Time:</span> <span class="prompt-meta-value">${ts}</span></span>
+  `;
+
+  // Parse full_prompt into sections
+  const sections = parsePromptSections(entry.full_prompt || '');
+
+  // Snapshot which subsections are currently collapsed (before re-render)
+  const sectionIds = ['system', 'context', 'conversation', 'response'];
+  const collapsedState = {};
+  sectionIds.forEach(id => {
+    const el = document.getElementById(`prompt-section-${id}`);
+    collapsedState[id] = el && el.classList.contains('collapsed');
+  });
+
+  // Build display HTML with collapsible subsections
+  let displayHtml = '';
+
+  if (sections.system) {
+    displayHtml += buildCollapsibleSection('System Prompt', sections.system, 'system');
+  }
+  if (sections.context) {
+    displayHtml += buildCollapsibleSection('Context (Memories & Tools)', sections.context, 'context');
+  }
+  if (sections.conversation) {
+    displayHtml += buildCollapsibleSection('Conversation', sections.conversation, 'conversation');
+  }
+  if (!sections.system && !sections.context && !sections.conversation) {
+    // Fallback: show raw full_prompt
+    displayHtml += `<pre style="white-space: pre-wrap; word-break: break-word; margin: 0;">${escapeHtml(entry.full_prompt || '')}</pre>`;
+  }
+
+  // Response section
+  if (entry.response) {
+    let responseText = entry.response;
+    if (_middleTruncation && responseText.length > 400) {
+      responseText = responseText.substring(0, 200) + '\n\n... [truncated middle] ...\n\n' + responseText.substring(responseText.length - 200);
+    }
+    displayHtml += buildCollapsibleSection('Response', responseText, 'response');
+  }
+
+  // Render into the prompt-display div (using innerHTML for collapsible structure)
+  const displayEl = document.getElementById('prompt-display');
+  displayEl.innerHTML = displayHtml;
+
+  // Restore collapsed state from snapshot
+  sectionIds.forEach(id => {
+    const el = document.getElementById(`prompt-section-${id}`);
+    if (el && collapsedState[id]) {
+      el.classList.add('collapsed');
+      const header = el.previousElementSibling;
+      if (header) header.innerHTML = header.innerHTML.replace('▼', '▶');
+    }
+  });
+
+  // Also set the raw text into the <pre> for copy functionality
+  let rawText = entry.full_prompt || '';
+  if (entry.response) {
+    rawText += `\n\n${'═'.repeat(60)}\n[Response]\n${entry.response}`;
+  }
+  document.getElementById('last-prompt').textContent = rawText;
+}
+
+function parsePromptSections(fullPrompt) {
+  const sections = { system: null, context: null, conversation: null };
+
+  if (!fullPrompt) return sections;
+
+  // Try to split on known section markers
+  const contextMatch = fullPrompt.match(/# Context\s*\n([\s\S]*?)(?=\n# Conversation|\nHuman:|\n$)/);
+  const convMatch = fullPrompt.match(/# Conversation until now:\s*\n([\s\S]*)/);
+
+  if (contextMatch) {
+    // Everything before # Context is the system prompt
+    const contextIdx = fullPrompt.indexOf('# Context');
+    if (contextIdx > 0) {
+      sections.system = fullPrompt.substring(0, contextIdx).trim();
+    }
+    sections.context = contextMatch[1].trim();
+  }
+
+  if (convMatch) {
+    sections.conversation = convMatch[1].trim();
+  } else {
+    // Try alternative: "Human:" at the end
+    const humanMatch = fullPrompt.match(/\nHuman:([\s\S]*)/);
+    if (humanMatch && fullPrompt.indexOf('Human:') > fullPrompt.indexOf('# Context')) {
+      sections.conversation = 'Human:' + humanMatch[1].trim();
+    }
+  }
+
+  // If no # Context marker, try "System:" prefix (fallback prompts)
+  if (!sections.system && !sections.context) {
+    const sysMatch = fullPrompt.match(/^System:\s*([\s\S]*?)(?=\nMessages:)/);
+    const msgMatch = fullPrompt.match(/Messages:\s*([\s\S]*)/);
+    if (sysMatch) {
+      sections.system = sysMatch[1].trim();
+    }
+    if (msgMatch) {
+      sections.conversation = msgMatch[1].trim();
+    }
+  }
+
+  return sections;
+}
+
+function buildCollapsibleSection(title, content, sectionId) {
+  const id = `prompt-section-${sectionId}`;
+  return `
+    <div class="prompt-subsection-header" onclick="togglePromptSubsection('${id}')">
+      ▼ ${escapeHtml(title)}
+    </div>
+    <div class="prompt-subsection-body" id="${id}">
+      <pre style="white-space: pre-wrap; word-break: break-word; background: #1a1a1a; padding: 0.5rem; border-radius: 4px; font-size: 0.8rem; line-height: 1.4; margin: 0.25rem 0;">${escapeHtml(content)}</pre>
+    </div>`;
+}
+
+function togglePromptSubsection(id) {
+  const body = document.getElementById(id);
+  if (!body) return;
+  const header = body.previousElementSibling;
+  if (body.classList.contains('collapsed')) {
+    body.classList.remove('collapsed');
+    if (header) header.innerHTML = header.innerHTML.replace('▶', '▼');
+  } else {
+    body.classList.add('collapsed');
+    if (header) header.innerHTML = header.innerHTML.replace('▼', '▶');
+  }
+}
+
+function togglePromptHistoryCollapse() {
+  const section = document.getElementById('prompt-history-section');
+  const toggle = document.getElementById('prompt-history-toggle');
+  if (section.classList.contains('collapsed')) {
+    section.classList.remove('collapsed');
+    toggle.textContent = '▼ Prompt History';
+  } else {
+    section.classList.add('collapsed');
+    toggle.textContent = '▶ Prompt History';
+  }
+}
+
+function copyPromptToClipboard() {
+  const rawText = document.getElementById('last-prompt').textContent;
+  if (!rawText) return;
+  navigator.clipboard.writeText(rawText).then(() => {
+    showNotification('Prompt copied to clipboard', 'success');
+  }).catch(err => {
+    console.error('Failed to copy:', err);
+    showNotification('Failed to copy', 'error');
+  });
+}
+
+function toggleMiddleTruncation() {
+  _middleTruncation = document.getElementById('prompt-truncate-toggle').checked;
+  // Re-render current entry
+  if (_selectedPromptId) {
+    selectPromptEntry(_selectedPromptId);
+  }
+}
+
+// Legacy compatibility — called from core.js on page load / tab switch
+// Redirects to the new loadPromptHistory()
+async function loadLastPrompt() {
+  await loadPromptHistory();
+}
+
 // ===== Autonomous Stats =====

 async function loadAutonomousStats() {
--- a/bot/utils/image_handling.py
+++ b/bot/utils/image_handling.py
@@ -472,15 +472,22 @@ async def rephrase_as_miku(vision_output, user_prompt, guild_id=None, user_id=No
                if globals.EVIL_MODE:
                    effective_mood = f"EVIL:{getattr(globals, 'EVIL_DM_MOOD', 'evil_neutral')}"
                logger.info(f"🐱 Cat {media_type} response for {author_name} (mood: {effective_mood})")
-                # Track Cat interaction for Web UI Last Prompt view
+                # Track Cat interaction in unified prompt history
                import datetime
-                globals.LAST_CAT_INTERACTION = {
+                globals._prompt_id_counter += 1
+                globals.PROMPT_HISTORY.append({
+                    "id": globals._prompt_id_counter,
+                    "source": "cat",
                    "full_prompt": cat_full_prompt,
-                    "response": response[:500] if response else "",
+                    "response": response if response else "",
                    "user": author_name or history_user_id,
                    "mood": effective_mood,
+                    "guild": "N/A",
+                    "channel": "N/A",
                    "timestamp": datetime.datetime.now().isoformat(),
-                }
+                    "model": "Cat LLM",
+                    "response_type": response_type,
+                })
        except Exception as e:
            logger.warning(f"🐱 Cat {media_type} pipeline error, falling back to query_llama: {e}")
            response = None
@@ -809,7 +816,7 @@ async def process_media_in_message(message, prompt, is_dm, guild_id) -> bool:

                # Build a combined vision description and route through
                # rephrase_as_miku (which handles Cat → LLM fallback,
-                # mood resolution, and LAST_CAT_INTERACTION tracking).
+                # mood resolution, and prompt history tracking).
                combined_description = "\n".join(embed_context_parts)
                miku_reply = await rephrase_as_miku(
                    combined_description, prompt,
--- a/bot/utils/llm.py
+++ b/bot/utils/llm.py
@@ -381,7 +381,23 @@ Please respond in a way that reflects this emotional tone.{pfp_context}"""
        media_note = media_descriptions.get(media_type, f"The user has sent you {media_type}.")
        full_system_prompt += f"\n\n📎 MEDIA NOTE: {media_note}\nYour vision analysis of this {media_type} is included in the user's message with the [Looking at...] prefix."

-    globals.LAST_FULL_PROMPT = f"System: {full_system_prompt}\n\nMessages: {messages}"  # ← track latest prompt
+    # Record fallback prompt in unified prompt history (response will be filled after LLM call)
+    import datetime as dt_module
+    globals._prompt_id_counter += 1
+    prompt_entry = {
+        "id": globals._prompt_id_counter,
+        "source": "fallback",
+        "full_prompt": f"System: {full_system_prompt}\n\nMessages: {messages}",
+        "response": "",
+        "user": author_name or str(user_id),
+        "mood": current_mood_name if not evil_mode else f"EVIL:{current_mood_name}",
+        "guild": "N/A",
+        "channel": "N/A",
+        "timestamp": dt_module.datetime.now().isoformat(),
+        "model": model,
+        "response_type": response_type,
+    }
+    globals.PROMPT_HISTORY.append(prompt_entry)

    headers = {'Content-Type': 'application/json'}
    
@@ -475,6 +491,9 @@ Please respond in a way that reflects this emotional tone.{pfp_context}"""
                            is_bot=True
                        )

+                    # Update the prompt history entry with the actual response
+                    prompt_entry["response"] = reply if reply else ""
+
                    return reply
                else:
                    error_text = await response.text()
--- a/docker-compose.yml
+++ b/docker-compose.yml
@@ -22,9 +22,7 @@ services:
      - LOG_LEVEL=debug  # Enable verbose logging for llama-swap

  llama-swap-amd:
-    build:
-      context: .
-      dockerfile: Dockerfile.llamaswap-rocm
+    image: ghcr.io/mostlygeek/llama-swap:rocm
    container_name: llama-swap-amd
    ports:
      - "8091:8080"  # Map host port 8091 to container port 8080
@@ -35,9 +33,6 @@ services:
    devices:
      - /dev/kfd:/dev/kfd
      - /dev/dri:/dev/dri
-    group_add:
-      - "985"  # video group
-      - "989"  # render group
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/health"]
--- a/llama-swap-config.yaml
+++ b/llama-swap-config.yaml
@@ -5,7 +5,7 @@ models:
  # Main text generation model (Llama 3.1 8B)
  # Custom chat template to disable built-in tool calling
  llama3.1:
-    cmd: /app/llama-server --port ${PORT} --model /models/Llama-3.1-8B-Instruct-UD-Q4_K_XL.gguf -ngl 99 -c 16384 --host 0.0.0.0 --no-warmup --flash-attn on --chat-template-file /app/llama31_notool_template.jinja
+    cmd: /app/llama-server --port ${PORT} --model /models/Llama-3.1-8B-Instruct-UD-Q4_K_XL.gguf -ngl 99 -c 16384 --host 0.0.0.0 -fit off --no-warmup --flash-attn on --no-kv-offload --cache-type-k q4_0 --cache-type-v q4_0 --chat-template-file /app/llama31_notool_template.jinja
    ttl: 1800  # Unload after 30 minutes of inactivity (1800 seconds)
    swap: true  # CRITICAL: Unload other models when loading this one
    aliases:
@@ -14,7 +14,7 @@ models:
  
  # Evil/Uncensored text generation model (DarkIdol-Llama 3.1 8B)
  darkidol:
-    cmd: /app/llama-server --port ${PORT} --model /models/DarkIdol-Llama-3.1-8B-Instruct-1.3-Uncensored_Q4_K_M.gguf -ngl 99 -c 16384 --host 0.0.0.0 --no-warmup --flash-attn on
+    cmd: /app/llama-server --port ${PORT} --model /models/DarkIdol-Llama-3.1-8B-Instruct-1.3-Uncensored_Q4_K_M.gguf -ngl 99 -c 16384 --host 0.0.0.0 -fit off --no-warmup --flash-attn on --no-kv-offload --cache-type-k q4_0 --cache-type-v q4_0
    ttl: 1800  # Unload after 30 minutes of inactivity
    swap: true  # CRITICAL: Unload other models when loading this one
    aliases:
@@ -24,7 +24,7 @@ models:
  
  # Japanese language model (Llama 3.1 Swallow - Japanese optimized)
  swallow:
-    cmd: /app/llama-server --port ${PORT} --model /models/Llama-3.1-Swallow-8B-Instruct-v0.5-Q4_K_M.gguf -ngl 99 -c 16384 --host 0.0.0.0 --no-warmup --flash-attn on
+    cmd: /app/llama-server --port ${PORT} --model /models/Llama-3.1-Swallow-8B-Instruct-v0.5-Q4_K_M.gguf -ngl 99 -c 16384 --host 0.0.0.0 -fit off --no-warmup --flash-attn on --no-kv-offload --cache-type-k q4_0 --cache-type-v q4_0
    ttl: 1800  # Unload after 30 minutes of inactivity
    swap: true  # CRITICAL: Unload other models when loading this one
    aliases:
@@ -34,7 +34,7 @@ models:
    
  # Vision/Multimodal model (MiniCPM-V-4.5 - supports images, video, and GIFs)
  vision:
-    cmd: /app/llama-server --port ${PORT} --model /models/MiniCPM-V-4_5-Q3_K_S.gguf --mmproj /models/MiniCPM-V-4_5-mmproj-f16.gguf -ngl 99 -c 4096 --host 0.0.0.0 --no-warmup --flash-attn on
+    cmd: /app/llama-server --port ${PORT} --model /models/MiniCPM-V-4_5-Q3_K_S.gguf --mmproj /models/MiniCPM-V-4_5-mmproj-f16.gguf -ngl 99 -c 4096 --host 0.0.0.0 -fit off --no-warmup --flash-attn on --no-kv-offload --cache-type-k q4_0 --cache-type-v q4_0
    ttl: 900  # Vision model used less frequently, shorter TTL (15 minutes = 900 seconds)
    swap: true  # CRITICAL: Unload text models before loading vision
    aliases:
Author	SHA1	Message	Date
koko210Serve	9eb081efb1	llama-swap: use pre-built images (:cuda, :rocm) with GPU-specific flags - Drop custom Dockerfiles; docker-compose uses ghcr.io pre-built images which ship llama-swap + llama-server with no pinned versions (always latest) - NVIDIA GTX 1660 (6GB): add -fit off --no-kv-offload --cache-type-k q4_0 --cache-type-v q4_0 to fix OOM segfault with new llama.cpp b9014's GPU-side KV cache default - AMD RX 6800 (16GB): flags unchanged; KV cache stays on GPU for max speed - Both running llama-swap v211 + llama.cpp b9014 (2026-05-05)	2026-05-05 16:53:34 +03:00
koko210Serve	4e28236b06	fix: preserve collapsible subsection state across polling re-renders - Use stable section IDs (without Date.now()) so collapse state can be tracked across re-renders - Snapshot collapsed state before innerHTML replacement, restore after - Prevents the 10s polling from expanding all subsections every time	2026-05-02 16:17:26 +03:00
koko210Serve	c5e49c73df	fix: add cache-busting to prevent stale JS/CSS from breaking the UI - Added ?v=20260502 query param to all <script src=...> and <link> tags - Added Cache-Control: no-cache, no-store, must-revalidate to index route - Added <meta> cache-control tags in HTML head for extra coverage - This ensures the browser always fetches fresh HTML/JS/CSS after deploy, preventing the old loadLastPrompt() from running against new HTML (which would crash since #prompt-cat-info no longer exists)	2026-05-02 16:08:47 +03:00
koko210Serve	393921e524	fix: add min-height to #prompt-display and placeholder text in clearPromptDisplay() The empty #prompt-display div collapsed to 0 height, making it appear 'gone'. Added min-height: 3rem and a 'No prompt selected.' placeholder that clearPromptDisplay() now sets via innerHTML.	2026-05-02 15:55:19 +03:00
koko210Serve	2dd32d0ef1	fix: move <pre> outside #prompt-display to prevent innerHTML from destroying it The renderPromptEntry() function sets innerHTML on #prompt-display, which was wiping out the child <pre id="last-prompt"> element. This caused copyPromptToClipboard() to fail silently and the display to appear empty. Fix: keep <pre> as a hidden sibling outside #prompt-display, used only as a text buffer for the copy function.	2026-05-02 15:45:54 +03:00
koko210Serve	a980b90c0a	fix: escape content in buildCollapsibleSection, avoid double-escaping response	2026-05-02 15:27:18 +03:00
koko210Serve	6b922d84ae	frontend: rewrite Last Prompt as Prompt History viewer - status.js: replace loadLastPrompt() with loadPromptHistory() + helpers - fetch /prompts with optional source filter, populate dropdown - selectPromptEntry() renders metadata bar + collapsible subsections - parsePromptSections() splits full_prompt into System/Context/Conversation - buildCollapsibleSection() with toggle arrows (▼/▶) - copyPromptToClipboard() copies raw text - toggleMiddleTruncation() truncates response from middle - togglePromptHistoryCollapse() collapses entire section - legacy loadLastPrompt() delegates to loadPromptHistory() - core.js: add promptInterval to polling (10s), visibility resume - update switchPromptSource() for 'all' filter + new button IDs - update initPromptSourceToggle() default to 'all' - declare promptInterval variable	2026-05-02 15:25:05 +03:00
koko210Serve	f33e2afdf7	frontend: new Prompt History section HTML + CSS - Replace single <pre> Last Prompt with rich Prompt History viewer - Add source filter buttons (All/Cat/Fallback), history dropdown selector - Add metadata bar, copy-to-clipboard button, middle-truncation toggle - Add collapsible section CSS classes for expandable subsections	2026-05-02 15:19:10 +03:00
koko210Serve	87de8f8b3a	backend: replace LAST_FULL_PROMPT/LAST_CAT_INTERACTION with unified PROMPT_HISTORY deque - globals.py: add collections.deque(maxlen=10) PROMPT_HISTORY with _prompt_id_counter - globals.py: add legacy accessor functions _get_last_fallback_prompt() and _get_last_cat_interaction() - bot.py: append to PROMPT_HISTORY instead of setting LAST_CAT_INTERACTION, remove 500-char truncation, add guild/channel/model fields - image_handling.py: same pattern for Cat media responses - llm.py: append fallback prompts to PROMPT_HISTORY with response filled after LLM reply - routes/core.py: new GET /prompts and GET /prompts/{id} endpoints, legacy /prompt and /prompt/cat use accessor functions	2026-05-02 15:17:15 +03:00