update: Major Overhaul to all subsystems

2026-03-07 11:32:18 +02:00
parent 810e81b323
commit c62188fda6
107 changed files with 20414 additions and 929 deletions
--- a/.claude/backend-mqtt-alerts-prompt.md
+++ b/.claude/backend-mqtt-alerts-prompt.md
@@ -0,0 +1,153 @@
+# Backend Task: Subscribe to Vesper MQTT Alert Topics
+
+> Use this document as a prompt / task brief for implementing the backend side
+> of the Vesper MQTT alert system. The firmware changes are complete.
+> Full topic spec: `docs/reference/mqtt-events.md`
+
+---
+
+## What the firmware now publishes
+
+The Vesper firmware (v155+) publishes on three status topics:
+
+### 1. `vesper/{device_id}/status/heartbeat` (unchanged)
+- Every 30 seconds, retained, QoS 1
+- You already handle this — **no change needed** except: suppress any log entry / display update triggered by heartbeat arrival. Update `last_seen` silently. Only surface an event when the device goes *silent* (no heartbeat for 90s).
+
+### 2. `vesper/{device_id}/status/alerts` (NEW)
+- Published only when a subsystem state changes (HEALTHY → WARNING, WARNING → CRITICAL, etc.)
+- QoS 1, not retained
+- One message per state transition — not repeated until state changes again
+
+**Alert payload:**
+```json
+{ "subsystem": "FileManager", "state": "WARNING", "msg": "ConfigManager health check failed" }
+```
+**Cleared payload (recovery):**
+```json
+{ "subsystem": "FileManager", "state": "CLEARED" }
+```
+
+### 3. `vesper/{device_id}/status/info` (NEW)
+- Published on significant device state changes (playback start/stop, etc.)
+- QoS 0, not retained
+
+```json
+{ "type": "playback_started", "payload": { "melody_uid": "ABC123" } }
+```
+
+---
+
+## What to implement in the backend (FastAPI + MQTT)
+
+### Subscribe to new topics
+
+Add to your MQTT subscription list:
+```python
+client.subscribe("vesper/+/status/alerts", qos=1)
+client.subscribe("vesper/+/status/info",   qos=0)
+```
+
+### Database model — active alerts per device
+
+Create a table (or document) to store the current alert state per device:
+
+```sql
+CREATE TABLE device_alerts (
+    device_id   TEXT NOT NULL,
+    subsystem   TEXT NOT NULL,
+    state       TEXT NOT NULL,    -- WARNING | CRITICAL | FAILED
+    message     TEXT,
+    updated_at  TIMESTAMP NOT NULL,
+    PRIMARY KEY (device_id, subsystem)
+);
+```
+
+Or equivalent in your ORM / MongoDB / Redis structure.
+
+### MQTT message handler — alerts topic
+
+```python
+def on_alerts_message(device_id: str, payload: dict):
+    subsystem = payload["subsystem"]
+    state     = payload["state"]
+    message   = payload.get("msg", "")
+
+    if state == "CLEARED":
+        # Remove alert from active set
+        db.device_alerts.delete(device_id=device_id, subsystem=subsystem)
+    else:
+        # Upsert — create or update
+        db.device_alerts.upsert(
+            device_id  = device_id,
+            subsystem  = subsystem,
+            state      = state,
+            message    = message,
+            updated_at = now()
+        )
+
+    # Optionally push a WebSocket event to the console UI
+    ws_broadcast(device_id, {"event": "alert_update", "subsystem": subsystem, "state": state})
+```
+
+### MQTT message handler — info topic
+
+```python
+def on_info_message(device_id: str, payload: dict):
+    event_type = payload["type"]
+    data       = payload.get("payload", {})
+
+    # Store or forward as needed — e.g. update device playback state
+    if event_type == "playback_started":
+        db.devices.update(device_id, playback_active=True, melody_uid=data.get("melody_uid"))
+    elif event_type == "playback_stopped":
+        db.devices.update(device_id, playback_active=False, melody_uid=None)
+```
+
+### API endpoint — get active alerts for a device
+
+```
+GET /api/devices/{device_id}/alerts
+```
+
+Returns the current active alert set (the upserted rows from the table above):
+
+```json
+[
+  { "subsystem": "FileManager",  "state": "WARNING",  "message": "SD mount failed",    "updated_at": "..." },
+  { "subsystem": "TimeKeeper",   "state": "WARNING",  "message": "NTP sync failed",    "updated_at": "..." }
+]
+```
+
+An empty array means the device is fully healthy (no active alerts).
+
+### Console UI guidance
+
+- Device list: show a coloured dot next to each device (green = no alerts, yellow = warnings, red = critical/failed). Update via WebSocket push.
+- Device detail page: show an "Active Alerts" section that renders the alert set statically. Do not render a scrolling alert log — just the current state.
+- When a `CLEARED` event arrives, remove the entry from the UI immediately.
+
+---
+
+## What NOT to do
+
+- **Do not log every heartbeat** as a visible event. Heartbeats are internal housekeeping.
+- **Do not poll the device** for health status — the device pushes on change.
+- **Do not store alerts as an append-only log** — upsert by `(device_id, subsystem)`. The server holds the current state, not a history.
+
+---
+
+## Testing
+
+1. Flash a device with firmware v155+
+2. Subscribe manually:
+   ```bash
+   mosquitto_sub -h <broker> -t "vesper/+/status/alerts" -v
+   mosquitto_sub -h <broker> -t "vesper/+/status/info"   -v
+   ```
+3. Remove the SD card from the device — expect a `FileManager` `WARNING` alert within 5 minutes (next health check cycle), or trigger it immediately via:
+   ```json
+   { "v": 2, "cmd": "system.health" }
+   ```
+   sent to `vesper/{device_id}/control`
+4. Reinsert the SD card — expect a `FileManager` `CLEARED` alert on the next health check