Files
bellsystems-cp/strategies/DATABASE_MIGRATION.md
2026-04-17 15:51:27 +03:00

25 KiB
Raw Blame History

Database Migration Strategy

BellSystems CP v2 — Firestore + SQLite → Postgres

This is the living plan. Update it as phases complete. Never start a phase without reading the notes from the previous one.


Database Split — Target State

Data Target Source Flutter uses?
Devices Firestore Firestore YES — keep
App users (device owners) Firestore Firestore YES — keep
Published melodies Firestore Firestore YES — keep
Draft melodies Postgres SQLite No
Built melodies Postgres SQLite No
CRM customers Postgres Firestore No
CRM products Postgres Firestore No
CRM orders Postgres Firestore (subcollection) No
Console settings Postgres Firestore No
Public features settings Postgres Firestore No
Staff / admin users Postgres Firestore No
Firmware versions Postgres Firestore No
Notes / Issues Postgres New (done) No
Support tickets Postgres New (done) No
CRM comms log Postgres SQLite No
CRM media references Postgres SQLite No
CRM sync state Postgres SQLite No
CRM quotations + items Postgres SQLite No
Mfg audit log Postgres SQLite No
Device alerts Postgres SQLite No
MQTT commands Postgres SQLite No
MQTT heartbeats Postgres SQLite No
Device logs Postgres (partitioned) SQLite No
Staff audit log Postgres New No

Rule: Everything that FlutterFlow touches directly stays in Firestore forever. The Console backend continues to write to those Firestore collections exactly as today. We only stop reading from Firestore in the Console — never stop writing to it.


Deployment Context — Critical

This project runs in two environments:

Environment SQLite data Firestore data Where migrations run
Local (Windows + Docker for Desktop) Empty / stale test data Live (correct) Development & testing only
VPS (production Docker) Live correct data Live (correct) All Phase 1 migrations run here

What this means for each phase:

  • Phase 0 (schema): Alembic migrations can be developed and tested locally, then the same migrations are run on the VPS via docker compose exec backend alembic upgrade head. The VPS is authoritative.
  • Phase 1 (SQLite → Postgres): Migration scripts must be run on the VPS only. The local SQLite is not a valid source. Do not run Phase 1 migration scripts locally and assume they reflect real data.
  • Phase 2 (Firestore → Postgres): Can be run on either environment (Firestore is the same), but the VPS run is the one that matters. Run locally first to verify the scripts work, then run on the VPS.
  • Phase 35: All service cutover and testing happens on the VPS.

The deployment workflow:

  1. Develop and test code locally
  2. Push code to VPS (git pull or equivalent)
  3. Run docker compose exec backend alembic upgrade head on the VPS to apply schema changes
  4. Run migration scripts on the VPS when Phase 1 begins
  5. Verify everything on the VPS before marking a phase complete

Non-negotiable Safety Rules

  1. Never touch a Firestore collection — only read from it during migration. Never delete, update, or rename documents until you have personally verified the Postgres data is complete and correct.
  2. Every migration script runs in a transaction — if any row fails, the entire script rolls back cleanly.
  3. Idempotent scripts — every script uses ON CONFLICT DO NOTHING or equivalent. Safe to run twice.
  4. Count verification before commit — each script prints Source: N docs/rows → Postgres: N rows ✓ and aborts if counts don't match.
  5. Migration run log — a _migration_runs table in Postgres records what ran, when, how many rows, and success/failure. Check it after each script.
  6. One domain at a time — complete and verify a full domain (schema + migration script + service cutover + smoke test) before starting the next.
  7. No data loss = no rushing — downtime during migration is acceptable. Data loss is not.

Phase 0 — Schema Foundation

Status: COMPLETE — Alembic revision b1c2d3e4f5a6 applied locally. Apply on VPS with docker compose exec backend alembic upgrade head before starting Phase 1.

What exists already in Postgres

  • entries + entry_links (notes/issues module)
  • support_tickets + ticket_messages (tickets module)
  • Alembic version history in alembic_version

What Phase 0 adds

Add the _migration_runs tracking table and all new table definitions via Alembic before any data moves.

New tables to create in this phase (schema only, no data yet):

  • _migration_runs — tracks what migration scripts have run
  • crm_products — flat columns, no JSONB needed
  • crm_customers — core columns + JSONB for contacts, notes, owned_items, location, tags, technical_issues, install_support, transaction_history, crm_summary
  • crm_orders — core columns + JSONB for items, discount, shipping, payment_status, timeline
  • staff — replaces admin_users Firestore collection
  • console_settings — key/value or typed columns, replaces Firestore settings doc
  • public_features — typed columns, replaces Firestore public_features doc
  • crm_comms_log — mirrors current SQLite schema, adds proper TIMESTAMPTZ columns
  • crm_media — mirrors current SQLite schema
  • crm_sync_state — key/value
  • crm_quotations + crm_quotation_items — mirrors current SQLite schema
  • mfg_audit_log — mirrors current SQLite schema
  • device_alerts — mirrors current SQLite schema
  • commands — mirrors current SQLite schema
  • heartbeats — mirrors current SQLite schema
  • melody_drafts — mirrors current SQLite schema
  • built_melodies — mirrors current SQLite schema
  • device_logspartitioned by month on received_at
  • audit_log — new staff action audit system (see schema below)

Key schema decisions

device_logs — monthly partitioning

CREATE TABLE device_logs (
    id            BIGSERIAL,
    device_serial TEXT NOT NULL,
    level         TEXT NOT NULL,
    message       TEXT NOT NULL,
    device_timestamp BIGINT,
    received_at   TIMESTAMPTZ NOT NULL DEFAULT now(),
    PRIMARY KEY (id, received_at)
) PARTITION BY RANGE (received_at);

-- Partitions created monthly by a background job or manually:
CREATE TABLE device_logs_2025_01 PARTITION OF device_logs
    FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');
-- etc.

CREATE INDEX idx_device_logs_serial_time ON device_logs(device_serial, received_at DESC);
CREATE INDEX idx_device_logs_level       ON device_logs(level, received_at DESC);

Dropping a partition to purge old data: DROP TABLE device_logs_2024_06; — instant, no DELETE scan.

crm_customers — JSONB for flexible arrays

CREATE TABLE crm_customers (
    id                   TEXT PRIMARY KEY,           -- keep Firestore UUID as-is
    firestore_id         TEXT UNIQUE,                -- same value during transition, null-able later
    title                TEXT,
    name                 TEXT NOT NULL,
    surname              TEXT,
    organization         TEXT,
    religion             TEXT,
    language             TEXT NOT NULL DEFAULT 'el',
    folder_id            TEXT UNIQUE NOT NULL,
    relationship_status  TEXT NOT NULL DEFAULT 'lead',
    nextcloud_folder     TEXT,
    contacts             JSONB NOT NULL DEFAULT '[]',
    notes                JSONB NOT NULL DEFAULT '[]',
    location             JSONB,
    tags                 TEXT[] NOT NULL DEFAULT '{}',
    owned_items          JSONB NOT NULL DEFAULT '[]',
    linked_user_ids      TEXT[] NOT NULL DEFAULT '{}',
    technical_issues     JSONB NOT NULL DEFAULT '[]',
    install_support      JSONB NOT NULL DEFAULT '[]',
    transaction_history  JSONB NOT NULL DEFAULT '[]',
    crm_summary          JSONB,
    created_at           TIMESTAMPTZ NOT NULL,
    updated_at           TIMESTAMPTZ NOT NULL
);
CREATE INDEX idx_crm_customers_rel_status ON crm_customers(relationship_status);
CREATE INDEX idx_crm_customers_tags       ON crm_customers USING GIN(tags);
CREATE INDEX idx_crm_customers_name       ON crm_customers(name, surname);

crm_orders — separate table (was Firestore subcollection)

CREATE TABLE crm_orders (
    id                   TEXT PRIMARY KEY,
    customer_id          TEXT NOT NULL REFERENCES crm_customers(id) ON DELETE CASCADE,
    order_number         TEXT UNIQUE NOT NULL,
    title                TEXT,
    created_by           TEXT,
    status               TEXT NOT NULL DEFAULT 'negotiating',
    status_updated_date  TIMESTAMPTZ,
    status_updated_by    TEXT,
    items                JSONB NOT NULL DEFAULT '[]',
    subtotal             NUMERIC(12,2) NOT NULL DEFAULT 0,
    discount             JSONB,
    total_price          NUMERIC(12,2) NOT NULL DEFAULT 0,
    currency             TEXT NOT NULL DEFAULT 'EUR',
    shipping             JSONB,
    payment_status       JSONB NOT NULL DEFAULT '{}',
    invoice_path         TEXT,
    notes                TEXT,
    timeline             JSONB NOT NULL DEFAULT '[]',
    created_at           TIMESTAMPTZ NOT NULL,
    updated_at           TIMESTAMPTZ NOT NULL
);
CREATE INDEX idx_crm_orders_customer ON crm_orders(customer_id);
CREATE INDEX idx_crm_orders_status   ON crm_orders(status);

staff — replaces Firestore admin_users

CREATE TABLE staff (
    id               TEXT PRIMARY KEY,    -- keep Firestore doc ID as-is during transition
    firestore_id     TEXT UNIQUE,         -- same as id during transition
    email            TEXT UNIQUE NOT NULL,
    name             TEXT NOT NULL,
    role             TEXT NOT NULL DEFAULT 'staff',
    permissions      JSONB NOT NULL DEFAULT '{}',
    hashed_password  TEXT NOT NULL,
    is_active        BOOLEAN NOT NULL DEFAULT TRUE,
    created_at       TIMESTAMPTZ NOT NULL DEFAULT now(),
    updated_at       TIMESTAMPTZ NOT NULL DEFAULT now()
);

audit_log — new system, no migration source

CREATE TABLE audit_log (
    id           BIGSERIAL PRIMARY KEY,
    occurred_at  TIMESTAMPTZ NOT NULL DEFAULT now(),
    actor_id     TEXT NOT NULL,
    actor_name   TEXT NOT NULL,
    action       TEXT NOT NULL,   -- CREATE | UPDATE | DELETE | COMMAND | LOGIN | LOGOUT | etc.
    entity_type  TEXT NOT NULL,   -- customer | order | device | melody | product | staff | ticket | note | quotation | etc.
    entity_id    TEXT NOT NULL,
    entity_label TEXT,            -- denormalized human name: "Church of St. George", "SN-0042", etc.
    changes      JSONB,           -- {"field": {"old": x, "new": y}, ...} — null for CREATE/DELETE/COMMAND
    meta         JSONB            -- extra context: ip_address, command_name, etc.
);
-- Indexes covering the exact filter combos we need:
CREATE INDEX idx_audit_actor      ON audit_log(actor_id,    occurred_at DESC);
CREATE INDEX idx_audit_entity     ON audit_log(entity_type, entity_id,  occurred_at DESC);
CREATE INDEX idx_audit_action     ON audit_log(action,                  occurred_at DESC);
CREATE INDEX idx_audit_occurred   ON audit_log(occurred_at DESC);

Phase 1 — SQLite → Postgres (Data Migration)

Status: NOT STARTED Prerequisite: Phase 0 complete (all tables exist in Postgres)

No downtime required — SQLite is local, can read it while the app is running. After migration is verified, services are switched to read from Postgres.

Migration order (least dependencies first)

Step Table Script
1.1 melody_drafts migration/migrate_melody_drafts.py
1.2 built_melodies migration/migrate_built_melodies.py
1.3 mfg_audit_log migration/migrate_mfg_audit_log.py
1.4 device_alerts migration/migrate_device_alerts.py
1.5 crm_sync_state migration/migrate_crm_sync_state.py
1.6 crm_quotations migration/migrate_crm_quotations.py
1.7 crm_quotation_items migration/migrate_crm_quotation_items.py
1.8 crm_media migration/migrate_crm_media.py
1.9 crm_comms_log migration/migrate_crm_comms_log.py
1.10 commands migration/migrate_commands.py
1.11 heartbeats migration/migrate_heartbeats.py
1.12 device_logs migration/migrate_device_logs.py (largest — batched)

Per-script pattern

# Every script follows this structure
async def run():
    sqlite_rows = await read_all_from_sqlite("table_name")
    source_count = len(sqlite_rows)
    print(f"Source: {source_count} rows")

    async with pg_session() as session:
        async with session.begin():
            await session.execute(
                insert(PgModel).values(rows).on_conflict_do_nothing()
            )
            pg_count = await session.scalar(select(func.count()).select_from(PgModel))

    if pg_count < source_count:
        raise RuntimeError(f"Count mismatch: source={source_count} pg={pg_count}")
    print(f"Postgres: {pg_count} rows ✓")
    await log_migration_run("table_name", source_count, pg_count)

Service cutover per domain

After each group is migrated and verified:

  1. Update service to import from database.postgres instead of database.core
  2. Replace aiosqlite queries with SQLAlchemy async queries
  3. Smoke test via the Console UI — verify the page loads correctly
  4. Leave SQLite file untouched for 48h as a fallback

Phase 2 — Firestore → Postgres (Data Migration)

Status: NOT STARTED Prerequisite: Phase 1 complete

Requires shared.firebase.get_db() to read from Firestore. Scripts run with Firebase Admin SDK — same SDK already initialized in the backend.

Migration order

Step Collection Script Notes
2.1 settings (doc) migration/migrate_settings.py Single document
2.2 public_features (doc) migration/migrate_public_features.py Single document
2.3 crm_products migration/migrate_crm_products.py No dependencies
2.4 crm_customers migration/migrate_crm_customers.py Strip legacy negotiating/has_problem fields
2.5 orders (subcollection) migration/migrate_crm_orders.py Uses collection_group("orders")

Converting Firestore types

Use the existing _convert_firestore_value helpers in devices/service.py — copy into a shared migration/utils.py. Key conversions:

  • DatetimeWithNanoseconds.isoformat() string
  • GeoPoint{"lat": x, "lng": y} dict
  • DocumentReference.id string (just the doc ID, no path)

Cutover

After each Firestore collection is migrated and verified:

  1. Switch service to read/write Postgres
  2. Keep all Firestore write calls — continue writing to Firestore on every mutation so the data stays current there for any emergency rollback
  3. After 48h of stable operation, remove the redundant Firestore writes (one service at a time)

Phase 3 — Staff Auth Cutover

Status: NOT STARTED Prerequisite: Phase 2 step 2.5 complete, staff table verified

This is the highest-risk phase because auth affects every request.

Steps

  1. Migrate admin_users Firestore collection → staff Postgres table (script: migration/migrate_staff.py)
  2. Verify: compare email list, role list, permission maps between Firestore and Postgres
  3. Update auth/dependencies.py to query Postgres staff table instead of Firestore
  4. Update staff/service.py to read/write Postgres
  5. Update seed_admin.py to write to Postgres (keep old Firestore version as seed_admin_firestore_legacy.py)
  6. Test: log in as each role, verify permissions work
  7. Only after 24h stable — remove Firestore reads from auth

Rollback plan

The JWT token payload doesn't change — it still contains sub (staff ID) and permissions. Rolling back is just reverting the two files (auth/dependencies.py and staff/service.py).


Phase 4 — Audit Log System

Status: NOT STARTED Prerequisite: Phase 0 (audit_log table created)

The audit log system can be built and wired in incrementally — it doesn't block other phases. Wire it into each service as that service is cut over to Postgres.

The logging utility

backend/shared/audit.py — a single async function all services call:

async def log_action(
    db: AsyncSession,
    actor_id: str,
    actor_name: str,
    action: str,           # "CREATE" | "UPDATE" | "DELETE" | "COMMAND" | ...
    entity_type: str,      # "customer" | "order" | "device" | ...
    entity_id: str,
    entity_label: str | None = None,
    changes: dict | None = None,   # {"field": {"old": x, "new": y}}
    meta: dict | None = None,      # {"ip": ..., "command_name": ...}
) -> None

How to capture diffs

In service update functions:

old_data = existing_record.to_dict()          # before
await session.execute(update_stmt)
new_data = updated_record.to_dict()           # after
changes = {
    k: {"old": old_data[k], "new": new_data[k]}
    for k in new_data
    if old_data.get(k) != new_data.get(k)
}
await log_action(db, actor_id, actor_name, "UPDATE", "customer", id, label, changes)

Action types

Action When
CREATE Any new record created
UPDATE Any field changed
DELETE Any record deleted
COMMAND MQTT command sent to device
PUBLISH Melody published to Firestore
UNPUBLISH Melody unpublished
LOGIN Staff login
LOGOUT Staff logout
PERMISSION_CHANGE Staff permissions updated
STATUS_CHANGE Order/customer/ticket status changed (convenience — also captured as UPDATE)

API endpoint

GET /api/audit-log with query params:

  • actor_id — filter by staff member
  • entity_type + entity_id — filter by a specific record
  • action — filter by action type
  • from_date / to_date — date range
  • limit / offset — pagination (default limit: 50, max: 200)

Phase 5 — MQTT Live Data Cutover

Status: COMPLETE — Postgres live ingestion + partition manager active 2026-04-17

What changed

  • New backend/database/pg_mqtt.py — all MQTT functions rewritten for Postgres (raw SQL, no ORM)
  • database/__init__.py — re-exports from pg_mqtt instead of core
  • main.py — removed db.init_db() / db.close_db(), added db.partition_manager_loop()
  • mqtt/router.py WebSocket auth — reads from Postgres staff table instead of Firestore admin_users
  • device_logs partitioned writes, heartbeats/commands as plain tables
  • purge_loop still runs for heartbeats/commands; device_logs purged via partition drops

Steps (original plan, now implemented)

  1. Update database/core.py insert_log, insert_heartbeat, insert_command to write to Postgres
  2. Update read functions (get_logs, get_heartbeats, etc.) similarly
  3. The partition management background job: each month, at startup or via a cron, ensure next month's partition exists:
async def ensure_current_partitions(db: AsyncSession):
    for month_offset in [0, 1]:  # current + next month
        d = date.today().replace(day=1) + relativedelta(months=month_offset)
        partition_name = f"device_logs_{d.strftime('%Y_%m')}"
        start = d.isoformat()
        end = (d + relativedelta(months=1)).isoformat()
        await db.execute(text(f"""
            CREATE TABLE IF NOT EXISTS {partition_name}
            PARTITION OF device_logs
            FOR VALUES FROM ('{start}') TO ('{end}')
        """))

Log retention

  • Keep last 6 months of partitions
  • Cron job runs monthly: checks for partitions older than 6 months and drops them
  • Dropping a partition = DROP TABLE device_logs_2024_09; — instantaneous, no row-by-row delete

Verification Checklist (run after each phase)

  • SELECT COUNT(*) in Postgres matches source count for every migrated table
  • Sample 10 random records — compare field by field against source
  • Timestamps are stored as TIMESTAMPTZ, not TEXT strings
  • All JSONB columns parse correctly (no null where arrays expected)
  • Relevant Console pages load without errors
  • API endpoints return correct data
  • _migration_runs table shows success for all scripts

Files & Locations

backend/
├── migration/                  ← all migration scripts live here
│   ├── utils.py                ← shared helpers (Firestore type converters, PG connection, etc.)
│   ├── migrate_melody_drafts.py
│   ├── migrate_crm_customers.py
│   ├── migrate_crm_orders.py
│   └── ... (one file per table)
├── shared/
│   └── audit.py                ← audit log utility (Phase 4)
└── alembic/versions/           ← never edit by hand

Current Status Summary

Phase Description Status
0 Schema foundation (all tables in Postgres) COMPLETE — applied on VPS 2026-04-17
1 SQLite → Postgres (data migration) COMPLETE — all 12 scripts ran successfully on VPS 2026-04-17
2 Firestore → Postgres (data migration) COMPLETE — all 5 scripts ran successfully on VPS 2026-04-17
3 Staff auth cutover COMPLETE — Postgres auth live 2026-04-17
4 Audit log system COMPLETE — shared/audit.py live, wired into auth + staff 2026-04-17
5 MQTT live data cutover COMPLETE — Postgres live ingestion + partition manager 2026-04-17

Update this table as each phase completes.


Phase 1 — Run Order & Commands

Run each command on the VPS in order. Verify the output of each before proceeding.

# 1.1
docker compose exec backend python -m migration.migrate_melody_drafts

# 1.2
docker compose exec backend python -m migration.migrate_built_melodies

# 1.3
docker compose exec backend python -m migration.migrate_mfg_audit_log

# 1.4
docker compose exec backend python -m migration.migrate_device_alerts

# 1.5
docker compose exec backend python -m migration.migrate_crm_sync_state

# 1.6  (FK enforcement suppressed — crm_customers not in PG yet)
docker compose exec backend python -m migration.migrate_crm_quotations

# 1.7
docker compose exec backend python -m migration.migrate_crm_quotation_items

# 1.8
docker compose exec backend python -m migration.migrate_crm_media

# 1.9
docker compose exec backend python -m migration.migrate_crm_comms_log

# 1.10
docker compose exec backend python -m migration.migrate_commands

# 1.11
docker compose exec backend python -m migration.migrate_heartbeats

# 1.12  (largest — batched, shows progress)
docker compose exec backend python -m migration.migrate_device_logs

After all scripts complete, verify the run log:

docker compose exec postgres psql -U bellsystems_user -d bellsystems_db \
  -c "SELECT script_name, ran_at, source_rows, dest_rows, success FROM _migration_runs ORDER BY ran_at;"

Phase 2 — Run Order & Commands

crm_customers MUST run before crm_orders (FK dependency).

# 2.1
docker compose exec backend python -m migration.migrate_settings

# 2.2
docker compose exec backend python -m migration.migrate_public_features

# 2.3
docker compose exec backend python -m migration.migrate_crm_products

# 2.4  (required before 2.5)
docker compose exec backend python -m migration.migrate_crm_customers

# 2.5  (depends on 2.4)
docker compose exec backend python -m migration.migrate_crm_orders

Phase 3 — Run Order & Commands

Apply the new Alembic revision first (adds ui_prefs column + makes permissions nullable):

# Apply schema change
docker compose exec backend alembic upgrade head

# 3.1 — migrate Firestore admin_users → Postgres staff table
docker compose exec backend python -m migration.migrate_staff

# Verify
docker compose exec postgres psql -U bellsystems_user -d bellsystems_db \
  -c "SELECT id, email, role, is_active FROM staff ORDER BY role, name;"

After verifying the staff table is populated correctly:

# Restart the backend so it picks up the new auth/staff code
docker compose restart backend

Then test: log in as each role in the Console UI and verify permissions work.

After 24h stable operation, Firestore reads from auth are fully removed (already done in code).

Rollback: revert auth/router.py, auth/dependencies.py, staff/service.py, staff/router.py to the Firestore versions — the JWT payload is unchanged so tokens remain valid during rollback.