Files

bonamin a605143c5d Phase 5 of Migration

2026-04-17 15:51:27 +03:00

25 KiB

Raw Blame History

Database Migration Strategy

BellSystems CP v2 — Firestore + SQLite → Postgres

This is the living plan. Update it as phases complete. Never start a phase without reading the notes from the previous one.

Database Split — Target State

Data	Target	Source	Flutter uses?
Devices	Firestore	Firestore	YES — keep
App users (device owners)	Firestore	Firestore	YES — keep
Published melodies	Firestore	Firestore	YES — keep
Draft melodies	Postgres	SQLite	No
Built melodies	Postgres	SQLite	No
CRM customers	Postgres	Firestore	No
CRM products	Postgres	Firestore	No
CRM orders	Postgres	Firestore (subcollection)	No
Console settings	Postgres	Firestore	No
Public features settings	Postgres	Firestore	No
Staff / admin users	Postgres	Firestore	No
Firmware versions	Postgres	Firestore	No
Notes / Issues	Postgres	New (done)	No
Support tickets	Postgres	New (done)	No
CRM comms log	Postgres	SQLite	No
CRM media references	Postgres	SQLite	No
CRM sync state	Postgres	SQLite	No
CRM quotations + items	Postgres	SQLite	No
Mfg audit log	Postgres	SQLite	No
Device alerts	Postgres	SQLite	No
MQTT commands	Postgres	SQLite	No
MQTT heartbeats	Postgres	SQLite	No
Device logs	Postgres (partitioned)	SQLite	No
Staff audit log	Postgres	New	No

Rule: Everything that FlutterFlow touches directly stays in Firestore forever. The Console backend continues to write to those Firestore collections exactly as today. We only stop reading from Firestore in the Console — never stop writing to it.

Deployment Context — Critical

This project runs in two environments:

Environment	SQLite data	Firestore data	Where migrations run
Local (Windows + Docker for Desktop)	Empty / stale test data	Live (correct)	Development & testing only
VPS (production Docker)	Live correct data	Live (correct)	All Phase 1 migrations run here

What this means for each phase:

Phase 0 (schema): Alembic migrations can be developed and tested locally, then the same migrations are run on the VPS via docker compose exec backend alembic upgrade head. The VPS is authoritative.
Phase 1 (SQLite → Postgres): Migration scripts must be run on the VPS only. The local SQLite is not a valid source. Do not run Phase 1 migration scripts locally and assume they reflect real data.
Phase 2 (Firestore → Postgres): Can be run on either environment (Firestore is the same), but the VPS run is the one that matters. Run locally first to verify the scripts work, then run on the VPS.
Phase 3–5: All service cutover and testing happens on the VPS.

The deployment workflow:

Develop and test code locally
Push code to VPS (git pull or equivalent)
Run docker compose exec backend alembic upgrade head on the VPS to apply schema changes
Run migration scripts on the VPS when Phase 1 begins
Verify everything on the VPS before marking a phase complete

Non-negotiable Safety Rules

Never touch a Firestore collection — only read from it during migration. Never delete, update, or rename documents until you have personally verified the Postgres data is complete and correct.
Every migration script runs in a transaction — if any row fails, the entire script rolls back cleanly.
Idempotent scripts — every script uses ON CONFLICT DO NOTHING or equivalent. Safe to run twice.
Count verification before commit — each script prints Source: N docs/rows → Postgres: N rows ✓ and aborts if counts don't match.
Migration run log — a _migration_runs table in Postgres records what ran, when, how many rows, and success/failure. Check it after each script.
One domain at a time — complete and verify a full domain (schema + migration script + service cutover + smoke test) before starting the next.
No data loss = no rushing — downtime during migration is acceptable. Data loss is not.

Phase 0 — Schema Foundation

Status: COMPLETE — Alembic revision b1c2d3e4f5a6 applied locally. Apply on VPS with docker compose exec backend alembic upgrade head before starting Phase 1.

What exists already in Postgres

entries + entry_links (notes/issues module)
support_tickets + ticket_messages (tickets module)
Alembic version history in alembic_version

What Phase 0 adds

Add the _migration_runs tracking table and all new table definitions via Alembic before any data moves.

New tables to create in this phase (schema only, no data yet):

_migration_runs — tracks what migration scripts have run
crm_products — flat columns, no JSONB needed
crm_customers — core columns + JSONB for contacts, notes, owned_items, location, tags, technical_issues, install_support, transaction_history, crm_summary
crm_orders — core columns + JSONB for items, discount, shipping, payment_status, timeline
staff — replaces admin_users Firestore collection
console_settings — key/value or typed columns, replaces Firestore settings doc
public_features — typed columns, replaces Firestore public_features doc
crm_comms_log — mirrors current SQLite schema, adds proper TIMESTAMPTZ columns
crm_media — mirrors current SQLite schema
crm_sync_state — key/value
crm_quotations + crm_quotation_items — mirrors current SQLite schema
mfg_audit_log — mirrors current SQLite schema
device_alerts — mirrors current SQLite schema
commands — mirrors current SQLite schema
heartbeats — mirrors current SQLite schema
melody_drafts — mirrors current SQLite schema
built_melodies — mirrors current SQLite schema
device_logs — partitioned by month on received_at
audit_log — new staff action audit system (see schema below)

Key schema decisions

`device_logs` — monthly partitioning

CREATE TABLE device_logs (
    id            BIGSERIAL,
    device_serial TEXT NOT NULL,
    level         TEXT NOT NULL,
    message       TEXT NOT NULL,
    device_timestamp BIGINT,
    received_at   TIMESTAMPTZ NOT NULL DEFAULT now(),
    PRIMARY KEY (id, received_at)
) PARTITION BY RANGE (received_at);

-- Partitions created monthly by a background job or manually:
CREATE TABLE device_logs_2025_01 PARTITION OF device_logs
    FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');
-- etc.

CREATE INDEX idx_device_logs_serial_time ON device_logs(device_serial, received_at DESC);
CREATE INDEX idx_device_logs_level       ON device_logs(level, received_at DESC);

Dropping a partition to purge old data: DROP TABLE device_logs_2024_06; — instant, no DELETE scan.

`crm_customers` — JSONB for flexible arrays

CREATE TABLE crm_customers (
    id                   TEXT PRIMARY KEY,           -- keep Firestore UUID as-is
    firestore_id         TEXT UNIQUE,                -- same value during transition, null-able later
    title                TEXT,
    name                 TEXT NOT NULL,
    surname              TEXT,
    organization         TEXT,
    religion             TEXT,
    language             TEXT NOT NULL DEFAULT 'el',
    folder_id            TEXT UNIQUE NOT NULL,
    relationship_status  TEXT NOT NULL DEFAULT 'lead',
    nextcloud_folder     TEXT,
    contacts             JSONB NOT NULL DEFAULT '[]',
    notes                JSONB NOT NULL DEFAULT '[]',
    location             JSONB,
    tags                 TEXT[] NOT NULL DEFAULT '{}',
    owned_items          JSONB NOT NULL DEFAULT '[]',
    linked_user_ids      TEXT[] NOT NULL DEFAULT '{}',
    technical_issues     JSONB NOT NULL DEFAULT '[]',
    install_support      JSONB NOT NULL DEFAULT '[]',
    transaction_history  JSONB NOT NULL DEFAULT '[]',
    crm_summary          JSONB,
    created_at           TIMESTAMPTZ NOT NULL,
    updated_at           TIMESTAMPTZ NOT NULL
);
CREATE INDEX idx_crm_customers_rel_status ON crm_customers(relationship_status);
CREATE INDEX idx_crm_customers_tags       ON crm_customers USING GIN(tags);
CREATE INDEX idx_crm_customers_name       ON crm_customers(name, surname);

`crm_orders` — separate table (was Firestore subcollection)

CREATE TABLE crm_orders (
    id                   TEXT PRIMARY KEY,
    customer_id          TEXT NOT NULL REFERENCES crm_customers(id) ON DELETE CASCADE,
    order_number         TEXT UNIQUE NOT NULL,
    title                TEXT,
    created_by           TEXT,
    status               TEXT NOT NULL DEFAULT 'negotiating',
    status_updated_date  TIMESTAMPTZ,
    status_updated_by    TEXT,
    items                JSONB NOT NULL DEFAULT '[]',
    subtotal             NUMERIC(12,2) NOT NULL DEFAULT 0,
    discount             JSONB,
    total_price          NUMERIC(12,2) NOT NULL DEFAULT 0,
    currency             TEXT NOT NULL DEFAULT 'EUR',
    shipping             JSONB,
    payment_status       JSONB NOT NULL DEFAULT '{}',
    invoice_path         TEXT,
    notes                TEXT,
    timeline             JSONB NOT NULL DEFAULT '[]',
    created_at           TIMESTAMPTZ NOT NULL,
    updated_at           TIMESTAMPTZ NOT NULL
);
CREATE INDEX idx_crm_orders_customer ON crm_orders(customer_id);
CREATE INDEX idx_crm_orders_status   ON crm_orders(status);

`staff` — replaces Firestore `admin_users`

CREATE TABLE staff (
    id               TEXT PRIMARY KEY,    -- keep Firestore doc ID as-is during transition
    firestore_id     TEXT UNIQUE,         -- same as id during transition
    email            TEXT UNIQUE NOT NULL,
    name             TEXT NOT NULL,
    role             TEXT NOT NULL DEFAULT 'staff',
    permissions      JSONB NOT NULL DEFAULT '{}',
    hashed_password  TEXT NOT NULL,
    is_active        BOOLEAN NOT NULL DEFAULT TRUE,
    created_at       TIMESTAMPTZ NOT NULL DEFAULT now(),
    updated_at       TIMESTAMPTZ NOT NULL DEFAULT now()
);

`audit_log` — new system, no migration source

CREATE TABLE audit_log (
    id           BIGSERIAL PRIMARY KEY,
    occurred_at  TIMESTAMPTZ NOT NULL DEFAULT now(),
    actor_id     TEXT NOT NULL,
    actor_name   TEXT NOT NULL,
    action       TEXT NOT NULL,   -- CREATE | UPDATE | DELETE | COMMAND | LOGIN | LOGOUT | etc.
    entity_type  TEXT NOT NULL,   -- customer | order | device | melody | product | staff | ticket | note | quotation | etc.
    entity_id    TEXT NOT NULL,
    entity_label TEXT,            -- denormalized human name: "Church of St. George", "SN-0042", etc.
    changes      JSONB,           -- {"field": {"old": x, "new": y}, ...} — null for CREATE/DELETE/COMMAND
    meta         JSONB            -- extra context: ip_address, command_name, etc.
);
-- Indexes covering the exact filter combos we need:
CREATE INDEX idx_audit_actor      ON audit_log(actor_id,    occurred_at DESC);
CREATE INDEX idx_audit_entity     ON audit_log(entity_type, entity_id,  occurred_at DESC);
CREATE INDEX idx_audit_action     ON audit_log(action,                  occurred_at DESC);
CREATE INDEX idx_audit_occurred   ON audit_log(occurred_at DESC);

Phase 1 — SQLite → Postgres (Data Migration)

Status: NOT STARTED Prerequisite: Phase 0 complete (all tables exist in Postgres)

No downtime required — SQLite is local, can read it while the app is running. After migration is verified, services are switched to read from Postgres.

Migration order (least dependencies first)

Step	Table	Script
1.1	`melody_drafts`	`migration/migrate_melody_drafts.py`
1.2	`built_melodies`	`migration/migrate_built_melodies.py`
1.3	`mfg_audit_log`	`migration/migrate_mfg_audit_log.py`
1.4	`device_alerts`	`migration/migrate_device_alerts.py`
1.5	`crm_sync_state`	`migration/migrate_crm_sync_state.py`
1.6	`crm_quotations`	`migration/migrate_crm_quotations.py`
1.7	`crm_quotation_items`	`migration/migrate_crm_quotation_items.py`
1.8	`crm_media`	`migration/migrate_crm_media.py`
1.9	`crm_comms_log`	`migration/migrate_crm_comms_log.py`
1.10	`commands`	`migration/migrate_commands.py`
1.11	`heartbeats`	`migration/migrate_heartbeats.py`
1.12	`device_logs`	`migration/migrate_device_logs.py` (largest — batched)

Per-script pattern

# Every script follows this structure
async def run():
    sqlite_rows = await read_all_from_sqlite("table_name")
    source_count = len(sqlite_rows)
    print(f"Source: {source_count} rows")

    async with pg_session() as session:
        async with session.begin():
            await session.execute(
                insert(PgModel).values(rows).on_conflict_do_nothing()
            )
            pg_count = await session.scalar(select(func.count()).select_from(PgModel))

    if pg_count < source_count:
        raise RuntimeError(f"Count mismatch: source={source_count} pg={pg_count}")
    print(f"Postgres: {pg_count} rows ✓")
    await log_migration_run("table_name", source_count, pg_count)

Service cutover per domain

After each group is migrated and verified:

Update service to import from database.postgres instead of database.core
Replace aiosqlite queries with SQLAlchemy async queries
Smoke test via the Console UI — verify the page loads correctly
Leave SQLite file untouched for 48h as a fallback

Phase 2 — Firestore → Postgres (Data Migration)

Status: NOT STARTED Prerequisite: Phase 1 complete

Requires shared.firebase.get_db() to read from Firestore. Scripts run with Firebase Admin SDK — same SDK already initialized in the backend.

Migration order

Step	Collection	Script	Notes
2.1	`settings` (doc)	`migration/migrate_settings.py`	Single document
2.2	`public_features` (doc)	`migration/migrate_public_features.py`	Single document
2.3	`crm_products`	`migration/migrate_crm_products.py`	No dependencies
2.4	`crm_customers`	`migration/migrate_crm_customers.py`	Strip legacy `negotiating`/`has_problem` fields
2.5	`orders` (subcollection)	`migration/migrate_crm_orders.py`	Uses `collection_group("orders")`

Converting Firestore types

Use the existing _convert_firestore_value helpers in devices/service.py — copy into a shared migration/utils.py. Key conversions:

DatetimeWithNanoseconds → .isoformat() string
GeoPoint → {"lat": x, "lng": y} dict
DocumentReference → .id string (just the doc ID, no path)

Cutover

After each Firestore collection is migrated and verified:

Switch service to read/write Postgres
Keep all Firestore write calls — continue writing to Firestore on every mutation so the data stays current there for any emergency rollback
After 48h of stable operation, remove the redundant Firestore writes (one service at a time)

Phase 3 — Staff Auth Cutover

Status: NOT STARTED Prerequisite: Phase 2 step 2.5 complete, staff table verified

This is the highest-risk phase because auth affects every request.

Steps

Migrate admin_users Firestore collection → staff Postgres table (script: migration/migrate_staff.py)
Verify: compare email list, role list, permission maps between Firestore and Postgres
Update auth/dependencies.py to query Postgres staff table instead of Firestore
Update staff/service.py to read/write Postgres
Update seed_admin.py to write to Postgres (keep old Firestore version as seed_admin_firestore_legacy.py)
Test: log in as each role, verify permissions work
Only after 24h stable — remove Firestore reads from auth

Rollback plan

The JWT token payload doesn't change — it still contains sub (staff ID) and permissions. Rolling back is just reverting the two files (auth/dependencies.py and staff/service.py).

Phase 4 — Audit Log System

Status: NOT STARTED Prerequisite: Phase 0 (audit_log table created)

The audit log system can be built and wired in incrementally — it doesn't block other phases. Wire it into each service as that service is cut over to Postgres.

The logging utility

backend/shared/audit.py — a single async function all services call:

async def log_action(
    db: AsyncSession,
    actor_id: str,
    actor_name: str,
    action: str,           # "CREATE" | "UPDATE" | "DELETE" | "COMMAND" | ...
    entity_type: str,      # "customer" | "order" | "device" | ...
    entity_id: str,
    entity_label: str | None = None,
    changes: dict | None = None,   # {"field": {"old": x, "new": y}}
    meta: dict | None = None,      # {"ip": ..., "command_name": ...}
) -> None

How to capture diffs

In service update functions:

old_data = existing_record.to_dict()          # before
await session.execute(update_stmt)
new_data = updated_record.to_dict()           # after
changes = {
    k: {"old": old_data[k], "new": new_data[k]}
    for k in new_data
    if old_data.get(k) != new_data.get(k)
}
await log_action(db, actor_id, actor_name, "UPDATE", "customer", id, label, changes)

Action types

Action	When
`CREATE`	Any new record created
`UPDATE`	Any field changed
`DELETE`	Any record deleted
`COMMAND`	MQTT command sent to device
`PUBLISH`	Melody published to Firestore
`UNPUBLISH`	Melody unpublished
`LOGIN`	Staff login
`LOGOUT`	Staff logout
`PERMISSION_CHANGE`	Staff permissions updated
`STATUS_CHANGE`	Order/customer/ticket status changed (convenience — also captured as UPDATE)

API endpoint

GET /api/audit-log with query params:

actor_id — filter by staff member
entity_type + entity_id — filter by a specific record
action — filter by action type
from_date / to_date — date range
limit / offset — pagination (default limit: 50, max: 200)

Phase 5 — MQTT Live Data Cutover

Status: COMPLETE — Postgres live ingestion + partition manager active 2026-04-17

What changed

New backend/database/pg_mqtt.py — all MQTT functions rewritten for Postgres (raw SQL, no ORM)
database/__init__.py — re-exports from pg_mqtt instead of core
main.py — removed db.init_db() / db.close_db(), added db.partition_manager_loop()
mqtt/router.py WebSocket auth — reads from Postgres staff table instead of Firestore admin_users
device_logs partitioned writes, heartbeats/commands as plain tables
purge_loop still runs for heartbeats/commands; device_logs purged via partition drops

Steps (original plan, now implemented)

Update database/core.py insert_log, insert_heartbeat, insert_command to write to Postgres
Update read functions (get_logs, get_heartbeats, etc.) similarly
The partition management background job: each month, at startup or via a cron, ensure next month's partition exists:

async def ensure_current_partitions(db: AsyncSession):
    for month_offset in [0, 1]:  # current + next month
        d = date.today().replace(day=1) + relativedelta(months=month_offset)
        partition_name = f"device_logs_{d.strftime('%Y_%m')}"
        start = d.isoformat()
        end = (d + relativedelta(months=1)).isoformat()
        await db.execute(text(f"""
            CREATE TABLE IF NOT EXISTS {partition_name}
            PARTITION OF device_logs
            FOR VALUES FROM ('{start}') TO ('{end}')
        """))

Log retention

Keep last 6 months of partitions
Cron job runs monthly: checks for partitions older than 6 months and drops them
Dropping a partition = DROP TABLE device_logs_2024_09; — instantaneous, no row-by-row delete

Verification Checklist (run after each phase)

SELECT COUNT(*) in Postgres matches source count for every migrated table
Sample 10 random records — compare field by field against source
Timestamps are stored as TIMESTAMPTZ, not TEXT strings
All JSONB columns parse correctly (no null where arrays expected)
Relevant Console pages load without errors
API endpoints return correct data
_migration_runs table shows success for all scripts

Files & Locations

backend/
├── migration/                  ← all migration scripts live here
│   ├── utils.py                ← shared helpers (Firestore type converters, PG connection, etc.)
│   ├── migrate_melody_drafts.py
│   ├── migrate_crm_customers.py
│   ├── migrate_crm_orders.py
│   └── ... (one file per table)
├── shared/
│   └── audit.py                ← audit log utility (Phase 4)
└── alembic/versions/           ← never edit by hand

Current Status Summary

Phase	Description	Status
0	Schema foundation (all tables in Postgres)	COMPLETE — applied on VPS 2026-04-17
1	SQLite → Postgres (data migration)	COMPLETE — all 12 scripts ran successfully on VPS 2026-04-17
2	Firestore → Postgres (data migration)	COMPLETE — all 5 scripts ran successfully on VPS 2026-04-17
3	Staff auth cutover	COMPLETE — Postgres auth live 2026-04-17
4	Audit log system	COMPLETE — shared/audit.py live, wired into auth + staff 2026-04-17
5	MQTT live data cutover	COMPLETE — Postgres live ingestion + partition manager 2026-04-17

Update this table as each phase completes.

Phase 1 — Run Order & Commands

Run each command on the VPS in order. Verify the output of each before proceeding.

# 1.1
docker compose exec backend python -m migration.migrate_melody_drafts

# 1.2
docker compose exec backend python -m migration.migrate_built_melodies

# 1.3
docker compose exec backend python -m migration.migrate_mfg_audit_log

# 1.4
docker compose exec backend python -m migration.migrate_device_alerts

# 1.5
docker compose exec backend python -m migration.migrate_crm_sync_state

# 1.6  (FK enforcement suppressed — crm_customers not in PG yet)
docker compose exec backend python -m migration.migrate_crm_quotations

# 1.7
docker compose exec backend python -m migration.migrate_crm_quotation_items

# 1.8
docker compose exec backend python -m migration.migrate_crm_media

# 1.9
docker compose exec backend python -m migration.migrate_crm_comms_log

# 1.10
docker compose exec backend python -m migration.migrate_commands

# 1.11
docker compose exec backend python -m migration.migrate_heartbeats

# 1.12  (largest — batched, shows progress)
docker compose exec backend python -m migration.migrate_device_logs

After all scripts complete, verify the run log:

docker compose exec postgres psql -U bellsystems_user -d bellsystems_db \
  -c "SELECT script_name, ran_at, source_rows, dest_rows, success FROM _migration_runs ORDER BY ran_at;"

Phase 2 — Run Order & Commands

crm_customers MUST run before crm_orders (FK dependency).

# 2.1
docker compose exec backend python -m migration.migrate_settings

# 2.2
docker compose exec backend python -m migration.migrate_public_features

# 2.3
docker compose exec backend python -m migration.migrate_crm_products

# 2.4  (required before 2.5)
docker compose exec backend python -m migration.migrate_crm_customers

# 2.5  (depends on 2.4)
docker compose exec backend python -m migration.migrate_crm_orders

Phase 3 — Run Order & Commands

Apply the new Alembic revision first (adds ui_prefs column + makes permissions nullable):

# Apply schema change
docker compose exec backend alembic upgrade head

# 3.1 — migrate Firestore admin_users → Postgres staff table
docker compose exec backend python -m migration.migrate_staff

# Verify
docker compose exec postgres psql -U bellsystems_user -d bellsystems_db \
  -c "SELECT id, email, role, is_active FROM staff ORDER BY role, name;"

After verifying the staff table is populated correctly:

# Restart the backend so it picks up the new auth/staff code
docker compose restart backend

Then test: log in as each role in the Console UI and verify permissions work.

After 24h stable operation, Firestore reads from auth are fully removed (already done in code).

Rollback: revert auth/router.py, auth/dependencies.py, staff/service.py, staff/router.py to the Firestore versions — the JWT payload is unchanged so tokens remain valid during rollback.

25 KiB Raw Blame History Unescape Escape