Transcription Platform — Full Handover

TRANSCRIPTION_HANDOVER.md
# Legal Transcription and Document Automation Platform
# Handover and Project Reference Document
# ============================================
# Status:  ON HOLD - Ready to build
# Server:  jfmsrv01 (192.168.100.197)
# Updated: 2026-05-28
# ============================================


═══════════════════════════════════════════════════════════════
SECTION 1 — PROJECT STATUS AND CONTEXT
═══════════════════════════════════════════════════════════════

STATUS: ON HOLD — ALL PLANNING COMPLETE, BUILD NOT YET STARTED

The server infrastructure is fully ready. All planning documents,
architecture decisions, build scripts, and design documents are
complete. The project is paused intentionally and can be resumed
at any time by running the build scripts in order.

WHAT IS DONE
────────────
- Full master project plan written and reviewed (see Section 3)
- Server and storage design plan written and reviewed (see Section 4)
- All five mandatory foundations designed:
    1. Feature flags
    2. Tenant settings
    3. Storage abstraction (with path traversal protection)
    4. AI abstraction (with privacy tier enforcement)
    5. Event-based modules
- Build scripts ready in /jfmsrv01-build/ on the server
- Server infrastructure fully provisioned:
    PostgreSQL, Redis, PHP 8.3, Python 3.12, ffmpeg,
    LibreOffice, ClamAV, Nginx, Supervisor, Node.js
- Stack decision made: Laravel/PHP + Python workers + PostgreSQL + Redis

WHAT IS NOT YET DONE
─────────────────────
- Storage layout (script ready, not run)
- Application skeleton (script ready, not run)
- Admin safety tools (script ready, not run)
- First firm creation (script ready, not run)
- Any application code
- Nginx vhost for transcription platform
- Any database tables for the transcription platform


═══════════════════════════════════════════════════════════════
SECTION 2 — HOW TO RESUME THIS PROJECT
═══════════════════════════════════════════════════════════════

All build scripts are in /jfmsrv01-build/ on the server.
Run them in order. Each is idempotent (safe to re-run).

STEP 1 — Storage layout
────────────────────────
  sudo bash /jfmsrv01-build/01_storage_layout.sh

  Creates:
    /data/jfmsrv01/firms/{firm_id}/recordings
    /data/jfmsrv01/firms/{firm_id}/transcripts
    /data/jfmsrv01/firms/{firm_id}/documents
    /data/jfmsrv01/firms/{firm_id}/templates
    /data/jfmsrv01/firms/{firm_id}/attachments
    /data/jfmsrv01/firms/{firm_id}/exports
    /data/jfmsrv01/firms/{firm_id}/temp
    /data/jfmsrv01/firms/{firm_id}/reports
    /data/jfmsrv01/firms/{firm_id}/backups
    /data/jfmsrv01/quarantine/
    /data/jfmsrv01/processing/
    /data/jfmsrv01/imports/
    /data/jfmsrv01/shared/
    /backups/jfmsrv01/
  Also creates:
    ClamAV scan helper script
    Temp/processing cleanup cron job (every 30 minutes)
    Firm storage creation helper script

STEP 2 — Application skeleton
───────────────────────────────
  sudo bash /jfmsrv01-build/02_app_skeleton.sh

  Creates:
    Laravel application at /opt/jfmsrv01/app
    All five mandatory foundations
    Database migrations (firms, users, feature_flags, matters,
    notifications, audit_logs)
    21 event classes
    20 feature definitions seeded
    4 Supervisor queue workers
    .env configuration file

STEP 3 — Admin safety tools
────────────────────────────
  sudo bash /jfmsrv01-build/03_admin_safety_tools.sh

  Creates:
    CHANGELOG.md
    HANDOVER.md
    backup.sh (source/database/files/tenant/full)
    smoke_test.sh
    patch.sh (patch workflow)
    diagnostics.sh (JSON for admin panel)
    route_list.sh
    log_tail.sh

STEP 4 — Create first firm
───────────────────────────
  sudo bash /jfmsrv01-build/04_first_firm.sh

  Creates:
    First firm record in database
    Platform admin user
    Firm admin user
    Firm storage directories
    Level 1 features enabled

STEP 5 — Configure Nginx
──────────────────────────
  After the app is running, add an Nginx vhost:
    /etc/nginx/sites-available/transcription
  Choose a port (e.g. 8090) or set up a domain.
  Run: nginx -t && systemctl reload nginx

THEN — Continue with Phase 1 build list (see Section 6)


═══════════════════════════════════════════════════════════════
SECTION 3 — MASTER PROJECT PLAN SUMMARY
═══════════════════════════════════════════════════════════════

Full document: legal_transcription_platform_master_plan.txt
Available on handoff portal under: transcription/master-plan.html

CORE PURPOSE
─────────────
Law firms provide audio recordings, dictation, uploaded letters,
emails, existing documents, or live speech. The system transcribes
the content, identifies the document type, selects the correct
Word template, inserts the content, and saves a professional legal
document into the correct client/matter/lawyer folder.

The system assists legal professionals. It must never claim
AI-generated work is final legal advice. Final review belongs
to the lawyer.

DEPLOYMENT MODELS (all use the same codebase)
───────────────────────────────────────────────
  Model 1: Hosted multi-tenant — multiple firms, one server
  Model 2: Hosted single-tenant dedicated — one firm, one server
  Model 3: Onsite single-tenant — installed at law firm premises
  Model 4: Hosted + local connector agent (preferred for SMB access)

SYSTEM HIERARCHY
─────────────────
  Platform / Host Admin
    -> Firm / Tenant
        -> Users (Lawyers, Secretaries, Reviewers, Firm Admins, Read-only)

THE FIVE MANDATORY FOUNDATIONS (must exist from day one)
──────────────────────────────────────────────────────────
  1. Feature flags
     - Enable/disable globally, per plan, per firm, per role, per user
     - Disabled features disappear from menus, blocked at routes,
       background jobs do not run

  2. Tenant settings
     - Every firm has its own branding, users, permissions, templates,
       folders, AI settings, email, jurisdiction, backup, retention

  3. Storage abstraction
     - All file access through StorageService
     - Supports: local disk, SMB, SFTP, FTP, cloud/object storage
     - Path traversal protection built in
     - Per-firm isolated storage paths

  4. AI abstraction with privacy tiers
     - All AI calls through AiService
     - Privacy tiers: 1=on-premise, 2=private API, 3=shared API
     - Firm's permitted tier enforced on every AI call
     - AI router chooses by: task, cost, accuracy, speed, privacy,
       context size, firm preference, fallback availability

  5. Event-based modules
     - Modules communicate through events, not direct calls
     - 21 event classes defined (AudioUploaded, TranscriptionCompleted,
       DocumentApproved, BackupFailed, etc.)

STACK DECISION (locked in)
───────────────────────────
  Web framework:  Laravel / PHP 8.3
  Workers:        Python 3.12 (transcription, AI, document processing)
  Database:       PostgreSQL (row-level security, better JSON, integrity)
  Queue/cache:    Redis (AOF persistence, noeviction policy)
  Web server:     Nginx
  Documents:      LibreOffice headless + python-docx
  Audio:          ffmpeg
  Virus scanning: ClamAV

WHY THIS STACK
───────────────
  Laravel: best multi-tenant SaaS support in PHP ecosystem
  Python workers: better AI/transcription library ecosystem than PHP
  PostgreSQL: row-level security for tenant isolation, superior JSON
  Redis: reliable queue backend with persistence
  Everything already installed on jfmsrv01

TENANCY MODEL (MVP)
────────────────────
  Shared database with firm_id on every tenant-owned table.
  Separate per-firm storage folders.
  Designed to support future database-per-firm without rewrites.

CRITICAL RULES FOR ALL CODE
─────────────────────────────
  - Every tenant-owned model must have firm_id
  - Every query must be tenant-scoped by default (TenantScope)
  - All file operations through StorageService (never hard-code paths)
  - All AI calls through AiService (enforces privacy tier)
  - All feature checks through FeatureFlagService
  - All important actions through AuditService::log()
  - Every background job must carry firm_id
  - No firm name, path, API key, or branding ever hard-coded


═══════════════════════════════════════════════════════════════
SECTION 4 — SERVER AND STORAGE DESIGN SUMMARY
═══════════════════════════════════════════════════════════════

Full document: legal_transcription_server_storage_design_plan.txt
Available on handoff portal under: transcription/storage-design.html

SERVER SPECIFICATION
─────────────────────
  VM:          jfmsrv01
  OS:          Ubuntu 24.04 LTS
  vCPU:        8 cores
  RAM:         32 GB
  OS disk:     150 GB SSD
  Data disk:   2 TB SSD (mount at /data)
  Backup:      separate off-server target

STORAGE LAYOUT
───────────────
  /data/jfmsrv01/
    firms/
      {firm_id}/
        recordings/      — uploaded audio files
        transcripts/     — raw and cleaned transcripts
        documents/       — generated Word documents
        templates/       — firm Word templates
        attachments/     — email attachments, uploaded docs
        exports/         — firm exports
        temp/            — cleaned after 24 hours
        reports/         — generated reports
        backups/         — per-firm backups
    quarantine/          — virus scan failures (admin review only)
    processing/          — active worker temp files (alert if stale >2h)
    imports/             — connector staging (alert if stale >1h)
    shared/              — platform-level assets only
    system/              — lock files, alerts, health check outputs

  /backups/jfmsrv01/
    source/              — application source backups
    database/            — database dumps
    files/               — file/data backups
    tenants/             — per-firm backups
    handovers/           — changelog and handover history
    reports/             — smoke test and backup reports

FILESYSTEM PERMISSIONS
───────────────────────
  Application user:   jfmsrv01
  All data dirs:      750 owned by jfmsrv01
  Web server (www-data) has NO direct access to data disk
  Quarantine:         readable only by jfmsrv01 and admin processes

DATABASE DESIGN
────────────────
  Primary database:   legaltranscribe (PostgreSQL)
  Tenancy model:      shared database, firm_id on all tenant tables

  Tenant-scoped tables (must have firm_id):
    users, recordings, transcription_jobs, transcripts,
    transcript_versions, generated_documents, document_versions,
    templates, template_versions, ai_jobs, ai_usage_logs,
    ai_review_results, email_connections, email_drafts, sent_emails,
    file_connectors, storage_connectors, audit_logs,
    feature_flags (firm-level), firm_settings, backup_jobs, reports,
    notifications, matters/clients, queue_jobs, failed_jobs,
    retention_policies, firm_plan_assignments, storage_quotas,
    local_connector_agents

REDIS CONFIGURATION
────────────────────
  Persistence:    AOF (appendonly yes, fsync everysec)
  Binding:        localhost only (127.0.0.1)
  Memory policy:  noeviction (queue jobs must never be dropped)
  Auth:           strong password (see /root/jfmsrv01_credentials.txt)

SECURITY REQUIREMENTS
──────────────────────
  - ClamAV scans all uploaded files before acceptance
  - Files failing scan go to quarantine (never deleted automatically)
  - Path traversal validation in StorageService
  - Storage quotas per firm (80% warning, 100% blocks uploads)
  - Retention periods configurable per firm
  - Per-firm audit logs
  - Platform admin impersonation fully logged
  - AI provider privacy tier enforced per firm
  - TLS required for any public-facing service
  - All secrets in .env, never in code or logs

LOCAL CONNECTOR AGENT
──────────────────────
  Purpose: Allows firms to access local SMB shares without
           installing the full platform onsite

  Agent responsibilities:
    - Watch local folders for new audio/documents
    - Securely upload to hosted platform via HTTPS API
    - Download completed documents to local output folders
    - Queue locally during network outages, retry on reconnect
    - Log all actions to firm audit log on platform

  Agent communication:
    - HTTPS only, TLS, outbound only (no inbound ports needed)
    - Per-agent tokens managed via platform admin panel
    - Tokens revocable immediately from admin

  Agent installation:
    - Windows Service or Linux systemd service
    - Supports auto-update from platform
    - Local status page for firm IT administrator

  Per-agent audit trail:
    - Every upload: filename, size, timestamp, agent ID
    - Every download: filename, destination, timestamp, agent ID
    - Errors and connectivity failures visible in diagnostics


═══════════════════════════════════════════════════════════════
SECTION 5 — MODULE LIST (35 MODULES)
═══════════════════════════════════════════════════════════════

  1.  Core platform/auth/users/roles/permissions
  2.  Firm/tenant management and white-label branding
  3.  Shared admin theme/UI kit
  4.  Feature flag/module management
  5.  Storage abstraction
  6.  AI provider/model router with privacy tier enforcement
  7.  Event system
  8.  Notification system
  9.  Matter/client management
  10. File connector manager (SMB, SFTP, FTP/FTPS, local folders)
  11. Recording watcher/import queue
  12. Transcription engine
  13. AI cost metering and usage guard
  14. Transcript viewer/editor
  15. Document type classifier
  16. Word template manager
  17. Template merge/document generation engine
  18. Document status and watermarking module
  19. AI legal review/recommendation module
  20. Second-AI checking/quality assurance module
  21. Existing-letter upload/response workflow
  22. Email response workflow
  23. Email sending module
  24. Live dictation via browser
  25. VoIP/SIP dictation integration
  26. Document approval workflow
  27. Backup/restore module
  28. Audit log and reporting module
  29. Retention policy and compliance module
  30. Queue management and failed job panel
  31. Plan/licence management module
  32. Local connector agent and agent management panel
  33. Diagnostics/smoke test/crawler module
  34. Changelog and handover module
  35. Deployment/update module


═══════════════════════════════════════════════════════════════
SECTION 6 — BUILD PHASES
═══════════════════════════════════════════════════════════════

PHASE 1 — Foundation and basic transcription portal (START HERE)
──────────────────────────────────────────────────────────────────
  Prerequisites: Run scripts 01 through 04 first (see Section 2)

  Then build:
  - Login / users / roles (spatie/laravel-permission)
  - Firm setup / branding
  - Tenant settings (Firm model already created by script)
  - Feature flags (FeatureFlagService + 20 definitions already seeded)
  - Storage abstraction (StorageService already created)
  - AI abstraction with privacy tier (AiService already created)
  - Event system (21 events already created)
  - Notification system foundation (in-app only)
  - Common admin theme (shared, not page-specific CSS)
  - Basic matter/client module
  - Manual audio upload (with ClamAV scan)
  - Transcription job and queue worker
  - Transcript review/editor
  - AI document type detection
  - Manual or AI template selection
  - Word document generation
  - Document draft status and watermark
  - Output folder saving
  - Audit log
  - Changelog/handover/backup/smoke-test

PHASE 2 — File automation and admin controls
──────────────────────────────────────────────
  - SMB/SFTP/FTP watched folders
  - Per-user/folder rules
  - Email sending
  - AI checker
  - Better audit logs
  - Cost metering
  - Admin diagnostics
  - Queue management and failed job panel
  - Firm onboarding wizard
  - Plan/licence management
  - Retention policy configuration

PHASE 3 — Legal review and email response
───────────────────────────────────────────
  - AI legal review/recommendations by jurisdiction
  - Email reading/reply workflow
  - Existing-letter response workflow
  - Document approval workflow
  - Better prompt management
  - Tenant export/import
  - Local connector agent and management panel

PHASE 4 — Live browser dictation
──────────────────────────────────
  - Browser microphone dictation
  - Real-time template display
  - Voice commands
  - Lawyer edits while speaking

PHASE 5 — VoIP/SIP dictation
──────────────────────────────
  - VoIP/SIP dial-in dictation
  - Phone-based workflows
  - PBX integration

PHASE 6 — Advanced integrations
─────────────────────────────────
  - Advanced firm onboarding
  - More AI provider options
  - Document comparison
  - Matter management integrations
  - Microsoft 365/Exchange deeper integration
  - Enhanced retention and compliance reporting


═══════════════════════════════════════════════════════════════
SECTION 7 — PROGRESSIVE PRODUCT LEVELS
═══════════════════════════════════════════════════════════════

Level 1 — Basic Transcription
  Login, upload audio, transcribe, view/edit/download transcript,
  basic user management, basic matter/client list

Level 2 — Template Documents
  Word template library, document type detection, template selection,
  generate legal document, draft watermarking, save to folder

Level 3 — File Automation
  SMB/SFTP/FTP connectors, watched folders, automatic import/export,
  per-user folder rules, local connector agent

Level 4 — AI Drafting and Review
  AI cleanup, AI classification, AI template selection,
  AI legal review by jurisdiction, second-AI checker,
  confidence scores and warnings

Level 5 — Email Integration
  Connect mailbox, read emails, AI-assisted replies,
  save drafts, send approved emails, save to matter folders

Level 6 — Live Dictation
  Browser microphone, live text on screen, voice commands,
  live template filling, edit while speaking

Level 7 — VoIP Dictation
  SIP/VoIP dial-in, phone-based dictation,
  matter/template selection by voice or keypad


═══════════════════════════════════════════════════════════════
SECTION 8 — AI PROVIDER PRIVACY TIERS
═══════════════════════════════════════════════════════════════

Every AI call must go through AiService which enforces the firm's
configured privacy tier. This is critical for legal confidentiality.

TIER 1 — On-premise / self-hosted
  No data leaves firm infrastructure.
  Examples: locally hosted Whisper, local LLM.
  Use when: firm requires maximum data sovereignty.

TIER 2 — Private API with no-training guarantee
  Data sent to third-party API.
  Provider has signed a data processing agreement.
  Provider does not train on submitted data.
  Use when: firm needs cloud AI but requires contractual protection.

TIER 3 — Shared API (standard terms)
  Data may be used by provider under standard terms.
  Only permitted if firm has explicitly approved this tier.
  Must show a warning to firm admin before enabling.

Per-firm tier enforcement:
  - Firm's ai_privacy_tier field controls minimum permitted tier
  - AiService blocks any provider that exceeds the firm's tier
  - Blocked attempts are logged in the audit trail
  - Platform admin cannot override firm's tier without changing it
    in firm settings (which is itself audited)

AI tasks and their typical tier requirements:
  Transcription:        Tier 1 preferred (audio is most sensitive)
  Transcript cleanup:   Tier 2 minimum recommended
  Classification:       Tier 2 or 3 acceptable
  Legal review:         Tier 1 or 2 strongly recommended
  Email drafting:       Tier 2 minimum recommended
  Second-AI checking:   Match primary AI tier


═══════════════════════════════════════════════════════════════
SECTION 9 — FEATURE FLAGS REFERENCE
═══════════════════════════════════════════════════════════════

20 feature definitions are seeded when the app skeleton runs.
The table below shows each feature, its default state, and which
plan levels include it.

Feature Key              | Default | Plans
─────────────────────────────────────────────────────────────
transcription            | ON      | all
transcript_viewer        | ON      | all
matters                  | ON      | all
audit_log                | ON      | all
notifications            | ON      | all
backups                  | ON      | all
templates                | OFF     | standard, professional, enterprise
document_generation      | OFF     | standard, professional, enterprise
document_watermark       | OFF     | standard, professional, enterprise
file_connectors          | OFF     | professional, enterprise
folder_watcher           | OFF     | professional, enterprise
local_agent              | OFF     | professional, enterprise
ai_cleanup               | OFF     | professional, enterprise
ai_classification        | OFF     | professional, enterprise
ai_legal_review          | OFF     | enterprise
ai_second_checker        | OFF     | enterprise
email_integration        | OFF     | professional, enterprise
email_ai_reply           | OFF     | enterprise
live_dictation           | OFF     | enterprise
voip_dictation           | OFF     | enterprise


═══════════════════════════════════════════════════════════════
SECTION 10 — BEFORE CODING CHECKLIST
═══════════════════════════════════════════════════════════════

Before writing any application code, confirm:

  [x] Stack recommendation document — done (see Section 3)
  [x] Project architecture document — done (this file + master plan)
  [x] Module list — done (see Section 5)
  [x] Security/privacy plan — done (see Section 4)
  [x] AI provider/model routing plan with privacy tiers — done
  [x] Data retention and compliance plan — done (master plan)
  [x] Backup/restore plan — done (storage design)
  [x] White-label/multi-tenant plan — done (master plan)
  [x] Feature flag plan — done (see Section 9)
  [x] Storage abstraction plan — done (see Section 4)
  [x] Event/module communication plan — done (master plan)
  [x] Notification plan — done (master plan)
  [x] Queue resilience and failed job handling plan — done
  [x] Local connector agent specification — done (see Section 4)
  [x] Testing/smoke-test plan — done (master plan)
  [x] Handover/changelog process — done (RULES.md)

  [ ] Run 01_storage_layout.sh
  [ ] Run 02_app_skeleton.sh
  [ ] Run 03_admin_safety_tools.sh
  [ ] Run 04_first_firm.sh
  [ ] Configure Nginx vhost for transcription platform
  [ ] Build Phase 1 application code (see Section 6)


═══════════════════════════════════════════════════════════════
SECTION 11 — KEY FILE LOCATIONS (when built)
═══════════════════════════════════════════════════════════════

  /opt/jfmsrv01/app/                  Laravel application
  /opt/jfmsrv01/app/.env              Environment config (secrets, never share)
  /opt/jfmsrv01/app/app/Models/       Eloquent models
  /opt/jfmsrv01/app/app/Services/     Service classes
  /opt/jfmsrv01/app/app/Events/       Event classes (21 created)
  /opt/jfmsrv01/app/app/Models/Scopes/TenantScope.php
  /opt/jfmsrv01/app/app/Services/StorageService.php
  /opt/jfmsrv01/app/app/Services/AiService.php
  /opt/jfmsrv01/app/app/Services/FeatureFlagService.php
  /opt/jfmsrv01/app/app/Services/AuditService.php
  /opt/jfmsrv01/app/database/migrations/
  /opt/jfmsrv01/scripts/              Admin scripts
  /opt/jfmsrv01/scripts/CHANGELOG.md  Change history
  /opt/jfmsrv01/scripts/HANDOVER.md   Current state
  /data/jfmsrv01/firms/               Per-firm file storage
  /data/jfmsrv01/quarantine/          Virus scan failures
  /backups/jfmsrv01/                  Backup destination
  /etc/nginx/sites-available/transcription  Nginx vhost (to be created)
  /etc/supervisor/conf.d/jfmsrv01-worker.conf  Queue workers


═══════════════════════════════════════════════════════════════
SECTION 12 — USEFUL COMMANDS (when built)
═══════════════════════════════════════════════════════════════

  # Run migrations
  cd /opt/jfmsrv01/app && php artisan migrate

  # Check queue workers
  supervisorctl status

  # Run smoke test
  sudo bash /opt/jfmsrv01/scripts/smoke_test.sh

  # Patch workflow
  sudo bash /opt/jfmsrv01/scripts/patch.sh pre "description"
  sudo bash /opt/jfmsrv01/scripts/patch.sh post "description"
  sudo bash /opt/jfmsrv01/scripts/patch.sh post "description" db

  # Create firm storage
  sudo -u jfmsrv01 /opt/jfmsrv01/scripts/create_firm_storage.sh 2

  # View error log
  tail -100 /opt/jfmsrv01/app/storage/logs/laravel.log

  # Artisan tinker
  cd /opt/jfmsrv01/app && php artisan tinker


═══════════════════════════════════════════════════════════════
SECTION 13 — KNOWN DECISIONS AND RATIONALE
═══════════════════════════════════════════════════════════════

Decision: Laravel + Python workers (not pure Python, not pure PHP)
Reason: Laravel has best multi-tenant SaaS support in PHP. Python
has better AI/transcription libraries (Whisper, transformers, etc).
Separating them allows each to do what it does best.

Decision: PostgreSQL over MariaDB
Reason: Row-level security for tenant isolation at DB layer, better
JSON/JSONB support for settings/metadata, stronger integrity, better
full-text search for future transcript/document search.

Decision: Redis with AOF persistence and noeviction
Reason: Queue jobs in a legal platform represent real billable work.
Silent job loss on Redis restart is unacceptable.

Decision: Shared database with firm_id (not separate DB per firm)
Reason: Simpler for MVP, easier to manage. Designed so that a firm
can be migrated to its own database later without rewrites.

Decision: ClamAV on all uploads before acceptance
Reason: Files come from law firm clients, email attachments, and
SMB shares. All are untrusted. Scan before accepting into firm storage.

Decision: Privacy tier enforcement in AiService
Reason: Legal recordings and documents contain highly sensitive
client information. Firms must control which AI providers can
access their data. This cannot be an optional setting.

Decision: Path traversal validation in StorageService
Reason: Any manipulated filename or folder path must not be able to
cause the application to read or write files outside the firm's
permitted base path. StorageService validates every resolved path.

Decision: Event-based module communication
Reason: Prevents tight coupling between transcription, templates,
email, AI review, and document modules. Future features can subscribe
to existing events without modifying existing code.