Master Project Plan

legal_transcription_platform_master_plan.txt

# Legal Transcription and Document Automation Platform - Master Project Plan

Version: 2026-05-26 (Updated)

## Paste-ready build request

I want to build a white-label, multi-tenant Legal Transcription and Document Automation Platform.

The system must support both deployment models from day one:

1. Single-firm onsite deployment, where the server is installed at one law firm's premises.
2. Hosted multi-tenant deployment, where one data-centre-hosted server can securely serve multiple firms using a Platform -> Firm -> User structure.

The project must not be built as a throwaway prototype. It must be a simple but expandable foundation.

The MVP must start as a simple transcription portal with feature flags already built in, so advanced functions can be added later without redesigning the whole system.

## Critical foundation rule

The biggest thing is this: feature flags, tenant settings, storage abstraction, AI abstraction, and event-based modules must be built from day one.

Those five foundations will stop the project from boxing itself in later.

These must not be treated as future upgrades. They must exist in the MVP, even if they are basic at first.

The five mandatory foundations are:

- Feature flags.
- Tenant settings.
- Storage abstraction.
- AI abstraction.
- Event-based modules.

The MVP can be simple, but the architecture must be ready to grow.

## Core purpose

Law firms need to provide audio recordings, dictation, uploaded letters, emails, existing documents, or live speech. The system must transcribe the content, identify the document or letter type, select the correct Word document template, insert the content into the template, and save a final professional legal document into the correct client, matter, lawyer, user, or firm folder.

The final document must be suitable for use as a legal document, while still requiring lawyer review and approval.

The system must assist legal professionals, but it must not claim that AI-generated work is final legal advice. Final legal review and approval belongs to the lawyer.

## Preferred technical approach

- Web-based admin and user portal.
- Linux server deployment.
- Clean shared theme/UI kit across the whole website.
- Modular backend structure.
- Queue workers for long-running jobs.
- Multi-tenant design from the beginning.
- Feature flags from the beginning.
- AI-provider abstraction from the beginning.
- Storage abstraction from the beginning.
- Event-based module system from the beginning.
- White-label branding from the beginning.
- Changelog and handover workflow from the beginning.

A Laravel/PHP portal with Python workers for transcription, AI, document processing, and background jobs may be acceptable, but the build process should recommend the best stack before starting.

A stack recommendation document must be produced and approved before any code is written. The recommendation must consider team capability, long-term maintainability, available libraries for Word document generation, transcription, queue management, and multi-tenancy. The chosen stack and the reasoning behind the decision must be recorded in the handover file before the first patch begins.

The system must support multiple AI providers and models, with a model router that can choose by task, cost, accuracy, privacy, speed, context size, fallback availability, and firm preference.

## Deployment models

### Model 1: Single-firm onsite deployment

- The server is installed at the law firm's premises.
- The system may only have one firm/tenant enabled.
- The firm uses its own local storage, network shares, email, templates, users, and AI settings.
- It should still use the same multi-tenant architecture internally so it can later be moved into hosted mode if required.

### Model 2: Hosted multi-tenant deployment

- The server is hosted by us in a data centre.
- Multiple law firms can use the same platform.
- Each firm is a separate tenant.
- Each firm has its own users, permissions, branding, templates, recordings, transcripts, documents, email settings, AI settings, storage connectors, audit logs, backups, and feature flags.
- Firms must never be able to see or access another firm's data.

### Model 3: Hosted single-tenant dedicated mode

- The same platform runs for one firm only on dedicated infrastructure.
- Useful for larger firms or clients that want dedicated hosting without sharing infrastructure.
- Still uses the same Platform -> Firm -> User structure internally.
- The system simply has one active firm.

### Model 4: Hosted platform with local connector agent

- Preferred option for firms that need access to local SMB shares or recording folders but do not want to install the full application onsite.
- A small local agent runs at the firm's office.
- The agent watches local folders and securely uploads recordings and documents to the hosted platform.
- The agent can also download completed documents back into the firm's local folders.
- This avoids a full onsite installation while still supporting local file workflows.
- See the Local Connector Agent section for full specification requirements.

The system hierarchy must be:

- Platform / Host Admin
- Firm / Tenant
- Users
- Lawyers
- Secretaries
- Reviewers
- Firm Admins
- Read-only users

## Platform admin / host admin

The platform admin is the owner/operator of the hosted system.

Platform admins can:

- Create firms.
- Disable firms.
- Manage firm plans and features.
- Configure global AI providers.
- Configure global storage options.
- Configure platform-level backups.
- View platform health.
- View system-wide errors.
- Impersonate firm users for support where permitted.
- See billing and usage summaries.
- Manage server-wide settings.

Platform admin access must be heavily audited.

Platform admins should not casually browse confidential firm documents. Access to firm documents should require an explicit support or impersonation action and must be logged.

## Main user roles

- System admin / platform admin.
- Firm admin.
- Lawyer.
- Secretary / paralegal.
- Transcription reviewer.
- Read-only / auditor.
- Client-specific restricted users if required later.

## Access control

- Every user must log in.
- Permissions must be role-based.
- Each firm must only see its own users, templates, files, jobs, transcripts, documents, logs, settings, and reports.
- System admin can see platform-level data.
- System admin can impersonate or perform actions as any selected user for troubleshooting where permitted.
- All impersonation must be logged.
- Disabled modules must be hidden from users and blocked at route/controller level.
- Every query must be tenant-scoped by default.

## White-label requirements

- Branding per firm/client.
- Logos per firm.
- Colour/theme settings per firm.
- Email templates per firm.
- Word document templates per firm.
- Storage locations per firm.
- AI/provider settings can be global, firm-specific, or user-specific.
- Jurisdiction settings per firm.
- Document naming rules per firm.
- Backup destinations per firm.
- Feature flags per firm.
- Custom domains or subdomains per firm where required.
- The same server should be able to host multiple firms securely.
- It must be easy to export, duplicate, or clone a firm setup for another client.

## Tenant isolation requirements

Tenant isolation is critical because this system handles confidential legal recordings and documents.

Every table that stores tenant data must include a firm_id or tenant_id unless it is truly global platform data.

Tenant isolation must apply to:

- Users.
- Recordings.
- Transcripts.
- Documents.
- Templates.
- AI jobs.
- AI usage.
- Email connections.
- File connectors.
- Backups.
- Audit logs.
- Settings.
- Feature flags.
- Generated documents.
- Matter/client records.
- Notifications.
- Queue jobs.

The system must prevent:

- One firm seeing another firm's users.
- One firm seeing another firm's files.
- One firm seeing another firm's templates.
- One firm seeing another firm's AI logs.
- One firm accessing another firm's storage connectors.
- One firm triggering jobs on another firm's data.
- One firm's large batch jobs starving other tenants of queue worker capacity.

## Recommended data-isolation strategy

The architecture should support more than one tenancy model, but the MVP should choose one safe default.

Option A: Shared database with tenant_id on every tenant-owned table.

- Easier to build and manage.
- Good for MVP.
- Must be extremely strict with tenant scoping.
- Requires strong tests to prevent data leakage.

Option B: Separate database per firm.

- Stronger isolation.
- Easier firm export/delete.
- More complex to manage.
- Useful later for larger firms or higher security requirements.

Option C: Hybrid model.

- Shared platform database.
- Separate storage folders/buckets per firm.
- Optional separate database for large or high-security firms later.

The system should be designed so it can start with Option A or hybrid, while not blocking future support for database-per-firm.

## Critical foundation 1: Feature flags from day one

The system must support enabling and disabling features globally, per plan/package, per firm, per role, per user, and where useful per document type or workflow.

This allows the platform to start as a simple transcription portal and then gradually reveal advanced features such as:

- Template documents.
- Folder automation.
- AI review.
- Email replies.
- Live dictation.
- VoIP dictation.
- Legal recommendations.
- Second-AI checking.
- Advanced workflow automation.

When a feature is disabled:

- It must disappear from menus.
- It must disappear from dashboards.
- Buttons for that feature must not be shown.
- Direct route access must be blocked.
- Related background jobs must not run.
- Disabled modules must not confuse users.

Feature flags must be part of the MVP, not added later.

## Critical foundation 2: Tenant settings from day one

The system must be multi-tenant from the beginning.

Every firm/client must have its own settings, including:

- Branding.
- Logo.
- Colours.
- Users.
- Permissions.
- Templates.
- Folders.
- AI provider preferences.
- Email settings.
- Enabled features.
- Jurisdiction.
- Backup destinations.
- Document naming rules.
- Retention rules.
- Workflow rules.

Do not hard-code firm-specific behaviour into the core system.

## Critical foundation 3: Storage abstraction from day one

The system must not be tied to one storage location.

All recordings, transcripts, templates, generated documents, attachments, reports, and backups must use a storage layer that can later support:

- Local disk.
- SMB shares.
- SFTP.
- FTP/FTPS.
- Cloud/object storage.
- Firm-specific storage locations.

This makes it easier to add new storage options later without rewriting the transcription, template, email, backup, or document modules.

In hosted mode, every firm must have isolated storage.

Example hosted storage layout:

- /storage/firms/{firm_id}/recordings
- /storage/firms/{firm_id}/transcripts
- /storage/firms/{firm_id}/documents
- /storage/firms/{firm_id}/templates
- /storage/firms/{firm_id}/attachments
- /storage/firms/{firm_id}/exports
- /storage/firms/{firm_id}/backups

The storage abstraction must prevent one firm from reading another firm's folder.

## Critical foundation 4: AI abstraction from day one

The system must not be built around one AI provider.

Create an AI provider/router layer that can support multiple AI APIs and models for different tasks, including:

- Transcription.
- Live transcription.
- Transcript cleanup.
- Document classification.
- Template selection.
- Legal review.
- Email response drafting.
- Second-AI checking.
- Summarisation.
- Admin diagnostics.
- Coding/developer assistance.

The AI router should choose or recommend models based on:

- Task type.
- Accuracy.
- Speed.
- Cost.
- Privacy tier.
- Context size.
- Firm preference.
- Fallback availability.
- Data sovereignty requirements.

### AI provider privacy tiers

Because this system handles confidential legal recordings and documents, AI providers must be categorised by privacy tier.

The three tiers are:

- Tier 1: On-premise or self-hosted models. No data leaves the firm's infrastructure. Examples include a locally hosted Whisper instance for transcription or a local LLM for transcript cleanup.
- Tier 2: Private API with contractual no-training guarantees. Data is sent to a third-party API but the provider has agreed not to use it for training. The firm must have a data processing agreement in place.
- Tier 3: Shared API with standard terms. Data may be used for model improvement under the provider's standard terms.

Per-firm settings must allow restriction to a minimum privacy tier. For example, a firm can be configured to only allow Tier 1 or Tier 2 providers.

The AI router must respect the firm's tier restriction when selecting a provider for any task.

The system must log which AI provider and which privacy tier handled each job, per firm, for audit purposes.

Legal firms must be informed that recordings and transcripts should only be sent to AI providers that meet their data handling requirements and that they have reviewed and approved. The platform must not override a firm's tier restriction.

AI providers that train on firm data without an explicit agreement must not be used.

## Critical foundation 5: Event-based modules from day one

The system must use an internal event system so modules can communicate without being tightly coupled.

Example events:

- AudioUploaded.
- AudioImported.
- TranscriptionStarted.
- TranscriptionCompleted.
- TranscriptEdited.
- TranscriptApproved.
- DocumentTypeDetected.
- TemplateSelected.
- DocumentGenerated.
- DocumentApproved.
- AiReviewRequested.
- AiReviewCompleted.
- EmailDraftCreated.
- EmailSent.
- BackupCompleted.
- BackupFailed.
- JobFailed.
- NotificationTriggered.
- RetentionExpiryWarning.
- SupportImpersonationStarted.
- SupportImpersonationEnded.

This allows future features to be added without rewriting existing modules.

For example:

- When TranscriptionCompleted happens, the system can later trigger AI cleanup.
- When DocumentGenerated happens, the system can later trigger AI legal review.
- When DocumentApproved happens, the system can later trigger email sending.
- When BackupFailed happens, the system can later trigger admin alerts.
- When JobFailed happens, the notification system can alert the relevant user or admin.
- When RetentionExpiryWarning happens, firm admins can be notified before any data is purged.

## File input/connectors

The system must connect to multiple recording/document locations, including:

- SMB/network shares.
- SFTP.
- FTP/FTPS.
- Local folders.
- Web upload.
- Email inbox ingestion later if useful.
- API/webhook ingestion later if useful.

In the website config area:

- Admins can create multiple file connectors.
- Each connector can be assigned to a firm, user, lawyer, matter, workflow, or document type.
- Admins can test each connector.
- Admins can browse available folders where permitted.
- The system can watch folders for new audio files.
- The system can place completed documents into configurable output folders.
- Each user may have different source and destination folders.

When using SMB, SFTP, FTP, or external storage:

- Connectors must belong to a firm.
- Credentials must be encrypted.
- Connectors must be permission-controlled.
- Connector tests must only show that firm's configured paths.
- Background jobs must always run in firm context.

## Audio transcription workflow

1. Detect or receive a new audio file.
2. Identify the firm, user, lawyer, matter, or client if possible from folder path, filename, metadata, or rules.
3. Transcribe the audio.
4. Store the raw transcript.
5. Optionally detect speakers.
6. Clean up the transcript into a professional legal drafting style where enabled.
7. Determine the document type, such as:
- Will.
- Letter of advice.
- Response to a letter.
- Affidavit.
- Statutory declaration.
- File note.
- Email draft.
- Court document.
- Contract clause.
- General correspondence.
- Other firm-defined templates.
8. Select the correct Word document template from the firm's template library.
9. Insert the transcript/content into the correct areas of the template.
10. Save the completed document under the correct lawyer, matter number, case file number, client name, date, or other configured naming rule.
11. Place the final document into the user's configured Transcribed Folder.
12. Show the job result in the website.
13. Allow the lawyer/secretary to review, edit, approve, download, email, or resend.

## Template handling

- Templates are Word documents.
- Templates are stored per firm and optionally per user/lawyer.
- Templates can have placeholders.
- The system must support template categories and document types.
- Admins can upload, edit metadata, replace, activate/deactivate, and test templates.
- The system must be able to choose a template automatically based on transcript content.
- Users can also manually select the template.
- The final document must look like a proper legal document, not just plain text pasted into a file.
- Template merge should preserve headers, footers, numbering, styles, logos, tables, page layout, and legal formatting.

## Live dictation workflow

Lawyers must be able to dictate directly into the system using either:

- A web browser microphone/WebRTC plugin.
- A VoIP/SIP dial-in number through PBX integration.

While logged in:

- The lawyer can select a template.
- The lawyer can say commands like:
- Prepare template for a Will.
- This is for [client/contact details].
- Start transcribing.
- New paragraph.
- Insert clause.
- Replace the last sentence.
- Undo that.
- Stop transcribing.
- Save document.
- Email this to my secretary.
- Words should appear on screen as they are spoken.
- The document should appear in the selected template live on screen.
- The lawyer must be able to edit the document while dictating.
- The system should support live AI cleanup after dictation, but not change critical legal meaning without user approval.
- The system must preserve a raw transcript and an edited/final version separately.

## Responding to existing letters and documents

The system must support a workflow where a lawyer uploads or selects an existing letter/document from:

- File upload.
- SMB/network share.
- SFTP/FTP folder.
- Email attachment later if added.

Then the user can say or request:

- Respond to this letter.
- Prepare a response in the style of our standard legal correspondence.
- Use this matter number.
- Use the template for dispute response.

The system should read the uploaded/source document, understand the context, and draft a response into the correct template.

## AI features

- Multiple AI providers must be supported.
- The user/admin can choose which AI provider/model to use.
- The system can recommend the best model for each task.
- AI tasks include:
- Transcription.
- Speaker detection.
- Document classification.
- Template selection.
- Legal-style formatting.
- Proofreading.
- Summarisation.
- Risk/quality checking.
- Command interpretation during live dictation.
- Draft response generation.
- Coding/admin helper features.
- A second AI checker must be available as an option.
- The second AI can review the first AI's transcript/document and check:
- Whether the template choice is correct.
- Whether key facts were missed.
- Whether names, dates, or matter numbers look inconsistent.
- Whether the final document follows the selected template.
- Whether the output looks complete.
- Whether the AI may have invented or changed meaning.
- AI checks must produce a confidence score and warnings.
- AI must not pretend legal advice is correct. It should assist drafting only. Final legal review belongs to the lawyer.

## AI legal review and recommendation module

After a transcription/document has been completed, the document page must include an optional button such as:

AI Legal Review / Recommendations

This feature must be controlled by permissions and feature flags.

When enabled, AI can review the completed document and provide recommendations based on the selected jurisdiction, for example:

- Australia.
- Western Australia.
- New South Wales.
- Victoria.
- Queensland.
- United Kingdom.
- New Zealand.
- Other country/state options added later.

The system must not claim to provide final legal advice. It must assist the lawyer by highlighting possible issues, missing information, wording concerns, formatting problems, unclear clauses, risk areas, and possible legal considerations.

The lawyer remains responsible for reviewing and approving everything.

This module should:

- Let the firm choose the default country/state/jurisdiction.
- Let the user override the jurisdiction per document.
- Show AI recommendations separately from the document.
- Allow recommendations to be accepted, ignored, or manually edited.
- Keep the original document unchanged unless the user chooses to apply changes.
- Keep a full audit trail of AI recommendations and user actions.
- Include confidence levels and warnings.
- Prefer current, reliable legal sources where legal research is involved.
- Support a second-AI checker to review the first AI's recommendations.
- Clearly mark the difference between:
- Drafting suggestions.
- Legal-risk flags.
- Formatting issues.
- Missing information.
- Questions for the lawyer.
- Suggested wording changes.

The AI legal review feature must be optional and able to be disabled globally, per firm, per user role, or per document type.

## Email response workflow

The system must be able to connect to a user's email/mailbox where permitted and allow the lawyer to respond to emails through the website.

Supported email access methods should include:

- Microsoft 365 / Microsoft Graph.
- Exchange / OWA / EWS if practical.
- IMAP/SMTP where required.
- SMTP-only sending as a fallback.

The email workflow should allow:

1. User logs into the website.
2. User opens the email-response area.
3. System displays emails or selected emails available for response.
4. User selects an email.
5. AI reads the email content and attachments where permitted.
6. The email is shown inside the website as a matter to respond to.
7. The user can choose a response template or allow AI to suggest one.
8. The user can dictate a response, type a response, or ask AI to draft a proposed reply.
9. AI prepares a reply using the firm's style, selected template, matter details, and user instructions.
10. The lawyer reviews and edits the reply.
11. The reply can be sent from the user's email account or saved as a draft.
12. All actions are logged.

The email reply system must support:

- Uploading or linking attachments.
- Reading attached letters/documents where permitted.
- Drafting a reply to an email.
- Drafting a letter in response to an email.
- Saving the response into the matter/client folder.
- Emailing the final document or reply.
- Creating an audit trail of original email, AI draft, user edits, and final sent version.

The AI must never send emails automatically unless the user has explicitly approved the final message.

## Backups

The server must have a strong backup system.

Admin can schedule backups as often as required.

Backups can include:

- Database.
- Uploaded recordings.
- Transcripts.
- Final documents.
- Templates.
- Configs.
- Logs.
- Source code.
- Changelogs.
- Handover files.

Backups can be sent to:

- Local disk.
- SMB share.
- SFTP.
- Cloud storage later if useful.

Backups must support encryption.

Restore testing must be included.

The website should show backup status, recent backups, failed backups, and storage usage.

In hosted mode, backups must support:

- Full platform backup.
- Per-firm backup.
- Per-firm export.
- Per-firm restore.
- Database backup.
- File backup.
- Template backup.
- Document backup.
- Settings backup.

In hosted mode, it must be possible to restore:

- The whole server.
- One firm.
- One firm's templates/settings.
- One firm's documents where practical.

Per-firm backup retention policies should be configurable.

## Billing and usage tracking for hosted mode

Hosted mode must track usage per firm.

Track:

- Number of users.
- Storage used.
- Recordings uploaded.
- Transcription minutes.
- AI tokens/cost.
- Generated documents.
- Email replies.
- Live dictation minutes.
- VoIP dictation minutes.
- Backup size.
- Active modules/features.

This allows future billing per firm.

### Plan and licence management

The system must support a plan or package model for hosted mode.

Plans define what features and limits a firm has access to. Examples:

- Basic: transcription and transcript viewer only.
- Standard: transcription, templates, and document generation.
- Professional: all of Standard plus AI review, email integration, and watched folders.
- Enterprise: all features, custom limits, dedicated support tier.

Plans must be enforceable by the system:

- Maximum number of users per firm.
- Maximum transcription minutes per month.
- Maximum storage per firm.
- Which feature flags are available under each plan.

When a firm approaches a plan limit:

- The system must show soft warnings before the limit is hit, not hard errors at the moment of failure.
- Platform admins can override limits per firm where required.

Plan assignment must be part of the firm onboarding process. Platform admins must be able to change a firm's plan at any time.

## Firm onboarding

The platform must include a firm onboarding process.

A platform admin should be able to create a new firm and configure:

- Firm name.
- Admin user.
- Branding.
- Domain/subdomain.
- Enabled features.
- Plan/package assignment.
- Default jurisdiction.
- AI provider/model settings and permitted privacy tier.
- Storage location.
- Template library.
- Email settings.
- Backup policy.
- Retention policy.

The onboarding wizard should support:

- Creating a blank firm.
- Cloning settings from another firm.
- Importing templates.
- Importing users.
- Applying a plan/package.

## Firm export/import

For hosted and white-label growth, the system should support exporting and importing a firm setup.

A firm export should include:

- Firm settings.
- Branding.
- Roles.
- Permissions.
- Feature flags.
- Templates.
- Prompt templates.
- Workflow rules.
- Document types.
- Naming rules.

Sensitive items should not be exported in plain text:

- Passwords.
- API keys.
- Email credentials.
- Storage credentials.
- Private keys.

## Custom domains and branding

Hosted mode should support custom domains or subdomains per firm.

Each firm may have:

- Its own login page branding.
- Logo.
- Colour scheme.
- Email footer.
- Document branding.
- Template library.
- Optional custom domain.

The shared theme should remain common, but firm branding should be applied through tenant settings.

## Security requirements

Hosted mode must be treated as a higher security requirement than onsite mode.

Requirements:

- Strong tenant isolation.
- Encrypted secrets.
- Encrypted backups.
- Per-firm audit logs.
- Platform admin audit logs.
- Optional MFA.
- Strong password policy.
- Firm-level retention policy.
- Firm-level data export/delete process.
- Clear support impersonation logging.
- AI usage isolation.
- Storage isolation.
- Background job tenant isolation.
- No cross-firm search leakage.

Support access must be controlled.

Options:

- Firm grants temporary support access.
- Platform admin impersonates a user.
- Support mode expires after a set time.
- Every support action is logged.
- Firm admin can view support access history.

The system must make it clear when a platform admin is acting inside a firm.

## Data retention and legal compliance

Legal firms in Australia and other jurisdictions face specific obligations around document retention under privacy legislation, law society rules, and record-keeping requirements.

The system must support:

- Configurable retention periods per firm. For example, a Western Australian legal firm may need to retain client files for seven years.
- Per-document-type retention overrides. Different document categories may have different retention requirements.
- Automated retention expiry warnings sent to firm admins before any data approaches its retention date.
- A retention policy review and approval step before any purge is executed. The system must never silently auto-delete data.
- An explicit right-to-erasure or data deletion workflow for any client data that falls under privacy law obligations.
- A record of what was deleted, when, by whom, and under which policy, kept in the audit log even after the data itself is removed.

Platform admins must be able to view and manage retention policies per firm. Firm admins must be able to view their own firm's retention settings.

Retention enforcement must be a background job that runs on a schedule and is visible in the admin diagnostics panel.

### AI provider data agreements

Because legal recordings and documents may contain highly sensitive client information, the platform must not send any firm's data to an AI provider unless:

- The firm has been configured with a permitted privacy tier.
- The selected AI provider meets or exceeds that tier.
- The provider's data handling terms are appropriate for legal confidential data.

AI providers that train on submitted data under standard terms must not be used unless the firm has been explicitly warned and has approved that tier in their settings.

The platform should include guidance in the admin area reminding firms to review AI provider agreements before enabling any Tier 3 provider.

## Matter and client management

The system handles documents, transcripts, and recordings that must be associated with specific legal matters and clients. Without a basic matter and client structure, folder routing, document naming, and job assignment cannot function properly.

A basic matter/client module must be included in Phase 1 or early Phase 2.

This module must include:

- A matters table per firm, containing at minimum: matter number, client name, assigned lawyer, matter type, and status.
- The ability to link recordings, transcripts, and generated documents to a matter.
- Matter number lookup during upload and import, so the system can attempt to identify and link a job to the correct matter automatically.
- Folder naming rules that can reference matter fields such as matter number, client name, lawyer name, and matter type.
- A basic matter list and detail view in the portal.

This module does not need to be a full practice management system. It only needs to provide enough structure for folder routing, document naming, and job association to work correctly.

Future phases may add deeper matter management, client portals, or integration with external practice management systems.

## Notification system

The system must include an internal notification layer to support legal workflows where multiple people are involved across different stages of a job.

Notifications must be triggered by the event system. Key events that should produce notifications include:

- TranscriptionCompleted: notify the assigned user or lawyer that their transcript is ready.
- DocumentGenerated: notify the relevant user that a document has been generated and is awaiting review.
- DocumentApproved: notify the secretary or assigned reviewer.
- AiReviewCompleted: notify the lawyer that AI recommendations are ready to review.
- JobFailed: notify the firm admin and the user who submitted the job.
- BackupFailed: notify platform admins and firm admins.
- RetentionExpiryWarning: notify firm admins before a retention deadline.
- SupportImpersonationStarted: notify the firm admin when platform support access begins.

Notification requirements:

- In-app notifications visible when the user is logged in.
- Optional email notifications per user, configurable in user preferences.
- Notification preferences must be per user, allowing each person to choose which events they want to be notified about and by which method.
- Admin-level alerts for system health events such as backup failures, AI provider errors, connector failures, and worker failures.
- Notifications must be tenant-scoped. A user must never see notifications from another firm.
- Notifications must be stored in the database and marked as read/unread.
- Older notifications should be automatically cleaned up after a configurable period.

## Queue and worker resilience

The system uses queue workers for transcription, document generation, AI review, backups, folder watching, and other long-running tasks. Worker reliability is critical because a missed or lost job in a legal workflow can mean lost billable work or a missed deadline.

Requirements:

- Failed jobs must be captured in a dead letter queue rather than silently lost.
- Failed jobs must be automatically retried up to a configurable maximum number of attempts before being moved to the failed queue.
- Platform and firm admins must be able to see all failed jobs, including the error message, the number of attempts, the firm context, and the job type.
- Admins must be able to manually retry a failed job from the admin panel.
- Each firm must have a per-firm job history showing what ran, when it ran, the result, the duration, the AI provider used, and the estimated cost where available.
- Worker health must be monitored and included in the diagnostics module.
- Concurrency limits must be enforceable per firm in hosted mode, so that one firm submitting a large batch of recordings does not starve other tenants of worker capacity.
- The queue system must be visible in the admin panel, showing queue depth, active workers, pending jobs, and recent completions.

## Document draft status and watermarking

Documents go through a lifecycle from AI-generated draft to lawyer-approved final. The system must make this status clear at every stage.

Requirements:

- Every generated document must have a clear status: Draft, Under Review, Approved, or Archived.
- Documents in Draft or Under Review status must carry a visible DRAFT watermark when downloaded or printed. This watermark must be configurable per firm and can be disabled if not required.
- The document detail page must clearly show whether the document was AI-generated, AI-assisted, or manually created.
- Approved documents can optionally be stamped with the approval date and the name of the approving lawyer, configurable per firm.
- The status must be shown clearly in the document list view and the document detail view.
- Status transitions must be logged in the audit trail.

## Accessibility and browser support

The web portal must be usable across the range of devices and users typical in a legal environment.

Requirements:

- The portal must meet WCAG 2.1 AA accessibility standards as a minimum.
- Supported browsers must be documented and tested: current versions of Chrome, Edge, Firefox, and Safari.
- The portal must be usable on tablet devices for lawyers who review documents away from their desk.
- Touch-friendly controls should be considered in the shared UI kit.
- Accessibility requirements apply to the shared admin theme and must not be treated as a future upgrade.

## Local connector agent

The local connector agent is the preferred solution for firms that need access to local SMB shares or network recording folders but do not want the full platform installed onsite. It is described in the deployment models section as Model 4 and requires a more detailed specification because it is a significant piece of infrastructure.

### Agent responsibilities

- Watch one or more configured local folders for new audio files or documents.
- Securely upload new files to the hosted platform for processing.
- Poll the hosted platform for completed documents and download them into the firm's configured output folders.
- Queue uploads and downloads locally so that network outages do not cause job loss.
- Retry uploads and downloads automatically when connectivity is restored.
- Log all upload and download actions locally and report them to the hosted platform for audit purposes.

### Agent communication

- The agent must communicate with the hosted platform using a secure API connection.
- Authentication must use per-agent tokens registered and managed through the platform admin panel.
- Connections must use TLS.
- The agent must not expose any inbound ports. All communication must be initiated by the agent outbound to the platform.

### Agent installation and updates

- The agent must be installable on Windows and Linux as a background service.
- The agent should support automatic updates pushed from the hosted platform, with a fallback to manual update if auto-update fails.
- The installation process must be documented clearly enough for a non-developer to complete it.
- The agent must have a simple local status page or command-line status command so an administrator at the firm can confirm it is running and check its last sync time.

### Agent registration and management

- Each agent must be registered to a specific firm in the platform admin panel.
- Platform admins and firm admins must be able to view registered agents, their last seen time, their version, and their current status.
- Platform admins must be able to revoke an agent's access token, which will immediately prevent that agent from communicating with the platform.
- Each agent must have a unique identifier visible in both the agent status output and the platform admin panel.

### Agent audit logging

- Every file the agent uploads must be recorded in the firm's audit log on the hosted platform.
- Every file the agent downloads must be recorded in the firm's audit log on the hosted platform.
- Agent errors and connectivity failures must be logged and visible in the admin diagnostics panel.

## Developer/admin helper features

The project must include the same type of safety and help-build features used in the billing software workflow:

- Every code change must update a changelog.
- Every code change must update a handover file for the next AI/developer.
- There should be a backup before patches.
- Use fast source backups for code/UI-only changes.
- Only do database backups when schema or production data is changed.
- Include route lists, schema summaries, error log tails, and smoke test reports.
- Include a public/private handover publishing option if required.
- Include a website crawler/link checker to find broken pages.
- Include render checks for key pages.
- Include application log checks.
- Include an admin diagnostics page.
- Include module health checks.
- Include AI-assisted error explanation, but do not expose secrets.
- If a patch causes an error, try to repair the patch in place rather than rolling back, unless rollback is necessary for safety or outage recovery.
- Maintain one common shared theme across the whole website, not page-specific CSS everywhere.

## Coding/build governance

- The system should be built module by module.
- One AI/developer can code, but another AI/reviewer should check the work before it is considered complete.
- Each module should have tests.
- Each module should have a clear interface.
- Avoid mixing transcription logic, template logic, email logic, file connector logic, and admin UI logic together.
- Add feature flags so incomplete modules can be disabled.
- Add permissions around every module.
- Add seed/demo data for testing.
- Add sample templates and sample recordings only if safe.
- Never hard-code firm names, paths, passwords, API keys, or branding.
- All code must assume multi-tenancy from the beginning, even if the first deployment only has one firm.
- Do not build anything that assumes there is only one firm.

Every module must be tenant-aware:

- Transcription.
- Templates.
- AI jobs.
- AI review.
- Email.
- Storage.
- Backups.
- Audit logs.
- Feature flags.
- Live dictation.
- VoIP dictation.
- Document generation.
- Notifications.
- Matter/client records.
- Queue jobs.
- Local connector agents.

This is mandatory because the same platform must support both onsite single-firm installs and hosted multi-firm SaaS-style deployment.

## Suggested modules

1. Core platform/auth/users/roles/permissions.
2. Firm/tenant management and white-label branding.
3. Shared admin theme/UI kit.
4. Feature flag/module management.
5. Storage abstraction.
6. AI provider/model router with privacy tier enforcement.
7. Event system.
8. Notification system.
9. Matter/client management.
10. File connector manager: SMB, SFTP, FTP/FTPS, local folders.
11. Recording watcher/import queue.
12. Transcription engine.
13. AI cost metering and usage guard.
14. Transcript viewer/editor.
15. Document type classifier.
16. Word template manager.
17. Template merge/document generation engine.
18. Document status and watermarking module.
19. AI legal review/recommendation module.
20. Second-AI checking/quality assurance module.
21. Existing-letter upload/response workflow.
22. Email response workflow.
23. Email sending module.
24. Live dictation via browser.
25. VoIP/SIP dictation integration.
26. Document approval workflow.
27. Backup/restore module.
28. Audit log and reporting module.
29. Retention policy and compliance module.
30. Queue management and failed job panel.
31. Plan/licence management module.
32. Local connector agent and agent management panel.
33. Diagnostics/smoke test/crawler module.
34. Changelog and handover module.
35. Deployment/update module.

## Progressive product levels

The system should support progressive feature levels so a firm can start basic and slowly enable more advanced functions.

### Level 1: Basic Transcription

- Login.
- Upload audio.
- Transcribe audio.
- View transcript.
- Edit transcript.
- Download transcript.
- Basic user management.
- Basic matter/client list.

### Level 2: Template Documents

- Word template library.
- Document type detection.
- Template selection.
- Generate legal document.
- Document draft status and watermarking.
- Save completed document to configured folder.

### Level 3: File Automation

- SMB/SFTP/FTP/local folder connectors.
- Watched recording folders.
- Automatic import.
- Automatic output folder saving.
- Per-user/per-lawyer folder rules.
- Local connector agent support.

### Level 4: AI Drafting and Review

- AI cleanup.
- AI document classification.
- AI template selection.
- AI legal review/recommendations by jurisdiction.
- Second-AI checker.
- Confidence scores and warnings.

### Level 5: Email Integration

- Connect mailbox.
- Read selected emails.
- AI-assisted email/document replies.
- Save drafts.
- Send approved emails.
- Save replies to matter folders.

### Level 6: Live Dictation

- Browser microphone dictation.
- Live text on screen.
- Voice commands.
- Live template filling.
- Lawyer edits while speaking.

### Level 7: VoIP Dictation

- SIP/VoIP dial-in.
- Phone-based dictation.
- Matter/template selection by voice or keypad.
- Dictation saved into the correct workflow.

## Growth and long-term maintainability requirements

The platform must be designed so it can grow safely over time without needing major rewrites.

Core growth principles:

- Start simple.
- Build every feature as a separate module.
- Hide unfinished or disabled features behind feature flags.
- Keep the user interface clean and only show features that are enabled for that firm/user.
- Avoid hard-coding anything that may differ between firms.
- Make settings configurable through the admin portal wherever practical.
- Keep firm-specific branding, templates, folders, AI settings, email settings, and permissions separate from the core code.

## Plugin/module architecture

The system should support a module-style structure where major features can be enabled, disabled, upgraded, or replaced independently.

Each major module should have:

- Its own routes.
- Its own controllers/services.
- Its own permissions.
- Its own settings.
- Its own queue jobs where needed.
- Its own tests.
- Its own health check.
- Its own changelog entry when changed.

Modules should not depend tightly on each other.

For example:

- The transcription module should not directly depend on the email module.
- The template module should not directly depend on the VoIP module.
- The AI review module should be able to work with documents created by upload, dictation, email, or future sources.

## Configuration-first design

Anything likely to change per firm should be stored in configuration/database settings, not hard-coded.

This includes:

- Branding.
- Logo.
- Colours.
- Document templates.
- Output folder rules.
- Source folder rules.
- Email settings.
- AI provider/model choices and permitted privacy tier.
- Jurisdiction/country/state.
- Document naming rules.
- Retention rules.
- Backup destinations.
- Enabled modules.
- Default workflows.
- User permissions.
- Plan/package assignment.
- Notification preferences.

## Workflow builder foundation

Even if the first version only has simple workflows, the database and code should allow future workflow rules.

Example future rules:

- If file appears in this folder, assign it to this lawyer.
- If transcript contains "Will", suggest the Will template.
- If matter number appears in filename, save output to matching matter folder.
- If AI confidence is low, send to reviewer.
- If document is approved, email it to secretary.
- If jurisdiction is WA, use WA-specific recommendation prompts.
- If document type is disabled, do not suggest it.
- If AI provider does not meet firm's privacy tier, block the job and notify the admin.
- If retention date is within 30 days, notify firm admin.

## Audit and version history

Every important object should have history.

Track versions of:

- Transcripts.
- AI-cleaned transcripts.
- Generated documents.
- Edited documents.
- Templates.
- AI recommendations.
- Email drafts.
- Sent emails.
- Settings changes.
- Retention policy changes.
- Plan/feature changes.

Users should be able to see:

- Original transcript.
- AI-modified version.
- User-edited version.
- Final approved version.

This is especially important for legal work.

## API-first thinking

Even if the first version is only a website, the backend should be designed so future apps and integrations can use APIs.

Future integrations may include:

- Mobile app.
- Desktop dictation tool.
- Microsoft Word add-in.
- Outlook add-in.
- PBX/VoIP systems.
- Document management systems.
- Practice management systems.
- Billing systems.
- Client portals.
- Local connector agent.

Use clear internal APIs/services from the beginning.

## Prompt/template management

AI prompts should be managed like templates, not hidden deep in code.

Admin users should eventually be able to manage:

- Transcription cleanup prompts.
- Document classification prompts.
- Legal review prompts.
- Jurisdiction-specific prompts.
- Email response prompts.
- Firm tone/style prompts.
- Second-AI checker prompts.

Prompts should have:

- Versions.
- Test mode.
- Approval status.
- Rollback ability.

## Testing and smoke checks

From the MVP onward, include simple automated checks.

At minimum:

- Login page renders.
- Dashboard renders.
- Upload page renders.
- Transcription job can be created.
- Template list renders.
- Document generation route works.
- Disabled features are hidden.
- Direct access to disabled features is blocked.
- Logs show no new fatal errors.
- Tenant isolation: confirm a user from Firm A cannot access Firm B data.
- AI privacy tier: confirm a job is blocked when the selected provider does not meet the firm's tier requirement.
- Notification: confirm a notification is created when a transcription job completes.
- Failed job: confirm a failed job appears in the failed jobs panel.

Before and after every patch:

- Create backup.
- Apply change.
- Clear/cache rebuild where needed.
- Run lint.
- Run route check.
- Run smoke test.
- Check logs.
- Update changelog.
- Update handover file.

## MVP priority and build phases

### Phase 1: Foundation and basic transcription portal

Phase 1 should build:

- Login/users/roles.
- Firm setup/branding.
- Tenant settings.
- Feature flags.
- Storage abstraction.
- AI abstraction with privacy tier foundation.
- Event system.
- Notification system foundation (in-app only).
- Common admin theme.
- Basic matter/client module.
- Manual audio upload.
- Transcription.
- Transcript review.
- AI document type detection.
- Manual or AI template selection.
- Word document generation.
- Document draft status and watermarking.
- Output folder saving.
- Audit log foundation.
- Changelog/handover/backup/smoke-test foundation.

### Phase 2: File automation and admin controls

Phase 2 should build:

- SMB/SFTP/FTP watched folders.
- Per-user/folder rules.
- Email sending.
- AI checker.
- Better audit logs.
- Cost metering.
- Admin diagnostics.
- Queue management and failed job panel.
- Firm onboarding wizard foundation.
- Plan/licence management foundation.
- Retention policy configuration.

### Phase 3: Legal review and email response workflows

Phase 3 should build:

- AI legal review/recommendations by jurisdiction.
- Email reading/reply workflow.
- Existing-letter response workflow.
- Document approval workflow.
- Better prompt management.
- Tenant export/import foundation.
- Local connector agent and agent management panel.

### Phase 4: Live browser dictation

Phase 4 should build:

- Live browser dictation.
- Real-time template display.
- Voice commands.
- Lawyer edits while speaking.

### Phase 5: VoIP/SIP dictation

Phase 5 should build:

- VoIP/SIP dial-in dictation.
- Phone-based workflows.
- PBX integration.

### Phase 6: Advanced integrations and reporting

Phase 6 should build:

- Advanced firm onboarding.
- More AI provider recommendations.
- Document comparison.
- Matter management integrations.
- Microsoft 365/Exchange deeper integration.
- More reporting and billing features.
- Enhanced retention and compliance reporting.

## Recommended MVP build rule

The first working version should be boring, stable, and simple.

Do not build live dictation, VoIP, email AI replies, or legal review first.

Build the foundation first:

1. Users and firms.
2. Permissions.
3. Tenant settings.
4. Feature flags.
5. Storage abstraction.
6. AI abstraction with privacy tier support.
7. Event system.
8. Notification system foundation.
9. Shared theme.
10. Basic matter/client module.
11. Manual upload.
12. Transcription job.
13. Transcript viewer/editor.
14. Template upload.
15. Document generation.
16. Document draft status and watermark.
17. Audit log.
18. Backup/changelog/handover process.

Then add advanced features one at a time.

## Before coding checklist

Before coding, produce:

1. A stack recommendation document. This must be approved before coding begins, and the decision must be recorded in the handover file.
2. A project architecture document.
3. A database schema.
4. A module list with responsibilities.
5. A build sequence.
6. Security/privacy plan.
7. AI provider/model-routing plan including privacy tier model.
8. Data retention and compliance plan.
9. Backup/restore plan.
10. White-label/multi-tenant plan.
11. Feature flag plan.
12. Storage abstraction plan.
13. Event/module communication plan.
14. Notification plan.
15. Queue resilience and failed job handling plan.
16. Local connector agent specification.
17. Testing/smoke-test plan including tenant isolation tests.
18. Handover/changelog process.

Then start building Phase 1 only, safely and incrementally.

Do not try to build everything at once. First create the architecture and MVP build plan, then ask before starting the first patch or install script.

## Deployment strategy

Do not build separate hosted and on-prem products.

Build one multi-tenant, tenant-aware platform that can run in different deployment modes:

1. Hosted multi-tenant mode
- One data-centre-hosted platform serves multiple firms.
- Each firm is a tenant.
- Data, users, templates, AI settings, storage, audit logs, backups, and feature flags are isolated by firm.

2. Hosted single-tenant dedicated mode
- The same platform runs for one firm only.
- Useful for larger firms or clients that want dedicated infrastructure.
- Still uses the same Platform -> Firm -> User structure.
- The system simply has one active firm.

3. Onsite single-tenant mode
- The same platform can be installed onsite if a firm requires local control.
- Still uses the same tenant-aware codebase.
- No separate on-prem codebase should be created.

4. Hosted platform with local connector agent
- Preferred option for firms that need access to local SMB shares or recording folders.
- A small local agent runs at the firm's office.
- The agent watches local folders and securely uploads recordings/documents to the hosted platform.
- The agent can also download completed documents back into the firm's local folders.
- This avoids installing the full application onsite while still supporting local file workflows.
- See the Local Connector Agent section for full specification.

Core rule:

The system must always be tenant-aware, even when only one firm is using it.

Do not write code that assumes there is only one firm.

The same codebase must support:
- multi-firm hosted deployment,
- single-firm hosted dedicated deployment,
- single-firm onsite deployment,
- hosted deployment with optional local connector agents.

This keeps the product easier to maintain, easier to support, and easier to grow.