Paperless-ngx Stack
A document management system that transforms your physical documents into a searchable online archive. Scan, index, and archive all your documents with powerful OCR and AI-powered organization.
Services Overview
- webserver: Main Paperless-ngx application with web interface and API
- db: PostgreSQL database for document metadata and full-text search
- broker: Redis message broker for background task processing
- gotenberg: Document conversion service for Office files and web pages
- tika: Text extraction service for various file formats
- backup-files: Automated file backups using resticprofile with AWS S3
- backup-database: Automated PostgreSQL database dumps
Key Features
- OCR Processing: Automatic text extraction from scanned documents
- AI Tagging: Machine learning-powered document classification and tagging
- Full-Text Search: Fast searching across all document contents
- Document Types: Support for PDF, images, Office documents, emails
- Web Interface: Modern, responsive web UI for document management
- REST API: Full API for integration with other applications
- Barcode Support: QR code and barcode recognition for automated filing
- Email Integration: Import documents via email
- Multi-user: User management with permission controls
Links & Documentation
- Official Website: https://paperless-ngx.com/
- GitHub Repository: https://github.com/paperless-ngx/paperless-ngx
- Documentation: https://docs.paperless-ngx.com/
- Docker Hub: https://hub.docker.com/r/paperlessngx/paperless-ngx
- Demo: https://demo.paperless-ngx.com/ (admin/demo)
- Community: https://github.com/paperless-ngx/paperless-ngx/discussions
Configuration
Environment Variables
Copy stack.env to stack.env.real and configure:
PAPERLESS_*: Application-specific settings (database, OCR languages, secret key)TZ: TimezoneTRAEFIK_DOMAIN: Domain for web accessCONSUME_PATH: Directory for automatic document consumptionAWS_*: AWS S3 credentials for backupsSERVICE_DATA_ROOT_PATH: Base path for service dataUSERMAP_UID/USERMAP_GID: User/group IDs for file permissions
OCR Languages
Configure PAPERLESS_OCR_LANGUAGE and PAPERLESS_OCR_LANGUAGES for multi-language OCR support.
Network Access
- Web Interface: Accessible via Traefik at configured domain
- Document Consumption: Place documents in the consume directory for automatic processing
Document Processing Pipeline
- Intake: Documents added via web upload, email, or consume folder
- OCR: Text extraction using Tesseract with configured languages
- Text Extraction: Additional text processing via Tika for office documents
- PDF Generation: Gotenberg converts office documents to searchable PDFs
- Classification: AI-powered tagging and document type detection
- Storage: Organized storage with full-text search indexing
Backup Strategy
Database: Hourly PostgreSQL dumps with 2-hour retention
Files: Automated S3 backups of documents and media using resticprofile
Dependencies
- External Traefik reverse proxy network
- AWS S3 bucket for backups
- Consume directory for document intake