Docker Deployment¶
Deploy SolanaLM using Docker and Docker Compose.
Prerequisites¶
- Docker 20.10+
- Docker Compose 2.0+
- 8GB+ RAM
- GPU support (optional, requires NVIDIA Container Toolkit)
Quick Start (Development)¶
# Clone the repository
git clone https://github.com/solanalm/solanalm.git
cd solanalm
# Start development services
docker-compose -f docker/docker-compose.yml up -d
# Check status
docker-compose -f docker/docker-compose.yml ps
# View logs
docker-compose -f docker/docker-compose.yml logs -f
Production Deployment¶
For production, use the dedicated production Docker Compose with TLS and secrets:
# Initialize Docker Swarm (required for secrets)
docker swarm init
# Create secrets
echo "your-secure-jwt-secret-at-least-32-chars" | docker secret create jwt_secret -
echo "your-secure-admin-api-key-at-least-32-chars" | docker secret create admin_api_key -
echo "postgres-password" | docker secret create postgres_password -
# Create treasury wallet
solana-keygen new -o treasury-keypair.json --no-bip39-passphrase
cat treasury-keypair.json | docker secret create treasury_keyfile -
# Set up TLS (place certificates in docker/nginx/ssl/)
# - fullchain.pem
# - privkey.pem
# Deploy production stack
docker stack deploy -c docker/docker-compose.production.yml solanalm
# Or use docker-compose (less secure - secrets as env vars)
docker-compose -f docker/docker-compose.production.yml up -d
Production Features¶
The docker-compose.production.yml includes:
| Feature | Description |
|---|---|
| Nginx TLS | TLS 1.2/1.3 termination with modern ciphers |
| Docker Secrets | Credentials stored securely, not in environment |
| Rate Limiting | Nginx-level limits (30 req/min inference, 10 req/min private) |
| Security Headers | HSTS, CSP, X-Frame-Options, etc. |
| Health Checks | Automatic container health monitoring |
| Resource Limits | Memory/CPU constraints per service |
| Non-root User | Application runs as unprivileged user |
Docker Compose Configuration¶
Basic Setup¶
# docker-compose.yml
version: '3.8'
services:
gateway:
build:
context: .
dockerfile: docker/Dockerfile.gateway
ports:
- "8001:8001"
environment:
- SOLANA_NETWORK=devnet
- GATEWAY_HOST=0.0.0.0
- GATEWAY_PORT=8001
- DATABASE_URL=postgresql://postgres:postgres@db:5432/solanalm
- REDIS_URL=redis://redis:6379
depends_on:
- db
- redis
restart: unless-stopped
inference-node:
build:
context: .
dockerfile: docker/Dockerfile.node
environment:
- NODE_TYPE=inference
- NODE_ID=inference-1
- GATEWAY_URL=http://gateway:8001
- WALLET_ADDRESS=${WALLET_ADDRESS}
depends_on:
- gateway
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
restart: unless-stopped
training-node:
build:
context: .
dockerfile: docker/Dockerfile.node
environment:
- NODE_TYPE=training
- NODE_ID=training-1
- GATEWAY_URL=http://gateway:8001
- WALLET_ADDRESS=${WALLET_ADDRESS}
depends_on:
- gateway
restart: unless-stopped
proxy-node:
build:
context: .
dockerfile: docker/Dockerfile.node
environment:
- NODE_TYPE=proxy
- NODE_ID=proxy-1
- GATEWAY_URL=http://gateway:8001
- WALLET_ADDRESS=${WALLET_ADDRESS}
- OPENAI_API_KEY=${OPENAI_API_KEY}
depends_on:
- gateway
restart: unless-stopped
db:
image: postgres:15-alpine
environment:
- POSTGRES_USER=postgres
- POSTGRES_PASSWORD=postgres
- POSTGRES_DB=solanalm
volumes:
- postgres_data:/var/lib/postgresql/data
restart: unless-stopped
redis:
image: redis:7-alpine
volumes:
- redis_data:/data
restart: unless-stopped
volumes:
postgres_data:
redis_data:
Dockerfiles¶
Gateway Dockerfile¶
# docker/Dockerfile.gateway
FROM python:3.12-slim
WORKDIR /app
# Install system dependencies
RUN apt-get update && apt-get install -y \
build-essential \
curl \
&& rm -rf /var/lib/apt/lists/*
# Install Poetry
RUN pip install poetry
# Copy dependency files
COPY pyproject.toml poetry.lock ./
# Install dependencies
RUN poetry config virtualenvs.create false \
&& poetry install --no-dev --no-interaction
# Copy application code
COPY . .
# Expose port
EXPOSE 8001
# Health check
HEALTHCHECK --interval=30s --timeout=10s --start-period=5s --retries=3 \
CMD curl -f http://localhost:8001/health || exit 1
# Run gateway
CMD ["python", "scripts/run_gateway.py"]
Node Dockerfile¶
# docker/Dockerfile.node
FROM python:3.12-slim
WORKDIR /app
# Install system dependencies
RUN apt-get update && apt-get install -y \
build-essential \
curl \
&& rm -rf /var/lib/apt/lists/*
# Install Poetry
RUN pip install poetry
# Copy dependency files
COPY pyproject.toml poetry.lock ./
# Install dependencies
RUN poetry config virtualenvs.create false \
&& poetry install --no-dev --no-interaction
# Copy application code
COPY . .
# Expose port
EXPOSE 8100
# Run node
CMD ["python", "scripts/run_node.py", "--node-type", "${NODE_TYPE}", "--node-id", "${NODE_ID}"]
GPU-Enabled Dockerfile¶
# docker/Dockerfile.gpu
FROM nvidia/cuda:12.1-runtime-ubuntu22.04
WORKDIR /app
# Install Python and dependencies
RUN apt-get update && apt-get install -y \
python3.12 \
python3-pip \
build-essential \
&& rm -rf /var/lib/apt/lists/*
# Install PyTorch with CUDA
RUN pip3 install torch --index-url https://download.pytorch.org/whl/cu121
# Install Poetry and dependencies
RUN pip install poetry
COPY pyproject.toml poetry.lock ./
RUN poetry config virtualenvs.create false \
&& poetry install --no-dev --no-interaction
COPY . .
EXPOSE 8100
CMD ["python3", "scripts/run_node.py", "--node-type", "inference"]
Environment Configuration¶
Create a .env file:
# .env
# Solana Configuration
SOLANA_NETWORK=devnet
SOLANA_RPC_URL=https://api.devnet.solana.com
# Wallet (for nodes)
WALLET_ADDRESS=YourSolanaWalletAddress
# API Keys (for proxy nodes)
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
# Database
POSTGRES_USER=postgres
POSTGRES_PASSWORD=your-secure-password
DATABASE_URL=postgresql://postgres:your-secure-password@db:5432/solanalm
# Redis
REDIS_URL=redis://redis:6379
# Security
JWT_SECRET=your-jwt-secret-key
Scaling Nodes¶
Scale Inference Nodes¶
# Scale to 3 inference nodes
docker-compose up -d --scale inference-node=3
# Check running containers
docker-compose ps
Custom Scaling Configuration¶
# docker-compose.override.yml
version: '3.8'
services:
inference-node:
deploy:
replicas: 3
resources:
limits:
cpus: '2'
memory: 4G
reservations:
cpus: '1'
memory: 2G
GPU Support¶
Install NVIDIA Container Toolkit¶
# Ubuntu/Debian
distribution=$(. /etc/os-release;echo $ID$VERSION_ID)
curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -
curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | \
sudo tee /etc/apt/sources.list.d/nvidia-docker.list
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo systemctl restart docker
GPU Docker Compose¶
# docker-compose.gpu.yml
version: '3.8'
services:
inference-node-gpu:
build:
context: .
dockerfile: docker/Dockerfile.gpu
environment:
- NODE_TYPE=inference
- NODE_ID=gpu-inference-1
- GATEWAY_URL=http://gateway:8001
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]
Monitoring¶
Add Prometheus and Grafana¶
# docker-compose.monitoring.yml
version: '3.8'
services:
prometheus:
image: prom/prometheus:latest
ports:
- "9090:9090"
volumes:
- ./monitoring/prometheus.yml:/etc/prometheus/prometheus.yml
- prometheus_data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
grafana:
image: grafana/grafana:latest
ports:
- "3000:3000"
volumes:
- grafana_data:/var/lib/grafana
environment:
- GF_SECURITY_ADMIN_PASSWORD=admin
volumes:
prometheus_data:
grafana_data:
Prometheus Configuration¶
# monitoring/prometheus.yml
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'gateway'
static_configs:
- targets: ['gateway:8001']
- job_name: 'nodes'
static_configs:
- targets: ['inference-node:8100']
Networking¶
Custom Network¶
version: '3.8'
networks:
solanalm-net:
driver: bridge
ipam:
config:
- subnet: 172.20.0.0/16
services:
gateway:
networks:
solanalm-net:
ipv4_address: 172.20.0.2
External Access¶
services:
gateway:
ports:
- "0.0.0.0:8001:8001" # All interfaces
# Or specific IP
# - "192.168.1.100:8001:8001"
Persistent Storage¶
Volume Configuration¶
volumes:
postgres_data:
driver: local
driver_opts:
type: none
o: bind
device: /data/postgres
model_cache:
driver: local
driver_opts:
type: none
o: bind
device: /data/models
Model Caching¶
services:
inference-node:
volumes:
- model_cache:/app/models
environment:
- MODEL_CACHE_DIR=/app/models
Health Checks¶
Custom Health Checks¶
services:
gateway:
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8001/health"]
interval: 30s
timeout: 10s
retries: 3
start_period: 40s
inference-node:
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:8100/health"]
interval: 30s
timeout: 10s
retries: 3
Common Operations¶
Restart Services¶
# Restart all
docker-compose restart
# Restart specific service
docker-compose restart gateway
# Restart with rebuild
docker-compose up -d --build gateway
View Logs¶
# All logs
docker-compose logs -f
# Specific service
docker-compose logs -f gateway
# Last 100 lines
docker-compose logs --tail=100 gateway
Update Containers¶
Cleanup¶
# Stop all services
docker-compose down
# Remove volumes (caution: deletes data)
docker-compose down -v
# Remove unused resources
docker system prune -a
Troubleshooting¶
Container Won't Start¶
# Check logs
docker-compose logs gateway
# Check container status
docker-compose ps
# Inspect container
docker inspect solanalm_gateway_1
Connection Issues¶
# Check network
docker network inspect solanalm_default
# Test connectivity
docker-compose exec gateway ping db
Performance Issues¶
# Check resource usage
docker stats
# Inspect specific container
docker stats solanalm_inference-node_1
Next Steps¶
- Kubernetes Deployment - Scale with K8s
- Production Guide - Production best practices