SDL Best Practices

Write efficient, secure, and cost-effective SDL configurations.

Follow these best practices to optimize your deployments on Akash Network.


Resource Optimization

Right-Size Your Resources

Don’t over-provision - Start small and scale up based on actual usage.

# **Bad: Over-provisioned
profiles:
compute:
web:
resources:
cpu:
units: 8.0 # Too much for a simple web app
memory:
size: 16Gi # Excessive
storage:
size: 500Gi # Way more than needed
# **Good: Right-sized
profiles:
compute:
web:
resources:
cpu:
units: 0.5 # Sufficient for most web apps
memory:
size: 512Mi # Appropriate
storage:
size: 1Gi # Adequate

Use Fractional CPU Units

For lightweight applications, use fractional CPU units to reduce costs:

resources:
cpu:
units: 0.1 # 100 millicores
# or
units: "100m" # Same as 0.1

Common CPU allocations:

  • Static sites: 0.1 - 0.25
  • Web applications: 0.5 - 1.0
  • Databases: 1.0 - 2.0
  • AI/ML workloads: 4.0+

Memory Sizing Guidelines

# Minimum viable sizes
memory:
size: 128Mi # Minimal static sites
size: 256Mi # Small apps
size: 512Mi # Standard web apps
size: 1Gi # Medium apps with caching
size: 2Gi+ # Databases, heavy workloads

Cost Management

Fund with ACT (Stable USD-Pegged Pricing)

Deployment funding uses ACT (USD-pegged compute credit). Fund your deployment with ACT (via Console credit card or by burning AKT to mint ACT). Use ACT for escrow unless the circuit breaker is in effect, in which case you can top up with AKT to keep deployments running. The network uses ACT (denom uact) for escrow and provider payouts.


Security Best Practices

Use Private Container Registries

Protect proprietary images with credentials:

services:
web:
image: registry.example.com/private/app:latest
credentials:
host: https://registry.example.com
username: myuser
password: mypassword # Use environment variables in production

Security tips:

  • Never commit credentials to version control
  • Use environment variables or secrets management
  • Rotate credentials regularly
  • Use read-only registry tokens when possible

Limit Exposure

Only expose ports that need to be publicly accessible:

# **Bad: Exposing everything
services:
web:
expose:
- port: 80
to:
- global: true # Public
- port: 3306 # Database port
to:
- global: true # **Don't expose databases publicly!
# **Good: Selective exposure
services:
web:
expose:
- port: 80
to:
- global: true # Public web traffic
database:
expose:
- port: 3306
to:
- service: web # **Only accessible to web service

Use Accept Lists for Custom Domains

Restrict access to specific domains:

expose:
- port: 80
accept:
- example.com
- www.example.com
to:
- global: true

Reliability and Availability

Use Health Checks (HTTP Options)

Configure timeouts and retries for production reliability:

expose:
- port: 80
to:
- global: true
http_options:
max_body_size: 104857600 # 100MB
read_timeout: 60000 # 60 seconds
send_timeout: 60000 # 60 seconds
next_tries: 3 # Retry 3 times
next_timeout: 10000 # 10 second timeout between retries
next_cases: # Retry on these errors
- error
- timeout
- 500
- 502
- 503

Use Persistent Storage for Stateful Apps

Never use ephemeral storage for critical data:

# **Bad: Using ephemeral storage for database
storage:
- size: 10Gi # Lost on restart!
# **Good: Using persistent storage
storage:
- size: 1Gi # Ephemeral for temp files
- name: db-data
size: 10Gi
attributes:
persistent: true # **Survives restarts
class: beta3 # Storage class

Performance Optimization

Optimize Storage Configuration

Separate ephemeral and persistent storage:

profiles:
compute:
web:
resources:
storage:
- size: 1Gi # Ephemeral: OS, temp files
- name: app-data
size: 5Gi
attributes:
persistent: true # Persistent: Application data
class: beta3

Storage best practices:

  • Use ephemeral storage for temporary files, caches, logs
  • Use persistent storage for databases, user uploads, configuration
  • Don’t over-allocate - storage costs add up
  • Consider using object storage (S3-compatible) for large files

Configure Storage Mounts

Mount persistent storage at the correct paths:

services:
database:
image: postgres
params:
storage:
db-data:
mount: /var/lib/postgresql/data
readOnly: false

Use RAM Storage for Shared Memory (SHM)

RAM storage is for shared memory (/dev/shm) only, not general caching:

storage:
- name: shm
size: 512Mi
attributes:
persistent: false
class: ram # Shared memory only (/dev/shm)

Note: RAM storage class is specifically for applications that require shared memory (e.g., Chrome, machine learning frameworks). For general caching, use ephemeral storage or an in-memory database like Redis.


Multi-Service Deployments

Service-to-Service Communication

Use internal networking for service communication:

services:
frontend:
image: nginx:1.25.3
env:
- API_URL=http://backend:3000 # Use service name as hostname
expose:
- port: 80
to:
- global: true
backend:
image: node-api
expose:
- port: 3000
to:
- service: frontend # Only accessible to frontend

Internal networking benefits:

  • No public exposure of internal services
  • Lower latency
  • Automatic service discovery

Environment Variable Management

Organize environment variables logically:

services:
web:
env:
# Application config
- NODE_ENV=production
- PORT=3000
# Database connection
- DB_HOST=database
- DB_PORT=5432
- DB_NAME=myapp
# External services
- REDIS_URL=redis://cache:6379
- API_KEY=your-api-key # Use secrets management in production

GPU Workloads

Specify GPU Requirements Precisely

Be specific about GPU requirements to ensure compatibility:

resources:
gpu:
units: 1
attributes:
vendor:
nvidia:
- model: rtx4090 # Specific model
ram: 24GB # Optional: minimum VRAM
interface: pcie # Optional: interface type

GPU selection tips:

  • Specify exact model when possible (e.g., a100, rtx4090)
  • Use wildcards sparingly (may get slower GPUs)
  • Include RAM requirement for VRAM-intensive workloads
  • Consider cost vs. performance tradeoffs

GPU Vendor Options

# NVIDIA GPUs
vendor:
nvidia:
- model: a100
- model: rtx4090
- model: rtx3090

Provider Selection

Use Provider Attributes

Target specific provider characteristics:

placement:
us-west:
attributes:
region: us-west # Geographic region
tier: premium # Provider tier
datacenter: equinix # Specific datacenter
pricing:
web:
denom: uact
amount: 100

Common attributes:

  • region: Geographic location (us-west, eu-central, asia-east)
  • tier: Provider quality tier
  • datacenter: Specific datacenter provider
  • Custom attributes set by providers

Use Signed Providers (Audited)

For production workloads, prefer audited providers:

placement:
production:
signedBy:
anyOf:
- akash1... # Auditor address
allOf:
- akash1... # Required auditor
pricing:
web:
denom: uact
amount: 150 # May cost more for audited providers

Testing and Validation

Test Locally First

Validate your SDL before deploying:

TypeScript:

import { SDL } from "@akashnetwork/chain-sdk";
const yamlContent = `... your SDL here ...`;
try {
const sdl = SDL.fromString(yamlContent, "beta3", "mainnet");
console.log("SDL is valid!");
} catch (error) {
console.error("SDL validation failed:", error.message);
}

Go:

import "pkg.akt.dev/go/sdl"
sdlDoc, err := sdl.ReadFile("deploy.yaml")
if err != nil {
log.Fatalf("SDL validation failed: %v", err)
}
// Validate deployment groups
groups, err := sdlDoc.DeploymentGroups()
if err != nil {
log.Fatalf("Invalid deployment groups: %v", err)
}

Start with Sandbox

Test deployments on sandbox before mainnet:

# Sandbox configuration
placement:
test:
pricing:
web:
denom: uact
amount: 10 # Sandbox tokens are free from faucet

Sandbox Limitations:

  • Limited provider resources (smaller CPU/memory/storage available)
  • Limited or no GPU availability
  • Fewer providers overall

If you receive no bids on sandbox (especially for GPU or high-resource deployments), deploy directly to mainnet where more providers and resources are available.

Use Version Control

Track SDL changes with git:

Terminal window
git init
git add deploy.yaml
git commit -m "Initial SDL configuration"

Documentation and Maintenance

Comment Your SDL

Add comments to explain complex configurations:

services:
web:
image: nginx:1.25.3 # Pinned version for stability
expose:
- port: 80
http_options:
max_body_size: 10485760 # 10MB - prevents large upload attacks

Pin Image Versions

Use specific image tags instead of latest:

# **Bad: Unpredictable updates
image: nginx:latest
# **Good: Predictable, reproducible
image: nginx:1.25.3
# **Also good: Digest for immutability
image: nginx@sha256:abc123...

Keep SDL Files Organized

Structure for multi-environment deployments:

deployments/
├── base.yaml # Common configuration
├── dev.yaml # Development overrides
├── staging.yaml # Staging configuration
└── production.yaml # Production configuration

Common Pitfalls to Avoid

**Don’t Use Excessive Resources

# Wastes money and reduces available providers
cpu:
units: 32.0
memory:
size: 128Gi

**Don’t Expose Databases Publicly

# Security risk!
services:
database:
expose:
- port: 5432
to:
- global: true # **Never do this

**Don’t Use Ephemeral Storage for Databases

# Data loss on restart!
storage:
- size: 10Gi # **Not persistent

**Don’t Forget to Set Pricing

# Will fail to deploy without pricing
placement:
akash:
# **Missing pricing section

Checklist for Production Deployments

Before deploying to production, verify:

  • Resources are right-sized (not over-provisioned)
  • Pricing is set in placement section
  • Image versions are pinned (not latest)
  • Sensitive data uses credentials, not hardcoded values
  • Databases use persistent storage with appropriate size
  • Only necessary ports are exposed publicly
  • HTTP options are configured for reliability
  • Provider attributes target appropriate infrastructure
  • SDL is tested on sandbox first (or mainnet for GPU/high-resource workloads)
  • Configuration is documented with comments
  • Backup strategy is in place for persistent data