The Cluster Service is the core orchestration component that manages the provider’s Kubernetes cluster. It handles resource reservations, deployment lifecycle, hostname management, and inventory tracking.
Purpose
The Cluster Service acts as the bridge between Akash blockchain events and Kubernetes resources by:
- Resource Reservation - Reserves cluster resources when bidding on orders
- Deployment Management - Creates and updates tenant deployments
- Hostname Tracking - Manages custom hostname assignments
- Inventory Synchronization - Tracks available resources in real-time
- Lease Lifecycle - Monitors lease state and handles cleanup
Architecture
Service Structure
Implementation: cluster/service.go
type service struct { session session.Session client Client bus pubsub.Bus sub pubsub.Subscriber
inventory *inventoryService hostnames *hostnameService
managers map[mtypes.LeaseID]*deploymentManager managerch chan *deploymentManager
config Config}Key Components
- Inventory Service - Tracks available cluster resources
- Hostname Service - Manages custom hostname reservations
- Deployment Managers - One per active lease, manages Kubernetes resources
- Event Bus - Receives blockchain events (lease won, manifest received)
- Kubernetes Client - Interacts with cluster API
Service Initialization
1. Service Creation
The provider service creates the cluster service on startup:
Source: service.go
cluster, err := cluster.NewService( ctx, session, bus, client, waiter, clusterConfig,)2. Configuration
Cluster configuration is loaded from provider settings:
inventory-resource-poll-period: 5sinventory-resource-debug-frequency: 10cpu-commit-level: 1.0memory-commit-level: 1.0storage-commit-level: 1.0blocked-hostnames: - ".internal.local"deployment-ingress-domain: provider.example.comdeployment-ingress-static-hosts: trueConfiguration Parameters:
inventory-resource-poll-period- How often to poll inventory (default: 5s)cpu-commit-level- CPU overcommit ratio (1.0 = no overcommit)memory-commit-level- Memory overcommit ratiostorage-commit-level- Storage overcommit ratioblocked-hostnames- List of blocked hostnames/domainsdeployment-ingress-domain- Provider’s base domaindeployment-ingress-static-hosts- Enable static hostname assignment
3. Initial State Discovery
On startup, the cluster service discovers existing state:
# Find existing deploymentsdeployments, err := findDeployments(ctx, log, client)
# Find existing hostnamesallHostnames, err := client.AllHostnames(ctx)
# Initialize inventory serviceinventory, err := newInventoryService(ctx, cfg, log, sub, client, waiter, deployments)
# Initialize hostname servicehostnames, err := newHostnameService(ctx, cfg, activeHostnames)Rebuilds State From:
- Kubernetes namespaces (pattern:
lease-*) - Existing ingress resources
- Active pods and services
- Persistent volumes
Resource Reservation
Reservation Process
When the bid engine decides to bid on an order:
// 1. Bid engine requests reservationreservation, err := cluster.Reserve(orderID, resourceGroup)
// 2. Inventory service checks availabilityavailable := inventory.checkAvailability(resourceGroup)
// 3. Create reservation if resources availableif available { reservation = inventory.reserve(orderID, resourceGroup)}
// 4. Reservation held until lease won or bid lostReservation Lifecycle
Order Created ↓Bid Placed → Reserve() → Resources Reserved ↓ ↓Lease Won Lease Lost ↓ ↓Deploy Unreserve()What Gets Reserved
CPU & Memory:
resources: cpu: 1000m # 1 CPU core memory: 2Gi # 2 GB RAMStorage:
storage: - size: 10Gi class: beta3 # Or: default, beta2, beta1Endpoints:
expose: - port: 80 as: 80 to: - global: true # Reserves external portGPUs:
resources: gpu: units: 1 attributes: vendor: nvidia: - model: rtx4090Deployment Management
Deployment Manager
Each active lease has a dedicated deployment manager:
Implementation: cluster/manager.go
type deploymentManager struct { lease mtypes.LeaseID mgroup *manifest.Group state deploymentState
bus pubsub.Bus client Client session session.Session
updatech chan *manifest.Group teardownch chan struct{}}Deployment States
const ( dsDeployActive = iota // Running normally dsDeployPending // Waiting for manifest dsDeployComplete // Lease closed dsDeployError // Error state)Deployment Lifecycle
1. Lease Won Event
When a lease is won, the cluster service creates a deployment manager:
// Event received from blockchaincase event.ManifestReceived: leaseID := ev.LeaseID mgroup := ev.ManifestGroup()
// Create deployment manager manager := newDeploymentManager(s, leaseID, mgroup, true) s.managers[leaseID] = manager2. Manifest Received
The deployment manager receives the manifest and deploys to Kubernetes:
// Deployment manager processes manifestfunc (dm *deploymentManager) deploy(mgroup *manifest.Group) error { // 1. Reserve hostnames withheldHostnames, err := dm.hostnameService.ReserveHostnames(...)
// 2. Create Kubernetes resources err = dm.client.Deploy(dm.lease, mgroup)
// 3. Publish event dm.bus.Publish(event.ManifestReceived{ LeaseID: dm.lease, Group: mgroup, })}3. Kubernetes Resources Created
The Kubernetes client creates resources in the lease namespace:
Namespace: lease-<owner>-<dseq>-<gseq>-<oseq>
Resources Created:
- Deployments - For stateless services
- StatefulSets - For persistent storage services
- Services - ClusterIP, NodePort, LoadBalancer
- Ingress - For HTTP(S) endpoints
- PersistentVolumeClaims - For storage
- ConfigMaps - For environment variables
- Secrets - For sensitive data
4. Deployment Update
When a tenant updates their deployment:
case event.ManifestReceived: manager := s.managers[ev.LeaseID] if manager != nil { // Update existing deployment err := manager.update(ev.ManifestGroup()) }Update Process:
- Parse new manifest
- Compare with existing resources
- Update changed resources
- Rolling update for pods
- Update hostname reservations
5. Lease Closure
When a lease ends:
// 1. Receive lease closed eventcase event.LeaseClosed: manager := s.managers[ev.LeaseID]
// 2. Teardown deployment manager.teardown()
// 3. Delete Kubernetes resources client.TeardownLease(ev.LeaseID)
// 4. Release hostnames hostnameService.ReleaseHostnames(ev.LeaseID)
// 5. Unreserve inventory inventory.unreserve(ev.LeaseID.OrderID())
// 6. Remove manager delete(s.managers, ev.LeaseID)Event Processing
The cluster service runs a perpetual event loop:
func (s *service) run(ctx context.Context, deployments []Deployment) { // Create managers for existing deployments for _, deployment := range deployments { manager := newDeploymentManager(s, deployment.LeaseID(), ...) s.managers[deployment.LeaseID()] = manager }
// Event loop for { select { case ev := <-s.sub.Events(): switch ev := ev.(type) { case event.ManifestReceived: s.handleManifestReceived(ev) case event.LeaseWon: s.handleLeaseWon(ev) case event.LeaseClosed: s.handleLeaseClosed(ev) } case manager := <-s.managerch: s.handleManagerComplete(manager) case req := <-s.statusch: req <- s.getStatus() } }}Event Types
Blockchain Events:
LeaseWon- New lease awarded to providerLeaseClosed- Lease ended by tenant or providerManifestReceived- Tenant sent deployment manifest
Internal Events:
DeploymentManagerComplete- Deployment fully terminatedInventoryUpdated- Resource availability changedHostnameReserved- Hostname assigned to deployment
Hostname Integration
The cluster service integrates with the hostname service:
// Reserve hostnames for deploymentwithheldHostnames, err := s.hostnames.ReserveHostnames( ctx, hostnames, leaseID,)
// Release hostnames on lease closeerr = s.hostnames.ReleaseHostnames(leaseID)
// Transfer hostname between deployments (same owner)err = s.TransferHostname( ctx, newLeaseID, hostname, serviceName, externalPort,)Inventory Integration
The cluster service relies on the inventory service for resource tracking:
// Reserve resources for bidreservation, err := s.inventory.Reserve(orderID, resourceGroup)
// Unreserve if bid lost or lease closederr = s.inventory.Unreserve(orderID)
// Check current inventory statusstatus := s.inventory.Status()Inventory Tracks:
- Available CPU, memory, storage
- Allocated resources per lease
- GPU availability and models
- External port pool
- Storage classes and capacity
Source Code Reference
Primary Implementation:
cluster/service.go- Main cluster servicecluster/manager.go- Deployment managercluster/inventory.go- Inventory trackingcluster/hostname.go- Hostname managementcluster/kube.go- Kubernetes client wrapper
Key Functions:
NewService()- Initialize cluster serviceReserve()- Reserve resources for bidUnreserve()- Release reserved resourcesnewDeploymentManager()- Create deployment managerfindDeployments()- Discover existing deployments
Related Documentation
- Provider Service Overview - High-level architecture
- Bid Engine - Bidding logic
- Manifest Service - Manifest handling
- Operators - Inventory, IP, Hostname operators