Don’t need persistent storage? Skip to Provider Installation.
This guide shows how to enable persistent storage on your Akash provider using Rook-Ceph.
Time: 45-60 minutes
Prerequisites
Before starting, ensure you have:
Hardware Requirements
See Hardware Requirements - Persistent Storage for detailed specifications:
- Minimum: 4 SSDs across all nodes, OR 2 NVMe SSDs across all nodes
- Drives must be:
- Dedicated exclusively to persistent storage
- Unformatted (no partitions or filesystems)
- NOT used for OS or ephemeral storage
- Recommended: Distribute across multiple nodes for redundancy
Network Requirements
- Minimum: 10 GbE NIC cards for storage nodes
- Recommended: 25 GbE or faster for better performance
Ceph Requirements
For production use:
- Minimum 3 OSDs for redundancy and high availability
- Minimum 2 Ceph managers
- Minimum 3 Ceph monitors
- Minimum 60 GB disk space at
/var/lib/ceph/(each monitor requires 60 GB)
OSDs per drive:
- HDD: 1 OSD max
- SSD: 1 OSD max
- NVMe: 2 OSDs max
Important: Do NOT run multiple OSDs on a single SAS/SATA drive. NVMe drives can achieve improved performance with 2 OSDs per drive.
STEP 1 - Identify Storage Nodes and Devices
List Available Nodes
kubectl get nodesCheck Available Drives
SSH into each potential storage node and list unformatted drives:
lsblk -fLook for drives with no FSTYPE (unformatted). Example output:
NAME FSTYPE LABEL UUID MOUNTPOINTsdasdbsdc ext4 a1b2c3d4-5678-90ab-cdef-1234567890ab /In this example, sda and sdb are unformatted and can be used for Ceph.
Wipe Drives (if needed)
If drives have existing partitions or filesystems, wipe them:
# WARNING: This destroys all data on the drive!sudo wipefs -a /dev/sdasudo wipefs -a /dev/sdbSTEP 2 - Install Rook-Ceph Operator
Run from a control plane node:
Add Rook Helm Repository
helm repo add rook-release https://charts.rook.io/releasehelm repo updateCreate Rook-Ceph Namespace
kubectl create namespace rook-cephInstall Operator
Standard installation (default kubelet path /var/lib/kubelet):
helm install rook-ceph-operator rook-release/rook-ceph \ --namespace rook-ceph \ --version 1.18.7 \ --wait \ --timeout 10mCustom kubelet path (if you configured a custom path in Kubernetes setup):
If you configured a custom kubelet directory (e.g., /data/kubelet), you need to set the CSI kubelet directory:
helm install rook-ceph-operator rook-release/rook-ceph \ --namespace rook-ceph \ --version 1.18.7 \ --set csi.kubeletDirPath="/data/kubelet" \ --wait \ --timeout 10mVerify Operator
kubectl -n rook-ceph get podsExpected output:
NAME READY STATUS RESTARTS AGErook-ceph-operator-xxx 1/1 Running 0 2mrook-discover-xxx 1/1 Running 0 2mrook-discover-yyy 1/1 Running 0 2mSTEP 3 - Deploy Ceph Cluster
Create Cluster Configuration
Create a file with your storage node and device information:
cat > rook-ceph-cluster.values.yml << 'EOF'operatorNamespace: rook-ceph
configOverride: | [global] osd_pool_default_pg_autoscale_mode = on osd_pool_default_size = 3 osd_pool_default_min_size = 2
cephClusterSpec: dataDirHostPath: /var/lib/rook # Change if using custom mount point
mon: count: 3
mgr: count: 2
storage: useAllNodes: false useAllDevices: false deviceFilter: "^sd[ab]" # Adjust to match your devices config: osdsPerDevice: "1" # Set to "2" for NVMe drives nodes: - name: "node1" # Replace with your actual node names - name: "node2" - name: "node3"
cephBlockPools: - name: akash-deployments spec: failureDomain: host replicated: size: 3 parameters: min_size: "2" bulk: "true" storageClass: enabled: true name: beta2 # SSD storage class isDefault: true reclaimPolicy: Delete allowVolumeExpansion: true parameters: imageFormat: "2" imageFeatures: layering csi.storage.k8s.io/provisioner-secret-name: rook-csi-rbd-provisioner csi.storage.k8s.io/provisioner-secret-namespace: rook-ceph csi.storage.k8s.io/controller-expand-secret-name: rook-csi-rbd-provisioner csi.storage.k8s.io/controller-expand-secret-namespace: rook-ceph csi.storage.k8s.io/node-stage-secret-name: rook-csi-rbd-node csi.storage.k8s.io/node-stage-secret-namespace: rook-ceph csi.storage.k8s.io/fstype: ext4
# Do not create default Ceph file systems or object storescephFileSystems:cephObjectStores:
# Spawn rook-ceph-tools for troubleshootingtoolbox: enabled: trueEOFImportant Configuration:
- dataDirHostPath: Default is
/var/lib/rook- If using a custom mount point (e.g., RAID array at
/data), change to/data/rook - This directory stores Ceph monitor and manager data (not OSD data)
- If using a custom mount point (e.g., RAID array at
- deviceFilter: Adjust to match your drives
- SATA/SAS drives:
"^sd[ab]" - NVMe drives:
"^nvme[01]n1"
- SATA/SAS drives:
- osdsPerDevice:
- NVMe drives:
"2" - HDD/SSD drives:
"1"
- NVMe drives:
- nodes: Replace with your actual storage node names from
kubectl get nodes - storageClass name:
- HDD:
beta1 - SSD:
beta2 - NVMe:
beta3
- HDD:
Install Cluster
helm install rook-ceph-cluster rook-release/rook-ceph-cluster \ --namespace rook-ceph \ --version 1.18.7 \ -f rook-ceph-cluster.values.ymlThis will take 5-10 minutes to deploy.
STEP 4 - Verify Ceph Cluster
Check Cluster Status
kubectl -n rook-ceph get cephclusterExpected output:
NAME DATADIRHOSTPATH MONCOUNT AGE PHASE MESSAGE HEALTHrook-ceph /var/lib/rook 3 5m Ready Cluster created successfully HEALTH_OKWait until PHASE is Ready and HEALTH is HEALTH_OK.
Check OSDs
kubectl -n rook-ceph get pods -l app=rook-ceph-osdYou should see OSD pods running on your storage nodes.
Check Ceph Status
Use the Ceph toolbox to check cluster health:
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph statusExpected output:
cluster: id: a1b2c3d4-5678-90ab-cdef-1234567890ab health: HEALTH_OK
services: mon: 3 daemons, quorum a,b,c mgr: a(active), standbys: b osd: 6 osds: 6 up, 6 inCheck Storage Class
kubectl get storageclassExpected output:
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE AGEbeta2 (default) rook-ceph.rbd.csi.ceph.com Delete Immediate 5mSTEP 5 - Test Persistent Storage
Create a test PVC to verify storage is working:
cat > test-pvc.yaml << 'EOF'apiVersion: v1kind: PersistentVolumeClaimmetadata: name: test-pvcspec: accessModes: - ReadWriteOnce resources: requests: storage: 1Gi storageClassName: beta2EOF
kubectl apply -f test-pvc.yamlVerify PVC
kubectl get pvc test-pvcExpected output:
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGEtest-pvc Bound pvc-a1b2c3d4-5678-90ab-cdef-1234567890ab 1Gi RWO beta2 10sStatus should be Bound.
Cleanup
kubectl delete pvc test-pvcTroubleshooting
Check Operator Logs
kubectl -n rook-ceph logs -l app=rook-ceph-operatorCheck Ceph Logs
kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph -skubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph osd treekubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- ceph osd statusCommon Issues
OSDs not starting:
- Verify drives are unformatted:
lsblk -f - Check deviceFilter matches your drives
- Review OSD pod logs:
kubectl -n rook-ceph logs <osd-pod-name>
HEALTH_WARN:
- Check
ceph statusfor specific warnings - Common warnings during initial setup are normal and will resolve
No PVC binding:
- Verify storage class exists:
kubectl get sc - Check CSI provisioner is running:
kubectl -n rook-ceph get pods -l app=csi-rbdplugin-provisioner
Next Steps
Your persistent storage is now ready!
→ Provider Installation - Install the Akash provider
Optional enhancements:
- TLS Certificates - Automatic SSL certificates
- IP Leases - Enable static IPs
Note: You’ll need to configure storage classes in the inventory operator during provider installation to advertise persistent storage capabilities. This is covered in the Provider Installation guide.