At Akash we use the Kubernetes Rook Operator coupled with the Ceph distributed file system to provision Provider persistent storage.
Refer to the Akash Provider guide if your provider has not yet been built.
We encourage becoming familiar with Rook and Ceph prior to configuring Akash persistent storage via this guide. The current persistent storage use is based on the Rook Ceph helm chart.
Please take into consideration the following Akash recommendations:
- Persistent storage should only be enabled on Kubernetes nodes that are NOT serving as control-plane/master nodes. This does not apply if you are running all-in-one node deployment.
- Ceph will only deploy it’s BlueStore on unformatted volumes. A node must have unformatted volumes mounted to serve persistent storage capabilities.
- Ceph uses BlueStore as its default backend to store the objects in a monolithic database-like fashion.
- To read more on Ceph Architecture go here.
Get started within the following sections:
- Persistent Storage Requirements
- Environment Review
- Deploy Persistent Storage
- Check Persistent Storage Health
- Provider Attributes and Pricing Adjustments
- Label Nodes For Storage Classes
- Inventory Operator
- Verify Node Labels for Storage Classes
- Additional Verifications
- Teardown
Persistent Storage Requirements
Environment Overview
When planning persistent storage, take into account the network (between the storage nodes) as a factor which will cause the latency, causing slower disk throughput / IOPS. This might not be suitable for heavy IOPS applications such as Solana validator.
In this case the “all-in-one” provider configuration might be desirable to avoid the network affecting the storage performance. I.e. for the best disk performance, the pods should run where persistent storage has been deployed.
It is advised to run control-plane / etcd separately, for sake of performance and security. We recommend to benchmark your storage with this script before and after deploying Persistent Storage. This will help know the difference before starting to advertising your provider on the Akash network.
Environment Requirements
For hosting of persistent storage please note the following, strict requirements for production use.
At least three Ceph OSDs are normally required for redundancy and high availability.
Single storage node configuration
- At least 3 HDD or SSD disks with 1 OSD per disk; (which makes a total 3 of OSDs)
- At least 2 NVME disks with 2 OSDs per disk; (which makes a total 4 of OSDs)
Three storage nodes configuration
- At least 1 HDD/SSD/NVME disk with 1 OSD per disk over 3 storage nodes; (which makes a total 3 of OSDs)
Maximum OSDs per single drive
- HDD 1 OSD
- SSD 1 OSD
- NVME 2 OSDs
Additional Requirements
- Minimum two Ceph managers
- Minimum three Ceph monitors
- Minimum recommended disk space at /
var/lib/ceph/
is greater than60 GiB
as each Ceph Monitor (ceph mon) requires60 GiB
of disk space - Additional Ceph minimum hardware requirements may be reviewed in the following document:
- Running multiple OSDs on a single SAS / SATA drive is NOT a good idea. NVMe drives, however, can achieve improved performance by being split into two or more OSDs.
- Running an OSD and a monitor or a metadata server on a single drive is also NOT a good idea.
Ceph Prerequisites
In order to configure the Ceph storage cluster, at least one of these local storage options are required:
- Raw devices (no partitions or formatted filesystems)
- Raw partitions (no formatted filesystem)
- PVs available from a storage class in
block
mode
Networking Requirements
We recommend updating the requirements to include 10 Gbps internal networking (NICs and infrastructure) for Ceph nodes. This is the absolute minimum for providers we aim to include. 10G networking is relatively affordable, and higher speeds like 25G or 100G are common in HPC environments, especially when Ceph is involved.
- Minimum Network Requirement: 10 GbE NIC cards for Ceph nodes
Environment Review
Retrieve Node Names
Gather the Kubernetes names of all nodes within your cluster. We will use the node names in a subsequent step.
Example Output
Ensure Unformatted Drives
- Rook-Ceph will automatically discover free, raw partitions. Use the following command on the host that will serve persistent storage to ensure the intended partition as no file system.
Example/Expected Output
- In this example we have can see that the
xvdf
is unformatted and ready for persistent storage use
LVM Package
Ceph OSDs have a dependency on LVM in the following scenarios:
- OSDs are created on raw devices or partitions
- If encryption is enabled (
encryptedDevice: true
in the cluster CR) - A
metadata
device is specified
For persistent storage use the OSDs are created on raw partitions. Issue the following command on each node serving persistent storage.
Storage Class Types
In the subsequent sections persistent storage attributes will be defined. Use the chart below to determine your provider’s storage class.
Class Name | Throughput/Approx matching device | Number of OSD |
---|---|---|
beta1 | hdd | 1 |
beta2 | ssd | 1 |
beta3 | NVMe | 1 or 2 |
Deploy Persistent Storage
Helm Install
Install Helm and add the Akash repo if not done previously by following the steps in this guide.
All steps in this section should be conducted from the Kubernetes control plane node on which Helm has been installed.
Rook has published the following Helm charts for the Ceph storage provider:
- Rook Ceph Operator: Starts the Ceph Operator, which will watch for Ceph CRs (custom resources)
- Rook Ceph Cluster: Creates Ceph CRs that the operator will use to configure the cluster
The Helm charts are intended to simplify deployment and upgrades.
Persistent Storage Deployment
- Note - if any issues are encountered during the Rook deployment, tear down the Rook-Ceph components via the steps listed here and begin anew.
- Deployment typically takes approximately 10 minutes to complete**.**
Migration procedure
If you already have the akash-rook
helm chart installed, make sure to use the following documentation:
Rook Ceph repository
Add Repo
- Add the Rook repo to Helm
- Expected/Example Result
Verify Repo
- Verify the Rook repo has been added
- Expected/Example Result
Deployment Steps
STEP 1 - Install Ceph Operator Helm Chart
TESTING
Scroll further for PRODUCTION
For additional Operator chart values refer to this page.
All In One Provisioner Replicas
For all-in-one deployments, you will likely want only one replica of the CSI provisioners.
- Add following to
rook-ceph-operator.values.yml
created in the subsequent step- By setting
provisionerReplicas
to1
, you ensure that only a single replica of the CSI provisioner is deployed. This defaults to2
when it is not explicitly set.
Default Resource Limits
You can disable default resource limits by using the following yaml config, this is useful when testing:
Install the Operator Chart
PRODUCTION
No customization is required by default.
- Install the Operator chart:
STEP 2 - Install Ceph Cluster Helm Chart
For additional Cluster chart values refer to this page.
For custom storage configuration refer to this example.
TESTING / ALL-IN-ONE SETUP
For production multi-node setup, please skip this section and scroll further for PRODUCTION SETUP
Preliminary Steps
- Device Filter: Update
deviceFilter
to correspond with your specific disk configurations. - Storage Class: Modify the
storageClass
name frombeta3
to an appropriate one, as outlined in the Storage Class Types table. - Node Configuration: Under the
nodes
section, list the nodes designated for Ceph storage, replacing placeholders likenode1
,node2
, etc., with your Kubernetes node names.
Configuration for All-in-One or Single Storage Node
When setting up an all-in-one production provider or a single storage node with multiple storage drives (minimum requirement: 3 drives, or 2 drives if osdsPerDevice
is set to 2):
- Failure Domain: Set
failureDomain
toosd
. - Size Settings:
- The
size
andosd_pool_default_size
should always be set toosdsPerDevice + 1
whenfailureDomain
is set toosd
. - Set
min_size
andosd_pool_default_min_size
to2
. - Set
size
andosd_pool_default_size
to3
. Note: These can be set to2
if you have a minimum of 3 drives andosdsPerDevice
is1
.
- The
- Resource Allocation: To ensure Ceph services receive sufficient resources, comment out or remove the
resources:
field before execution.
PRODUCTION SETUP
Core Configuration
- Device Filter: Update
deviceFilter
to match your disk specifications. - Storage Class: Change the
storageClass
name frombeta3
to a suitable one, as specified in the Storage Class Types table. - OSDs Per Device: Adjust
osdsPerDevice
according to the guidelines provided in the aforementioned table. - Node Configuration: In the
nodes
section, add your nodes for Ceph storage, ensuring to replacenode1
,node2
, etc., with the actual names of your Kubernetes nodes.
Configuration for a Single Storage Node
For a setup involving a single storage node with multiple storage drives (minimum: 3 drives, or 2 drives if osdsPerDevice
= 2):
- Failure Domain: Set
failureDomain
toosd
. - Size Settings:
- The
size
andosd_pool_default_size
should always be set toosdsPerDevice + 1
whenfailureDomain
is set toosd
. - Set
min_size
andosd_pool_default_min_size
to2
. - Set
size
andosd_pool_default_size
to3
. Note: These can be set to2
if you have a minimum of 3 drives andosdsPerDevice
is1
.
- The
- Install the Cluster chart:
STEP 3 - Label the storageClass
This label is mandatory and is used by the Akash’s
inventory-operator
for searching the storageClass.
- Change
beta3
to yourstorageClass
you have picked before
STEP 4 - Update Failure Domain (Single Storage Node or All-In-One Scenarios Only)
When running a single storage node or all-in-one, make sure to change the failure domain from
host
toosd
for the.mgr
pool.
Check Persistent Storage Health
Persistent Storage Status Check
Expected Output
Provider Attributes and Pricing Adjustments
Attribute Adjustments
- Conduct the steps in this section on the Kubernetes control plane from which the provider was configured in prior steps
- Adjust the following key-values pairs as necessary within the
provider-storage.yaml
file created below:- Update the values of the
capabilities/storage/2/class
key to the correct class type (I.e.beta2
). Reference the Storage Class Types doc section for additional details. - Update the region value from current
us-west
to an appropriate value such asus-east
OReu-west
- Update the values of the
- Ensure that necessary environment variables are in place prior to issuing
Caveat on Attributes Updates in Active Leases
- If your provider has active leases, attributes that were used during the creation of those leases cannot be updated
- Example - if a lease was created and is active on your provider with
key=region
andvalue=us-east
- it would not be possible to update theregion
attribute without closing those active leases prior
Helm Chart Update
Capture and Edit provider.yaml File
- In this section we will capture the current provider settings and add necessary persistent storage elements
- NOTE - the
bidpricestoragescale
setting in theprovider.yaml
file will be ignored if the bid pricing script is used.
Capture Current Provider Settings and Write to File
Update provider.yaml File With Persistent Storage Settings
- Open the
provider.yaml
file with your favorite editor (I.e.vi
orvim
) and add the following
And add this attribute if you are not using the bid pricing script:
Finalized provider.yaml File
- Post additions discussed above, your
provider.yaml
file should look something like this:
Upgrade the Helm Install
Expected/Example Output
Verify Provider Settings
- Issue the following command to verify values applied by Helm
Example/Expected Output
Provider Status
- Note - the Helm upgrade will spawn a new provider pod
- Possible the prior provider pod may show with a status of deleting on initial view and then would eventually disappear from output
Expected/Example Output
Label Nodes For Storage Classes
Each node serving persistent storage will automatically get akash.network/capabilities.storage.class.beta3=1
by the inventory-operator
. (This could be beta2
or beta1
instead of beta3
depending on your type of storage)
NOTE - currently the Helm Charts for persistent storage support only a single storageclass per cluster. All nodes in the cluster should be marked as
beta2
- as an example - and cannot have a mix ofbeta2
andbeta3
nodes.
- Ensure that this command is issued - one at a time - for all nodes serving persistent storage
List Kubernetes Node Names
- Use this command to capture the node names for the subsequent step
Inventory Operator
When your Akash Provider was initially installed a step was included to also install the Akash Inventory Operator. In this step we will make any necessary changes to the inventory operator for your specific persisent storage type (I.e. beta1, beta2, or `beta3).
Default Helm Chart - values.yaml file
- The
values.yaml
file for the inventory operator defaults are as follows - As the default cluster storage type includes
beta2
- no update is necessary if this is your persistent storage type and no further action is necessary for inventory operator and you may skip the remainer of this step - If your persistent storage type is instead
beta1
orbeta3
proceed to theUpdate Cluster Storage Cluster Setting
section next
Update Cluster Storage Cluster Setting
- Again this step is only necessary if you have
beta1
orbeta3
persistent storage type - Use this command to update the cluster storage settings
- In the following command example we are updating the chart with
beta3
persistent storage type such as -inventoryConfig.cluster_storage[1]=beta3
. Adjust as necessary for your needs. - The
default
label can be used and left as is in all circumstances.
Expected Output
Verify Node Labels For Storage Classes
Overview
Each node serving persistent storage will automatically get akash.network/capabilities.storage.class.beta3=1
by the inventory-operator
. (This could be beta2
or beta1
instead of beta3
depending on your type of storage)
As these labels are automatically applied and this section we will verify proper labeling.
NOTE - currently the Helm Charts for persistent storage support only a single storageclass per cluster. All nodes in the cluster should be marked as
beta2
- as an example - and cannot have a mix ofbeta2
andbeta3
nodes.
Node Label Verification
Verification Template
- Replace
<node-name>
with actual node name as gathered viakubectl get nodes
Example/Expected Output
Verifications
Several provider verifications and troubleshooting options are presented in this section which aid in persistent storage investigations including:
- Ceph Status and Health
- Ceph Configuration and Detailed Health
- Ceph Related Pod Status
- Kubernetes General Events
Ceph Status and Health
Example Output
Ceph Configuration and Detailed Health
Example Output (Tail Only)
- Ensure the name is correct in the Nodes section
- The
Health
key should have a value ofHEALTH_OK
as shown in example output below - Review any output of interest in the Events section
Ceph Related Pod Status
Example Output
Kubernetes General Events
- Enters a scrolling events output which would display persistent storage logs and issues if present
Example Output from a Healthy Cluster
Teardown
If a problem is experienced during persistent storage enablement, review and follow the steps provided in these guides to begin anew.
- https://rook.io/docs/rook/latest-release/Helm-Charts/ceph-cluster-chart/?h=ceph+cluster+helm+chart#uninstalling-the-chart
- https://rook.io/docs/rook/latest-release/Helm-Charts/operator-chart/?h=ceph+operator+helm+chart#uninstalling-the-chart
- https://rook.io/docs/rook/latest-release/Getting-Started/ceph-teardown/?h=cleaning+up+cluste