Summary
Decentralized cloud computing exchange connects those who need computing resources with those that have computing capacity to lease providers. Based on the Akash Network Whitepaper.
Specification
- Summary
- Specification
- Actors
- Distributed Exchange
- Deployments
- Automation
- History
- Copyright
Workflow
- Tenants define desired infrastructure, workloads to run on infrastructure, and how workloads can connect to one another.
- Desired lifetime of resources is expressed via collateral requirements.
- Orders are generated from the tenant’s definition.
- Datacenters bid on open orders.
- The bid with lowest price gets matched with order to create a lease.
- Once lease is reached, workloads and topology are delivered to datacenter.
- Datacenter deploy workloads and allow connectivity as specified by the tenant.
- If a datacenter fails to maintain lease, collateral is transferred to tenant, and a new order is crated for the desired resources.
Actors
Tenants
A tenant hosting an application on the Akash network
Datacenters
Each datacenter will host an agent which is a mediator between the with the Akash Network and datecenter-local infrastructure.
The datacenter agent is responsible for
- Bidding on orders fulfillable by the datacenter.
- Managing managing active leases it is a provider for.
Validators
A Akash Node that is elected to be a validator in the DPoS consensus scheme.
Marketplace Facilitators
Marketplace facilitators maintain the distributed exchange (marketplace). Validators will initially perform this function.
Distributed Exchange
Global Parameters
Name | Description |
---|---|
reconfirmation-period | Number of blocks between required lease confirmations |
collateral-interest-rate | Interest rate awarded to datacenters for collateral posted with fulfillment orders |
Models
ComputeUnit
Field | Definition |
---|---|
cpu | Number of vCPUs |
memory | Amount of memory in GB |
disk | Amount of block storage in GB |
ResourceGroup
Field | Definition |
---|---|
compute | compute unit definition |
price | Price of compute unit per time unit |
collateral | Collateral per compute unit |
count | Number of defined compute units |
Deployment
A Deployment
represents the state of a tenant’s application. It includes desired infrastructure and pricing parameters, as well as workload definitions and connectivity.
Field | Definition |
---|---|
infrastructure | List of deployment infrastructure definitions |
wait-duration | Amount of time to wait before matching generated orders with fulfillment orders |
DeploymentInfrastructure
DeploymentInfrastructure
represents a set of resources (including pricing) that a tenant would like to be provisioned in a single datacenter.
orders are created from deployment infrastructure as necessary.
Field | Definition |
---|---|
region | Geographic region of datacenter |
persist | Whether or not to maintain active lease if current lease is broken |
resources | List of resource groups for this datacenter |
Within the resources
list, resource group fields are interpreted as follows:
Field | Definition |
---|---|
price | Maximum price tenant is willing to pay. |
collateral | Amount of collateral that the datacenter must post when creating a fulfillment order |
Order
A Order
is generated for each deployment infrastructure present in the deployment.
Field | Definition |
---|---|
region | Geographic region of datacenter |
resources | List of resource groups for this datacenter |
wait-duration | Number of blocks to wait before matching the order with fulfillment orders |
Fulfillment
A Fulfillment
represents a datacenter’s interest in providing the resources requested in a order.
Field | Definition |
---|---|
order | ID of order which is being bid on. |
resources | List of resource groups for this datacenter. |
The resources
list must match the order’s resources
list for each resource group with the following rules:
- the
compute
,count
,collateral
fields must be the same. - the
price
field represents the datacenter’s offering price and must be less than or equal to the order’s price.
The total collateral required to post a fulfillment order is the sum of collateral
fields present in the order’s resources
list.
Lease
A Lease
represents a matching order and fulfillment order.
Field | Definition |
---|---|
deployment-order | ID of order |
fulfillment-order | ID of fulfillment order |
LeaseConfirmation
A LeaseConfirmation
represents a confirmation that the resources are being provided by the datacenter. Its creation may initiate a transfer of
tokens from the tenant to the datacenter
Field | Definition |
---|---|
lease | ID of lease being confirmed |
Transactions
SubmitDeployment
Sent by a tenant to deploy their application on Akash. A order will be created for each datacenter configuration described in the deployment
UpdateDeployment
Sent by a tenant to update their application on Akash.
CancelDeployment
Sent by a tenant to cancel their application on Akash.
SubmitFulfillment
Sent by a datacenter to bid on a order.
CancelFulfillment
Sent by a datacenter to cancel an existing fulfillment order.
SubmitLeaseConfirmation
Sent by a datacenter to confirm a lease that it is engaged in. This should be called once every reconfirmation period rounds.
SubmitLease
Sent by a validator to match a order with a fulfillment order.
SubmitStaleLease
Sent by a validator after finding a lease that has not been confirmed in reconfirmation period rounds.
Workflows
Tenants
Tenants submit their deployment to the network via SubmitDeployment
.
Marketplace Facilitators
Every time a new block is created, each facilitator runs MatchOpenOrders
and InvalidateStaleLeases
MatchOpenOrders
For each order that is ready to be fulfilled (state=open
,wait-duration
has transpired):
- Find the matching fulfillment order with the lowest price.
- Emit a
SubmitLease
transaction to initiate a lease for the matching orders.
InvalidateStaleLeases
For each active lease that has not been confirmed in reconfirmation-period:
- Emit a
SubmitStaleLease
transaction
Datacenters
Every time a new block is created, each datacenter runs ConfirmCurrentLeases
and BidOnOpenOrders
ConfirmCurrentLeases
For each lease currently provided by the datacenter:
- Emit a
SubmitLeaseConfirmation
event for the lease.
BidOnOpenOrders
For each open order:
- If the datacenter is out of collateral, exit.
- If datacenter is not able to fulfill the order, skip to next order.
- Emit a
SubmitFulfillment
transaction for the order
Deployments
Once resources have been procured, clients must distribute their workloads to providers so that they can execute on the leased resources. We refer to the current state of the client’s workloads on the Akash Network as a “deployment”.
A tenant describes their desired deployment in a “manifest”. The manifest contains workload definitions, configuration, and connection rules. Providers use workload definitions and configuration to execute the workloads on the resources they’re providing, and use the connection rules to build an overlay network and firewall configurations.
A hash of the manifest is known as the deployment “version” and is stored on the blockchain-based distributed database.
Workflow
- Stack infrastructure is submitted to the ledger.
- Ask orders are generated for resources defined in the stack infrastructure.
- Providers (data centers) bid on orders.
- Leases are reached by matching bid and ask orders.
- Stack manifest is distributed to deployment data centers (lease providers).
- Datacenters deploy workloads and distribute connection parameters to all other deployment datacenters.
- Overlay network is established to allow for connectivity between workloads.
Manifest Distribution
Each on-chain deployment contains a hash of the manifest. This hash represents the deployment version.
The manifest contains sensitive information which should only be shared with participants of the deployment. This poses a problem for self-managed deployments - Akash must distribute the workload definition autonomously, without revealing its contents to unnecessary participants.
To address these issues, we devised a peer-to-peer file sharing scheme in which lease participants distribute the manifest to one another as needed. The protocol runs off-chain over a TLS connection; each participant can verify the manifest they received by computing its hash and comparing this with the deployment version that is stored on the blockchain-backed distributed database.
In addition to providing private, secure, autonomous manifest distribution, the peer-to-peer protocol also enables fast distribution of large manifests to a large number of datacenters.
Overlay Network
By default, a workload’s network is isolated - nothing can connect to it. While this is secure, it is not practical for real-world applications. For example, consider a simple web application: end-tenant browsers should have access to the web tier workload, and the web tier needs to communicate to the database workload. Furthermore, the web tier may not be hosted in the same datacenter as the database.
On the Akash Network, clients can selectively allow communications to and between workloads by defining a connection topology within the manifest. Datacenters use this topology to configure firewall rules and to create a secure network between individual workloads as needed.
To support secure cross-datacenter communications, providers expose workloads to each other through a mTLS tunnel. Each workload-to-workload connection uses a distinct tunnel.
Before establishing these tunnels, providers generate a TLS certificate for each required tunnel and exchange these certificates with the necessary peer providers. Each provider’s root certificate is stored on the blockchain-based distributed database, enabling peers to verify the authenticity of the certificates it receives.
Once certificates are exchanged, providers establish an authenticated tunnel and connect the workload’s network to it. All of this is transparent to the workloads themselves - they can connect to one another through stable addresses and standard protocols.
Models
Stack
A stack is a description of all components necessary to deploy an application on the Akash Network.
A stack includes:
- Infrastucture requirements.
- Manifest of workloads to deploy on procured infrastructure.
Manifest
A manifest describes workloads and how they should be deployed.
A manifest includes:
- Workloads to be executed.
- Data center placement for each workload.
- Connectivity rules describing which entities are allowed to connect to each workload.
Deployment
A deployment represents the current state of a stack as fulfilled by the Akash Network.
- Infrastructure procured via the cloud exchange (leases).
- Manifest distribution state.
- Overlay network state.
Workload
Field | Description |
---|---|
name | Workload name |
container | Docker container |
compute | resources needed for each instance |
count | number of instances to run |
connections | List of allowed incomming connections |
Connection
Field | Description |
---|---|
port | TCP port |
workload | Workload name to allow incomming connection from |
datacenter | Datacenter to allow incomming connection from |
global | If true , allow all connections, regardless of source |
LeasedWorkload
Field | Description |
---|---|
lease | Lease ID |
workload | Workload name |
certificate | SSL certificate for workload |
addresses | List of (address,port) for connecting to remote workload |
Automation
The dynamic nature of cloud infrastructure is both a blessing and a curse for operations management. That new resources can be provisioned at will is a blessing; the exploding management overhead and complexity of said resources is a curse. The goal of DevOps — the practice of managing deployments programmatically — is to alleviate the pain points of cloud infrastructure by leveraging its strengths.
The Akash Network was built from the ground up to provide DevOps engineers with a simple but powerful toolset for creating highly-automated deployments. The toolset is comprised of the primitives that enable non-management applications — generic workloads and overlay networks — and can be leveraged to create autonomous, self-managed systems.
Self-managed deployments on Akash are a simple matter of creating workloads that manage their own deployment themselves. A DevOps engineer may employ a workload that updates DNS entries as providers join or leave the deployment; tests response times of web tier applications; and scales up and down infrastructure (in accordance with permissions and constraints defined by the client) as needed based on any number of input metrics. The “management tier” may be spread across all datacenters for a deployment, with global state maintained by a distributed database running over the secure overlay network.
Examples
Latency-Optimized Deployment
Many web-based applications are “latency-sensitive” - lower response times from application servers translates into a dramatically improved end-tenant experience. Modern deployments of such applications employ content delivery networks (CDNs) to deliver static content such as images to end tenants quickly.
CDNs provide reduced latency by distributing content so that it is geographically close to the tenants that are accessing it. Deployments on the Akash Network can not only replicate this approach, but beat it - Akash gives clients the ability to place dynamic content close to an application’s tenants.
To implement a self-managed “dynamic delivery network” on Akash, a DevOps engineer would include a management tier in their deployment which monitors the geographical location of clients. This management tier would add and remove datacenters across the globe, provisioning more resources in regions where tenant activity is high, and less resources in regions where tenant participation is low.
Machine Learning Deployment
Machine learning applications employ a large number of nodes to parallelize computations involving large datasets. They do their work in “batches” - there is no “steady state” of capacity that is required.
A machine learning application on Akash may use a management tier to proactively procure resources within a single datacenter. As a machine learning task begins, the management tier can “scale up” the number of nodes for it; when a task completes, the resources provisioned for it can be relinquished.
History
March 8, 2018: Initial Design based on Akash Whitepaper
Copyright
All content herein is licensed under Apache 2.0.