Development ICF Whitepaper
Introduction
Regardless of how large the Depin sector grows there will, by design, always be unused capacity on the equipment in this ecosystem. Brokers try to maintain some unrented capacity on their market so new clients have inventory to rent. Products focusing on specific computation tasks often don’t have consistent demand throughout the day. Even tasks with unlimited demand such as cryptocurrency mining scale up and down due to the economics of the underlying tokens. As a result, there is a significant degree of equipment underuse globally.
On the other side of this market are a plethora of computation marketplaces that this underutilized equipment could be applied to. These marketplaces range from general to very specific. For example there are half a dozen “Api as a service” marketplaces where equipment could be used for any open source, pure functions such as AI inference, generic job board style marketplaces that can be used for batch compute transformations, rendering, or AI training, and task specific marketplaces such as Morpheus that focus on a specific niche and provide a better UX for tasks where the demand justifies that extra degree of polish.
The skillset involved in operating a data center and scaling hardware is very different from managing a software stack that integrates all of these marketplaces to source demand so we likely won’t see many AI neoclouds doing all of this integration work. Furthermore, the excess capacity is (hopefully) only 10-30% of broker marketplace capacity so it won’t be the priority of brokers either. Software focused companies are focused on optimizing their own task-dedicated product line and they just overprovision as much as is necessary to ensure that hardware limits don’t throttle their company growth. This behavior creates an ecosystem-wide opportunity for Ceti AI. Rather than having each such company implement all of these integrations and the automation to scale them, Ceti AI can do this work and avoid duplicating effort.
The Intelligent Compute Fabric (ICF) is a system that sources demand from a variety of marketplace integrations and distributes it globally to our partners. It’s a sort of switchboard that routes demand for compute to supply of compute.
In addition to saving effort by deduplicating work amongst our partners the ICF is also likely to increase the ecosystem rate for our work by participating in a legal form of price fixing. For example, if each broker were to write their own OpenRouter integration and manage their own inference servers on that marketplace we would quickly drive the listing cost on that marketplace to zero. If they opt into the ICF instead, there is only 1 marketplace listing without that price competition and we will be able to list inference for just below the web2 rates instead of racing to zero amongst web3 competitors.
The ICF has four primary functions:
- Source demand for computation.
- Manage distributed applications that can fulfill this demand.
- Dispatch demand to the applications in a fair way.
- Compensate partners for work done.
There’s also numerous secondary functions such as monitoring resource usage and security code that is essential but does not in itself make revenue. This document will review the overall architecture of the ICF and dive into each of these primary functions to describe their component architecture.
Architecture Overview
At a high level each of these primary functions has a dedicated subsystem with a few shared systems such as databases that facilitate the work of multiple subsystems. These correspond neatly with the primary functions described above.
Demand Integrations source demand for computation from everywhere we can. This list of work sources is expected to grow over time as the ecosystem matures. We have already implemented multiple inference marketplaces and are closely following the testnet efforts of multiple training marketplaces. Work on the demand integrations is expected to be a long-tail effort with incremental gains but we’re entirely open to bringing on some large demand source like Venice, Perplexity, or Anthropic if we can demonstrate scale, lower costs, and acceptable reliability for them.
The Distributed Application Controller is responsible for allocating applications to the available inventory of machines. It has to gracefully react to machines going offline, clusters being enlisted or removed, changing demand patterns in inference across models and throughout the day, and changes to the pricing of various models or available async work. This is most of the “intelligence” in the Intelligent Compute Fabric.
The Dispatcher is responsible for assigning and routing work to an instance of a distributed application. It authorizes the purchaser, load balances according to capacity and latency requirements, and tracks work performed for billing purposes.
The Payment Manager manages partner accounts, provides transparency into our billing and payment model, tracks the work done by each ICF enlisted cluster, and facilitates payments to our partners.
Distributed Applications are scheduled by the Distributed Application Controller and execute the work sent to them by the Dispatcher from the Demand Integrations.
Demand Source Integration
While the ICF is designed to be extensible to perform any type of work we chose to focus on sourcing AI demand and allocating it to NVidia GPUs. This is the applicable market sector experiencing the most growth and with the fewest established players to try to disrupt.
Inference
Currently, inference has the best defined marketplaces in AI. These integrations tend to judge you based on your error rate, latency, and most importantly cost. Whether web2 or web3 these have standardized on either subscription plans or paying by the token. On the web2 side there are dedicated API marketplaces at major Fintech giants like Stripe that can wrap arbitrary APIs, inspect the traffic for billing purposes, and handle all the aspects of charging users for access. On the web3 side there are frontends like Venice that could plug demand into the ICF to avoid having to manage the underlying cloud themselves, session based marketplaces like Morpheus or Gaia and new restaking AVSs coming online like Ritual. We think we have already identified the largest players in this space and have either integrated with them or are looking to do so. We’d of course be happy to plug directly into something like Venice or Perplexity that use open source models but otherwise there is a long tail of marketplace services that we source incremental demand from.
Job Boards
The less established marketplaces for AI have to do with fine tuning. Of these systems they generally take two forms: job boards for larger or terminating tasks and synthetic clusters for smaller continuous tasks.
The job board style marketplaces are still being developed but their form factor is akin to batch compute systems like Microsoft Cosmos, Google BigQuery, or Snowflake. Users post work to be done, the provider pulls all the resources from this description and post results back to the user. We’re not aware of any such marketplaces in the web2 space that we can plug compute into but there are a bunch under development in web3 such as Golem, CoopHive, and SymRes.
Overall, this style of work tends to be lower paying per unit time but without strict latency requirements which makes it ideal to fill in gaps in demand from the inference system.
We’re focusing on AI training to start but the system is compatible with any other batch transform work such as batch inference or rendering as long as the work can be estimated and completed with capacity not needed for the higher paying work.
Scalable Demand Sources
The synthetic cluster style marketplaces look more akin to Bitcoin mining. Characteristically they have practically infinite work to be done but usually dilute payment as the amount of work being done scales. This is what we see with networks like PoW hashing, GenSyn, etc. These tend to be the most permissionless systems. They also tend to be the lowest paying. They are ideal for consuming whatever excess capacity the ICF has at its disposal after the batch job queue has either been cleared or there is not enough guaranteed capacity to commit to more work.
Managing Applications on Partner Clouds
The ICF is designed as a federated Kubernetes system. It can manage an inventory of nodes on connected clusters, deploy applications to them as Kubernetes pods, and route work to them. The applications fulfilling work requests run on ICF enlisted clouds.
Autoscaler Data System
The ICF gathers two information feeds from ICF enlisted clusters. First it maintains an inventory of nodes and their resources (e.g. GPUs). Second it receives time-stream resource utilization data (e.g. CPU, RAM, disk, network IO, GPU vRAM, and TensorCore utilization).
Each enlisted cloud publishes this data to Ceti AI for scheduling purposes and refinement purposes. This is combined with benchmark data we have on each application to estimate the capacity for the enlisted nodes. That is combined with pricing information to calculate a revenue optimized allocation of ICF enabled resources.
Distributed Application Management
Applications are scheduled to ICF enabled clusters using a federated Kubernetes controller. This amounts to a bunch of helm charts created by the Federated Autoscaler which are mirrored to ICF enlisted clusters by Kubernetes Controllers on the Ceti AI cluster and ICF enlisted cluster. On the ICF enlisted cluster these deployment configurations are picked up by the local control plane and deployed as any local deployment manifest would be.
Batch jobs are something of a hybrid between an application and a function call to an application and they frequently will first require a data pull before execution can begin. Both the data pull and the execution are scheduled as Kubernetes jobs instead of deployments but otherwise use all the same subsystems. The only difference is the Ceti AI dispatcher is proactively scheduling this work at the behest of the autoscaler rather than reacting to external calls.
Autoscaler
The autoscaler is constantly monitoring inventory changes, capacity updates, resource utilization, and demand per deployment configuration. We assume that nodes can be preempted from their work at any time to be rented or otherwise used by the provider. Similarly nodes can be added to clusters and new clusters can enlist with the ICF at any time. In all cases the autoscaler is notified of these inventory changes and reallocates inventory to distributed applications dynamically.
Even absent inventory changes, the Autoscaler also reacts to changing supply and demand conditions. For example, cards are allocated over time to models with more demand and during off-peak hours more cards are allocated to latency insensitive tasks such as model fine tuning. To do this it looks at utilization metrics across all ICF enlisted clusters and reacts whenever some customizable thresholds are crossed.
Distributed applications are allocated capacity in terms of economic value of each application per capacity of the underlying hardware. Usually this means that inference servers have higher priority, batch processing jobs take second priority, and scalable demand jobs take last priority but it’s entirely up to the economics of the market.
Managing Applications on Partner Clouds
The ICF has to dispatch (route) requests from the demand integrations to the distributed applications according to capacity and latency requirements. This makes it a sort of switchboard for global AI workloads.
Two things we are sensitive to on this topic are:
- Provably fair billing. We need to be able to observe the output of work to ensure capacity isn’t being spoofed.
- Firewalls. Many distributed applications are going to run in environments that don’t allow inbound network access or don’t have a public IP/DNS available for us.
The solution to both for most cases is to route the requests/responses through a Ceti AI cloud so that we can observe the work and ensure we have a reachable endpoint a buyer can call. Minimizing latency while doing this may require Ceti AI clouds with dispatchers in various regions. Once we have a request inside our cloud, the ICF maintains an active bidirectional streaming connection to a proxy on each ICF enlisted cluster. This enables us to reach distributed applications while making the fewest assumptions about the network topology of the ICF enlisted cloud.
The last part of this system is just an API gateway for each demand source integration that requires one. These typically work by creating a unique path on our primary IP, e.g. taoceti.ai/stripe-inference would be a suitable domain+path to register our inference services with the Stripe marketplace. Inside our cluster, we just check an authorization header on each request to ensure it came from Stripe and then proceed with dispatching.
Compensating Partners
We saw two possibilities for compensation structure to ICF enlisted clouds. The first option was to compensate each node for that nodes work in isolation. The second option was to socialize rewards amongst similar hardware.
The advantage of a node by node payment structure is that each node would have the full visibility of work it did and the rate of payment for that work so it could fully audit the revenue it was entitled to. This is a trust minimizing solution.
The advantage of a pooled reward system is it gives all node operators a mean payment and reduces a lot of manipulation possibilities in the system. The difference between median and mean payment can be quite high because the economic value of different demand sources in the ICF can be quite high. The outcomes here resemble systems like smoothing pools in Ethereum staking. We can observe that most people trend towards using smoothing pools over time because the median expected value is higher.
Also, in the case of the ICF, unlike staking, each node has visibility into the type of work assigned to it and there is a stickiness of that assignment. We don’t randomly shuffle the work each node is doing periodically, so once a node is assigned a distributed application it can expect to run that application unless reassigned by the autoscaler for some reason. This creates annoying manipulation possibilities where nodes keep exiting and rejoining repeatedly until they are assigned high paying work.
While we’d like to leave the choice up to the end user on this, the game theory here creates an adverse selection problem where dishonest nodes will tend to dominate the lottery system over time. So, that leaves us with a socialized payment scheme as our only real choice. Within that constraint we do want to provide as much information as we can to convince a skeptical hardware provider that they are being compensated fairly.
Hardware Categories
We’ll be using hardware categories in the ICF in several places. The Federated autoscaler will use the benchmark data of these categories to assign work in a revenue maximizing manner. The Dispatcher will assign requests to categories according to their capacity with the current deployments. In general, too many categories creates more benchmarking work than is useful. Too few categories unjustly punishes hardware that is outperforming the median of that category. Eventually we’ll probably benchmark each equipment spec and assign it a standard compute unit (SCU) capacity and distribute rewards according to each node’s SCU. To begin, we’ll just have a reference spec for each hardware category and each node in that category will earn equivalent rewards despite minor differences in capabilities.
Initial hardware categories will consist of commodity cards (e.g. RTX 4090) and multiples of enterprise cards (e.g. 1-8 H100s). Enlisting a node at all will require a threshold spec of non-GPU resources such as disk storage to manage models and network IO suitable for inference. Within a hardware category we want the resources to be nearly fungible so the Autoscaler can reason using categories rather than on a node by node basis. These categories are likely to evolve over time as we get a better picture of the ICF enlisted hardware. We’ll focus our benchmarking efforts on hardware segments with the most machines and the highest deviation on capability between the slowest and fastest machine within each category.
Providers are compensated for capacity rather than work done. Dispatch history will be used to roll up estimated income per hardware category prior to payments being collected from demand source integrations.
Payment Infrastructure
We need to strike a balance between supporting customer payment preference and being able to scale the ICF to potentially tens of thousands of providers. To begin with, we expect most providers (except CETI owned hardware) to be sourced via a broker. Ceti AIs payment relation in that case will be with the broker, not directly with each provider. The broker is expected to take a percentage of this and forward the rest to the provider using their own terms. The number of brokers is small enough that we can launch the ICF this way using manual invoice and payment systems according to each broker’s preference.
Provider Dashboards
The most essential information we need to make externally available is our view of each ICF enlisted cloud. There are a variety of off-the-shelf Grafana dashboards to display cloud metrics. The only change we have to do to those is to add authorization so each provider can only access their own data. At minimum providers can expect to see all the standard Prometheus metrics we can get from Kubernetes about their nodes and failure logs for diagnostic and notification purposes. This information will only have data retention for the current and previous billing cycle and is used to improve service quality and internally by our Autoscaler for scheduling purposes.
Payment information is a little trickier because not all our demand source integrations are web3 native tools and therefore not all ICF revenue can be proven trustlessly on-chain. The best we can do for some demand source integrations is forward the information we get from the demand source APIs. We will have accounting dashboards that show rates per hardware category over time, amount due to the broker for each provider cluster, and history of payments.
We could provide information from the dispatch history on work each node has done but because revenue will be socialized amongst a hardware category the benefit of this is unclear. Similarly we could display aggregate information from each demand source integration but the benefit of this would mostly be to competitors who want to know which demand source integrations to focus on rather than to any of our users. The most relevant aggregate information is the total revenue earned by the ICF which we plan to show on a public page on our website at some point. This would be reflected in our CETI buybacks eventually in any case.
Internal Dashboards
- Total inventory and system health
- Application benchmark data
- Estimated economic value of each deployment configuration
- Autoscaler inventory allocation
- Aggregate dispatch history by demand source integration