Hardware we build on

Four nodes to pick from.

The cluster architecture is the same across all four. Pick by build quality, support story, and budget. We provision, configure, and deploy on whichever you choose.

NVIDIA

DGX Spark

Premium build, NVIDIA-supported OS, the gold-standard option.

Memory: 128 GB unified
Chip: GB10 Grace Blackwell
Storage: 4 TB NVMe
Draw: ~240W

Private AI Stack

Private AI infrastructure, sized for your workload.

How much private compute you need is set by the inference throughput you run, not a fixed product tier. A single node sits on a desk. Four nodes make a 512 GB cluster on one switch. Past that, clusters scale out for as much aggregate throughput as the workload demands.

DGX Spark units and a mini-PC on a desk running a live training dashboard, Table Mountain through the window in a Cape Town home lab.

RecommendedIndicative hardware ~R260k-R900k

One-node cluster

Four nodes pooled into a single 512 GB cluster, sized for a department's day-to-day experimentation and inference.

512 GB

Unified memory

pooled across 4 nodes

~400B-class

Model size

pooled inference

4× 200G

Switched fabric

any-to-any

~960W

Total draw

standard wall circuit

Topology

Powered from a standard 10A wall circuit. 10GbE management per node over your existing network. No dedicated power or networking infrastructure required.

In the cluster

Compute ×4

Nodes

128 GB unified memory and up to 4 TB NVMe per node. Four nodes pooled into one cluster for distributed serving of a single large model.

MikroTik CRS804-4DDQ-hRM

Interconnect

Four 400G QSFP56-DD ports run at 200G to match each node's NIC. One per node, fully populated for a cluster. A second cluster adds its own switch and routes across.

QSFP56-DD → QSFP112 DAC

Cabling

One identical short-run passive-copper cable per node, switch to node NIC. Nothing exotic to source.

Indicative figures only. They include a safety margin and will move with supplier pricing, exchange rates, and import costs. Final pricing is confirmed on a written quote. As an all-in example, a single-node deployment lands around R0.4M to R1.2M depending on the node SKU you pick: ~R260k-R900k hardware plus the 13 to 20 day setup below. Final scope depends on your throughput targets, identity provider, and compliance environment.

Model strategy

A portfolio, not one model.

A private AI platform runs a portfolio of model routing, RAG, guardrails, observability, and workload-specific endpoints. A cluster is four nodes; capacity scales from a single node to a multi-cluster fabric, with the model chosen to fit the workload.

Tier 1

Single node

1 node · 128 GB · up to ~200B

Gemma 4
Qwen3.6-35B-A3B
Qwen3-Coder-Next
Qwen3 Embedding 8B · BGE-M3
Parakeet · Voxtral · Kokoro

Embeddings, reranking, private chatbot, summarisation, policy Q&A, voice pre/post-processing.

Tier 2Default

Four-node cluster

4 nodes · 512 GB · 405B-class

NVIDIA Nemotron 3 Super
Qwen3.6-35B-A3B
Qwen3-Coder-Next
Gemma 4 31B
Mistral Large 3 (distributed)

Private enterprise assistant, developer guardrails, repo analysis, agentic coding, RAG, multimodal document understanding, model bake-offs.

Tier 3

Multi-cluster fabric

8+ nodes · distributed inference

Kimi K2.6
DeepSeek V4 Pro
GLM-5.1
MiMo-V2.5-Pro
NVIDIA Nemotron 3 Ultra

Frontier coding agents, long-horizon autonomous workflows, 1M-token reasoning, multi-agent orchestration, regulated high-value workloads.

A single node typically handles models up to ~200B, and a pooled four-node cluster reaches ~405B-class. Larger frontier models run via NVFP4 variants, sharding, and multi-cluster serving, all validated per workload before client production.

See the full model registry

Serving architecture

Every request passes the guardrail layer.

No model is reached directly. Requests enter through one gateway, clear policy and guardrails, then route to the endpoint and node sized for the workload, with audit and observability on every path.

Client application

API gateway

Policy & guardrail layer

Audit logs

PII detection & redactionPrompt-injection detectionData classificationTool allow-lists

Model router

RAG pipelineObservability

Workload endpoints

General assistantDeveloper agentFrontier reasoningEmbedding & rerankingSpeech-to-textText-to-speech

Node pool

Tier 1-2 workloads

Multi-cluster fabric

Frontier models

Professional services

Stack Configuration

The hardware arrives configured at the DGX OS level. From there, six configuration steps turn it into a governed, observable platform wired into your network. Roughly 13 to 20 days for a standard stack.

3-4 days

Monitoring

Prometheus and Grafana across the hardware, inference, and model-cost layers, tracking utilization, TTFT, latency, and per-team token spend, with alerts.

2-3 days

Model usage per user

Built on DeepEval's cost and efficiency metrics, tracking spend and tokens per user and per task, with insights into which models complete the work economically and which burn budget.

2-3 days

Authentication

An API gateway in front of every endpoint, per-team keys with rotation, LDAP/AD or SSO integration, and TLS everywhere.

2-3 days

Model registry

A central registry of models, versions, and quantisations, with staged promotion to production, one-step rollback, and provenance for every deployed weight.

3-5 days

Guardrails

PII detection and redaction pre- and post-model, prompt-injection detection, output filtering, tool allow-lists, and red-team testing.

1-2 days

Network integration to VPNs

The lab wired into your corporate network over site-to-site or client VPN. Private endpoints only, firewall rules scoped per team, nothing exposed to the public internet.

Deployment

From order to operational in 4.5 to 8 weeks.

Hardware is purchased from a trusted vendor.

01 / PHASE2-4 weeks
Procurement
All hardware ordered at once
Compute nodes, switch and cables.
02 / PHASE2.5-4 weeks
Configuration
Begins on hardware arrival
Six workstreams, ~13-20 days, run on-site once everything is racked.
03 / PHASE4.5-8 weeks
Operational
Working, governed, observed
Total order-to-operational for a standard deployment on an existing network.

Want to size the right setup for your organization?

Start with a conversation. We will work through your workloads, the data-residency constraints, and the budget envelope, and come back with a concrete spec.

Talk to us See the private lab

Four nodes to pick from.

DGX Spark

AI TOP ATOM

A9 Mega AI

NUCBOX EVO X2

DGX Spark

AI TOP ATOM

A9 Mega AI

NUCBOX EVO X2

DGX Spark

AI TOP ATOM

A9 Mega AI

NUCBOX EVO X2

Private AI infrastructure, sized for your workload.

One-node cluster

Compute ×4

MikroTik CRS804-4DDQ-hRM

QSFP56-DD → QSFP112 DAC

A portfolio, not one model.

Single node

Four-node cluster

Multi-cluster fabric

Every request passes the guardrail layer.

Policy & guardrail layer

Model router

Stack Configuration

Monitoring

Model usage per user

Authentication

Model registry

Guardrails

Network integration to VPNs

From order to operational in 4.5 to 8 weeks.

Procurement

Configuration

Operational

Want to size the right setup for your organization?