NPU LabsNPU LABS
All articles
Reliability23 June 2026 · 5 min read

Claude was banned by the US government. Then it crashed ten times in twelve days.

The week the cloud blinked. Three weeks you do not want to be running production on someone else's GPUs.

An exhausted AI engineer at his desk staring at a laptop displaying a service-unavailable error, multiple monitors dim around him.

Marcus is a backend engineer at a fintech in Cape Town. He runs the AI code review bot for a forty-person team and the customer support agent in their mobile app. On Friday, June 12 2026, at exactly 23:21 his time, Marcus's bot stopped working.

Three thousand miles away, Anthropic had just received an order from the US Commerce Department to disable Claude Fable 5 and Claude Mythos 5 for every foreign national on earth. Marcus is a foreign national. So is his bot. So are his customers.

At 5:21 PM ET that same Friday, Anthropic disabled its two newest, most capable models for the entire world. The order used national-security export controls, and was the first time the United States has ever applied export controls directly to an AI model rather than the chips it runs on. Three hours earlier, those same models were the default tier in tens of thousands of developer environments. Including Marcus's.

Ten days earlier, on June 2, a bug in Claude Code's sub-agent system caused agents to multiply in an infinite loop, taking out the web interface, the developer console, and the newly launched Claude Code platform in a single failure window. Marcus noticed the merge queue had stalled. The code review bot was still 'thinking' on a PR from three hours ago. By the time Anthropic's postmortem went up, he had manually reviewed eleven PRs.

Between June 5 and June 16, independent trackers counted ten distinct disruptions across the Claude surface. On June 23 the platform went down again at 14:19 UTC, this time across every model. On the same day, at 06:28 UTC, Anthropic re-suspended Fable 5 and Mythos 5 after a brief restoration window.

If you are renting your AI capacity from one foreign-jurisdiction provider, the last three weeks of June 2026 are the most expensive education your company will buy this year. Marcus is the cheapest case study you will read about it.

10

outages in 12 days

8,000+

failure reports in one window

99.12%

Claude.ai 90-day uptime

1st

AI model export control in history

The directive

The Commerce Department invoked export-control authority. The directive itself was narrow on paper: prohibit distribution of Fable 5 and Mythos 5 to any foreign national. The scope on the ground was anything but. It applied to non-US citizens anywhere on earth, including non-citizens working inside the United States, including Anthropic's own non-citizen engineering staff. Anthropic's only practical compliance path was to disable the models globally for every customer.

The trigger, according to Anthropic's public statement and follow-up reporting, was a method of bypassing the safeguards that limit Fable 5's use for cybersecurity tasks: specifically, the analysis of code for exploitable vulnerabilities. The company described the bypass as narrow and non-universal. Capabilities of a similar shape exist in several other frontier models. None of that mattered. Once an export-control directive lands, the model goes off.

The directive marks the first time the US government has applied export controls directly to an AI model rather than to the chips and hardware that power them.
Fortune, June 13 2026

A timeline

  1. June 2

    Claude Code sub-agent system enters an infinite multiplication loop. Web interface, developer console, and Claude Code platform all return elevated error rates for hours.

  2. June 5 → 16

    Ten distinct disruptions across the Claude surface. Opus 4.8 and Haiku 4.5 errors persist through a 2:00 PM ET fix attempt on June 16.

  3. June 12, 5:21 PM ET

    Anthropic disables Fable 5 and Mythos 5 worldwide following a legally binding Commerce Department export-control directive.

  4. June 22

    Ninety-minute global outage cuts access to the flagship Claude models for tens of thousands of developers.

  5. June 23, 06:28 UTC

    Fable 5 and Mythos 5 re-suspended after a brief restoration window.

  6. June 23, 14:19 UTC

    Elevated error rates across Claude.ai, Console, API, Code, and Cowork. Multiple models affected simultaneously.

What it actually breaks

An outage on a chatbot is annoying. An outage on a CI/CD pipeline that calls Claude on every pull request stalls the merge queue. An outage on a customer-facing agent darkens the support channel for as long as it lasts. An outage during a long-running multi-step agentic task does not pause and resume: the task fails midway, often leaving partial state that needs manual cleanup.

On June 22 at lunch, Marcus was demoing the customer support agent to leadership. The support channel went silent for ninety minutes. Leadership did not ask questions about model selection or vendor lock-in. They asked why nobody had a backup. Marcus did not have a good answer.

Two Mondays in June

Marcus

Models rented from Anthropic

  • June 12 23:21: code review bot dead, no warning
  • June 22 lunch: leadership demo cratered
  • June 23: stops trying, takes a long walk
  • Quarterly: 19 hours of unplanned downtime

Mirror-universe Marcus

Models on a four-node rack in Cape Town

  • June 12: weights stay put, no directive applies
  • June 22: agent never noticed Anthropic was down
  • June 23: heads home at 5pm
  • Quarterly: 99.95% uptime measured on his own dashboard

Anthropic's own status page puts 90-day uptime at 99.12 percent for Claude.ai, 99.28 percent for Claude Code, and 99.41 percent for the API. That sounds high until you translate it. 99.12 percent over 90 days is 19 hours of downtime. Enterprise software conventionally measures itself at 99.9 percent, which is 65 minutes over the same period. The delta is roughly an order of magnitude.

Why it is not really about Anthropic

Anthropic's infrastructure went from nine billion dollars of annualised revenue at the end of 2025 to north of thirty billion by April 2026. Enterprise contracts above a million dollars a year doubled in two months. The number of organisations whose production AI workloads pass through Anthropic's API in 2026 is unrecognisable from the number that did so in 2025. Nothing scales smoothly through that. The wonder is not that there were ten outages in twelve days; it is that there were not more.

The point also is not that an Anthropic engineer made a mistake or that the Commerce Department made an unreasonable call. The point is that the shape of the dependency itself is the problem. A single foreign-jurisdiction model provider is a single point of failure. Outages are an availability risk. Export-control directives are a sovereignty risk. The two are different in mechanism and identical in outcome: your team cannot work, your customers cannot get answered, your pipeline does not ship.

Claude Code is increasingly deployed as a component inside CI/CD pipelines that execute multi-step tasks autonomously. When an outage interrupts one of those tasks mid-run, the pipeline does not pause and resume; it fails.
TechTimes, June 16 2026

What the open models can actually do now

The category of open-weight model has moved. Quietly through 2025 and audibly through 2026, the gap between the best closed models and the best open ones has closed enough that the question is no longer capability but operating model.

Qwen 3.7 from Alibaba sits inside the same reasoning ballpark as GPT-5.5 and ships under Apache 2.0. GLM-5.1 from Tsinghua matches Claude Opus on coding benchmarks and ships under MIT. DeepSeek V4 Pro leads many open benchmarks on mathematics and ships under MIT. Mistral Large 3 ships under Apache 2.0. Meta's Llama 4 is the most widely deployed open-weight model in production enterprise contexts globally. The licences matter: Apache 2.0 and MIT are the licences your legal team will sign off on without a six-week negotiation.

On price, the dynamic has flipped. DeepSeek V4 Flash is currently at $0.01 per million tokens for inference. That is not a quote for a marketing graph. That is an actual price you can pay today on Together.ai, Fireworks, or your own GPU.

Open models worth your time in 2026

  • Qwen 3.7Apache 2.0 · Alibaba · reasoning in GPT-5.5 territory
  • GLM-5.1MIT · Tsinghua · matches Claude Opus on coding benchmarks
  • DeepSeek V4 ProMIT · leads open benchmarks on mathematics
  • Mistral Large 3Apache 2.0 · multilingual strength, European jurisdiction
  • Llama 4Meta licence · most widely deployed open-weight in enterprise

What changes when the model runs on hardware you own

A four-node NVIDIA DGX Spark cluster on Marcus's office desk, status LEDs glowing.

The mirror universe

A four-node rack in Cape Town.

Imagine Marcus's mirror universe. Same fintech, same forty engineers, same code review bot, same customer support agent. Different inference layer: Qwen 3.7 and GLM-5.1 running on a four-node cluster in the company's own rack.

On June 12, when the directive comes down in Washington, the model weights on Marcus's cluster do not move. On June 22, when Anthropic loses ninety minutes, Marcus's support agent does not notice. On June 23, when the platform falls over at 14:19 UTC, Marcus is in a meeting about something else.

Three things change immediately. First, export-control immunity. A model already running on nodes in your rack does not get pulled by a directive issued in Washington. The model weights are yours. The export control is on the act of distribution to a foreign national, and that distribution already happened.

Second, blast radius. An outage at Anthropic does not stop your engineers, your customer-facing agents, your CI/CD pipeline, or your support queue. The blast radius of an Anthropic incident becomes other people's problem.

Third, jurisdiction. Your inputs, your outputs, your audit trails stay inside the same legal envelope your data already sits in. For regulated industries this stops being a paragraph in the compliance brief and starts being a single check-box.

The trade-off, honestly

Owning the stack is not free. You carry the hardware capex, the ops rotation, the model-upgrade cycle, the evaluation pipeline, and the runtime engineering to make a multi-model gateway behave reliably under load. Done badly, that is worse than a cloud bill. Done well, it costs less, runs faster on your workloads, and removes the dependency that quietly failed your team for three of the last twelve mornings.

The teams who are already doing it well share a few things. They picked a small set of capable open models rather than chasing every release. They put a router in front of the models with retries, caching, and a per-team usage budget. They wrote an evaluation pipeline that runs before every model upgrade. And they treated the AI runtime as production infrastructure with on-call rotations, dashboards, and post-incident reviews. None of that is mysterious. All of it is engineering work that compounds.

What NPU Labs does about it

We build Marcus's mirror universe. Private clusters on hardware enterprises own. Open models, deployed with continuous evaluation. Models wired into the tools your team already uses. The runtime layer so a single model upgrade does not take production down. On-call rotations so your team does not have to learn vLLM at 3 AM. The same stack our products RapportIQ, NPU Agent, and Agentic WoW run on.

If the last three weeks were unsettling to watch, the conversation worth having is what your stack looks like when the cloud blinks next. We can tell you, in your own environment, with your own workloads, exactly what it would take.

Marcus relaxed at his desk, monitors showing healthy green status dashboards, his stack running on hardware he owns.
Mirror-universe Marcus. Same engineer, same desk, different inference layer.

Don't be Marcus.

Don't be Marcus

Or your CI/CD pipeline. Or your support queue. Or your AI agent at 9am on a Monday.

Talk to us about what your stack looks like when the cloud blinks next. We promise no skip-levels.