reviews

DeepSeek-V4: China Trains AI for $5.6M While OpenAI Spends $100M

A Hangzhou startup is rewriting the rules of artificial intelligence with a fraction of Silicon Valley's budget. Their secret weapon: the mHC architecture.

Sarah ChenSarah Chen-January 29, 2026-14 min read
Share:
Abstract visualization of artificial neural network representing AI architecture

Photo by Cash Macanaya on Unsplash

Key takeaways

DeepSeek-V4 arrives in February 2026 promising to beat GPT-5 at coding with training costs 20 times lower. Here's how mHC architecture works, who the mysterious founder is, and why 15 countries have already banned this AI.

The Sputnik Moment of Artificial Intelligence

Let me break this down: imagine a startup with a fraction of OpenAI's budget managing to create an AI that competes head-to-head with GPT-5. That trains its models for $5.6 million when OpenAI spends $100 million or more. That publishes model weights for free so anyone can use them.

That's exactly what DeepSeek is doing.

On January 27, 2025, when DeepSeek released its R1 model, NVIDIA lost $600 billion in market value in a single day. The largest single-day loss for any company in Wall Street history. Microsoft lost another $150 billion. Analysts called it the "Sputnik moment" of AI.

And now, in February 2026, DeepSeek-V4 is coming. Internal leaks suggest it could surpass Claude Opus 4.5 in programming with a 1 trillion parameter model and a 1 million token context window.

The trick is an innovation called mHC (Manifold-Constrained Hyper-Connections) that allows training larger models with less hardware. And what most guides won't tell you is that this fundamentally changes the economics of artificial intelligence.

What Is DeepSeek and Why Should You Care

DeepSeek isn't a typical Silicon Valley startup. It's a Chinese company founded in Hangzhou by a former quantitative trader who decided to use his hedge fund profits to build AGI.

The Numbers That Matter

Metric DeepSeek OpenAI Anthropic
Flagship model training cost $5.6M $100M+ Not disclosed
API price (input/1M tokens) $0.28 $2.50-$3.00 $3.00
API price (output/1M tokens) $0.42 $10.00 $15.00
Open source model Yes (public weights) No No
Valuation ~$3.4B $300B+ $60B+

What most guides won't tell you is the magnitude of this difference: for the same money it costs to use GPT-4o, you can make 10-40 times more queries with DeepSeek. A 100K token context that costs $5.50 on GPT-4 costs $0.90 on DeepSeek.

DeepSeek V3 vs the Giants (Current Benchmarks)

Benchmark DeepSeek V3 GPT-4o Claude 3.5 Sonnet
MMLU-Pro (general knowledge) 75.9 ~77 ~76
GPQA (advanced science) 59.1 ~58 ~59
MATH-500 (mathematics) 90.2% ~85% ~88%
LiveCodeBench (programming) #1 Top 5 Top 3

DeepSeek already leads in mathematics and programming. With V4, they're aiming to completely dominate the coding niche.

The mHC Architecture: The Secret to Efficiency

This is where things get technical, but let me break it down so it makes sense.

The Problem mHC Solves

Think of it like training a very large neural network. As you add more layers, information flowing through the network tends to amplify uncontrollably. It's like a microphone producing feedback: a small signal gets amplified until it becomes unbearable noise.

This phenomenon is called gradient explosion and it's one of the main obstacles to training larger models. The traditional solution is to use more computing power to stabilize training. More GPUs, more money, more time.

mHC proposes something different: instead of letting information amplify without control, it projects it onto a mathematical structure called a manifold that guarantees the total amount of information is conserved.

How It Works (Simplified)

  1. Multiple parallel streams: Instead of a single path for information, mHC uses 4 "streams" that process data in parallel

  2. Sinkhorn-Knopp constraint: A mathematical algorithm that ensures when information mixes between streams, it doesn't amplify. It can redistribute, but never grow uncontrollably

  3. Controlled maximum gain: While traditional architectures can amplify signals up to 3000 times, mHC limits amplification to ~1.6 times

The Concrete Results

Metric With mHC Without mHC
Training overhead +6.7% Baseline
Stability High Low in large models
BIG-Bench Hard (reasoning) +2.1% improvement Baseline
Scalability Tested up to 27B params Limited by gradient explosion

The trick is that 6.7% extra compute during training saves you from having to use 10x more GPUs to stabilize the process. It's a small investment with an enormous return.

Liang Wenfeng: The Founder Nobody Knows

If you ask me who's the most important person in AI that almost nobody knows, I'd say it's Liang Wenfeng.

From Rural Village to Billionaire Hedge Fund Manager

Fact Detail
Birth 1985, Mililing village, Wuchuan, Guangdong
Parents Both primary school teachers
Education Bachelor (2007) and Master (2010) in electronic engineering, Zhejiang University
Estimated net worth ~$4.5 billion (2025)

Liang founded High-Flyer, a quantitative trading hedge fund, in 2016. By 2021, it managed over 100 billion yuan (~$14 billion). The strategy: use machine learning algorithms to predict market movements.

From Hedge Fund to AI Lab

In April 2023, Liang announced he would convert part of High-Flyer into an AGI laboratory. In July, that lab spun off as DeepSeek.

What most guides won't tell you is how he funded it: 100% with his own money. When Silicon Valley VCs offered investment, Liang rejected them. He didn't want the pressure of quick "exits" or investor interference.

The Master Move: 10,000 A100 GPUs

Before the U.S. restricted exports of advanced chips to China, Liang acquired 10,000 NVIDIA A100 GPUs. Those chips, now banned, are the infrastructure that allows DeepSeek to compete with OpenAI.

According to some reports, DeepSeek may have up to 50,000 Hopper GPUs including some H100s obtained through intermediaries. NVIDIA denies this, claiming DeepSeek only uses "legally acquired H800s" (a limited version allowed for China).

DeepSeek-V4: What We Know (and What We Don't)

DeepSeek-V4 hasn't been officially released yet. It's expected in mid-February 2026. But leaks and GitHub updates reveal a lot.

Leaked Specifications

Feature DeepSeek-V4 (leaked)
Parameters ~1 trillion (MoE model)
Context window 1 million tokens (with DSA)
Architecture MoE + mHC + Engram
Consumer hardware 2x RTX 4090 or 1x RTX 5090
Efficiency vs Transformers -50% overhead
Open weights Yes (probable)

The "Reasoning Core"

One of the most interesting innovations is what DeepSeek calls Reasoning Core: a separate module within the model specialized in step-by-step reasoning. Think of it like the model has a "deep thinking mode" it can activate for complex problems.

This is similar to what OpenAI did with o1/o3, but integrated directly into the base architecture.

Projected Comparison with Competitors

Aspect DeepSeek V4 Claude Opus 4.5 GPT-5
SWE-bench (coding) >80% (leak) 80.9% ~78%
Context 1M tokens 200K 128K
Expected price 10-40x cheaper Premium Premium
Open weights Yes No No
Availability Global (with restrictions) Global Global

Why 15 Countries Have Banned DeepSeek

This is where geopolitics comes into play. DeepSeek isn't just a technical threat to Silicon Valley; it's a national security threat according to several governments.

Countries That Have Banned or Restricted DeepSeek

Country/Region Date Scope Official Reason
Italy January 2025 Total GDPR violation
Australia February 2025 Government National security risk
Taiwan February 2025 Government + schools National information danger
South Korea February 2025 Government Data collection
Czech Republic July 2025 Public administration Servers in China/Russia
India 2025 Finance Ministry Data vulnerability
USA Various NASA, Navy, Congress, Texas Foreign access

Companies like Microsoft, Mitsubishi Heavy Industries, and Toyota have also banned internal use.

The Legal Problem

The underlying reason is simple: according to DeepSeek's privacy policy, all user data is stored on servers in China. And according to China's National Intelligence Law (2017), any organization must "support, assist, and cooperate with national intelligence efforts."

In other words: the Chinese government can legally demand access to your DeepSeek conversations without notifying you.

The Impact on the Industry: End of OpenAI's Business Model?

What most guides won't tell you is that DeepSeek doesn't just threaten Silicon Valley's technology. It threatens their business model.

The Math That Scares OpenAI

If DeepSeek can train comparable models for $5.6 million instead of $100 million, and then publishes them for free with open weights, why would you pay $20 per month for ChatGPT Plus?

Jack Clark, Anthropic co-founder, said it clearly:

"DeepSeek means AI proliferation is guaranteed."

Developers are already running DeepSeek models locally on their own servers. No monthly subscriptions. No usage limits. Without their data going to any company.

Silicon Valley's Response

U.S. companies are responding in several ways:

  1. Price reductions: OpenAI and Anthropic have significantly lowered API prices over the past year

  2. Emphasis on security: Positioning themselves as the "safe" choice against Chinese models

  3. Vertical integration: Microsoft, Google, and Amazon use their models in their own products where AI cost is secondary

  4. Differentiation: Focusing on enterprise use cases where trust and support matter more than price

How to Use DeepSeek (If You Decide To)

If after reading all this you want to try DeepSeek, here are your options.

Option 1: Official API

  • Site: platform.deepseek.com
  • Price: $0.28/1M tokens input, $0.42/1M tokens output
  • Warning: Your data goes to servers in China

Option 2: Run It Locally

DeepSeek publishes model weights on Hugging Face and GitHub. You can download them and run them on your own hardware.

Requirements for V3 (current version):

  • Minimum: 2x RTX 4090 (24GB VRAM each)
  • Recommended: GPU with 80GB+ VRAM or distributed cluster

Expected requirements for V4:

  • Minimum: 1x RTX 5090 or 2x RTX 4090
  • For full 1M token context: significantly more

Option 3: Intermediary Providers

Companies like Together AI, Fireworks, and others offer access to DeepSeek models from U.S. infrastructure. You pay a bit more but your data doesn't go directly to China.

Frequently Asked Questions

Is DeepSeek safe to use?

Depends on your definition of "safe." Technically it works well. But if you're concerned about the Chinese government accessing your conversations, the answer is no: according to their own terms of service, they store data in China under Chinese law. For casual personal use it's probably irrelevant. For sensitive company information, clearly not.

Is DeepSeek better than ChatGPT?

In some things yes, in others no. DeepSeek V3 leads in mathematics (MATH-500: 90.2%) and programming (LiveCodeBench: #1). GPT-4o has better general knowledge (MMLU-Pro: 77 vs 75.9). For most users, the practical difference is minimal; the price difference is enormous.

Can I run DeepSeek on my computer?

Yes, if you have enough hardware. Smaller models (7B, 16B parameters) run on consumer GPUs. The full V3 model needs minimum 2x RTX 4090. V4 will probably need more, though they promise optimizations for consumer hardware.

Why is DeepSeek so cheap?

Three reasons: (1) Architecture innovations like mHC that reduce compute requirements, (2) Lower labor costs in China, (3) They don't need to generate returns for investors because Liang funds it with his own money. Basically, DeepSeek operates more like a research project than a business.

When does DeepSeek-V4 come out?

Expected mid-February 2026. No exact date confirmed. GitHub updates and internal leaks suggest development is advanced.

Conclusion: The Future of AI Is No Longer Decided Only in Silicon Valley

DeepSeek represents something bigger than a company or an AI model. It represents the end of American hegemony in artificial intelligence.

For decades, the world's most advanced technologies were developed in U.S. labs with budgets no other country could match. DeepSeek proves that efficiency and innovation can compensate for lack of raw resources.

The trick is this has enormous implications:

For developers: Access to frontier AI no longer requires paying premium subscriptions. Models comparable to GPT-4 are available for free with open weights.

For companies: The cost of integrating AI into products drops 10-40x. What was previously viable only for Big Tech is now within reach of startups.

For the industry: The AI race is no longer won with more money. It's won with better research. OpenAI, Anthropic, and Google will have to innovate faster, not just spend more.

For geopolitics: China is no longer "years behind" the U.S. in AI. They're competing head-to-head, and on some benchmarks, winning.

When DeepSeek-V4 launches in February, we'll probably see another "DeepSeek Monday" in the markets. But the real impact won't be on NVIDIA's stock. It'll be on how we think about who controls the future of artificial intelligence.


Would you use a Chinese AI knowing your data goes to servers under Beijing's jurisdiction? Or would you rather pay 10x more for the "security" of Western models? The answer to that question will define the AI market for years to come.

Was this helpful?
Sarah Chen
Written by

Sarah Chen

Tech educator focused on AI tools. Making complex technology accessible since 2018.

#deepseek#artificial intelligence#chinese ai#mhc architecture#liang wenfeng#open source#llm#machine learning#tech geopolitics

Related Articles