Highlight Report

Rethink your AI barriers and budgets—capability just got more affordable

Home » Research & Insights » Rethink your AI barriers and budgets—capability just got more affordable

Enterprise AI is at a critical moment. While AI models continue to grow in capability and adoption increases, enterprises are about to confront unsustainable compute costs. GPU shortages drive some of this, but much of it can now be resolved through a more flexible deployment approach.

Inflection AI is betting that a new approach—mixture-of-experts (MoE)—will change the compute vs. capability equation. By selectively activating only the necessary parts of a model for each task, MoE offers a path to high-performing and cost-effective AI. If Inflection AI’s approach works, enterprises could finally get AI that delivers strong reasoning without breaking infrastructure budgets.

Smarter architectures deliver the right-sized bang for a more controllable buck

Inflection is not alone in chasing efficiency. Alibaba’s new QwQ-32B model claims to rival DeepSeek’s larger AI models while staying lightweight and cost-efficient. This suggests the AI race is shifting from sheer brute force scale to smarter architectures. Writer has succeeded with its alternative, too (Look beyond the DeepSeek hype—Writer has already blazed that trail).

Enterprises continue to prioritize cost-effectiveness even amid the excitement of the possibilities AI unleashes. We asked 240 enterprise leaders, all with experience in applying generative AI, what they prioritized in their enterprise AI delivery partners. Their answer? A practical approach to business value at the right cost
(see Exhibit 1).

Exhibit 1: Enterprise leaders demand practicality at the right price—even amid the excitement of AI

Sample: 260 enterprise leaders with GenAI experience
Source: HFS Research, 2025

The future of AI, at least in the enterprise, is no longer about sheer size—it’s about intelligence, cost control, and deployment flexibility.

Mixture-of-experts is AI that thinks without wasting compute

Enterprise AI has been dominated by monolithic models that activate every parameter for every task—whether needed or not. This results in excessive GPU usage, higher costs, and scalability challenges. Inflection’s MoE approach challenges this paradigm:

  • Selective computation: Instead of lighting up the entire network, MoE models activate only the most relevant “experts” for each task, reducing the compute load.
  • Lower infrastructure costs: By using compute more efficiently, MoE models consume fewer GPU resources, making them more cost-effective than traditional models.
  • Faster and more scalable: Targeted activation means quicker responses and the ability to fine-tune smaller subsets of the model rather than retraining an entire LLM.

This isn’t just a theoretical benefit—Inflection claims its largest MoE model achieves top-tier benchmark scores, while its smaller, optimized versions can run on anything from cloud clusters to handheld devices. For enterprises, this translates to a realistic AI strategy that scales intelligently instead of consuming endless GPU resources.

Inflection AI prioritizes enterprise needs through hardware, fine-tuning, and testing, too

While MoE is the newest piece in Inflection’s enterprise AI strategy, it’s not the only commitment to its enterprise focus. The firm is also delivering:

  • Hardware flexibility: Optimizing for Intel Gaudi hardware, not just NVIDIA GPUs, gives enterprises more deployment options and helps avoid provider lock-in.
  • Fine-tuned enterprise AI: Acquiring BoostKPI (for structured data insights) and Jelled.ai (for institutional memory) signals Inflection’s commitment to custom AI that fits enterprise workflows.
  • Real-world testing at scale: By maintaining millions of Pi consumer users as testing points, Inflection is continuing to fine-tune its models to handle complex reasoning and emotional intelligence tasks.
The Bottom Line: Rethink your enterprise AI investment plans—competing for the future may be more affordable than you feared.

Inflection AI’s move to mixture-of-experts models signals a fundamental change in how AI is built, deployed, and priced. For enterprises, this could mean AI that delivers top-tier reasoning at a fraction of today’s compute costs.

And Inflection isn’t alone in this push. Alibaba, Writer, and DeepSeek are all proving that enterprises don’t need the biggest models—they need the smartest, most efficient ones. The winners in enterprise AI won’t be the ones with the most parameters; they’ll be the ones that deliver the best balance of performance, cost, and deployment flexibility.

If your AI strategy still depends on—or is budgeting for—massive, compute-heavy models, it’s time for a rethink. The future of AI belongs to architectures such as MoE that make AI affordable, scalable, and adaptable to real business needs.

Sign in to view or download this research.

Login

Register

Insight. Inspiration. Impact.

Register now for immediate access of HFS' research, data and forward looking trends.

Get Started

Logo

confirm

Congratulations!

Your account has been created. You can continue exploring free AI insights while you verify your email. Please check your inbox for the verification link to activate full access.

Sign In

Insight. Inspiration. Impact.

Register now for immediate access of HFS' research, data and forward looking trends.

Get Started
ASK
HFS AI