Enterprise AI is at a critical moment. While AI models continue to grow in capability and adoption increases, enterprises are about to confront unsustainable compute costs. GPU shortages drive some of this, but much of it can now be resolved through a more flexible deployment approach.
Inflection AI is betting that a new approach—mixture-of-experts (MoE)—will change the compute vs. capability equation. By selectively activating only the necessary parts of a model for each task, MoE offers a path to high-performing and cost-effective AI. If Inflection AI’s approach works, enterprises could finally get AI that delivers strong reasoning without breaking infrastructure budgets.
Inflection is not alone in chasing efficiency. Alibaba’s new QwQ-32B model claims to rival DeepSeek’s larger AI models while staying lightweight and cost-efficient. This suggests the AI race is shifting from sheer brute force scale to smarter architectures. Writer has succeeded with its alternative, too (Look beyond the DeepSeek hype—Writer has already blazed that trail).
Enterprises continue to prioritize cost-effectiveness even amid the excitement of the possibilities AI unleashes. We asked 240 enterprise leaders, all with experience in applying generative AI, what they prioritized in their enterprise AI delivery partners. Their answer? A practical approach to business value at the right cost
(see Exhibit 1).
Sample: 260 enterprise leaders with GenAI experience
Source: HFS Research, 2025
The future of AI, at least in the enterprise, is no longer about sheer size—it’s about intelligence, cost control, and deployment flexibility.
Enterprise AI has been dominated by monolithic models that activate every parameter for every task—whether needed or not. This results in excessive GPU usage, higher costs, and scalability challenges. Inflection’s MoE approach challenges this paradigm:
This isn’t just a theoretical benefit—Inflection claims its largest MoE model achieves top-tier benchmark scores, while its smaller, optimized versions can run on anything from cloud clusters to handheld devices. For enterprises, this translates to a realistic AI strategy that scales intelligently instead of consuming endless GPU resources.
While MoE is the newest piece in Inflection’s enterprise AI strategy, it’s not the only commitment to its enterprise focus. The firm is also delivering:
Inflection AI’s move to mixture-of-experts models signals a fundamental change in how AI is built, deployed, and priced. For enterprises, this could mean AI that delivers top-tier reasoning at a fraction of today’s compute costs.
And Inflection isn’t alone in this push. Alibaba, Writer, and DeepSeek are all proving that enterprises don’t need the biggest models—they need the smartest, most efficient ones. The winners in enterprise AI won’t be the ones with the most parameters; they’ll be the ones that deliver the best balance of performance, cost, and deployment flexibility.
If your AI strategy still depends on—or is budgeting for—massive, compute-heavy models, it’s time for a rethink. The future of AI belongs to architectures such as MoE that make AI affordable, scalable, and adaptable to real business needs.
Register now for immediate access of HFS' research, data and forward looking trends.
Get StartedIf you don't have an account, Register here |
Register now for immediate access of HFS' research, data and forward looking trends.
Get Started