Google and OpenAI fire the opening salvos of the next AI revolution

The latest GenAI breakthroughs from Google and OpenAI free the enterprise to use data for decision making on a scale never seen before.

The sheer volume of data ingestion for instant analysis that Google’s Gemini 1.5 Pro offers opens exciting (and potentially terrifying) opportunities for the enterprise. We may be steps away from capturing every detail of our working days in close to real-time and identifying our best and worst decisions and their impact. Applied to workflows and processes, leaders are on the road to having the data—and the capacity—to make multi-modal queries of that data through natural language text, voice, or a quick sketch.

Some may raise fears about employee surveillance, so leaders must prepare to address privacy concerns. But the capability to track and measure almost everything is coming soon to a hyperscaler near you.

Exhibit 1: The ability to track and measure everything about the enterprise is about to scale exponentially

Source: HFS and DALL-E, 2024

How scaling capabilities will impact the three Ps of GenAI value creation

Our research among leading enterprise GenAI users found that most use cases can be defined as delivering prediction, personalization, or productivity—the three Ps of GenAI value creation (see Exhibit 2).

Exhibit 2: The three Ps of GenAI value creation will each get significant boosts with the growth of context windows and the volume of data GenAI can analyze

Sample: November 2023 survey of 104 enterprise leaders actively exploring and deploying GenAI
Source: HFS Research, 2024

Prediction

Prediction cases will be supercharged by the volume of data new GenAI models can handle simultaneously. More data fills gaps in models like using GenAI for soil analysis for best crop use. It offers more validation opportunities. Leaders should not underestimate the impact of more accurate predictions on use cases like delivering just-in-time supply chains, cash flow, or even planning to handle macroeconomic trends.

Personalization

Personalization will rise to a new ceiling, delivering unheard-of customer satisfaction. For example, teams at customer contact centers may finally achieve perfect customer knowledge. At the very least, for every likely bot-led interaction, the upgraded AI will inform our ability to know, understand, and adapt to previous customer interactions and purchases. Combine this quality of personalization with high-quality prediction, and we arrive in a world where customer service teams reach out proactively because they already know what the customer needs. Personalization is already used in patient experience in the healthcare sector. We should expect that to scale, too.

Productivity

Productivity—particularly in summarization tasks such as monitoring social and other media for market-impacting conversations—will be accelerated, too. The next upgrade in GenAI technology will manage an exponentially larger volume of documents for consumption, summarization, and categorization to speed compliance checking and contract management, for example.

Here come “outrageously large” neural networks

Google’s Gemini 1.5 Pro offers a significant upgrade in the capacity of its context window. A context window is the maximum amount of data a large language model (LLM) can consider simultaneously. The more tokens a context window can handle, the greater the volume of content it can analyze, understand, and reason about.

Google has extended the number of tokens its context window can handle to 1 million. That is up from the 32,000 tokens Gemini 1 can handle, and it eclipses the 200,000 tokens Anthropic’s Claude 2.1 can manage. Gemini 1.5 Pro has managed 10 million in experiments. Its breakthrough is the application of “Outrageously Large Neural Networks,” which the team claims deliver a 1,000x performance increase by realizing the promise of conditional computation.

Conditional computing makes each input sample use a different part of the model, enabling performance upgrades while reducing latency and power use.

Bigger context windows deliver a giant leap for our relationship with data and how we communicate with it

Gemini 1.5 Pro’s huge context window enabled it to ingest the entire 402-page transcript from the Apollo 11 moon mission. When prompted for comedic moments, it accurately identified the humor shared between the crew and base. Revealing its multi-modal capabilities, Gemini 1.5 Pro tracked the moment of Neil Amstrong’s descent from the lunar lander from nothing more than a quick sketch of a booted foot planting on land.

In another example, the team uploaded a Buster Keaton silent movie, and Gemini 1.5 could accurately identify plot points and details humans may easily miss.

Not only is our relationship with data changing at pace but so is how we communicate with data to derive outcomes. Can’t think of the right words? Draw your prompt. Can’t recall who agreed to what at that meeting three weeks ago? Ask Gemini 1.5.

And yes, this high token capacity also extends to code, making Gemini 1.5 Pro capable of suggesting modifications and explaining how chunks of more than 100,000 lines of code work.

OpenAI’s Sora is teaching AI how to understand three-dimensional reality

On the same day Google enticed us with Gemini 1.5 Pro, OpenAI announced Sora. The press excitement and fearmongering in response to Sora’s ability to create video from text has focused mainly on concerns that technology like this could be misused to create fake news content. Meta is known to be working on similar text-to-video tech, and we wrote about Bitmagic’s work in this space in our report on the Three waves of GenAI Hot Tech powering the future of work. But OpenAI stated that it intends to help teach AI to understand the physical world as it is, moving in three dimensions. The purpose is to help us solve problems where real-world interaction is required.

Early applications are likely to be in gaming, interaction in the metaverse, and, of course, in digital twins. The model understands not only the prompt but also the reality of the physical environment in which the AI is operating.

Enterprise applications will likely cover realistic training scenarios where we can test our decision-making without creating disasters. As the video outputs become more and more accurate and the AI more and more capable, could we digitally clone ourselves to be productive in multiple scenarios?

It is no great leap to imagine applying the learned understanding of the physical world to deploy advanced robotic solutions to work in hazardous environments or locations where humans just cannot fit.

The Bottom Line: Enterprises have been given a rare early technology warning. Use it wisely.

Neither Google’s nor OpenAI’s solutions are ready to market as we write in mid-February 2024. Gemini 1.5 Pro is now available for testing only, and Sora is going through a significant testing and safety development period before it will be released. This early warning presents a rare occasion when leaders have time to consider how such developments will impact their work and their ability to create value.

Our advice is to use that time wisely. Consider what scaling up the data you can query may do to your business model. The range of things you can measure is about to grow exponentially. But just because you can measure and track, should you? Ask yourself how an AI’s understanding of the physical layouts of your stores, factories, and customer journeys—or even its understanding of you—could help you meet your goals.

Google and OpenAI fire the opening salvos of the next AI revolution

Exhibit 1: The ability to track and measure everything about the enterprise is about to scale exponentially

How scaling capabilities will impact the three Ps of GenAI value creation

Exhibit 2: The three Ps of GenAI value creation will each get significant boosts with the growth of context windows and the volume of data GenAI can analyze

Here come “outrageously large” neural networks

Bigger context windows deliver a giant leap for our relationship with data and how we communicate with it

OpenAI’s Sora is teaching AI how to understand three-dimensional reality

The Bottom Line: Enterprises have been given a rare early technology warning. Use it wisely.

Login

Register

Insight. Inspiration. Impact.

Author

David Cushman Executive Research Leader

Related

Congratulations!

Sign In

Insight. Inspiration. Impact.

Email
	If you don't have an account, Register here

Username
Password

	Remember Me Lost your password?

Email
	If you don't have an account, Register here

Username
Password

	Remember Me Lost your password?