Accelerate scaling with Azure OpenAI Service Provisioned

With new enhancements to Azure OpenAI Service Provisioned, we’ve taken a big step forward in making AI accessible and ready for businesses.

In today’s fast-paced digital environment, businesses need more than just powerful AI models – they need AI solutions that are adaptable, reliable and scalable. With the upcoming availability of Data Zones and new enhancements to the established offering in the Azure OpenAI Service, we’re taking a big step forward in making AI widely available and enterprise-ready. These capabilities represent a fundamental shift in how organizations can deploy, manage and optimize generative AI models.

With the launch of Azure OpenAI Service Data Zones in the European Union and the United States, enterprises can now scale their AI workloads even more easily while complying with regional data location requirements. In the past, differences in model region availability forced customers to manage multiple resources, often slowing development and complicating operations. Azure OpenAI Service Data Zones can remove this friction by offering flexible data processing for multiple regions while ensuring that data is processed and stored within a selected data boundary.

This is a regulatory win that also enables enterprises to seamlessly scale their AI operations across regions and optimize performance and reliability without having to navigate the complexities of traffic management across multiple systems.

Leya, a tech startup creating a genAI platform for legal professionals, has been exploring the possibility of deploying data zones.

“The ability to deploy Azure OpenAI Service Data Zones offers Leya a cost-effective way to securely scale AI applications to thousands of lawyers, ensuring compliance and peak performance. It helps us achieve better customer quality and control with quick access to the latest Azure OpenAI innovations.“—Sigge Labor, CTO, Leya

Data zones will be available for both Standard (PayGo) and Provisioned offers starting this week from November 1, 2024.

graphical user interface, text, application, chat or text message

Industry leading performance

Businesses depend on predictability, especially when deploying mission-critical applications. Therefore, we are introducing a SLA with 99% latency for token generation. This latency SLA ensures that tokens are generated at a faster and more consistent rate, especially at high volumes

The Provisioned menu provides predictable performance for your application. Whether you’re in e-commerce, healthcare, or financial services, the ability to rely on low-latency, high-reliability AI infrastructure translates directly into better customer experiences and more efficient operations.

Lower start-up costs

To make testing, scaling, and management easier, we’re reducing hourly rates for Provisioned Global and Provisioned Data Zone deployments starting November 1, 2024. This cost reduction ensures that our customers can enjoy these new features without the burden of high expenses. The offer provided continues to offer discounts for monthly and annual commitments.

Possibility of deployment	Hourly PTU	One month reservation at PTU	One-year reservation at PTU
Provided by Global	Current: $2.00 per hour November 1, 2024: $1.00 per hour	$260 per month	$221 per month
Data zone provided^New	November 1, 2024: $1.10 per hour	$260 per month	$221 per month

We’re also reducing the minimum implementation entry points for Provisioned Global implementations by 70% and increasing increments by up to 90%, lowering the barrier for businesses to get started with Provisioned earlier in the development lifecycle.

Minimum amount of implementation and increments for guaranteed offer

Model	Overall	Data zone ^New	Regional
GPT-4o	min: 50 15 Growth 50 5	Minimum: 15 Increment 5	Minimum: 50 Increment 50
GPT-4o-mini	min: 25 15 Growth: 25 5	Minimum: 15 Increment 5	Minimum: 25 Increment: 25

For developers and IT teams, this means faster deployment and less friction when moving from a standard offering to a provisioned one. As businesses grow, these simple transitions become vital to maintaining agility as AI applications scale globally.

Caching efficiency: A game changer for high-volume applications

Another new feature is Prompt Caching, which offers cheaper and faster inference for repeated API requests. Cached tokens are 50% cheaper on Standard. For applications that frequently send the same system prompts and instructions, this improvement provides a significant cost and performance advantage.

By caching requests, organizations can maximize their throughput without having to process the same requests repeatedly, all while reducing costs. This is especially beneficial for high-traffic environments, where even a small increase in performance can translate into tangible business gains.

A new era of model flexibility and performance

One of the key benefits of Provisioned is that it is flexible, with one simple hourly, monthly and annual price that applies to all available models. We’ve also heard your feedback that it’s difficult to understand how many tokens per minute (TPM) you get for each model on provisioned deployments. We now provide a simplified view of the number of input and output tokens per minute for each live deployment. Customers no longer have to rely on detailed conversion tables or calculators.

We maintain the flexibility customers love with Provisioned. With monthly and annual commitments, you can still change model and version – like GPT-4o and GPT-4o-mini – during the booking period without losing any discount. This agility allows enterprises to experiment, iterate and evolve their AI deployments without incurring unnecessary costs or having to restructure their infrastructure.

Enterprise readiness in action

Azure OpenAI’s continuous innovation is not just theoretical; they are already delivering results in various industries. For example, companies like AT&T, H&R Block, Mercedes and others are using Azure OpenAI not just as a tool, but as a transformational asset that changes the way they operate and engage with customers.

Beyond models: The enterprise-grade promise

It’s clear that the future of artificial intelligence is much more than just offering the latest models. While powerful models like GPT-4o and GPT-4o-mini provide the foundation, it’s the supporting infrastructure—such as provisioning, the ability to deploy data zones, SLAs, caching, and simplified deployment processes—that make Azure OpenAI truly ready for business. .

Microsoft’s vision is to provide not only cutting-edge AI models, but also enterprise-grade tools and support that enable businesses to scale these models with confidence, security and cost-effectiveness. From enabling low-latency, high-reliability deployments to offering a flexible and simplified infrastructure, Azure OpenAI Service enables enterprises to fully embrace the future of AI-driven innovation.

Get started today

As the AI landscape continues to evolve, the need for scalable, flexible and reliable AI solutions is even more critical to business success. With the latest enhancements to the Azure OpenAI Service, Microsoft is delivering on that promise, giving customers not only access to world-class AI models, but also the tools and infrastructure to put them to work at scale.

Now is the time for businesses to unlock the full potential of generative AI with Azure and move from experimentation to real-world enterprise-grade applications that deliver measurable results. Whether you’re scaling a virtual assistant, developing real-time voice applications, or transforming customer service with AI, Azure OpenAI Service provides the enterprise-ready platform you need to innovate and grow.

Accelerate scaling with Azure OpenAI Service Provisioned | Microsoft Azure Blog