Skip to content

Commit 38b047e

Browse files
authored
Merge pull request #175 from microsoft/copilot/fix-174
Add lesson 07 on image and video generation with new Azure OpenAI models (gpt-image-1 and sora)
2 parents 2a92814 + fac94d8 commit 38b047e

File tree

3 files changed

+271
-3
lines changed

3 files changed

+271
-3
lines changed

03-CoreGenerativeAITechniques/06-LocalModelRunners.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -315,6 +315,6 @@ Running AI models locally with AI Toolkit for Visual Studio Code, Docker Model R
315315

316316
## Next Steps
317317

318-
You've learned how to run AI models locally using AI Toolkit for Visual Studio Code, Docker Model Runner, and Foundry Local. Next, you'll explore how to create AI agents that can perform tasks autonomously.
318+
You've learned how to run AI models locally using AI Toolkit for Visual Studio Code, Docker Model Runner, and Foundry Local. Next, you'll explore the latest Azure OpenAI models for image and video generation.
319319

320-
👉 [Check out AI Agents](./04-agents.md)
320+
👉 [Image and Video Generation with New Azure OpenAI Models](./07-ImageVideoGenerationNewModels.md)
Lines changed: 266 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,266 @@
1+
# Image and Video Generation with Azure OpenAI New Models
2+
3+
In this lesson, we'll explore how to use the latest Azure OpenAI models for generating images and videos in your .NET applications. We'll cover `gpt-image-1` for advanced image generation and `sora` for video generation, providing you with cutting-edge capabilities for creating visual content.
4+
5+
---
6+
7+
## Image and Video Generation with New Azure OpenAI Models
8+
9+
Image and video generation AI represents the next frontier in creative applications. Using the latest models like `gpt-image-1` for images and `sora` for videos through Azure OpenAI, you can create high-quality visual content from text descriptions. These new models offer improved quality, better understanding of complex prompts, and enhanced creative capabilities.
10+
11+
### Image Generation with gpt-image-1
12+
13+
The `gpt-image-1` model represents a significant advancement in image generation capabilities. Let's explore how to use it in a .NET application:
14+
15+
> 🧑‍💻**Sample code**: [Here is a working example of this application](./src/ImageGeneration-01/) you can follow along with.
16+
17+
#### How to run the sample code
18+
19+
To run the sample code, you'll need to:
20+
21+
1. Make sure you have set up a GitHub Codespace with the appropriate environment as described in the [Setup guide](../02-SetupDevEnvironment/readme.md)
22+
2. Ensure you have configured your Azure OpenAI API key and settings as described in the [Azure OpenAI setup guide](../02-SetupDevEnvironment/getting-started-azure-openai.md)
23+
3. Open a terminal in your codespace (Ctrl+` or Cmd+`)
24+
4. Navigate to the sample code directory:
25+
```bash
26+
cd 03-CoreGenerativeAITechniques/src/ImageGeneration-01
27+
```
28+
5. Run the application:
29+
```bash
30+
dotnet run
31+
```
32+
33+
#### Setting up the Azure OpenAI Client for gpt-image-1
34+
35+
First, we need to set up our configuration and create an Azure OpenAI client specifically for the `gpt-image-1` model:
36+
37+
```csharp
38+
var builder = new ConfigurationBuilder().AddUserSecrets<Program>();
39+
var configuration = builder.Build();
40+
41+
var model = configuration["model"]; // Set to "gpt-image-1" in your configuration
42+
var url = configuration["api_url"];
43+
var apiKey = configuration["api_key"];
44+
45+
AzureOpenAIClient azureClient = new(new Uri(url), new System.ClientModel.ApiKeyCredential(apiKey));
46+
var client = azureClient.GetImageClient(model);
47+
```
48+
49+
In this code:
50+
1. We load the configuration from user secrets
51+
2. We extract the model name (should be "gpt-image-1"), API URL, and API key from the configuration
52+
3. We create an Azure OpenAI client using the URL and API key
53+
4. We get an image client specifically for the `gpt-image-1` model
54+
55+
#### Creating Advanced Prompts and Options
56+
57+
The `gpt-image-1` model supports more sophisticated prompts and enhanced options:
58+
59+
```csharp
60+
string prompt = "A kitten playing soccer in the moon. Use a comic style";
61+
62+
// generate an image using the prompt with advanced options
63+
ImageGenerationOptions options = new()
64+
{
65+
Size = GeneratedImageSize.W1024xH1024,
66+
Quality = "medium"
67+
};
68+
```
69+
70+
The `gpt-image-1` model provides:
71+
- Enhanced understanding of complex prompts
72+
- Better style interpretation
73+
- Improved image quality and consistency
74+
- More accurate object placement and composition
75+
76+
#### Generating and Processing the Image
77+
78+
With our client, prompt, and options configured for `gpt-image-1`, we can generate the image:
79+
80+
```csharp
81+
GeneratedImage image = await client.GenerateImageAsync(prompt, options);
82+
83+
// Save the image to a file
84+
string path = $"{Environment.GetFolderPath(Environment.SpecialFolder.Desktop)}/genimage{DateTimeOffset.Now.Ticks}.png";
85+
File.WriteAllBytes(path, image.ImageBytes.ToArray());
86+
```
87+
88+
This code:
89+
1. Calls the `GenerateImageAsync` method with our prompt and options
90+
2. Creates a file path on the desktop with a unique filename
91+
3. Saves the generated image bytes to the file
92+
93+
### Video Generation with Sora
94+
95+
The `sora` model enables video generation from text prompts, bringing motion and temporal dynamics to your AI-generated content. Let's explore how to use it:
96+
97+
> 🧑‍💻**Sample code**: [Here are working examples for video generation](./src/VideoGeneration-AzureSora-01/) and [using the AzureSoraSDK](./src/VideoGeneration-AzureSoraSDK-02/) you can follow along with.
98+
99+
#### Using REST API Approach
100+
101+
The first approach uses direct REST API calls to interact with the Sora model:
102+
103+
```csharp
104+
// Configuration setup
105+
var builder = new ConfigurationBuilder().AddUserSecrets<Program>();
106+
var configuration = builder.Build();
107+
string endpoint = configuration["endpoint"];
108+
string apiKey = configuration["api_key"];
109+
string model = "sora";
110+
111+
// HTTP client setup
112+
var client = new HttpClient();
113+
client.DefaultRequestHeaders.Add("api-key", apiKey);
114+
client.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));
115+
```
116+
117+
#### Creating a Video Generation Job
118+
119+
To generate a video with Sora, we first create a video generation job:
120+
121+
```csharp
122+
string prompt = "Two puppies playing soccer in the moon. Use a cartoon style.";
123+
124+
// Create a video generation job
125+
string createUrl = $"{endpoint}/openai/v1/video/generations/jobs?api-version=preview";
126+
var body = new
127+
{
128+
prompt = prompt,
129+
width = 480,
130+
height = 480,
131+
n_seconds = 5,
132+
model = model
133+
};
134+
135+
var bodyJson = JsonSerializer.Serialize(body);
136+
var response = await client.PostAsync(createUrl, new StringContent(bodyJson, Encoding.UTF8, "application/json"));
137+
response.EnsureSuccessStatusCode();
138+
139+
var responseJson = await response.Content.ReadAsStringAsync();
140+
using var doc = JsonDocument.Parse(responseJson);
141+
string jobId = doc.RootElement.GetProperty("id").GetString();
142+
```
143+
144+
#### Polling for Job Completion
145+
146+
Video generation takes time, so we need to poll for the job status:
147+
148+
```csharp
149+
// Poll for job status
150+
string statusUrl = $"{endpoint}/openai/v1/video/generations/jobs/{jobId}?api-version=preview";
151+
string status = null;
152+
JsonElement statusResponse = default;
153+
154+
do
155+
{
156+
await Task.Delay(5000); // Wait 5 seconds between polls
157+
var statusResp = await client.GetAsync(statusUrl);
158+
var statusJson = await statusResp.Content.ReadAsStringAsync();
159+
var statusDoc = JsonDocument.Parse(statusJson);
160+
statusResponse = statusDoc.RootElement;
161+
status = statusResponse.GetProperty("status").GetString();
162+
Console.WriteLine($"{DateTime.Now:dd-MMM-yyyy HH:mm:ss} Job status: {status}");
163+
} while (status != "succeeded" && status != "failed" && status != "cancelled");
164+
```
165+
166+
#### Downloading the Generated Video
167+
168+
Once the job succeeds, we can download the generated video:
169+
170+
```csharp
171+
if (status == "succeeded")
172+
{
173+
string generationId = statusResponse.GetProperty("generations")[0].GetProperty("id").GetString();
174+
string videoUrl = $"{endpoint}/openai/v1/video/generations/{generationId}/content/video?api-version=preview";
175+
176+
var videoResp = await client.GetAsync(videoUrl);
177+
if (videoResp.IsSuccessStatusCode)
178+
{
179+
string outputFilename = Path.Combine(outputDir, $"sora_{DateTime.Now:ddMMMyyyy_HHmmss}.mp4");
180+
using (var fs = new FileStream(outputFilename, FileMode.Create, FileAccess.Write))
181+
{
182+
await videoResp.Content.CopyToAsync(fs);
183+
}
184+
Console.WriteLine($"SORA Generated video saved: '{outputFilename}'");
185+
}
186+
}
187+
```
188+
189+
#### Using AzureSoraSDK (Alternative Approach)
190+
191+
For a more streamlined experience, you can use the [AzureSoraSDK](https://github.com/DrHazemAli/AzureSoraSDK) - an official SDK that simplifies Sora integration:
192+
193+
```csharp
194+
// Configure the client
195+
var options = new SoraClientOptions
196+
{
197+
Endpoint = endpoint,
198+
ApiKey = apiKey,
199+
DeploymentName = "sora",
200+
ApiVersion = "preview"
201+
};
202+
203+
// Create client
204+
using var client = new SoraClient(options.Endpoint, options.ApiKey, options.DeploymentName);
205+
206+
// Submit video generation job
207+
var jobId = await client.SubmitVideoJobAsync(
208+
prompt: "A serene waterfall in a lush forest with sunlight filtering through trees",
209+
width: 1280,
210+
height: 720,
211+
nSeconds: 10);
212+
213+
// Wait for completion and download
214+
var videoUri = await client.WaitForCompletionAsync(jobId);
215+
var outputPath = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "sora_videos", "output.mp4");
216+
await client.DownloadVideoAsync(videoUri, outputPath);
217+
```
218+
219+
> 🗒️**Note:** Video generation with Sora typically takes several minutes to complete, depending on the length and complexity of the requested video.
220+
221+
> 🙋 **Need help?**: If you encounter any issues running these examples, [open an issue in the repository](https://github.com/microsoft/Generative-AI-for-beginners-dotnet/issues/new?template=Blank+issue) and we'll help you troubleshoot.
222+
223+
## Key Differences and Capabilities
224+
225+
### gpt-image-1 vs Previous Models
226+
- **Enhanced Prompt Understanding**: Better interpretation of complex, detailed prompts
227+
- **Improved Quality**: Higher resolution and more detailed images
228+
- **Better Style Consistency**: More accurate representation of artistic styles
229+
- **Object Placement**: Better spatial understanding and object relationships
230+
231+
### Sora Video Generation Features
232+
- **Temporal Consistency**: Maintains object and scene consistency across frames
233+
- **Complex Motion**: Handles intricate movements and interactions
234+
- **Style Flexibility**: Supports various artistic styles and aesthetics
235+
- **Duration Control**: Generate videos from a few seconds to longer sequences
236+
237+
## Summary
238+
239+
In this lesson, we explored the latest Azure OpenAI models for visual content generation:
240+
241+
1. **gpt-image-1 for Image Generation**:
242+
- Set up Azure OpenAI client for the latest image model
243+
- Create sophisticated prompts with enhanced understanding
244+
- Generate high-quality images with improved consistency
245+
246+
2. **Sora for Video Generation**:
247+
- Create video generation jobs using REST API
248+
- Monitor job progress and handle asynchronous processing
249+
- Download and save generated videos
250+
- Use AzureSoraSDK for simplified integration
251+
252+
These new models represent significant advances in AI-generated visual content, offering enhanced quality, better prompt understanding, and new creative possibilities for your applications.
253+
254+
## Additional resources
255+
256+
- [Microsoft Learn: How to use Azure OpenAI image generation models](https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/dall-e?tabs=gpt-image-1)
257+
- [OpenAI-dotnet image generation](https://github.com/openai/openai-dotnet?tab=readme-ov-file#how-to-generate-images)
258+
- [Microsoft Learn: Sora video generation (preview)](https://learn.microsoft.com/en-us/azure/ai-services/openai/concepts/video-generation)
259+
- [Azure Sora SDK documentation](https://github.com/DrHazemAli/AzureSoraSDK/tree/main)
260+
- [AzureSoraSDK Official Repository](https://github.com/DrHazemAli/AzureSoraSDK)
261+
262+
## Up next
263+
264+
You've learned how to use the latest Azure OpenAI models for image and video generation in your .NET applications. These capabilities open up new possibilities for creating dynamic, engaging visual content in your applications.
265+
266+
👉 [Return to the main lesson overview](./readme.md) to explore more generative AI techniques.

03-CoreGenerativeAITechniques/readme.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ In this lesson you'll learn some practical skills for building AI-enabled .NET a
1212
- 👀 Vision-based AI approaches
1313
- 🔊 Audio creation and transcription
1414
- 🖼️ Image generation with DALL-E
15+
- 🎬 Image and video generation with new models (gpt-image-1 and sora)
1516
- 🧩 Agents & assistants
1617
- 💻 Running models locally with AI Toolkit and Docker
1718

@@ -20,9 +21,10 @@ For this lesson, we will subdivide the content into the following sections:
2021
- [Chat, LLM completions, and function calling](./01-lm-completions-functions.md)
2122
- [Retrieval-Augmented Generation (RAG)](./02-retrieval-augmented-generation.md)
2223
- [Vision and audio AI applications](./03-vision-audio.md)
23-
- [Image Generation with Azure OpenAI](./05-ImageGenerationOpenAI.md)
2424
- [Agents](04-agents.md)
25+
- [Image Generation with Azure OpenAI](./05-ImageGenerationOpenAI.md)
2526
- [Running models locally with AI Toolkit, Docker, and Foundry Local](./06-LocalModelRunners.md)
27+
- [Image and Video Generation with New Azure OpenAI Models](./07-ImageVideoGenerationNewModels.md)
2628

2729
Starting with Language Model completions and Chat applications and function implementations with language models in .NET.
2830

0 commit comments

Comments
 (0)