Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 20 additions & 14 deletions md/03.FineTuning/FineTuning_Scenarios.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,22 +2,26 @@

![FineTuning with MS Services](../../imgs/03/intro/FinetuningwithMS.png)

**Platform** This includes various technologies such as Azure AI Foundry, Azure Machine Learning, AI Tools, Kaito, and ONNX Runtime.
This section provides an overview of fine-tuning scenarios in Microsoft Foundry and Azure environments, including deployment models, infrastructure layers, and commonly used optimization techniques.

**Infrastructure** This includes the CPU and FPGA, which are essential for the fine-tuning process. Let me show you the icons for each of these technologies.
**Platform**
This includes managed services such as Microsoft Foundry (formerly Azure AI Foundry) and Azure Machine Learning, which provide model management, orchestration, experiment tracking, and deployment workflows.

**Tools & Framework** This includes ONNX Runtime and ONNX Runtime. Let me show you the icons for each of these technologies.
[Insert icons for ONNX Runtime and ONNX Runtime]
**Infrastructure**
Fine-tuning requires scalable compute resources. In Azure environments, this typically includes GPU-based virtual machines and CPU resources for lightweight workloads, along with scalable storage for datasets and checkpoints.

The fine-tuning process with Microsoft technologies involves various components and tools. By understanding and utilizing these technologies, we can effectively fine-tune our applications and create better solutions.
**Tools & Framework**
Fine-tuning workflows commonly rely on frameworks and optimization libraries such as Hugging Face Transformers, DeepSpeed, and PEFT (Parameter-Efficient Fine-Tuning).

The fine-tuning process with Microsoft technologies spans platform services, compute infrastructure, and training frameworks. By understanding how these components work together, developers can efficiently adapt foundation models to specific tasks and production scenarios.

## Model as Service

Fine-tune the model using hosted fine-tuning, without the need to create and manage compute.

![MaaS Fine Tuning](../../imgs/03/intro/MaaSfinetune.png)

Serverless fine-tuning is available for Phi-3-mini and Phi-3-medium models, enabling developers to quickly and easily customize the models for cloud and edge scenarios without having to arrange for compute. We have also announced that, Phi-3-small, is now available through our Models-as-a-Service offering so developers can quickly and easily get started with AI development without having to manage underlying infrastructure.
Serverless fine-tuning is now available for Phi-3, Phi-3.5, and Phi-4 model families, enabling developers to quickly and easily customize the models for cloud and edge scenarios without having to arrange for compute.

## Model as a Platform

Expand All @@ -27,20 +31,22 @@ Users manage their own compute in order to Fine-tune their models.

[Fine Tuning Sample](https://github.com/Azure/azureml-examples/blob/main/sdk/python/foundation-models/system/finetune/chat-completion/chat-completion.ipynb)

## Fine Tuning Scenarios
## Fine-Tuning Techniques Comparison

| | | | | | | |
|-|-|-|-|-|-|-|
|Scenario|LoRA|QLoRA|PEFT|DeepSpeed|ZeRO|DORA|
|Scenario|LoRA|QLoRA|PEFT|DeepSpeed|ZeRO|DoRA|
|---|---|---|---|---|---|---|
|Adapting pre-trained LLMs to specific tasks or domains|Yes|Yes|Yes|Yes|Yes|Yes|
|Fine-tuning for NLP tasks such as text classification, named entity recognition, and machine translation|Yes|Yes|Yes|Yes|Yes|Yes|
|Fine-tuning for QA tasks|Yes|Yes|Yes|Yes|Yes|Yes|
|Fine-tuning for generating human-like responses in chatbots|Yes|Yes|Yes|Yes|Yes|Yes|
|Fine-tuning for generating music, art, or other forms of creativity|Yes|Yes|Yes|Yes|Yes|Yes|
|Reducing computational and financial costs|Yes|Yes|No|Yes|Yes|No|
|Reducing memory usage|No|Yes|No|Yes|Yes|Yes|
|Using fewer parameters for efficient finetuning|No|Yes|Yes|No|No|Yes|
|Memory-efficient form of data parallelism that gives access to the aggregate GPU memory of all the GPU devices available|No|No|No|Yes|Yes|Yes|
|Reducing computational and financial costs|Yes|Yes|Yes|Yes|Yes|Yes|
|Reducing memory usage|Yes|Yes|Yes|Yes|Yes|Yes|
|Using fewer parameters for efficient finetuning|Yes|Yes|Yes|No|No|Yes|
|Memory-efficient form of data parallelism that gives access to the aggregate GPU memory of all the GPU devices available|No|No|No|Yes|Yes|No|

> [!NOTE]
> LoRA, QLoRA, PEFT, and DoRA are parameter-efficient fine-tuning methods, whereas DeepSpeed and ZeRO focus on distributed training and memory optimization.

## Fine Tuning Performance Examples

Expand Down