Skip to content

OdiaGenAI/FEDCOM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 

Repository files navigation

Framework for Easy Deployment of Compressed and Optimized Models (FEDCOM)

A Collaborative Initiative Between Odia Generative AI and the Norwegian BioAI Lab for Deep Learning Model Compression and Deployment.

In this repository, we have compared two LLM Compression Techniques mainly which are:

  1. BitsandBytes
  2. AWQ

FEDCOM LLM Quantization Report

Objective: Evaluate 4-bit and 8-bit quantization techniques for on-device Odia LLM deployment.

Technique VRAM Required (7B Model) Speed (Tokens/sec) Accuracy Drop (Est.) Best For
Baseline (FP16) ~14.5 GB 25 t/s 0% Cloud Servers
Bitsandbytes (NF4) ~5.2 GB 18 t/s Minimal Fast Prototyping
AWQ (INT4) ~4.8 GB 35 t/s Negligible Edge Deployment

You can find the Jupyter Notebook for all these tests and comparison between results of each techniques here.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors