Framework for Easy Deployment of Compressed and Optimized Models (FEDCOM)

A Collaborative Initiative Between Odia Generative AI and the Norwegian BioAI Lab for Deep Learning Model Compression and Deployment.

In this repository, we have compared two LLM Compression Techniques mainly which are:

FEDCOM LLM Quantization Report

Objective: Evaluate 4-bit and 8-bit quantization techniques for on-device Odia LLM deployment.

Technique	VRAM Required (7B Model)	Speed (Tokens/sec)	Accuracy Drop (Est.)	Best For
Baseline (FP16)	~14.5 GB	25 t/s	0%	Cloud Servers
Bitsandbytes (NF4)	~5.2 GB	18 t/s	Minimal	Fast Prototyping
AWQ (INT4)	~4.8 GB	35 t/s	Negligible	Edge Deployment

You can find the Jupyter Notebook for all these tests and comparison between results of each techniques here.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
FEDCOM_LLM_Compression.ipynb		FEDCOM_LLM_Compression.ipynb
README.md		README.md