This guide provides instructions to install Python 3.11, set up two Python projects (llama-stack-server and llama-stack-workshop) with virtual environments, install Ollama, pull specific LLaMA models, and configure the projects. You can run the installer script OR follow the step by step instructions.
- A system with a terminal (Linux, macOS, or Windows with WSL).
- Internet access to download packages and models.
- Administrative privileges for installing software.
- A Tavily API key (for the workshop project).
- Step 1 - Clone the repo
git clone https://github.com/devninja-in/llama-stack-workshop.git- Step 2 - Go to project directory and make the installer script executable and run it there.
cd llama-stack-workshop
chmod +x installer.sh
./installer.sh- Linux (Ubuntu/Debian-based)
sudo apt update sudo apt install -y software-properties-common sudo add-apt-repository ppa:deadsnakes/ppa sudo apt install -y python3.11 python3.11-venv python3.11-dev
- macOS (using Homebrew)
brew install python@3.11
- Windows
- Download the Python 3.11 installer from python.org.
- Run the installer, ensuring to check "Add Python 3.11 to PATH."
- Verify installation:
python3.11 --version
- Download and install Ollama:
curl -fsSL https://ollama.com/install.sh | sh - Verify installation:
ollama --version
Run the following commands to pull the required models:
ollama pull llama3.2:3b-instruct-fp16
#ollama pull meta-llama/Llama-Guard-3-8BStart the Ollama service:
ollama serveNote: Run this in a separate terminal, as it needs to keep running.
- Create and navigate to the project directory:
mkdir llama-stack-server cd llama-stack-server - Create a virtual environment:
python3.11 -m venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate
- Install llama-stack
pip install llama-stack==0.2.8
4. Build the LLaMA stack with the Ollama template:
```bash
INFERENCE_MODEL="llama3.2:3b-instruct-fp16" llama stack build --template ollama --image-type venv
- Run the LLaMA stack server:
Note: Keep this running in a terminal.
export INFERENCE_MODEL="llama3.2:3b-instruct-fp16" export SAFETY_MODEL="llama-guard3:8b" llama stack run .venv/lib/python3.11/site-packages/llama_stack/templates/ollama/run.yaml --image-type venv
- Create and navigate to the project directory:
mkdir llama-stack-workshop cd llama-stack-workshop - Create a virtual environment:
python3.11 -m venv .venv source .venv/bin/activate # On Windows: .venv\Scripts\activate
- Create a
requirements.txtfile:echo -e "llama-stack-client\nstreamlit\ndotenv\nrequests\ntavily" > requirements.txt
- Install the dependencies:
pip install -r requirements.txt
- Create a
.envfile:echo -e "LLAMA_STACK_SERVER=http://localhost:8321\nINFERENCE_MODEL_ID=llama3.2:3b-instruct-fp16\nSHIELD_ID=meta-llama/Llama-Guard-3-8B\nEMBEDDING_MODEL_ID=all-MiniLM-L6-v2\nTAVILY_SEARCH_API_KEY=<Add Key>" > .env
- Replace
<Add Key>in the.envfile with your Tavily API key.
- Ensure the Ollama service is running (
ollama serve). - Ensure the
llama-stack-serveris running (from Step 5.4). - The
llama-stack-workshopproject is now ready for development with the specified dependencies and configuration.
- Keep the Ollama service and
llama-stack-serverrunning in separate terminals. - Obtain a Tavily API key from Tavily and update the
.envfile. - Ensure Python 3.11 is used for all virtual environments to avoid compatibility issues.