This module provides a wrapper for ChatGroq to control the rate of streaming token output.
Ensure you have the langchain-core and related dependencies installed.
pip install langchain-coreImport SlowGroq and wrap your ChatGroq instance:
from langchain_groq import ChatGroq
# Initialize your Groq chat model
groq_model = ChatGroq(model="llama-3.1-8b-instant")
# Wrap with throttling (e.g., 5 tokens per second)
slow_groq = SlowGroq(groq_model, tokens_per_second=5)
# Stream messages slowly
for chunk in slow_groq.stream(messages):
print(chunk.text, end="", flush=True)groq_model: An instance ofChatGroq.tokens_per_second: Optional float specifying the desired token rate.