Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
---
title: Evaluation of Rag with Ragas
description: Use RAGAS to evaluate your RAG pipelines traced with Langfuse to measure the quality of your retrieval and sythesis.
category: Evaluation
source: "⚠️ This file is auto-generated from cookbook/evaluation_of_rag_with_ragas.ipynb. Do not edit this file directly — update the .ipynb file and regenerate with `bash scripts/update_cookbook_docs.sh`."
title: "Evaluation of RAG pipelines with Ragas"
description: "Use RAGAS to evaluate your RAG pipelines traced with Langfuse to measure the quality of your retrieval and sythesis."
category: "Evaluation"
---

# Evaluation of RAG pipelines with Ragas
Expand Down Expand Up @@ -52,16 +53,6 @@ fiqa_eval = load_dataset("explodinggradients/fiqa", "ragas_eval")['baseline']
fiqa_eval
```




Dataset({
features: ['question', 'ground_truths', 'answer', 'contexts'],
num_rows: 30
})



## The Metrics
For going to measure the following aspects of a RAG system. These metric are from the Ragas library:

Expand Down Expand Up @@ -143,14 +134,6 @@ row = fiqa_eval[0]
row['question'], row['answer']
```




('How to deposit a cheque issued to an associate in my business into my business account?',
'\nThe best way to deposit a cheque issued to an associate in your business into your business account is to open a business account with the bank. You will need a state-issued "dba" certificate from the county clerk\'s office as well as an Employer ID Number (EIN) issued by the IRS. Once you have opened the business account, you can have the associate sign the back of the cheque and deposit it into the business account.')



Now lets init a Langfuse client SDK to instrument you app.


Expand All @@ -169,9 +152,6 @@ else:
print("Authentication failed. Please check your credentials and host.")
```

Langfuse client is authenticated and ready!


Here we are defining a utility function to score your trace with the metrics you chose.


Expand Down Expand Up @@ -237,21 +217,6 @@ print("RAGAS Scores:", ragas_scores)
ragas_scores
```

calculating faithfulness
calculating answer_relevancy
calculating llm_context_precision_without_reference
RAGAS Scores: {'faithfulness': 0.8, 'answer_relevancy': np.float64(0.9825100521118072), 'llm_context_precision_without_reference': 0.9999999999}





{'faithfulness': 0.8,
'answer_relevancy': np.float64(0.9825100521118072),
'llm_context_precision_without_reference': 0.9999999999}



Once the scores are computed you can add them to the trace in Langfuse:


Expand Down Expand Up @@ -340,13 +305,6 @@ traces_sample = sample(traces, NUM_TRACES_TO_SAMPLE)
len(traces_sample)
```




3



Now lets make a batch and score it. Ragas uses huggingface dataset object to build the dataset and run the evaluation. If you run this on your own production data, use the right keys to extract the question, contexts and answer from the trace


Expand Down Expand Up @@ -386,24 +344,13 @@ ds = Dataset.from_dict(evaluation_batch)
r = evaluate(ds, metrics=[Faithfulness(), ResponseRelevancy()])
```


Evaluating: 0%| | 0/6 [00:00<?, ?it/s]


And that is it! You can see the scores over a time period.


```python
r
```




{'faithfulness': 0.5516, 'answer_relevancy': 0.9294}



You can also push the scores back into Langfuse or use the exported pandas dataframe to run further analysis.


Expand All @@ -419,66 +366,6 @@ df.head()



<div>
<style scoped>
.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
vertical-align: top;
}

.dataframe thead th {
text-align: right;
}
</style>
<table border="1" class="dataframe">
<thead>
<tr style="text-align: right;">
<th></th>
<th>user_input</th>
<th>retrieved_contexts</th>
<th>response</th>
<th>faithfulness</th>
<th>answer_relevancy</th>
<th>trace_id</th>
</tr>
</thead>
<tbody>
<tr>
<th>0</th>
<td>Do I need a new EIN since I am hiring employee...</td>
<td>[You don't need to notify the IRS of new membe...</td>
<td>\nNo, you do not need a new EIN since you are ...</td>
<td>0.750000</td>
<td>0.992491</td>
<td>9a96d48d96d45b1bb6d28d48b7cc93d4</td>
</tr>
<tr>
<th>1</th>
<td>Privacy preferences on creditworthiness data</td>
<td>[See the first item in the list: For our every...</td>
<td>\nThe best answer to this question is that you...</td>
<td>0.571429</td>
<td>0.875799</td>
<td>18e23692aa5b2b245c176574e247a236</td>
</tr>
<tr>
<th>2</th>
<td>Have plenty of cash flow but bad credit</td>
<td>[This is probably a good time to note that cre...</td>
<td>\nIf you have plenty of cash flow but bad cred...</td>
<td>0.333333</td>
<td>0.919893</td>
<td>877d64dc4355743e2d2f1b2607d9ec14</td>
</tr>
</tbody>
</table>
</div>




```python
for _, row in df.iterrows():
Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
---
title: Evaluation with Langchain
description: Cookbook that demonstrates how to run Langchain evaluations on data in Langfuse.
category: Evaluation
source: "⚠️ This file is auto-generated from cookbook/evaluation_with_langchain.ipynb. Do not edit this file directly — update the .ipynb file and regenerate with `bash scripts/update_cookbook_docs.sh`."
title: "Run Langchain Evaluations on data in Langfuse"
description: "Cookbook that demonstrates how to run Langchain evaluations on data in Langfuse."
category: "Evaluation"
---

# Run Langchain Evaluations on data in Langfuse
Expand Down Expand Up @@ -75,7 +76,9 @@ else:
print("Authentication failed. Please check your credentials and host.")
```

Langfuse client is authenticated and ready!
```
Langfuse client is authenticated and ready!
```


### Fetching data
Expand Down Expand Up @@ -114,7 +117,9 @@ generations[0].id



'adb5ba6beab14984ab89006ee09e9cd6'
```
'adb5ba6beab14984ab89006ee09e9cd6'
```



Expand Down
Loading
Loading