Small benchmark test results

This small benchmark was conceived to test the solidity of the double-layered RAG approach implemented inside qdurllm as a retrieval technique.

RAG workflow

The RAG workflow goes like this:

First test

Benchmark is based on the content of 4 web pages:

The content of these URLs was chunked up and uploaded to Qdrant collections, and at the same time smaller portions of each chunk (encompassing 10-25% of the text) were used for querying, and the retrieved results compared with the original full text.

First results

The correct/total retrievals ratio for the only All-MiniLM-L6-v2 is 81.54%, whereas the correct/total retrievals ratio for the previously described double-layered All-MiniLM-L6-v2 + sentence-t5-base goes up 93.85%, equalling the one of sentence-t5-base alone. Following a double-layered approach with switched roles for the two encoders yields a correct/total retrievals ratio of 84.62%.

The advantage of this technique is that it does not require that all the chunks of text are encoded in 768-dimensional vectors (as would happen if we adopted sentence-t5-base alone), but this step is done dynamically at each vector call. As you can see, it also definitely improves the performance of the sole All-MiniLM-L6-v2 by little more than 12%.

The disadvantage is in the execution time: on a 8GB RAM-12 cores Windows 10 laptop, double-layered RAG takes an average of 8.39 s, against the 0.23 s of the sole sentence-t5-base.

Second test

The second benchmark is based on dataset, available on HuggingFace. It is a Q&A dataset based on a set of 358 answers (used as content to retrieve) and questions (used as retrieval queries).

Second results

Code availability

The benchmark test code is available here for the first test and here for the second one.


If you happen to have time and a powerful hardware, you can carry on vaster tests using the script referenced before: it would be great!