Vector store-backed retriever

A vector store retriever is a retriever that uses a vector store to retrieve documents. It is a lightweight wrapper around the vector store class to make it conform to the retriever interface. It uses the search methods implemented by a vector store, like similarity search and MMR, to query the texts in the vector store.

Once you construct a vector store, it's very easy to construct a retriever. Let's walk through an example.

from langchain_community.document_loaders import TextLoader

loader = TextLoader("../../state_of_the_union.txt")

from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import CharacterTextSplitter

documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
texts = text_splitter.split_documents(documents)
embeddings = OpenAIEmbeddings()
db = FAISS.from_documents(texts, embeddings)

retriever = db.as_retriever()

docs = retriever.invoke("what did he say about ketanji brown jackson")

Maximum marginal relevance retrieval

By default, the vector store retriever uses similarity search. If the underlying vector store supports maximum marginal relevance search, you can specify that as the search type.

retriever = db.as_retriever(search_type="mmr")

docs = retriever.invoke("what did he say about ketanji brown jackson")

Similarity score threshold retrieval

You can also set a retrieval method that sets a similarity score threshold and only returns documents with a score above that threshold.

retriever = db.as_retriever(
    search_type="similarity_score_threshold", search_kwargs={"score_threshold": 0.5}
)

docs = retriever.invoke("what did he say about ketanji brown jackson")

Specifying top k

You can also specify search kwargs like k to use when doing retrieval.

retriever = db.as_retriever(search_kwargs={"k": 1})

docs = retriever.invoke("what did he say about ketanji brown jackson")
len(docs)

Vector store-backed retriever

Maximum marginal relevance retrieval​

Similarity score threshold retrieval​

Specifying top k​

Help us out by providing feedback on this documentation page:

Maximum marginal relevance retrieval

Similarity score threshold retrieval

Specifying top k