Chroma is a powerful vector database that can be used to store and query large-scale datasets represented as numerical vectors. In this comprehensive guide, we will explore how to create a Chroma vector database and query it using documents as input.
Prerequisites
- Python: Ensure you have Python installed on your system.
- Chroma: Install Chroma using pip:
pip install chromadb
- Embedding Model: Choose a suitable embedding model, such as SentenceTransformer, to generate embeddings for your documents.
Creating a Chroma Collection
Import Necessary Libraries:
Python
import chromadb
from sentence_transformers import SentenceTransformer
Create a Chroma Client:
Python
client = chromadb.Client()
Create a Collection:
Python
collection = client.create_collection(
name=”my_collection”,
embedding_function=SentenceTransformer(“all-MiniLM-L6-v2”)
)
Ingesting Documents
Prepare Documents: Load your documents into a list of strings.
Generate Embeddings: Use the embedding model to generate embeddings for each document.
Add Documents to the Collection:
Python
collection.add(
documents=documents,
embeddings=embeddings
)
Querying the Collection
Create a Query:
Python
query_text = “What is the capital of France?”
Generate Query Embedding:
Python
query_embedding = embedding_model.encode([query_text])
Perform Query:
Python
results = collection.query(
query_embeddings=query_embedding,
n_results=5
)
Accessing Results
Python
for result in results["matches"]:
print(result["document"])