Creating and Querying a Chroma Vector Database with Documents

Chroma is a powerful vector database that can be used to store and query large-scale datasets represented as numerical vectors. In this comprehensive guide, we will explore how to create a Chroma vector database and query it using documents as input.

Prerequisites

  • Python: Ensure you have Python installed on your system.
  • Chroma: Install Chroma using pip: pip install chromadb
  • Embedding Model: Choose a suitable embedding model, such as SentenceTransformer, to generate embeddings for your documents.

Creating a Chroma Collection

Import Necessary Libraries:

Python

import chromadb
from sentence_transformers import SentenceTransformer

Create a Chroma Client:

Python

client = chromadb.Client()

Create a Collection:

Python

collection = client.create_collection(
name=”my_collection”,
embedding_function=SentenceTransformer(“all-MiniLM-L6-v2”)
)

Ingesting Documents

Prepare Documents: Load your documents into a list of strings.

Generate Embeddings: Use the embedding model to generate embeddings for each document.

Add Documents to the Collection:

Python

collection.add(
documents=documents,
embeddings=embeddings
)

Querying the Collection

Create a Query:

Python

query_text = “What is the capital of France?”

Generate Query Embedding:

Python

query_embedding = embedding_model.encode([query_text])

Perform Query:

Python

results = collection.query(
query_embeddings=query_embedding,
n_results=5
)

Accessing Results

Python

for result in results["matches"]:
    print(result["document"])
Configuring VS Code, Python, and OpenAI API Key
Looping Through Results and Displaying Similarity Search

Get industry recognized certification – Contact us

keyboard_arrow_up
Open chat
Need help?
Hello 👋
Can we help you?