LangChain provides a convenient wrapper for interacting with Pinecone, a popular vector database. This wrapper simplifies the process of creating indexes, upserting vectors, and performing similarity searches. In this comprehensive guide, we will explore how to use the LangChain Pinecone wrapper for these tasks.
Prerequisites
- Pinecone: Ensure you have a Pinecone account and API key.
- LangChain: Install LangChain using pip:
pip install langchain
- Pinecone Python SDK: Install the Pinecone Python SDK:
pip install pinecone-client
Creating a Pinecone Index
Import Necessary Libraries:
Python
from langchain.vectorstores import Pinecone
import pinecone
Initialize Pinecone:
Python
pinecone.init(
api_key=”YOUR_API_KEY”,
environment=”us-west1-gcp” # Replace with your desired environment
)
Create a Pinecone Index:
Python
index_name = “my_index”
embedding_dimension = 512 # Adjust based on your embeddings
metric = “cosine” # Choose a suitable metric
collection = Pinecone(
index_name=index_name,
embedding_dimension=embedding_dimension,
metric=metric
)
Upserting Vectors
Prepare Vectors: Create a list of vectors to upsert.
Upsert Vectors:
Python
vectors = [
{“id”: “doc1”, “embeddings”: [0.1, 0.2, …]},
{“id”: “doc2”, “embeddings”: [0.3, 0.4, …]}
]
collection.add_embeddings(
embeddings=vectors
)
Performing Similarity Search
Create a Query Embedding:
Python
query_text = “What is the capital of France?”
query_embedding = embedding_model.encode([query_text])
Search for Similar Items:
Python
results = collection.similarity_search(
query=query_embedding,
k=5
)
Accessing Results
The results will be a list of documents with their corresponding similarity scores.
Python
for result in results:
print(result.metadata["id"])
print(result.embedding)
print(result.score)
LangChain’s Pinecone wrapper simplifies the process of creating, updating, and querying vector indexes in Pinecone. By using this wrapper, you can efficiently manage your vector data and perform similarity search operations.