February 8, 2024
-
15
Min Read

How to build a question answering system in Python with a vector index and OpenAI

Building a Q&A engine is easy with Momento Vector Index and OpenAI APIs
Pratik Agarwal
Headshot of the blog author
by
Pratik Agarwal
,
,
by
Pratik Agarwal
by
green squirrel logo for momento
Pratik Agarwal
,
,
AI/ML

Note: We're presenting this tutorial with code examples in both Python and TypeScript (Node.js). This is the Python tutorial. Find the TypeScript version here!

In this step-by-step guide, we delve into building a question answering system from scratch, focusing on a specific topic: carrots. Central to our exploration is the concept of treating question answering as a retrieval process. This approach involves identifying source documents or specific sections within them that contain the answers to users' queries. By revealing the underlying process without the complexities introduced by external libraries, we aim to provide valuable insights into the fundamental workings of such systems.

Here's a quick overview of how we will get this done:

  • Initialize OpenAI and Momento clients.
  • Fetch and process (create chunks) carrot data from Wikipedia.
  • Generate embeddings for the text using OpenAI.
  • Store the embeddings in Momento Vector Index.
  • Search and respond to queries using the stored data.
  • Utilize OpenAI's chat completions for refined responses.

We also have a Google Colab set up for this blog, where you can execute queries while you're reading the blog!

Environment Setup

Before we start coding, we need to create our index in Momento for storing data, and generate an API key to access Momento programmatically. You can do both on Momento Console and follow this guide for details! The code below uses mvi-openai-demo as the index name, 1536 for the number of dimensions (more on this soon!), and cosine similarity as the similarity metric. Cosine similarity cares more about the orientation of vectors than its magnitude (the word count in this case), which are suitable for a question answering system.

We also need an OpenAI API key to generate embeddings of our data and search queries. 

Next, we have to install the necessary packages. For Python, you'll need openai, requests, and momento.


pip install momento openai

Step 1: Initializing clients

We begin by initializing our OpenAI and Momento clients. Here, we set up our development environment with the necessary packages and API keys. This step is crucial for establishing communication with OpenAI and Momento services. It lays the foundation for our Q&A engine.

Make sure you have the environment variables 'OPENAI_API_KEY' and 'MOMENTO_API_KEY' set before you run the code!


import openai
import requests
from momento import CredentialProvider, PreviewVectorIndexClient, VectorIndexConfigurations
from momento.config import VectorIndexConfiguration
from momento.requests.vector_index import Item
from momento.responses.vector_index import Search, UpsertItemBatch
import os

# Setting up the API keys and clients
openai.api_key = os.environ['OPENAI_API_KEY']
VECTOR_INDEX_CONFIGURATION: VectorIndexConfiguration = VectorIndexConfigurations.Default.latest()
VECTOR_AUTH_PROVIDER = CredentialProvider.from_environment_variable('MOMENTO_API_KEY')
mvi_client = PreviewVectorIndexClient(VECTOR_INDEX_CONFIGURATION, VECTOR_AUTH_PROVIDER)
index_name = 'mvi-openai-demo'

Step 2: Loading data from Wikipedia

We start by extracting data about carrots from Wikipedia. This step demonstrates how to handle external API calls and parse JSON responses. Go ahead and try this out locally for any Wikipedia page!


def get_wikipedia_extract(url: str) -> str:
    response = requests.get(url)
    data = response.json()
    pages = data['query']['pages']
    extract = next(iter(pages.values()))['extract']
    return extract

url = "https://en.wikipedia.org/w/api.php?action=query&format=json&titles=Carrot&prop=extracts&explaintext";

Now let’s run these snippets and view the length of our Carrot wikipedia page with a sample text


extract_text = get_wikipedia_extract(url)
print('Total characters in carrot Wikipedia page: ' + str(len(extract_text)))
print('Sample text in carrot Wikipedia page:\n\n ' + extract_text[0:500])

Output:


Total characters in carrot Wikipedia page: 21534

The carrot (Daucus carota subsp. sativus) is a root vegetable, typically orange in color, though heirloom variants including purple, black, red, white, and yellow cultivars exist, all of which are domesticated forms of the wild carrot, Daucus carota, native to Europe and Southwestern Asia. The plant probably originated in Persia and was originally cultivated for its leaves and seeds. The most commonly eaten part of the plant is the taproot, although the stems and leaves are also eaten.

Go ahead and try this out locally for any Wikipedia page!

Step 3: Preprocessing data to create chunks

In building our Q&A engine, we approach question answering as a kind of retrieval: identifying which source documents (or parts of them) contain the answers to a user's query. This concept is fundamental to our process and influences how we handle our data.

To make our system effective, we preprocess the data into chunks. This is because, in a question-answering context, answers often reside in specific sections of a document rather than across the entire text. By splitting the data into manageable chunks, we're effectively creating smaller, searchable units that our system can scan to find relevant answers. This chunking process is a crucial step in transforming extensive text into a format conducive to semantic search and retrieval.

We've opted for a straightforward approach to split our text by character count. However, it's crucial to understand that the size and method of chunking can significantly impact the system's effectiveness. Too large chunks might dilute the relevance of search results, while too small ones may miss critical context.

Alternative chunking methods may use tokenizers, such as tiktoken to split the text along boundaries that align with the text embedding model. These methods may produce better results, but require external libraries. For demonstration we opt for a simpler method.


def split_text_into_chunks(text: str, chunk_size: int = 600) -> list[str]:
    return [text[i:i + chunk_size] for i in range(0, len(text), chunk_size)]

chunks = split_text_into_chunks(extract_text)

Now we can view the total number of chunks that got created 


print('Total number of chunks created: ' + str(len(chunks)))
print('Total characters in each chunk: ' + str(len(chunks[0])))

Output:


Total number of chunks created: 36
Total characters in each chunk: 600

Step 4: Generating embeddings with OpenAI

In our approach to building a Q&A engine, we've chosen to leverage the power of vector search, a state-of-the-art technique in semantic search. This method differs significantly from traditional keyword search approaches, like those used in Elasticsearch or Lucene. Vector search delves deeper into the intricacies of language, capturing concepts and meanings in a way that keyword search can't.

To facilitate vector search, our first task is to transform our textual data into a format that embodies this richer semantic understanding. We achieve this by generating embeddings using OpenAI's text-embedding-ada-002 model. This model is known for striking a balance between accuracy, cost, and speed, making it an ideal choice for generating text embeddings.


def generate_embeddings(chunks: list[str]):
    response = openai.embeddings.create(input=chunks, model="text-embedding-ada-002")
    return response.data

Recall that we selected 1536 as the dimensionality for our vector index. This decision was based on the fact that OpenAI, when generating embeddings for each chunk, produces these embeddings as floating point vectors with a length of 1536.


embeddings_response = generate_embeddings(chunks)
print('Length of each embedding: ' + str(len(embeddings_response[0].embedding)))
print('Sample embedding: ' + str(embeddings_response[0].embedding[0:10]))

Output:


Length of each embedding: 1536

Sample embedding: 0.008307404,-0.03437371,0.00043777542,-0.01768263,-0.010926112,-0.0056728064,-0.0025742147,-0.023453956,-0.021114917,-0.020148791

Step 5: Storing data in Momento Vector Index

After generating embeddings, we store them in Momento's Vector Index. This involves creating items with IDs, vectors, and metadata, then upserting them to MVI. When storing data in the Momento Vector Index, it's important to use deterministic chunk IDs. This ensures that the same data isn't re-indexed repeatedly; optimizing storage, retrieval efficiency, and response accuracy. Managing data storage effectively is key to maintaining a scalable and responsive Q&A system.


def upsert_to_mvi(embeddings: list, chunks: list[str]):

    metadatas = [{"text": chunk} for chunk in chunks]

    ids = [f"chunk{i + 1}" for i, _ in enumerate(embeddings)]

    items = [Item(id=id, vector=embedding.embedding, metadata=metadata) for id, embedding, metadata in zip(ids, embeddings, metadatas)]

    response = mvi_client.upsert_item_batch(index_name, items)

    if (isinstance(response, UpsertItemBatch.Success)):
        print("\n\nUpsert successful. Items have been stored.")
    elif isinstance(response, UpsertItemBatch.Error):
        print(response.message)
        raise response.inner_exception

upsert_to_mvi(embeddings_response, chunks)

Output:


Upsert successful. Items have been stored.

Step 6: Searching and responding to queries

This step highlights the core functionality of the Q&A engine - retrieving answers using Momento Vector Index.This process involves searching through the indexed data using text embeddings, a technique that ensures we find the most relevant and contextually appropriate results.

When we indexed snippets of text in the previous steps, we first transformed these text snippets into vector representations using OpenAI's model. This transformation was key to preparing our data for efficient storage and retrieval in the Momento Vector Index.

Now, as we turn to the task of querying, it's crucial to apply a similar preprocessing step. The user's question, "What is a carrot?" in this instance, must also be converted into a vector. This enables us to perform a vector-to-vector search within our index.

The effectiveness of our search hinges on the consistency of preprocessing. The same embedding model and process used during indexing must be applied to the query. This ensures that the vector representation of the query aligns with the vectors stored in our index, otherwise the approach would not work.


def search_query(query_text: str) -> list[str]:
    query_vector = openai.embeddings.create(input=query_text, model="text-embedding-ada-002").data[0].embedding
    search_response = mvi_client.search(index_name, query_vector=query_vector, top_k=2, metadata_fields=["text"])
    if isinstance(search_response, Search.Success):
        return [hit.metadata['text'] for hit in search_response.hits]
    elif isinstance(search_response, Search.Error):
        print(f"Error while searching on index {index_name}: {search_response.message}")
        return []

Let’s start with a simple search for “What is a carrot?”:


query = "What is a carrot?"
texts = search_query(query, index_name)
if texts:
    print("\n=========================================\n”)
    print("Embedding search results:\n\n" + "\n".join(texts))
    print("\n=========================================\n")

The output for this query looks like:


The carrot (Daucus carota subsp. sativus) is a root vegetable, typically orange in color, though heirloom variants including purple, black, red, white, and yellow cultivars exist, all of which are domesticated forms of the wild carrot, Daucus carota, native to Europe and Southwestern Asia. The plant probably originated in Persia and was originally cultivated for its leaves and seeds. The most commonly eaten part of the plant is the taproot, although the stems and leaves are also eaten. The domestic carrot has been selectively bred for its enlarged, more palatable, less woody-textured taproot.

The carrot is a biennial plant in the umbellifer family, Apiaceae. At birth, it grows a rosette of leaves while building up the enlarged taproot. Fast-growing cultivars mature within about three months (90 days) of sowing the seed, while slower-maturing cultivars need a month longer (120 days). The roots contain high quantities of alpha- and beta-carotene, lycopene, anthocyanins, lutein, and are a good source of vitamin A, vitamin K, and vitamin B6. Black carrots are one of the richest sources of anthocyanins (250-300 mg/100 g fresh root weight), and hence possesses high antioxidant ability

As you see, we indexed vectors in Momento Vector Index and stored the original text as metadata in the items. When asked the question “What is a carrot?”, we transformed the text into a vector, performed a vector search in MVI, and returned the original text stored in the metadata. Under the hood we did a vector-to-vector matching, yet from a user perspective it looks like a text-to-text search.

Step 7: Too verbose? Let’s use chat completions to enhance query responses

Until now, our approach has treated question answering primarily as a retrieval task. We've taken the user's query, performed a search, and presented snippets of information that could potentially contain the answer. This method, while effective in fetching relevant data, still places the onus on the user to sift through the results and extract the answer. It's akin to providing pages from a reference book without pinpointing the exact information sought.

To elevate the user experience from mere retrieval to direct answer generation, we introduce Large Language Models (LLMs) like OpenAI's GPT-3.5. LLMs have the ability to not just find but also synthesize information, offering concise and contextually relevant answers. This is a significant leap from delivering a page of search results to providing a clear, succinct response to the user's query.


def search_with_chat_completion(texts: list[str], query_text: str):
    text = "\n".join(texts)
    prompt = ("Given the following extracted parts about carrot, answer questions pertaining to"
              " carrot only from the provided text. If you don't know the answer, just say that "
              "you don't know. Don't try to make up an answer. Do not answer anything outside of the context given. "
              "Your job is to only answer about carrots, and only from the text below. If you don't know the answer, just "
              "say that you don't know. Here's the text:\n\n----------------\n\n")
    chat_response = openai.chat.completions.create(model="gpt-3.5-turbo", messages=[
        {"role": "system", "content": prompt + text},
        {"role": "user", "content": query_text}
    ])
    return chat_response.choices[0].message

And let’s use the same query “What is a carrot?” to compare the response.


chat_completion_resp = search_with_chat_completion(texts, query)
print("\n=========================================\n")
print("Chat completion search results:\n\n" + chat_completion_resp.content)
print("\n=========================================\n")

Output:


A carrot is a root vegetable that is typically orange in color, although there are also other colored variants such as purple, black, red, white, and yellow. 

Now let’s quickly compare the outputs of a more specific question such as "How fast do fast-growing cultivators mature in carrots?"


query = "how fast do fast-growing cultivators mature in carrots?"
texts = search_query(query)
if texts:
    print("\n=========================================\n")
    print("Embedding search results:\n\n" + texts[0])
    print("\n=========================================\n")

    chat_completion_resp = search_with_chat_completion(texts, query)
    print("\n=========================================\n")
    print("Chat completion search results:\n\n" + chat_completion_resp.content)
    print("\n=========================================\n")

Output:

Notice how brief and precise the chat completion response is compared to the raw semantic search results.


=========================================
Embedding search results:

The carrot is a biennial plant in the umbellifer family, Apiaceae. At birth, it grows a rosette of leaves while building up the enlarged taproot. Fast-growing cultivars mature within about three months (90 days) of sowing the seed, while slower-maturing cultivars need a month longer (120 days). The roots contain high quantities of alpha- and beta-carotene, lycopene, anthocyanins, lutein, and are a good source of vitamin A, vitamin K, and vitamin B6. Black carrots are one of the richest sources of anthocyanins (250-300 mg/100 g fresh root weight), and hence possesses high antioxidant ability.

=========================================


=========================================
Chat completion search results:

Fast-growing cultivars of carrots mature within about three months (90 days) of sowing the seed.

=========================================

Conclusion

In this guide, we embarked on a journey to build a question answering system from the ground up. The key idea behind our approach was to treat question answering as a retrieval problem. By using text embeddings and vector search, we've brought in state of the art nuanced and semantically rich search, surpassing traditional keyword-based approaches. Let's briefly recap the steps we took to get here:

  • Initializing Clients: Set up OpenAI and Momento clients, laying the groundwork for our system.
  • Fetching and Processing Data: Extracted and processed data from Wikipedia, preparing it for embedding generation. We learnt about the significance of creating chunks of data for efficient retrieval.
  • Generating Embeddings: Utilized OpenAI's text-embedding-ada-002 model to generate text embeddings, converting our corpus into a format suitable for semantic search. We learnt how the length of these embeddings direct the number of dimensions of a vector index.
  • Storing in MVI: Stored these embeddings in Momento's Vector Index, ensuring efficient retrieval. We learnt about a common pitfall of using UUID as an index’s item ID, which results in repeated re-indexing of the same data. 
  • Searching and Responding to Queries: Implemented a search functionality that leverages vector indexing for semantic search to find the most relevant responses. We perform a vector-to-vector search, and use the text stored in the metadata of our items to display to the user. 
  • Enhancing Responses with Chat Completions: Added a layer of refinement using OpenAI's chat completions to generate concise and accurate answers. Here we witnessed that Large Language Models not only improve the accuracy of the responses but also ensure they are contextually relevant, coherent, and presented in a user-friendly format.

Finally, while our hands-on approach offers a deep dive into the mechanics of building a Q&A engine, we recognize the complexity involved in such an endeavor. Frameworks like Langchain abstract much of this complexity, providing a higher-level abstraction that simplifies the process of chaining embeddings from OpenAI or altering the vector store. Langchain is a choice tool for many developers, making it easier to build, modify, and maintain complex AI-driven applications.

Pratik Agarwal
by
Pratik Agarwal
,
,
by
Pratik Agarwal
by
green squirrel logo for momento
by
Pratik Agarwal
,
,
Author
Pratik Agarwal

Pratik is a software engineer at Momento, specializing in distributed systems. With a rich background spanning roles at prominent teams like AWS DynamoDB and Marketplace, he has honed his expertise across the backend stack. Now at Momento, Pratik is on a mission to elevate the developer experience, rooted in his conviction that it's a cornerstone of serverless computing. Beyond the code, he is passionate about kickboxing and cricket, and loves delving into the strategic nuances of poker, always seeking the upper hand at the table.

Author
Author
Open