Note: We're presenting this tutorial with code examples in both TypeScript (Node.js) and Python. This is the Node.js tutorial. Find the Python version here!
In this step-by-step guide, we delve into building a question answering system from scratch, focusing on a specific topic: carrots. Central to our exploration is the concept of treating question answering as a retrieval process. This approach involves identifying source documents or specific sections within them that contain the answers to users' queries. By revealing the underlying process without the complexities introduced by external libraries, we aim to provide valuable insights into the fundamental workings of such systems.
Here's a quick overview of how we will get this done:
- Initialize OpenAI and Momento clients.
- Fetch and process (create chunks) carrot data from Wikipedia.
- Generate embeddings for the text using OpenAI.
- Store the embeddings in Momento Vector Index.
- Search and respond to queries using the stored data.
- Utilize OpenAI's chat completions for refined responses.
Before we start coding, we need to create our index in Momento for storing data, and generate an API key to access Momento programmatically. You can do both on Momento Console and follow this guide for details! The code below uses mvi-openai-demo as the index name, 1536 for the number of dimensions (more on this soon!), and cosine similarity as the similarity metric. Cosine similarity cares more about the orientation of vectors than its magnitude (the word count in this case), which are suitable for a question answering system.
We also need an OpenAI API key to generate embeddings of our data and search queries.
Next, we have to install the necessary packages. For TypeScript, we use @gomomento/sdk and openai.
Step 1: Initializing clients
We begin by initializing our OpenAI and Momento clients. Here, we set up our development environment with the necessary packages and API keys. This step is crucial for establishing communication with OpenAI and Momento services. It lays the foundation for our Q&A engine.
Make sure you have the environment variables 'OPENAI_API_KEY' and 'MOMENTO_API_KEY' set before you run the code!
Step 2: Loading data from Wikipedia
We start by extracting data about carrots from Wikipedia. This step demonstrates how to handle external API calls and parse JSON responses. Go ahead and try this out locally for any Wikipedia page!
Now let’s run these snippets and view the length of our Carrot wikipedia page with a sample text.
Go ahead and try this out locally for any Wikipedia page!
Step 3: Preprocessing data to create chunks
In building our Q&A engine, we approach question answering as a kind of retrieval: identifying which source documents (or parts of them) contain the answers to a user's query. This concept is fundamental to our process and influences how we handle our data.
To make our system effective, we preprocess the data into chunks. This is because, in a question-answering context, answers often reside in specific sections of a document rather than across the entire text. By splitting the data into manageable chunks, we're effectively creating smaller, searchable units that our system can scan to find relevant answers. This chunking process is a crucial step in transforming extensive text into a format conducive to semantic search and retrieval.
We've opted for a straightforward approach to split our text by character count. However, it's crucial to understand that the size and method of chunking can significantly impact the system's effectiveness. Too large chunks might dilute the relevance of search results, while too small ones may miss critical context.
Alternative chunking methods may use tokenizers such as tiktoken to split the text along boundaries that align with the text embedding model. These methods may produce better results, but require external libraries. For demonstration we opt for a simpler method.
Now we can view the total number of chunks that got created
Step 4: Generating embeddings with OpenAI
In our approach to building a Q&A engine, we've chosen to leverage the power of vector search, a state-of-the-art technique in semantic search. This method differs significantly from traditional keyword search approaches, like those used in Elasticsearch or Lucene. Vector search delves deeper into the intricacies of language, capturing concepts and meanings in a way that keyword search can't.
To facilitate vector search, our first task is to transform our textual data into a format that embodies this richer semantic understanding. We achieve this by generating embeddings using OpenAI's text-embedding-ada-002 model. This model is known for striking a balance between accuracy, cost, and speed, making it an ideal choice for generating text embeddings.
Recall that we selected 1536 as the dimensionality for our vector index. This decision was based on the fact that OpenAI, when generating embeddings for each chunk, produces these embeddings as floating point vectors with a length of 1536.
Step 5: Storing data in Momento Vector Index
After generating embeddings, we store them in Momento's Vector Index. This involves creating items with IDs, vectors, and metadata, then upserting them to MVI. When storing data in the Momento Vector Index, it's important to use deterministic chunk IDs. This ensures that the same data isn't re-indexed repeatedly; optimizing storage, retrieval efficiency, and response accuracy. Managing data storage effectively is key to maintaining a scalable and responsive Q&A system.
Step 6: Searching and responding to queries
This step highlights the core functionality of the Q&A engine - retrieving answers using Momento Vector Index.This process involves searching through the indexed data using text embeddings, a technique that ensures we find the most relevant and contextually appropriate results.
When we indexed snippets of text in the previous steps, we first transformed these text snippets into vector representations using OpenAI's model. This transformation was key to preparing our data for efficient storage and retrieval in the Momento Vector Index.
Now, as we turn to the task of querying, it's crucial to apply a similar preprocessing step. The user's question, "What is a carrot?" in this instance, must also be converted into a vector. This enables us to perform a vector-to-vector search within our index.
The effectiveness of our search hinges on the consistency of preprocessing. The same embedding model and process used during indexing must be applied to the query. This ensures that the vector representation of the query aligns with the vectors stored in our index, otherwise the approach would not work.
Let’s start with a simple search for “What is a carrot?”:
The output for this query looks like:
As you see, we indexed vectors in Momento Vector Index and stored the original text as metadata in the items. When asked the question “What is a carrot?”, we transformed the text into a vector, performed a vector search in MVI, and returned the original text stored in the metadata. Under the hood we did a vector-to-vector matching, yet from a user perspective it looks like a text-to-text search.
Step 7: Too verbose? Let’s use chat completions to enhance query responses
Until now, our approach has treated question answering primarily as a retrieval task. We've taken the user's query, performed a search, and presented snippets of information that could potentially contain the answer. This method, while effective in fetching relevant data, still places the onus on the user to sift through the results and extract the answer. It's akin to providing pages from a reference book without pinpointing the exact information sought.
To elevate the user experience from mere retrieval to direct answer generation, we introduce Large Language Models (LLMs) like OpenAI's GPT-3.5. LLMs have the ability to not just find but also synthesize information, offering concise and contextually relevant answers. This is a significant leap from delivering a page of search results to providing a clear, succinct response to the user's query.
And let’s use the same query “What is a carrot?” to compare the response.
Now let’s quickly compare the outputs of a more specific question such as "How fast do fast-growing cultivators mature in carrots?"
Notice how brief and precise the chat completion response is compared to the raw semantic search results.
In this guide, we embarked on a journey to build a question answering system from the ground up. The key idea behind our approach was to treat question answering as a retrieval problem. By using text embeddings and vector search, we've brought in state of the art nuanced and semantically rich search, surpassing traditional keyword-based approaches. Let's briefly recap the steps we took to get here:
- Initializing Clients: Set up OpenAI and Momento clients, laying the groundwork for our system.
- Fetching and Processing Data: Extracted and processed data from Wikipedia, preparing it for embedding generation. We learnt about the significance of creating chunks of data for efficient retrieval.
- Generating Embeddings: Utilized OpenAI's text-embedding-ada-002 model to generate text embeddings, converting our corpus into a format suitable for semantic search. We learnt how the length of these embeddings direct the number of dimensions of a vector index.
- Storing in MVI: Stored these embeddings in Momento's Vector Index, ensuring efficient retrieval. We learnt about a common pitfall of using UUID as an index’s item ID, which results in repeated re-indexing of the same data.
- Searching and Responding to Queries: Implemented a search functionality that leverages vector indexing for semantic search to find the most relevant responses. We perform a vector-to-vector search, and use the text stored in the metadata of our items to display to the user.
- Enhancing Responses with Chat Completions: Added a layer of refinement using OpenAI's chat completions to generate concise and accurate answers. Here we witnessed that Large Language Models not only improve the accuracy of the responses but also ensure they are contextually relevant, coherent, and presented in a user-friendly format.
Finally, while our hands-on approach offers a deep dive into the mechanics of building a Q&A engine, we recognize the complexity involved in such an endeavor. Frameworks like Langchain abstract much of this complexity, providing a higher-level abstraction that simplifies the process of chaining embeddings from OpenAI or altering the vector store. Langchain is a choice tool for many developers, making it easier to build, modify, and maintain complex AI-driven applications.