Agent Knowledge Base
This guide provides detailed information about the Agent Knowledge Base. It covers the process of saving and processing documents, as well as retrieving relevant context for the AI. You will learn how documents and links are handled when creating or editing an agent, how the system processes and stores this information, and how it retrieves the most relevant context to answer user queries.
Creating/Editing Agent
When creating or editing an agent, after the user clicks on the 'Create' or 'Save' button, the documents and links inside the knowledge base that the user added go through the following process:
Documents
- For documents, the 'processDocuments' function is called, which extracts text using various libraries. The following document formats are permitted: PDF, DOCX, XLSX, XLS, and TXT. For each document, we get one large string containing the entire content of that document.
- After obtaining the large string of text, it is split into smaller chunks. The process involves turning the string into an array of words. Each word is added to a chunk until the chunk reaches 1000 characters. If adding a word exceeds this limit, the chunk is locked and added to an array of chunks. A new chunk is then started with the remaining words, and the process repeats.
- Then, for each chunk, an embedding is created using the OpenAI API and the model 'text-embedding-ada-002'. An embedding is a numerical representation of text that captures its semantic meaning for similarity comparison. In simpler terms, the 'generateEmbedding' function turns text into a list of numbers, helping the computer understand and find similar texts.
- Then comes the final step of saving the information. The text chunks, together with their embeddings and other important pieces of information, are saved into the Typesense database. Typesense is a fast, open-source search engine designed to quickly find relevant information. It supports typo tolerance, faceted search, and can handle text chunks and embeddings.
Links
- For links, the process is a little bit different. The user provides a link, and then using a service called Jina.ai, the text is scraped from the website and converted into markdown text. The same process is then applied: the entire string of markdown text is split into chunks, embeddings are generated for the chunks, and then the chunks and their embeddings are inserted into the Typesense database.
Chatting with the Agent
- When chatting with the agent, you can provide documents just as you do during the creation or editing phase. The uploaded documents go through the same processing steps.
- You can also provide the agent with an image and ask questions about the image you provided.
Extracting relevant information from the data in Typesense
When extracting relevant information from the data in Typesense, the process involves the following steps:
- A function called 'AI_GET_CONTEXT' is called, which performs the following steps. From the message that the user asked the character, an embedding is created using the OpenAI API and the model 'text-embedding-ada-002'. An embedding is a numerical representation of text that captures its semantic meaning, which can be used for similarity comparison.
- After the embedding is created for the question, a function called 'searchSimilarDocuments' is called. This function finds the best context for the character by comparing the similarity of embeddings between the question and potential answers. It retrieves documents with a vector distance less than or equal to 0.2, sorted by similarity. All relevant chunks are combined into one string, and this context is returned.
- If no relevant chunks are found, then all documents, either from the character or from the chat (including both documents and links), are retrieved. It returns a dictionary where the keys are filenames and the values are the text from the documents.
- For each of these documents, I provide the OpenAI model 'gpt-4o-mini' with the following prompt: 'Based on the following user query ${prompt}, try to construct an answer in a few sentences from the following document: ${document}.' For each document, I ask OpenAI to find the answer to the user's question. Then, I create one big string that has: 'In document ${filename}, I have found: ${result}' for each document and return this.
- This is useful because when no context is found under the similarity threshold, I still get context from the document. It also allows me to answer questions like "What is in this document?" by providing the entire document as context. The AI can look into the document and provide accurate answers based on the information contained within.
FAQ
Is there a different process for processing documents when the user adds them during agent creation/editing versus during a chat?
No, the process is the same. Whether the user adds the documents during the agent creation/editing phase or during a chat, the documents go through the same process: extracting text, splitting it into chunks, generating embeddings for the chunks, and inserting them into the Typesense database.
When is the context from the links refreshed?
When the user clicks on the refresh content button inside the knowledge base, all old entries are deleted, and the web is scraped anew. The same process as before is then repeated: scrape the content into markdown text, split the markdown text into chunks, generate embeddings for the chunks, and insert them into the Typesense database.
Where is the correct context inserted?
The correct context gets appended to the system message at the end. The user first sees the instructions they provided, followed by: "Here is some relevant context for the user's question that I got from the documents." Then, it proceeds to list the documents: "In document ${document1}, I have found: ${answer based on document}."
After every iteration, meaning after every user question, the whole context is removed and created anew so that the user has the most relevant context each time.
What documents are searched for the best context?
The documents that are searched for the best context are either links or documents provided during agent creation or documents found inside the chat.
What does "Preparing your conversation, please wait..." mean?
It means that the system is preparing everything so that you can start the conversation.
What does "Extracting information from your documents, please wait..." mean?
It means that the documents are being processed, meaning they are being converted from their original format into text format, then chunked into smaller pieces. Embeddings are created for these chunks, and they are then inserted into the Typesense database. As this whole operation takes some time, the user is provided with this informative message and a loading state.
What does "Finding the best context for your question, please wait..." mean?
When we are getting context, we are looking into Typesense for the most relevant documents with the smallest vector distance in their embeddings, meaning the most similar ones. If none are found, we loop through all relevant documents and prompt AI model for an answer based on the document's context. After that, we return the relevant context to the user. This process takes a little bit of time, which is why we provide the user with an informative message and a loading state.
Glossary
Jina.ai
Jina.ai is a service used to scrape text from websites and convert it into markdown text. It helps in extracting content from links provided by the user.
Typesense
Typesense is a fast, open-source search engine designed to quickly find relevant information. It supports typo tolerance, faceted search, and can handle text chunks and embeddings.
Chunk
A chunk is a smaller piece of text obtained by splitting a larger string of text. This is done to manage and process the text more efficiently.
Vectorization
Vectorization is the process of converting text into numerical vectors. These vectors represent the text in a way that can be used for various natural language processing tasks.
Embedding
An embedding is a numerical representation of text that captures its semantic meaning. It is used for tasks such as similarity comparison, where the meaning of the text needs to be understood by the computer.