Quantcast
Channel: David Eastman, Author at The New Stack
Viewing all articles
Browse latest Browse all 80

A Developer’s Guide to Getting Started with LlamaIndex

$
0
0
llamas

A Large Language Model represents a moment in history, forever frozen in the time they were built. Leveraging an LLM that doesn’t include current or unreachable information is a problem that most serious applications need to address. Here, the term “unreachable” could mean private or domain-specific information. Without it, an LLM is more likely to hallucinate.

I’ve explained a bit about vector databases and mentioned Retrieval Augmented Generation (RAG) as a method to introduce novel data to an LLM without retraining it. LlamaIndex is a tool that focuses on the ‘R’ (for retrieval) to help enrich a prompt with your data.

Now before we continue, let’s just brutally summarise why we are doing what we are doing. A General Pretrained Transformer (or GPT, and closely associated with OpenAI’s brand) describes a cycle of input, transformation and output via matrix multiplications where words (actually tokens of text, or sounds, or images) are converted into vectors with enough dimensions to hopefully express meaning. To make sure the context of the incoming text is computed, we pay attention to nearby verbs to move the vectors closer to their contextual meaning (so for example, a “black hole” is not just a dark hole) via more blocks of matrix multiplication:

The end product is hopefully a really great guess at the next word. But those guesses are only as good as the input text corpus. What if we want to ask ChatGPT about text that it hasn’t learned? We can’t shove in large amounts of text into the query, because that is limited by a window. Hence, we come to RAG.

Getting Started with LlamaIndex

Let’s get straight into LlamaIndex. Fortunately, there is a quick start that promises a start with “5 lines of code.”

Now I’ve done several local LLM installations, but for this post, I’ll tamely use my OpenAI key and burn some credits. I use Visual Studio Code when I want to run Python briefly, which will add a bit of flotsam to the post, but the same touch points will get covered however you enter.

On my Mac, I’ll just check up on my Homebrew install of Python3. So opening my Warp terminal, I’ll start with:

>brew install python3


After Homebrew is done, I confirm what I have:

I then start VS in this otherwise empty folder. I installed the Python extension, then I followed good practice and made a project-specific virtual environment from the command palette, using Python: Create environment. I then chose Venv. This ends by confirming that I’m using the Python I just installed:

OK, now I’d better go back to the LlamaIndex instructions and use pip to install the lama-index package as required, in my virtual environment within VS Code using an active terminal (so not in Warp I’m afraid):

I’ll need to tell the environment about my OpenAI key. Given the nature of the virtual environment running under an IDE, it is safest to stick this in the launch.json file that VS Code makes when it runs a project:

.. 
"configurations": 
[ 
  { 
    "name": "Python Debugger: Current File", 
    "type": "debugpy",
    "request": "launch", 
    "program": "${file}", 
    "console": "integratedTerminal", 
    "env": 
    {  
      "OPENAI_API_KEY": "XXXX" 
    } 
  } 
] 
..


(You may need to create an OpenAI account of course. I suspect the ‘XXXX’ account is dry by now!)

Following the advice within the LlamaIndex start tutorial, I downloaded a narrative screed from Paul Graham into a folder named data. It is just a lengthy biography.

In VS Code, I created starter.py

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
documents = SimpleDirectoryReader("data").load_data() 
index = VectorStoreIndex.from_documents(documents) 

query_engine = index.as_query_engine() 
response = query_engine.query("What did the author do growing up?") 
print(response)


The important bit is that the package llama_index has been resolved. Fortunately, none of this requires intimate knowledge of Python. You can see clearly that we will print the response to our query.

Here is the response:

To confirm that I did actually use OpenAI, here are my stats from my account activity:

So what is this code doing? It has embedded the new text into a vector store and supplied an index (hence the call to VectorStoreIndex) and this is retrieved at query time and added to the context window as English just before the call goes through to GPT-3.5. Hence the term enrich we saw earlier.

By adding two lines of logging code, I was able to extract lots of dense REST calls, but also this useful tidbit from the llama_index package:

DEBUG:llama_index.core.indices.utils:> Top 2 nodes: 
> [Node 167d0eb4-7dba-4b93-85ec-3f5779b32daa] [Similarity score: 0.819982] 
"What I Worked On February 2021 Before college the two main things 
I worked on, outside of school..." 

> [Node ee847bc2-d56a-4c26-afd7-c4bee9a3d116] [Similarity score: 0.811733] 
"I remember taking the boys to the coast on a sunny day in 2015 and 
figuring out how to deal with ..."


This hints at what is going on under the covers.

Before we declare ourselves done, I’ll add another document to the data folder, one I’ve used before: Shakespeare’s sonnets. Of course, it is possible these are known to the LLM, but I doubt it. Rather importantly, a bunch of poems do not make a meaningful narrative.

So we will run this extra query, with this additional purposely vague question:

.. 
response = query_engine.query("Who is Blessed?") 
print(response)


And the short response we get is:

Adonis is Blessed.


Interesting. Let’s grab the one area in the sonnets where Adonis is mentioned:

“Blessed are you whose worthiness gives scope, Being had to triumph, being lacked to hope. What is your substance, whereof are you made, That millions of strange shadows on you tend? Since every one, hath every one, one shade, And you but one, can every shadow lend: Describe Adonis and the counterfeit, Is poorly imitated after you, On Helen’s cheek all art of beauty set, And you in Grecian tires are painted new: Speak of the spring, and foison of the year, The one doth shadow of your beauty show, The other as your bounty doth appear, And you in every blessed shape we know.”

This is confirmed by looking at the log nodes, like the ones we saw earlier:

DEBUG:llama_index.core.indices.utils:> Top 2 nodes: 
> [Node 38e29f53-3656-4b55-ab6b-08acf898f122] [Similarity score: 0.766188] 
"Blessed are you whose worthiness gives scope, Being had to triumph, 
being lacked to hope. What i..." 

> [Node 16d55fda-34ac-42cf-9b08-66d2c6944302] [Similarity score: 0.730936] 
"And other strains of woe, which now seem woe, Compared with loss of thee, 
will not seem so. Some..."


Most of this is Sonnet 53. And the term “blessed” does appear near “Adonis.” Of course, an LLM will always give you an answer, and sound definitive!

However, none of this is an issue for LlamaIndex, which has performed well enough. I’ve just used the very first steps to building a pipeline, and LlamaIndex gives you more ways to work with documents in this manner.

While it is true that we still lack a comprehensive language to describe what is happening internally, using RAG via LlamaIndex is a solid way to both enhance an LLM with domain-specific information, as well as ensure that verifiable knowledge is processed. This all helps to reduce the chances of erroneous responses — the one problem that still dogs AI today.

The post A Developer’s Guide to Getting Started with LlamaIndex appeared first on The New Stack.

LlamaIndex is a tool that focuses on the ‘R’ of RAG (for retrieval) to help enrich an LLM prompt with your data. Here's how devs can use it.

Viewing all articles
Browse latest Browse all 80

Trending Articles