Quantcast
Channel: David Eastman, Author at The New Stack
Viewing all articles
Browse latest Browse all 80

How To Set Up and Run a Local LLM With Ollama and Llama 3

$
0
0
Llama

I’ve posted about coming off the cloud, and now I’m looking at running an open source LLM locally on my MacBook. If this feels like part of some “cloud repatriation” project, it isn’t: I’m just interested in tools I can control to add to any potential workflow chain.

Assuming your machine can spare the size and memory, what are the arguments for doing this? Apart from not having to pay the running costs of someone else’s server, you can run queries on your private data without any security concerns.

For this, I’m using Ollama. This is ”a tool that allows you to run open-source large language models (LLMs) locally on your machine.” They have access to a full list of open source models, which have different specializations — like bilingual models, compact-sized models, or code generation models. This started out as a Mac-based tool, but Windows is now available as a preview. It can also be used via Docker.

If you were looking for an LLM as part of a testing workflow, then this is where Ollama fits in:

https://cdn.thenewstack.io/media/2024/02/0deb09f2-untitled-1024x499.png

A GenAI testing presentation from @patrickdubois

For testing, local LLMs controlled from Ollama are nicely self-contained, but their quality and speed may suffer compared to the options you have on the cloud. Building a mock framework will result in much quicker tests, but setting these up — as the slide indicates — can be tedious.

Installing Ollama

I installed Ollama by downloading the app onto My MacBook, ran the app, and was prompted to try llama3.2 (for now I’ll ignore the argument that this isn’t actually open source). Opening up my Warp terminal, I assumed I’d have to install the model first, but the run command took care of that:

Note that it plops you into a chat mode, so you can test it immediately. It came back rapidly.

Inspecting Llama 3

Looking at the specs for the llama3.2 model, I see it defaults to the 3B parameter model and my 16GB MacBook Pro M4 was quite comfortable running it. I made one quick test query:

This was quick, so the model is clearly alive. Well, when I say “alive” I don’t quite mean that, as the model is trapped temporally at the point it was built:

If you were wondering, the correct answer to the arithmetic problem is actually 1,223,834,880. Better models simply spin out these problems to calculator apps when they spot them. Paradoxically, the inability to do simple maths marks out the limits of the new AI. Remember, LLM’s are not intelligent, they are just extremely good at extracting linguistic meaning from their models. But you know this, of course.

The convenient console is nice, but I wanted to use the available API. Ollama sets itself up as a local server on port 11434. We can do a quick curl command to check that the API is responding. Here is a non-streaming (that is, not interactive) REST call via the terminal with a JSON style payload:

> curl http://localhost:11434/api/generate -d '
{
 "model": "llama3.2",
 "prompt": "Why is the sky blue?",
 "stream": false
}'


The response was:

The full response line — which covered Rayleigh scattering, light’s wavelength, and the sun’s angle — all looked correct to me. It took 7 seconds, as you can see recorded in the Warp terminal block.

Using the Model

The common route to gain programmatic control would be to use Python, and maybe a Jupyter Notebook. But my tool of choice will be to try to use some C# bindings. I found some here. Fortunately, OllamaSharp is also available as a package via NuGet.

I’m not too keen on Visual Studio Code, but once you set up a C# console project with NuGet support, it is quick to get going.

Open VS Code from your terminal in your project directory:

Start a new .NET project via the Command Palette, choose a Console App, and name your project:

Then add OllamaSharp as a Nuget project, again from the Command Palette.

Here is the code to contact Ollama with a query, written into Program.cs and generating a completion straight into the console:

using OllamaSharp;

var uri = new Uri("http://localhost:11434");
var ollama = new OllamaApiClient(uri);

// select a model which should be used for further operations ollama.
ollama.SelectedModel = "llama3.2";
ConversationContext context = null;
await foreach (var stream in ollama.GenerateAsync("How are you today?"))
    Console.Write(stream.Response);


In a few seconds we get the response directly in the terminal tab:

That’s nice.

Further Tests of the Model

OK, so now we are ready to ask something a little bit more specific. I’ve seen people asking for categorized summaries of their bank accounts, but before I entrust it with that, let me try something more mundane. I’ll ask for a recipe based on the food in my fridge:

...
string question =
@"I have the following ingredients in my fridge:
aubergine, milk, cheese, peppers.
What food could I cook with this and other basic ingredients?";

ConversationContext context = null;
await foreach (var stream in ollama.GenerateAsync(question))
    Console.Write(stream.Response);


It took a few seconds to come back, for the record, the result is here:

With the ingredients you have (aubergine, milk, cheese, peppers), here are a few dish ideas that don’t require additional basic ingredients:

  1. Eggplant Fritters: Finely chop the aubergine, mix it with grated cheese, and shape into patties. Pan-fry until crispy and serve hot.
  2. Cheesy Aubergine Bake: Slice the aubergine, layer it in a baking dish with sliced peppers, and top with grated cheese. Drizzle with milk to create a creamy sauce. Bake in the oven until the cheese is melted and bubbly.
  3. Grilled Pepper and Aubergine Salad: Grill the peppers and aubergine slices until tender. Chop and mix with crumbled cheese, served with a side of creamy milk sauce (heat the milk with grated cheese until smooth).
  4. Aubergine and Cheese Omelet: Finely chop the aubergine and sauté it in a pan with some oil. Then, whip up an omelet with eggs and add chopped aubergine, sliced peppers, and grated cheese.

These ideas should inspire you to create a tasty dish using your available ingredients!

Given that we did not train the LLM, and didn’t add any recipe texts via Retrieval-augmented generation (RAG) to improve the quality by supplementing the LLM’s internal representation, I think this answer is fine. It comprehended what “basic ingredients” meant, and each recipe covers a different style. It also intuited that I didn’t need every one of my ingredients to be used, and correctly figured the distinct ingredient was the aubergine.

I would certainly have the confidence to let this summarize a bank account with set categories, if that was a task I valued — we are running locally after all. While things are still in flux with open source LLMs, especially around the issues of training data and bias, the maturity of the solutions is clearly improving, giving reasonable hope for future capability under considered conditions.

The post How To Set Up and Run a Local LLM With Ollama and Llama 3 appeared first on The New Stack.

Take a look at how to run an open source LLM locally, which allows you to run queries on your private data without any security concerns.

Viewing all articles
Browse latest Browse all 80

Trending Articles