Local LLMs

LlamaIndex.TS supports OpenAI and other remote LLM APIs. You can also run a local LLM on your machine!

Using a local model via Ollama

The easiest way to run a local LLM is via the great work of our friends at Ollama, who provide a simple to use client that will download, install and run a growing range of models for you.

Install Ollama

They provide a one-click installer for Mac, Linux and Windows on their home page.

Since we're going to be doing agentic work, we'll need a very capable model, but the largest models are hard to run on a laptop. We think mixtral 8x7b is a good balance between power and resources, but llama3 is another great option. You can run Mixtral by running

ollama run mixtral:8x7b

The first time you run it will also automatically download and install the model for you.

Switch the LLM in your code

To switch the LLM in your code, you first need to make sure to install the package for the Ollama model provider:

npm i @llamaindex/ollama

pnpm add @llamaindex/ollama

yarn add @llamaindex/ollama

bun add @llamaindex/ollama

Then, to tell LlamaIndex to use a local LLM, use the Settings object:

import { Settings } from "llamaindex";
import { ollama } from "@llamaindex/ollama";

Settings.llm = ollama({
  model: "mixtral:8x7b",
});

Use local embeddings

If you're doing retrieval-augmented generation, LlamaIndex.TS will also call out to OpenAI to index and embed your data. To be entirely local, you can use a local embedding model from Huggingface like this:

First install the Huggingface model provider package:

npm i @llamaindex/huggingface

pnpm add @llamaindex/huggingface

yarn add @llamaindex/huggingface

bun add @llamaindex/huggingface

And then set the embedding model in your code:

Settings.embedModel = new HuggingFaceEmbedding({
  modelType: "BAAI/bge-small-en-v1.5",
  quantized: false,
});

The first time this runs it will download the embedding model to run it.

Try it out

With a local LLM and local embeddings in place, you can perform RAG as usual and everything will happen on your machine without calling an API:

async function main() {
  // Load essay from abramov.txt in Node
  const path = "node_modules/llamaindex/examples/abramov.txt";

  const essay = await fs.readFile(path, "utf-8");

  // Create Document object with essay
  const document = new Document({ text: essay, id_: path });

  // Split text and create embeddings. Store them in a VectorStoreIndex
  const index = await VectorStoreIndex.fromDocuments([document]);

  // Query the index
  const queryEngine = index.asQueryEngine();

  const response = await queryEngine.query({
    query: "What did the author do in college?",
  });

  // Output response
  console.log(response.toString());
}

main().catch(console.error);

You can see the full example file.