Local LLMs
LlamaIndex.TS supports OpenAI and other remote LLM APIs. You can also run a local LLM on your machine!
Using a local model via Ollama
The easiest way to run a local LLM is via the great work of our friends at Ollama, who provide a simple to use client that will download, install and run a growing range of models for you.
Install Ollama
They provide a one-click installer for Mac, Linux and Windows on their home page.
Pick and run a model
Since we're going to be doing agentic work, we'll need a very capable model, but the largest models are hard to run on a laptop. We think mixtral 8x7b
is a good balance between power and resources, but llama3
is another great option. You can run Mixtral by running
ollama run mixtral:8x7b
The first time you run it will also automatically download and install the model for you.
Switch the LLM in your code
To switch the LLM in your code, you first need to make sure to install the package for the Ollama model provider:
npm i @llamaindex/ollama
Then, to tell LlamaIndex to use a local LLM, use the Settings
object:
import { Settings } from "llamaindex";
import { ollama } from "@llamaindex/ollama";
Settings.llm = ollama({
model: "mixtral:8x7b",
});
Use local embeddings
If you're doing retrieval-augmented generation, LlamaIndex.TS will also call out to OpenAI to index and embed your data. To be entirely local, you can use a local embedding model from Huggingface like this:
First install the Huggingface model provider package:
npm i @llamaindex/huggingface
And then set the embedding model in your code:
Settings.embedModel = new HuggingFaceEmbedding({
modelType: "BAAI/bge-small-en-v1.5",
quantized: false,
});
The first time this runs it will download the embedding model to run it.
Try it out
With a local LLM and local embeddings in place, you can perform RAG as usual and everything will happen on your machine without calling an API:
async function main() {
// Load essay from abramov.txt in Node
const path = "node_modules/llamaindex/examples/abramov.txt";
const essay = await fs.readFile(path, "utf-8");
// Create Document object with essay
const document = new Document({ text: essay, id_: path });
// Split text and create embeddings. Store them in a VectorStoreIndex
const index = await VectorStoreIndex.fromDocuments([document]);
// Query the index
const queryEngine = index.asQueryEngine();
const response = await queryEngine.query({
query: "What did the author do in college?",
});
// Output response
console.log(response.toString());
}
main().catch(console.error);
You can see the full example file.