LocalAI is the free, Open Source OpenAI alternative. LocalAI act as a drop-in replacement REST API that’s compatible with OpenAI API specifications for local inferencing. It allows you to run LLMs, generate images, audio (and not only) locally or on-prem with consumer grade hardware, supporting multiple model families and architectures.

LocalAI is available as a container image and binary, compatible with various container engines like Docker, Podman, and Kubernetes. Container images are published on quay.io and Docker Hub. Binaries can be downloaded from GitHub.

Prerequisites

Before you begin, ensure you have a container engine installed if you are not using the binaries. Suitable options include Docker or Podman. For installation instructions, refer to the following guides:

Running LocalAI with All-in-One (AIO) Images

Do you have already a model file? Skip to Run models manually or Run other models to use an already-configured model.

LocalAI’s All-in-One (AIO) images are pre-configured with a set of models and backends to fully leverage almost all the LocalAI featureset. If you don’t need models pre-configured, you can use the standard images.

These images are available for both CPU and GPU environments. The AIO images are designed to be easy to use and requires no configuration.

It suggested to use the AIO images if you don’t want to configure the models to run on LocalAI. If you want to run specific models, you can use the manual method.

The AIO Images comes pre-configured with the following features:

  • Text to Speech (TTS)
  • Speech to Text
  • Function calling
  • Large Language Models (LLM) for text generation
  • Image generation
  • Embedding server

Start the image with Docker:

  docker run -p 8080:8080 --name local-ai -ti localai/localai:latest-aio-cpu
# For Nvidia GPUs:
# docker run -p 8080:8080 --gpus all --name local-ai -ti localai/localai:latest-aio-gpu-nvidia-cuda-11
# docker run -p 8080:8080 --gpus all --name local-ai -ti localai/localai:latest-aio-gpu-nvidia-cuda-12
  

Or with a docker-compose file:

  version: "3.9"
services:
  api:
    image: localai/localai:latest-aio-cpu
    # For a specific version:
    # image: localai/localai:v2.17.1-aio-cpu
    # For Nvidia GPUs decomment one of the following (cuda11 or cuda12):
    # image: localai/localai:v2.17.1-aio-gpu-nvidia-cuda-11
    # image: localai/localai:v2.17.1-aio-gpu-nvidia-cuda-12
    # image: localai/localai:latest-aio-gpu-nvidia-cuda-11
    # image: localai/localai:latest-aio-gpu-nvidia-cuda-12
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:8080/readyz"]
      interval: 1m
      timeout: 20m
      retries: 5
    ports:
      - 8080:8080
    environment:
      - DEBUG=true
      # ...
    volumes:
      - ./models:/build/models:cached
    # decomment the following piece if running with Nvidia GPUs
    # deploy:
    #   resources:
    #     reservations:
    #       devices:
    #         - driver: nvidia
    #           count: 1
    #           capabilities: [gpu]
  

For a list of all the container-images available, see Container images. To learn more about All-in-one images instead, see All-in-one Images.

Running LocalAI from Binaries

LocalAI binaries are available for both Linux and MacOS platforms and can be executed directly from your command line. These binaries are continuously updated and hosted on our GitHub Releases page. This method also supports Windows users via the Windows Subsystem for Linux (WSL).

Use the following one-liner command in your terminal to download and run LocalAI on Linux or MacOS:

  curl -Lo local-ai "https://github.com/mudler/LocalAI/releases/download/v2.17.1/local-ai-$(uname -s)-$(uname -m)" && chmod +x local-ai && ./local-ai
  

Otherwise, here are the links to the binaries:

OSLink
LinuxDownload
MacOSDownload

Try it out

Connect to LocalAI, by default the WebUI should be accessible from http://localhost:8080 . You can also use 3rd party projects to interact with LocalAI as you would use OpenAI (see also Integrations ).

You can also test out the API endpoints using curl, examples below.

Text Generation

Creates a model response for the given chat conversation. OpenAI documentation.

  curl http://localhost:8080/v1/chat/completions \
      -H "Content-Type: application/json" \
      -d '{ "model": "gpt-4", "messages": [{"role": "user", "content": "How are you doing?", "temperature": 0.1}] }' 
  

GPT Vision

Understand images.

  curl http://localhost:8080/v1/chat/completions \
    -H "Content-Type: application/json" \
    -d '{ 
        "model": "gpt-4-vision-preview", 
        "messages": [
          {
            "role": "user", "content": [
              {"type":"text", "text": "What is in the image?"},
              {
                "type": "image_url", 
                "image_url": {
                  "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg" 
                }
              }
            ], 
          "temperature": 0.9
          }
        ]
      }' 
  

Function calling

Call functions

  curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4",
    "messages": [
      {
        "role": "user",
        "content": "What is the weather like in Boston?"
      }
    ],
    "tools": [
      {
        "type": "function",
        "function": {
          "name": "get_current_weather",
          "description": "Get the current weather in a given location",
          "parameters": {
            "type": "object",
            "properties": {
              "location": {
                "type": "string",
                "description": "The city and state, e.g. San Francisco, CA"
              },
              "unit": {
                "type": "string",
                "enum": ["celsius", "fahrenheit"]
              }
            },
            "required": ["location"]
          }
        }
      }
    ],
    "tool_choice": "auto"
  }'
  

Image Generation

Creates an image given a prompt. OpenAI documentation.

  curl http://localhost:8080/v1/images/generations \
      -H "Content-Type: application/json" -d '{
          "prompt": "A cute baby sea otter",
          "size": "256x256"
        }'
  

Text to speech

Generates audio from the input text. OpenAI documentation.

  curl http://localhost:8080/v1/audio/speech \
  -H "Content-Type: application/json" \
  -d '{
    "model": "tts-1",
    "input": "The quick brown fox jumped over the lazy dog.",
    "voice": "alloy"
  }' \
  --output speech.mp3
  

Audio Transcription

Transcribes audio into the input language. OpenAI Documentation.

Download first a sample to transcribe:

  wget --quiet --show-progress -O gb1.ogg https://upload.wikimedia.org/wikipedia/commons/1/1f/George_W_Bush_Columbia_FINAL.ogg 
  

Send the example audio file to the transcriptions endpoint :

  curl http://localhost:8080/v1/audio/transcriptions \
    -H "Content-Type: multipart/form-data" \
    -F file="@$PWD/gb1.ogg" -F model="whisper-1"
  

Embeddings Generation

Get a vector representation of a given input that can be easily consumed by machine learning models and algorithms. OpenAI Embeddings.

  curl http://localhost:8080/embeddings \
    -X POST -H "Content-Type: application/json" \
    -d '{ 
        "input": "Your text string goes here", 
        "model": "text-embedding-ada-002"
      }'
  

What’s next?

There is much more to explore! run any model from huggingface, video generation, and voice cloning with LocalAI, check out the features section for a full overview.

Explore further resources and community contributions:

Last updated 25 May 2024, 10:18 +0200 . history