Model compatibility table

Besides llama based models, LocalAI is compatible also with other architectures. The table below lists all the compatible models families and the associated binding repository.

notifications

LocalAI will attempt to automatically load models which are not explicitly configured for a specific backend. You can specify the backend to use by configuring a model with a YAML file. See the advanced section for more details.

Backend and Bindings	Compatible models	Completion/Chat endpoint	Capability	Embeddings support	Token stream support	Acceleration
llama.cpp	Vicuna, Alpaca, LLaMa, Falcon, Starcoder, GPT-2, and many others	yes	GPT and Functions	yes**	yes	CUDA, openCL, cuBLAS, Metal
gpt4all-llama	Vicuna, Alpaca, LLaMa	yes	GPT	no	yes	N/A
gpt4all-mpt	MPT	yes	GPT	no	yes	N/A
gpt4all-j	GPT4ALL-J	yes	GPT	no	yes	N/A
falcon-ggml (binding)	Falcon (*)	yes	GPT	no	no	N/A
dolly (binding)	Dolly	yes	GPT	no	no	N/A
gptj (binding)	GPTJ	yes	GPT	no	no	N/A
mpt (binding)	MPT	yes	GPT	no	no	N/A
replit (binding)	Replit	yes	GPT	no	no	N/A
gptneox (binding)	GPT NeoX, RedPajama, StableLM	yes	GPT	no	no	N/A
bloomz (binding)	Bloom	yes	GPT	no	no	N/A
rwkv (binding)	rwkv	yes	GPT	no	yes	N/A
bert (binding)	bert	no	Embeddings only	yes	no	N/A
whisper	whisper	no	Audio	no	no	N/A
stablediffusion (binding)	stablediffusion	no	Image	no	no	N/A
langchain-huggingface	Any text generators available on HuggingFace through API	yes	GPT	no	no	N/A
piper (binding)	Any piper onnx model	no	Text to voice	no	no	N/A
sentencetransformers	BERT	no	Embeddings only	yes	no	N/A
`bark`	bark	no	Audio generation	no	no	yes
`autogptq`	GPTQ	yes	GPT	yes	no	N/A
`exllama`	GPTQ	yes	GPT only	no	no	N/A
`diffusers`	SD,…	no	Image generation	no	no	N/A
`vall-e-x`	Vall-E	no	Audio generation and Voice cloning	no	no	CPU/CUDA
`vllm`	Various GPTs and quantization formats	yes	GPT	no	no	CPU/CUDA
`exllama2`	GPTQ	yes	GPT only	no	no	N/A
`transformers-musicgen`		no	Audio generation	no	no	N/A
tinydream	stablediffusion	no	Image	no	no	N/A
`coqui`	Coqui	no	Audio generation and Voice cloning	no	no	CPU/CUDA
`petals`	Various GPTs and quantization formats	yes	GPT	no	no	CPU/CUDA
`transformers`	Various GPTs and quantization formats	yes	GPT, embeddings	yes	yes****	CPU/CUDA/XPU

Note: any backend name listed above can be used in the backend field of the model configuration file (See the advanced section).

* 7b ONLY
** doesn’t seem to be accurate
*** 7b and 40b with the ggccv format, for instance: https://huggingface.co/TheBloke/WizardLM-Uncensored-Falcon-40B-GGML
**** Only for CUDA and OpenVINO CPU/XPU acceleration.

Edit this page

Last updated 06 May 2024, 10:52 +0200 . history

FAQ

Architecture