Use crane from 4D
crane-oai is a Rust based alternative to llama.cpp, optimised for LLM, VLM, VLA, TTS, OCR inference. Embeddings or reranker models are not supported.
The API is compatibile with Open AI.
| Class | API | Availability |
|---|---|---|
| Models | /v1/models |
✅ |
| Chat | /v1/chat/completions |
✅ |
| Images | /v1/images/generations |
|
| Moderations | /v1/moderations |
|
| Embeddings | /v1/embeddings |
|
| Files | /v1/files |