platform downloads

Use Text Embeddings Inference from 4D

Abstract

Text Embeddings Inference is a high-performance extraction framework and server that supports a wide variety of embedding models such as Nomic, BERT, CamemBERT, XLM-RoBERTa models with absolute positions; JinaBERT model with Alibi positions; Mistral, Alibaba GTE, Qwen2 models with Rope positions; MPNet, ModernBERT, Qwen3, and Gemma3.

TEI is the industry standard for serving embeddings in the Hugging Face ecosystem and is widely deployed in enterprise RAG stacks. It expects NVIDIA GPUs and aims to process thousands of concurrent requests per second.

Note: The component’s CLI is compiled with Metal support for Apple Silicon and CPU-only settings for Windows and Mac Intel.

For 4D developers looking for a local LLM engine to support semantic search, TEI might be an overkill. Unlike llama.cpp which expects weights in quantised custom GGUF format, TEI works with full precision weights. 4D memory consumption may jump up while loading the model before returning to regular levels.

Usage

Instantiate cs.TEI.TEI in your On Startup database method:

var $TEI : cs.TEI.TEI

If (False)
    $TEI:=cs.TEI.TEI.new()  //default
Else 
    var $homeFolder : 4D.Folder
    $homeFolder:=Folder(fk home folder).folder(".TEI")
    var $URL : Text
    var $port : Integer
    
    var $event : cs.event.event
    $event:=cs.event.event.new()
    /*
        Function onError($params : Object; $error : cs.event.error)
        Function onSuccess($params : Object; $models : cs.event.models)
        Function onData($request : 4D.HTTPRequest; $event : Object)
        Function onResponse($request : 4D.HTTPRequest; $event : Object)
        Function onTerminate($worker : 4D.SystemWorker; $params : Object)
    */
    
    $event.onError:=Formula(ALERT($2.message))
    $event.onSuccess:=Formula(ALERT($2.models.extract("name").join(",")+" loaded!"))
    $event.onData:=Formula(LOG EVENT(Into 4D debug message; This.file.fullName+":"+String((This.range.end/This.range.length)*100; "###.00%")))
    $event.onData:=Formula(MESSAGE(This.file.fullName+":"+String((This.range.end/This.range.length)*100; "###.00%")))
    $event.onResponse:=Formula(LOG EVENT(Into 4D debug message; This.file.fullName+":download complete"))
    $event.onResponse:=Formula(MESSAGE(This.file.fullName+":download complete"))
    $event.onTerminate:=Formula(LOG EVENT(Into 4D debug message; (["process"; $1.pid; "terminated!"].join(" "))))
    
    $port:=8080
    
    $model:=$homeFolder.folder("dangvantuan/sentence-camembert-base")
    $path:=""
    $URL:="dangvantuan/sentence-camembert-base"
    $embedding:=cs.event.huggingface.new($model; $URL; $path; "embedding")
    
    $options:={max_concurrent_requests: 512}
    var $huggingfaces : cs.event.huggingfaces
    $huggingfaces:=cs.event.huggingfaces.new([$embedding])
    
    $TEI:=cs.TEI.TEI.new($port; $huggingfaces; $homeFolder; $options; $event)
    
End if 

Unless the server is already running (in which case the costructor does nothing), the following procedure runs in the background:

The specified model is downloaded via HTTP
The text-embeddings-router program is started

Now you can test the server:

curl -X POST http://127.0.0.1:8080/v1/embeddings \
     -H "Content-Type: application/json" \
     -d '{"input":"The quick brown fox jumps over the lazy dog."}'

Or, use AI Kit:

var $AIClient : cs.AIKit.OpenAI
$AIClient:=cs.AIKit.OpenAI.new()
$AIClient.baseURL:="http://127.0.0.1:8080/v1"

var $text : Text
$text:="The quick brown fox jumps over the lazy dog."

var $responseEmbeddings : cs.AIKit.OpenAIEmbeddingsResult
$responseEmbeddings:=$AIClient.embeddings.create($text)

Finally to terminate the server:

var $llama : cs.TEI.TEI
$llama:=cs.TEI.TEI.new()
$llama.terminate()

AI Kit compatibility

The API is compatibile with Open AI.

Class	API	Availability
Models	`/v1/models`	✅
Chat	`/v1/chat/completions`
Images	`/v1/images/generations`
Moderations	`/v1/moderations`
Embeddings	`/v1/embeddings`	✅
Files	`/v1/files`