Use TabbyML from 4D
TabbyML is a local inference server for “Fill-In-The-Middle” (FIM) code completion. It has a specialised RAG logic for coding to generate tokens are the most relevant to the current workspace.
You typically use pair of models, one for conversation and another for coding. The AI generated code can be augmented by indexing the codebase (RAG). The conversation model is trained understand semantic queries and the coding model is to trained to generate better code.
For instance, one may use the following combination:
675.7 MB)1.32 GB)The idea is to run the AI locally instead of using online such as Copilot, Claude Sonnet or ChatGPT.
Instantiate cs.tabby.tabby in your On Startup database method:
var $tabby : cs.tabby.tabby
If (False)
$tabby:=cs.tabby.tabby.new() //default
Else
var $port : Integer
$port:=8088
Case of
: (Is macOS) && (System info.processor="@apple@")
$device:="metal"
: (Is Windows)
$device:="vulcan"
Else
$device:="cpu"
End case
var $TABBY_ROOT : 4D.Folder
$TABBY_ROOT:=Folder(fk home folder).folder(".tabby")
var $folder : 4D.Folder
$folder:=Folder(Structure file; fk platform path).parent.parent.parent
//can be file, github or gitlab
var $repositories : Collection
$repositories:=[]
var $URL : Text
var $name : Text
$URL:="file://"+Convert path system to POSIX($folder.platformPath; *)
$name:="local"
$repositories.push({name: $name; URL: $URL})
$URL:="https://github.com/miyako/tabby"
$name:="github"
$repositories.push({name: $name; URL: $URL})
//paths exclude from RAG index
var $ignore : Collection
$ignore:=["Libraries/"; "Data/"; "userPreferences.*"]
var $event : cs.event.event
$event:=cs.event.event.new()
/*
Function onError($params : Object; $error : cs.event.error)
Function onSuccess($params : Object; $models : cs.event.models)
Function onData($request : 4D.HTTPRequest; $event : Object)
Function onResponse($request : 4D.HTTPRequest; $event : Object)
Function onTerminate($worker : 4D.SystemWorker; $params : Object)
*/
$event.onError:=Formula(ALERT($2.message))
$event.onSuccess:=Formula(ALERT($2.models.extract("name").join(",")+" loaded!"))
$event.onData:=Formula(LOG EVENT(Into 4D debug message; "download:"+String((This.range.end/This.range.length)*100; "###.00%")))
$event.onResponse:=Formula(LOG EVENT(Into 4D debug message; "download complete"))
$event.onTerminate:=Formula(LOG EVENT(Into 4D debug message; (["process"; $1.pid; "terminated!"].join(" "))))
$tabby:=cs.tabby.tabby.new($port; {\
model: "StarCoder-1B"; \
chat_model: "Qwen2.5-Coder-0.5B-Instruct"; \
device: $device; \
parallelism: 1; \
root: $TABBY_ROOT; \
repositories: $repositories; \
ignore: $ignore}; $event)
End if
Unless the server is already running (in which case the costructor does nothing), the following procedure runs in the background:
tabby program is startedVisit http://localhost:8080/ and follow the instructions to create your account. After creating your account, you can find your token for connecting to the server.
Now you can test the server:
curl http://localhost:8080/v1/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer auth_..." \
-d '{
"language": "python",
"segments": {
"prefix": "def fibonacci(n):\n \"\"\"Return the nth fibonacci number.\"\"\""
},
"max_tokens": 64,
"temperature": 0.1
}'
{"id":"cmpl-f6f89dca-549a-4a7c-a129-8b976083dd9c","choices":[{"index":0,"text":"\n if n <= 1:\n return n\n return fibonacci(n - 1) + fibonacci(n - 2)\n"}],"mode":"standard"}
The newer chat endpoint:
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-H "Authorization: Bearer auth_" \
-d '{
"model": "tabby",
"messages": [
{
"role": "user",
"content": "Write a Python function for fibonacci."
}
],
"max_tokens": 64,
"temperature": 0.1
}'
data: {"id":"chatcmpl-3ewdYlS51ra4RQ3LJATBPDOf1Qu5Hpqa","choices":[{"index":0,"delta":{"content":null,"function_call":null,"tool_calls":null,"role":"assistant","refusal":null},"finish_reason":null,"logprobs":null}],"created":1765334008,"model":"tabby","service_tier":null,"system_fingerprint":"b1-952a47f","object":"chat.completion.chunk","usage":null}
data: {"id":"chatcmpl-3ewdYlS51ra4RQ3LJATBPDOf1Qu5Hpqa","choices":[{"index":0,"delta":{"content":"Here","function_call":null,"tool_calls":null,"role":null,"refusal":null},"finish_reason":null,"logprobs":null}],"created":1765334008,"model":"tabby","service_tier":null,"system_fingerprint":"b1-952a47f","object":"chat.completion.chunk","usage":null}
data: {"id":"chatcmpl-3ewdYlS51ra4RQ3LJATBPDOf1Qu5Hpqa","choices":[{"index":0,"delta":{"content":"'s","function_call":null,"tool_calls":null,"role":null,"refusal":null},"finish_reason":null,"logprobs":null}],"created":1765334008,"model":"tabby","service_tier":null,"system_fingerprint":"b1-952a47f","object":"chat.completion.chunk","usage":null}
...
Visit http://localhost:8080/, select a repository and ask a specific technical question about the project.
e.g.
Explain the function escape
In VSCode, connect to server.
Finally to terminate the server:
var $tabby : cs.tabby.tabby
$tabby:=cs.tabby.tabby.new()
$tabby.terminate()
The API is compatibile with Open AI.
| Class | API | Availability |
|---|---|---|
| Models | /v1/models |
|
| Chat | /v1/chat/completions |
✅ |
| Completion | /v1/completions |
✅ |
| Images | /v1/images/generations |
|
| Moderations | /v1/moderations |
|
| Embeddings | /v1/embeddings |
|
| Files | /v1/files |