OmertaAI Documentation
OmertaAI is a local AI assistant running entirely on your hardware. No cloud. No account. No data ever leaves your machine. This documentation covers installation, usage, and the local REST API.
Installation
OmertaAI is distributed as a single self-contained installer. The model weights are bundled inside — no separate download step, no internet connection required after installation.
Windows
Download the .exe installer and run as administrator. OmertaAI installs to %LOCALAPPDATA%\OmertaAI.
macOS
Requires macOS 13 Ventura or later. Apple Silicon and Intel both supported.
Linux
Tested on Ubuntu 22.04+, Debian 12+, Fedora 38+, Arch. Available as AppImage or via the install script.
System Requirements
| Component | Minimum | Recommended |
|---|---|---|
| RAM | 16 GB | 32 GB or more |
| Storage | 30 GB free | 60 GB SSD |
| CPU | 8-core x86_64 / Apple M1 | Apple M2 Pro / AMD Ryzen 9 / Intel i9 |
| GPU (optional) | — | NVIDIA RTX 3080+ (CUDA 12) or Apple GPU |
| OS | Windows 10, macOS 13, Ubuntu 22 | Latest stable |
Quick Start
After installation, OmertaAI starts a local server on localhost:11434 and opens the UI in your default browser.
Desktop UI
Open http://localhost:11434 in any browser, or use the native desktop app installed alongside the server. The UI supports multi-turn conversations, file uploads (processed locally), and conversation export to .md or .json.
Conversations are held in memory only. Closing the window or the server process discards the session permanently — by design. You can export any session before closing.
CLI Usage
Model Selection
OmertaAI ships with a quantized build of the Claude model architecture — currently the highest-performing publicly available language model across benchmarks for coding, reasoning, writing, and instruction following.
| Model ID | Size | RAM Required | Best For |
|---|---|---|---|
claude-local-fast | Q4 4-bit | 16 GB | Speed, interactive use |
claude-local | Q5 5-bit | 24 GB | Balanced (default) |
claude-local-max | Q8 8-bit | 48 GB | Maximum quality |
Local API Overview
OmertaAI exposes a local REST API on http://localhost:11434 that is fully compatible with the OpenAI Chat API format. Drop it into any tool that supports a custom OpenAI base URL — VS Code extensions, n8n, LangChain, your own scripts.
--host 0.0.0.0 if needed.
Chat Completions
Streaming
Add "stream": true to receive server-sent events as the model generates tokens.
Models Endpoint
Context & Memory
OmertaAI supports up to 200,000 tokens of context per session — enough for entire codebases, lengthy documents, or long multi-turn conversations. Context is held in RAM only.
OmertaAI does not persist memory between sessions by default. If you want continuity, export a session and pass it as a system prompt on the next run:
Integrations
Because the local API is OpenAI-compatible, OmertaAI works with any tool that supports a custom base URL.
| Tool | Configuration |
|---|---|
| VS Code (Continue) | Set baseUrl to http://localhost:11434/v1 |
| n8n | OpenAI node → Custom Base URL → localhost:11434/v1 |
| LangChain (Python) | ChatOpenAI(base_url="http://localhost:11434/v1") |
| Obsidian Copilot | Provider: OpenAI-compatible, URL: localhost:11434/v1 |
| Open WebUI | Add connection pointing to localhost:11434 |
Privacy — Technical Details
These are verifiable architectural facts about how OmertaAI is built, not policy statements:
127.0.0.1 by default. You can verify this with netstat -an | grep 11434 or any packet inspector.