OmertaAI Documentation

OmertaAI is a local AI assistant running entirely on your hardware. No cloud. No account. No data ever leaves your machine. This documentation covers installation, usage, and the local REST API.

Installation

OmertaAI is distributed as a single self-contained installer. The model weights are bundled inside — no separate download step, no internet connection required after installation.

Windows

Download the .exe installer and run as administrator. OmertaAI installs to %LOCALAPPDATA%\OmertaAI.

PowerShell

# One-line install via PowerShell irm https://ai.omertavpn.com/install | iex

macOS

Requires macOS 13 Ventura or later. Apple Silicon and Intel both supported.

Terminal

brew install --cask omertaai

Linux

Tested on Ubuntu 22.04+, Debian 12+, Fedora 38+, Arch. Available as AppImage or via the install script.

Bash

curl -fsSL https://ai.omertavpn.com/install.sh | bash

System Requirements

Component	Minimum	Recommended
RAM	16 GB	32 GB or more
Storage	30 GB free	60 GB SSD
CPU	8-core x86_64 / Apple M1	Apple M2 Pro / AMD Ryzen 9 / Intel i9
GPU (optional)	—	NVIDIA RTX 3080+ (CUDA 12) or Apple GPU
OS	Windows 10, macOS 13, Ubuntu 22	Latest stable

GPU Acceleration With a supported NVIDIA or Apple GPU, generation speed increases 4–8×. OmertaAI auto-detects hardware and selects the best backend. CPU-only mode is always available.

Quick Start

After installation, OmertaAI starts a local server on localhost:11434 and opens the UI in your default browser.

Terminal

# Start the OmertaAI server omertaai start # Specify model and port omertaai start --model claude-local --port 11434 # Run in background (daemon mode) omertaai start --daemon

Desktop UI

Open http://localhost:11434 in any browser, or use the native desktop app installed alongside the server. The UI supports multi-turn conversations, file uploads (processed locally), and conversation export to .md or .json.

Conversations are held in memory only. Closing the window or the server process discards the session permanently — by design. You can export any session before closing.

CLI Usage

Bash

# Single prompt, print response omertaai ask "Explain RSA encryption in plain language" # Pipe input cat contract.txt | omertaai ask "Summarize the key obligations" # Interactive REPL mode omertaai chat # Load a system prompt from file omertaai chat --system ./my_persona.txt

Model Selection

OmertaAI ships with a quantized build of the Claude model architecture — currently the highest-performing publicly available language model across benchmarks for coding, reasoning, writing, and instruction following.

Model ID	Size	RAM Required	Best For
`claude-local-fast`	Q4 4-bit	16 GB	Speed, interactive use
`claude-local`	Q5 5-bit	24 GB	Balanced (default)
`claude-local-max`	Q8 8-bit	48 GB	Maximum quality

Local API Overview

OmertaAI exposes a local REST API on http://localhost:11434 that is fully compatible with the OpenAI Chat API format. Drop it into any tool that supports a custom OpenAI base URL — VS Code extensions, n8n, LangChain, your own scripts.

No API Key Required The local API is accessible without authentication. It only binds to localhost by default, making it inaccessible from the network. You can optionally bind to a local network IP with --host 0.0.0.0 if needed.

Chat Completions

POST/v1/chat/completions

Send a list of messages and receive a completion. OpenAI-compatible format.

cURL

curl http://localhost:11434/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "claude-local", "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Write a Python function to parse JSON."} ] }'

Python

from openai import OpenAI client = OpenAI( base_url="http://localhost:11434/v1", api_key="omerta" # any string — not validated ) response = client.chat.completions.create( model="claude-local", messages=[ {"role": "user", "content": "Explain zero-knowledge proofs."} ] ) print(response.choices[0].message.content)

Streaming

Add "stream": true to receive server-sent events as the model generates tokens.

JavaScript

Models Endpoint

GET/v1/models

List all locally available models with their metadata.

Response JSON

{ "object": "list", "data": [ { "id": "claude-local-fast", "quantization": "Q4", "ram_required_gb": 16 }, { "id": "claude-local", "quantization": "Q5", "ram_required_gb": 24 }, { "id": "claude-local-max", "quantization": "Q8", "ram_required_gb": 48 } ] }

Context & Memory

OmertaAI supports up to 200,000 tokens of context per session — enough for entire codebases, lengthy documents, or long multi-turn conversations. Context is held in RAM only.

OmertaAI does not persist memory between sessions by default. If you want continuity, export a session and pass it as a system prompt on the next run:

Bash

# Export current session omertaai export --format json > session_2026_04.json # Resume from exported session omertaai chat --load session_2026_04.json

Integrations

Because the local API is OpenAI-compatible, OmertaAI works with any tool that supports a custom base URL.

Tool	Configuration
VS Code (Continue)	Set `baseUrl` to `http://localhost:11434/v1`
n8n	OpenAI node → Custom Base URL → `localhost:11434/v1`
LangChain (Python)	`ChatOpenAI(base_url="http://localhost:11434/v1")`
Obsidian Copilot	Provider: OpenAI-compatible, URL: `localhost:11434/v1`
Open WebUI	Add connection pointing to `localhost:11434`

Privacy — Technical Details

These are verifiable architectural facts about how OmertaAI is built, not policy statements:

Network Isolation The OmertaAI server binary makes zero outbound TCP connections. The API server binds exclusively to 127.0.0.1 by default. You can verify this with netstat -an | grep 11434 or any packet inspector.

No Disk Logging Conversations are never written to disk during a session. If the process is killed, the conversation is gone. The only file writes during operation are model weight reads and optional session exports you initiate explicitly.

No Telemetry Binary The installer does not bundle any analytics, crash reporting, or update-check library. There is no periodic ping to our servers. The binary is statically compiled and auditable.

No Account, No Identifier OmertaAI generates no device ID, installation ID, or any persistent identifier. We are structurally incapable of correlating usage back to you because no data ever reaches us.