Documentation

OmertaAI Documentation

OmertaAI is a local AI assistant running entirely on your hardware. No cloud. No account. No data ever leaves your machine. This documentation covers installation, usage, and the local REST API.

Installation

OmertaAI is distributed as a single self-contained installer. The model weights are bundled inside — no separate download step, no internet connection required after installation.

Windows

Download the .exe installer and run as administrator. OmertaAI installs to %LOCALAPPDATA%\OmertaAI.

PowerShell
# One-line install via PowerShell irm https://ai.omertavpn.com/install | iex

macOS

Requires macOS 13 Ventura or later. Apple Silicon and Intel both supported.

Terminal
brew install --cask omertaai

Linux

Tested on Ubuntu 22.04+, Debian 12+, Fedora 38+, Arch. Available as AppImage or via the install script.

Bash
curl -fsSL https://ai.omertavpn.com/install.sh | bash

System Requirements

ComponentMinimumRecommended
RAM16 GB32 GB or more
Storage30 GB free60 GB SSD
CPU8-core x86_64 / Apple M1Apple M2 Pro / AMD Ryzen 9 / Intel i9
GPU (optional)NVIDIA RTX 3080+ (CUDA 12) or Apple GPU
OSWindows 10, macOS 13, Ubuntu 22Latest stable
GPU Acceleration With a supported NVIDIA or Apple GPU, generation speed increases 4–8×. OmertaAI auto-detects hardware and selects the best backend. CPU-only mode is always available.

Quick Start

After installation, OmertaAI starts a local server on localhost:11434 and opens the UI in your default browser.

Terminal
# Start the OmertaAI server omertaai start # Specify model and port omertaai start --model claude-local --port 11434 # Run in background (daemon mode) omertaai start --daemon

Desktop UI

Open http://localhost:11434 in any browser, or use the native desktop app installed alongside the server. The UI supports multi-turn conversations, file uploads (processed locally), and conversation export to .md or .json.

Conversations are held in memory only. Closing the window or the server process discards the session permanently — by design. You can export any session before closing.

CLI Usage

Bash
# Single prompt, print response omertaai ask "Explain RSA encryption in plain language" # Pipe input cat contract.txt | omertaai ask "Summarize the key obligations" # Interactive REPL mode omertaai chat # Load a system prompt from file omertaai chat --system ./my_persona.txt

Model Selection

OmertaAI ships with a quantized build of the Claude model architecture — currently the highest-performing publicly available language model across benchmarks for coding, reasoning, writing, and instruction following.

Model IDSizeRAM RequiredBest For
claude-local-fastQ4 4-bit16 GBSpeed, interactive use
claude-localQ5 5-bit24 GBBalanced (default)
claude-local-maxQ8 8-bit48 GBMaximum quality

Local API Overview

OmertaAI exposes a local REST API on http://localhost:11434 that is fully compatible with the OpenAI Chat API format. Drop it into any tool that supports a custom OpenAI base URL — VS Code extensions, n8n, LangChain, your own scripts.

No API Key Required The local API is accessible without authentication. It only binds to localhost by default, making it inaccessible from the network. You can optionally bind to a local network IP with --host 0.0.0.0 if needed.

Chat Completions

POST/v1/chat/completions
Send a list of messages and receive a completion. OpenAI-compatible format.
cURL
curl http://localhost:11434/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "claude-local", "messages": [ {"role": "system", "content": "You are a helpful assistant."}, {"role": "user", "content": "Write a Python function to parse JSON."} ] }'
Python
from openai import OpenAI client = OpenAI( base_url="http://localhost:11434/v1", api_key="omerta" # any string — not validated ) response = client.chat.completions.create( model="claude-local", messages=[ {"role": "user", "content": "Explain zero-knowledge proofs."} ] ) print(response.choices[0].message.content)

Streaming

Add "stream": true to receive server-sent events as the model generates tokens.

JavaScript
const response = await fetch('http://localhost:11434/v1/chat/completions', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ model: 'claude-local', stream: true, messages: [{ role: 'user', content: 'Write me a haiku about silence.' }] }) }); const reader = response.body.getReader(); const decoder = new TextDecoder(); while (true) { const { done, value } = await reader.read(); if (done) break; console.log(decoder.decode(value)); }

Models Endpoint

GET/v1/models
List all locally available models with their metadata.
Response JSON
{ "object": "list", "data": [ { "id": "claude-local-fast", "quantization": "Q4", "ram_required_gb": 16 }, { "id": "claude-local", "quantization": "Q5", "ram_required_gb": 24 }, { "id": "claude-local-max", "quantization": "Q8", "ram_required_gb": 48 } ] }

Context & Memory

OmertaAI supports up to 200,000 tokens of context per session — enough for entire codebases, lengthy documents, or long multi-turn conversations. Context is held in RAM only.

OmertaAI does not persist memory between sessions by default. If you want continuity, export a session and pass it as a system prompt on the next run:

Bash
# Export current session omertaai export --format json > session_2026_04.json # Resume from exported session omertaai chat --load session_2026_04.json

Integrations

Because the local API is OpenAI-compatible, OmertaAI works with any tool that supports a custom base URL.

ToolConfiguration
VS Code (Continue)Set baseUrl to http://localhost:11434/v1
n8nOpenAI node → Custom Base URL → localhost:11434/v1
LangChain (Python)ChatOpenAI(base_url="http://localhost:11434/v1")
Obsidian CopilotProvider: OpenAI-compatible, URL: localhost:11434/v1
Open WebUIAdd connection pointing to localhost:11434

Privacy — Technical Details

These are verifiable architectural facts about how OmertaAI is built, not policy statements:

Network Isolation The OmertaAI server binary makes zero outbound TCP connections. The API server binds exclusively to 127.0.0.1 by default. You can verify this with netstat -an | grep 11434 or any packet inspector.
No Disk Logging Conversations are never written to disk during a session. If the process is killed, the conversation is gone. The only file writes during operation are model weight reads and optional session exports you initiate explicitly.
No Telemetry Binary The installer does not bundle any analytics, crash reporting, or update-check library. There is no periodic ping to our servers. The binary is statically compiled and auditable.
No Account, No Identifier OmertaAI generates no device ID, installation ID, or any persistent identifier. We are structurally incapable of correlating usage back to you because no data ever reaches us.