FileGrind Documentation

Break files into pieces. Find connections across them.

Overview

FileGrind is a macOS application that breaks documents into structured pieces and helps you find connections across them. It runs AI models locally, stores everything on your Mac, and works offline.

Core Concepts

  • Chips - Structured pieces extracted from files (pages, sections, images)
  • Blocks - Collections of chips from one or many files
  • Capabilities - Operations you can perform on files and chips
  • Plugins - External binaries that add new capabilities

What it's for

FileGrind is useful when you have many documents and want to find connections between them. Researchers with hundreds of papers. Students with textbooks and notes. Anyone who needs to search across a collection of files regardless of format.

What it's not

FileGrind isn't a file manager or a document reader. It doesn't organize where your files are stored, and it doesn't replace your PDF reader or ebook app. When you want to read a document, FileGrind opens it in your default app.

Architecture

FileGrind runs as two processes: a Rust backend engine and a Swift frontend app. They communicate via gRPC.

The Engine

The backend engine handles:

  • Database - SQLite for listings, chips, collections, and metadata
  • Search - Tantivy-based full-text search with multiple query types
  • LLM - Local inference using llama.cpp
  • Embeddings - Vector generation using Candle
  • Plugins - Discovery and execution of capability plugins

The App

The Swift frontend provides:

  • Library browser - View and organize your documents
  • Search interface - Query across all your files
  • Details panel - View metadata and chips for a document
  • Plugin manager - Install and manage plugins

Data Storage

All data is stored locally in ~/Library/Application Support/FileGrind/. This includes the SQLite database, search indexes, extracted content, and downloaded models.

Chips

A chip is a structured piece of content extracted from a file. When you add a file to FileGrind, it gets "ground" into chips.

What becomes a chip

  • PDF pages - Each page becomes a chip with extracted text and images
  • EPUB sections - Chapters and sections become individual chips
  • Text blocks - Paragraphs or sections from plain text files
  • Images - Standalone images or those extracted from documents

Chip properties

Every chip carries metadata:

  • Source - Which file it came from
  • Position - Page number, section index, or offset
  • Content type - Text, image, table, or other
  • Embeddings - Vector representation for semantic search
  • Tags - User-defined or auto-generated labels

Chip URN

Chips are identified by a capability URN that describes their type. See capns.org for details on URN format.

cap:type=page-content     # A page of text
cap:type=thumbnail        # A thumbnail image
cap:type=outline          # Document outline/TOC
cap:type=book-info        # Bibliographic metadata

Blocks

A block is a collection of chips. You can group chips from a single file or gather related chips from across your entire library.

Creating blocks

  • Manual selection - Select chips and group them
  • Search results - Save a search as a block
  • Smart collections - Blocks that auto-update based on criteria

Using blocks

Once you have a block:

  • Apply capabilities to all chips in the block
  • Export the block's content
  • Use it as context for AI operations
  • Link it to other blocks

Example

Block: "Machine Learning Fundamentals"
├── Deep Learning (PDF) - pages 1-50
├── Pattern Recognition (EPUB) - chapter 2
├── Course notes (TXT) - sections 1-3
└── 4 research papers - abstracts

Capabilities

A capability is an operation you can perform on files or chips. Extract text from a PDF. Parse an EPUB's structure. Generate embeddings.

How capabilities work

  1. Select - Choose files or chips as input
  2. Match - FileGrind finds a capability that handles the input type
  3. Execute - The capability runs (via a plugin or built-in)
  4. Output - Results are stored as new chips or metadata

Built-in capabilities

  • PDF extraction - Text, images, and structure from PDFs
  • EPUB parsing - Chapters, sections, and metadata from EPUBs
  • Text processing - Plain text and markdown files
  • Thumbnail generation - Visual previews of pages
  • Metadata extraction - Title, author, keywords from files
  • Embedding generation - Vector representations for search

CAPNS

Capabilities use the CAPNS naming system. Each capability has a URN like cap:op=extract;format=pdf;target=text. When you need an operation, FileGrind matches your request to available capabilities and picks the most specific one.

Read the full capabilities guide →

Plugins

Plugins are external binaries that provide capabilities. They run in a sandboxed XPC service, separate from the main app.

Bundled plugins

  • pdfczar - PDF processing using PDFium
  • txtczar - Text and markdown processing

Installing plugins

Plugins are distributed as signed .pkg installers. Download from the Plugins tab, run the installer, and FileGrind detects the new plugin automatically.

Plugins install to /Library/Application Support/FileGrind/Plugins/.

Plugin discovery

When FileGrind starts, the XPC service:

  1. Scans plugin directories for executables
  2. Runs each with the manifest argument
  3. Parses the JSON manifest to get capabilities
  4. Registers capabilities with the router

Plugin protocol

Plugins communicate via stdin/stdout:

  • Input - JSON on stdin with arguments and file paths
  • Output - JSON on stdout for structured data
  • Errors - Non-zero exit code, error message on stderr
echo '{"cap":"cap:extract;format=pdf","input":"/path/to/file.pdf"}' | ./pdfczar

Security

All distributed plugins must be:

  • Signed with a Developer ID certificate
  • Notarized by Apple
  • Signed by FileGrind's team ID (P336JK947M)

Building plugins

Want to add support for a new file type? See the plugin development guide for how to build, test, and publish plugins.

Plugin development guide →

AI Models

FileGrind runs AI models locally on your Mac. No API keys, no cloud processing.

Model types

  • Embedding models - Generate vectors for semantic search
  • LLMs - Text analysis, summarization, classification

Default models

  • all-MiniLM-L6-v2 - Sentence embeddings for search
  • Mistral-7B-Instruct - Document analysis

Model management

FileGrind uses modelczar for model downloads. Models come from HuggingFace and are cached locally.

Models are stored in ~/.cache/modelczar/.

Running models

LLMs run via llama.cpp. Embedding models run via Candle. Both use MLX for Apple Silicon optimization when available.

Supported File Types

FileGrind launches with support for common document formats. More are added through plugins.

Beta launch

PDF Full text extraction, images, structure, thumbnails
EPUB Chapters, sections, embedded images
TXT/MD Plain text and markdown files
Images PNG, JPG, GIF, WebP

Coming via plugins

  • 3D models (OBJ, FBX, GLTF)
  • Audio files (MP3, WAV, FLAC)
  • Video files (MP4, MOV)
  • Code files (syntax-aware parsing)
  • Spreadsheets (XLSX, CSV)

Privacy

Your files stay on your Mac. FileGrind is designed for local operation.

What stays local

  • All your files
  • All extracted content and chips
  • All search indexes
  • All AI processing
  • All metadata and tags

Network usage

FileGrind connects to the internet only for:

  • Downloading AI models (one-time, from HuggingFace)
  • Checking for app updates (optional)
  • Downloading plugins (optional)

Offline mode

Once models are downloaded, FileGrind works offline. Search, grind files, run AI analysis—no network required.