Documentation - FileGrind

Overview

FileGrind is a macOS application that breaks documents into structured pieces and helps you find connections across them. It runs AI models locally, stores everything on your Mac, and works offline.

Core Concepts

Chips - Structured pieces extracted from files (pages, sections, images)
Blocks - Collections of chips from one or many files
Capabilities - Operations you can perform on files and chips
Plugins - External binaries that add new capabilities

What it's for

FileGrind is useful when you have many documents and want to find connections between them. Researchers with hundreds of papers. Students with textbooks and notes. Anyone who needs to search across a collection of files regardless of format.

What it's not

FileGrind isn't a file manager or a document reader. It doesn't organize where your files are stored, and it doesn't replace your PDF reader or ebook app. When you want to read a document, FileGrind opens it in your default app.

Architecture

FileGrind runs as two processes: a Rust backend engine and a Swift frontend app. They communicate via gRPC.

The Engine

The backend engine handles:

Database - SQLite for listings, chips, collections, and metadata
Search - Tantivy-based full-text search with multiple query types
LLM - Local inference using llama.cpp
Embeddings - Vector generation using Candle
Plugins - Discovery and execution of capability plugins

The App

The Swift frontend provides:

Library browser - View and organize your documents
Search interface - Query across all your files
Details panel - View metadata and chips for a document
Plugin manager - Install and manage plugins

Data Storage

All data is stored locally in ~/Library/Application Support/FileGrind/. This includes the SQLite database, search indexes, extracted content, and downloaded models.

Chips

A chip is a structured piece of content extracted from a file. When you add a file to FileGrind, it gets "ground" into chips.

What becomes a chip

PDF pages - Each page becomes a chip with extracted text and images
EPUB sections - Chapters and sections become individual chips
Text blocks - Paragraphs or sections from plain text files
Images - Standalone images or those extracted from documents

Chip properties

Every chip carries metadata:

Source - Which file it came from
Position - Page number, section index, or offset
Content type - Text, image, table, or other
Embeddings - Vector representation for semantic search
Tags - User-defined or auto-generated labels

Chip URN

Chips are identified by a capability URN that describes their type. See capns.org for details on URN format.

cap:type=page-content     # A page of text
cap:type=thumbnail        # A thumbnail image
cap:type=outline          # Document outline/TOC
cap:type=book-info        # Bibliographic metadata

Blocks

A block is a collection of chips. You can group chips from a single file or gather related chips from across your entire library.

Creating blocks

Manual selection - Select chips and group them
Search results - Save a search as a block
Smart collections - Blocks that auto-update based on criteria

Using blocks

Once you have a block:

Apply capabilities to all chips in the block
Export the block's content
Use it as context for AI operations
Link it to other blocks

Example

Block: "Machine Learning Fundamentals"
├── Deep Learning (PDF) - pages 1-50
├── Pattern Recognition (EPUB) - chapter 2
├── Course notes (TXT) - sections 1-3
└── 4 research papers - abstracts

Capabilities

A capability is an operation you can perform on files or chips. Extract text from a PDF. Parse an EPUB's structure. Generate embeddings.

How capabilities work

Select - Choose files or chips as input
Match - FileGrind finds a capability that handles the input type
Execute - The capability runs (via a plugin or built-in)
Output - Results are stored as new chips or metadata

Built-in capabilities

PDF extraction - Text, images, and structure from PDFs
EPUB parsing - Chapters, sections, and metadata from EPUBs
Text processing - Plain text and markdown files
Thumbnail generation - Visual previews of pages
Metadata extraction - Title, author, keywords from files
Embedding generation - Vector representations for search

CAPNS

Capabilities use the CAPNS naming system. Each capability has a URN like cap:op=extract;format=pdf;target=text. When you need an operation, FileGrind matches your request to available capabilities and picks the most specific one.

Read the full capabilities guide →

Plugins

Plugins are external binaries that provide capabilities. They run in a sandboxed XPC service, separate from the main app.

Bundled plugins

pdfczar - PDF processing using PDFium
txtczar - Text and markdown processing

Installing plugins

Plugins are distributed as signed .pkg installers. Download from the Plugins tab, run the installer, and FileGrind detects the new plugin automatically.

Plugins install to /Library/Application Support/FileGrind/Plugins/.

Plugin discovery

When FileGrind starts, the XPC service:

Scans plugin directories for executables
Runs each with the manifest argument
Parses the JSON manifest to get capabilities
Registers capabilities with the router

Plugin protocol

Plugins communicate via stdin/stdout:

Input - JSON on stdin with arguments and file paths
Output - JSON on stdout for structured data
Errors - Non-zero exit code, error message on stderr

echo '{"cap":"cap:extract;format=pdf","input":"/path/to/file.pdf"}' | ./pdfczar

Security

All distributed plugins must be:

Signed with a Developer ID certificate
Notarized by Apple
Signed by FileGrind's team ID (P336JK947M)

Building plugins

Want to add support for a new file type? See the plugin development guide for how to build, test, and publish plugins.

Plugin development guide →

Search

FileGrind search works across all your files, regardless of format. One query, one result set.

Search types

Keyword - Traditional text matching
Semantic - Find related content via embeddings
Filtered - Limit by file type, date, tags, or metadata

How it works

FileGrind uses Tantivy for full-text search. When files are ground, their text content is indexed. Semantic search uses vector embeddings generated by the embedding model.

Search results

Results are chips, not files. A search for "neural networks" might return:

Deep Learning (PDF) - pages 45, 67, 89-92
Pattern Recognition (EPUB) - chapter 6
Course notes (TXT) - section 3
Research Paper A (PDF) - page 2

Saved searches

Any search can be saved as a block. The block updates automatically when new matching content is added.

AI Models

FileGrind runs AI models locally on your Mac. No API keys, no cloud processing.

Model types

Embedding models - Generate vectors for semantic search
LLMs - Text analysis, summarization, classification

Default models

all-MiniLM-L6-v2 - Sentence embeddings for search
Mistral-7B-Instruct - Document analysis

Model management

FileGrind uses modelczar for model downloads. Models come from HuggingFace and are cached locally.

Models are stored in ~/.cache/modelczar/.

Running models

LLMs run via llama.cpp. Embedding models run via Candle. Both use MLX for Apple Silicon optimization when available.

Supported File Types

FileGrind launches with support for common document formats. More are added through plugins.

Beta launch

PDF	Full text extraction, images, structure, thumbnails
EPUB	Chapters, sections, embedded images
TXT/MD	Plain text and markdown files
Images	PNG, JPG, GIF, WebP

Coming via plugins

3D models (OBJ, FBX, GLTF)
Audio files (MP3, WAV, FLAC)
Video files (MP4, MOV)
Code files (syntax-aware parsing)
Spreadsheets (XLSX, CSV)

Privacy

Your files stay on your Mac. FileGrind is designed for local operation.

What stays local

All your files
All extracted content and chips
All search indexes
All AI processing
All metadata and tags

Network usage

FileGrind connects to the internet only for:

Downloading AI models (one-time, from HuggingFace)
Checking for app updates (optional)
Downloading plugins (optional)

Offline mode

Once models are downloaded, FileGrind works offline. Search, grind files, run AI analysis—no network required.