apple-fm-mcp is an MCP server that exposes Apple's on-device Foundation Models to any MCP client — Claude Code, Cursor, Windsurf, or anything that speaks the protocol. All inference runs locally on Apple Silicon with zero API cost and full privacy.
The project is open source at github.com/yihan2099/apple-fm-mcp.
Why I Built It
Apple Intelligence ships on-device language models with macOS 26, but using them requires Apple's native tools or direct SDK integration. There was no way for Claude Code or other MCP clients to call Apple's models as tools. I wanted to bridge that gap: wrap Apple's Foundation Models in a standard MCP server so any agent could use on-device inference for free, private generation without writing Apple-specific code.
What It Does
The server exposes eight MCP tools built on Apple's apple-fm-sdk:
- generate — One-shot text generation with optional system instructions.
- generate_structured — JSON output with schema validation. Handles Apple's non-standard schema requirements automatically.
- chat — Multi-turn conversations with named sessions and context persistence.
- tag_content — Content classification using Apple's optimized CONTENT_TAGGING model.
- check_model_availability — Graceful availability checking with human-readable error messages.
- list_sessions / clear_session / clear_all_sessions — Session management for multi-turn state.
Three prompt templates (summarize, extract_structured, classify) and three resources (device status, sessions, transcripts) round out the interface.
Technical Decisions
Schema normalization was the hardest part. Apple's SDK requires JSON schema fields that are not part of the standard spec: additionalProperties: false, an x-order array, all properties marked required, a title field, and — the biggest gotcha — string properties must include an enum array. The _normalize_schema() function injects these automatically so callers can send standard JSON Schema and get valid results.
Use-case specific models matter. The GENERAL use case handles generation and chat. The CONTENT_TAGGING use case is optimized for classification but silently refuses the instructions parameter — sending system instructions causes it to error. Discovering this required live testing against the pre-release SDK, not reading documentation.
In-memory sessions are a deliberate trade-off. Apple's LanguageModelSession holds internal state that cannot be serialized, so conversations are lost on server restart. For a dev tool running locally, this is acceptable. Persisting sessions would require re-replaying message history on reconnection, which adds complexity without clear value.
Availability as a check, not an exception. Functions return (bool, message) tuples with user-friendly messages for common failures: Apple Intelligence not enabled, model still downloading, device not eligible (requires Apple Silicon). This lets clients handle unavailability gracefully rather than crashing.
What I Learned
Building against a pre-release SDK means the documentation is incomplete. Class names, method signatures, and parameter behaviors all changed between the beta documentation and the actual API. I documented every constraint I discovered in SDK_API_REFERENCE.md — which schema fields Apple requires, which use cases accept instructions, how availability checking actually works. That reference file may be more valuable than the server code itself.
The MCP wrapper pattern — take a proprietary SDK, normalize its quirks, expose it as standard tools — is broadly applicable. The hard part is never the MCP layer. It is understanding the underlying SDK well enough to abstract its constraints without leaking them to callers. FastMCP made the MCP side trivial; the Apple SDK side took 80% of the effort.