Agent skills for iOS, iPadOS, Swift, SwiftUI, and modern Apple framework development.
71
89%
Does it follow best practices?
Impact
—
No eval scenarios have been run
Advisory
Suggest reviewing before use
Guide for selecting, deploying, and optimizing on-device ML models. Covers Apple Foundation Models, Core ML, MLX Swift, and llama.cpp.
Use this decision tree to pick the right framework for your use case.
When to use: Text generation, summarization, entity extraction, structured output, and short dialog on iOS 26+ / macOS 26+ devices with Apple Intelligence enabled. Zero setup -- no API keys, no network, no model downloads.
Best for:
@Generable typesTool protocolNot suited for: Complex math, code generation, factual accuracy tasks, or apps targeting pre-iOS 26 devices.
When to use: Deploying custom trained models (vision, NLP, audio) across all Apple platforms. Converting models from PyTorch, TensorFlow, or scikit-learn with coremltools.
Best for:
When to use: Running specific open-source LLMs (Llama, Mistral, Qwen, Gemma) on Apple Silicon with maximum throughput. Research and prototyping.
Best for:
mlx-communityWhen to use: Cross-platform LLM inference using GGUF model format. Production deployments needing broad device support.
Best for:
| Scenario | Framework |
|---|---|
| Text generation, zero setup (iOS 26+) | Foundation Models |
| Structured output from on-device LLM | Foundation Models (@Generable) |
| Image classification, object detection | Core ML |
| Custom model from PyTorch/TensorFlow | Core ML + coremltools |
| Running specific open-source LLMs | MLX Swift or llama.cpp |
| Maximum throughput on Apple Silicon | MLX Swift |
| Cross-platform LLM inference | llama.cpp |
| OCR and text recognition | Vision framework |
| Sentiment analysis, NER, tokenization | Natural Language framework |
| Training custom classifiers on device | Create ML |
On-device language model optimized for Apple Silicon. Available on devices supporting Apple Intelligence (iOS 26+, macOS 26+).
contextSize for the limitsupportedLanguages for supported localesAlways check before using. Never crash on unavailability.
import FoundationModels
switch SystemLanguageModel.default.availability {
case .available:
// Proceed with model usage
case .unavailable(.appleIntelligenceNotEnabled):
// Guide user to enable Apple Intelligence in Settings
case .unavailable(.modelNotReady):
// Model is downloading; show loading state
case .unavailable(.deviceNotEligible):
// Device cannot run Apple Intelligence; use fallback
default:
// Graceful fallback for any other reason
}// Basic session
let session = LanguageModelSession()
// Session with instructions
let session = LanguageModelSession {
"You are a helpful cooking assistant."
}
// Session with tools
let session = LanguageModelSession(
tools: [weatherTool, recipeTool]
) {
"You are a helpful assistant with access to tools."
}Key rules:
session.isResponding)session.prewarm() before user interaction for faster first responseLanguageModelSession(model: model, tools: [], transcript: savedTranscript)@GenerableThe @Generable macro creates compile-time schemas for type-safe output:
@Generable
struct Recipe {
@Guide(description: "The recipe name")
var name: String
@Guide(description: "Cooking steps", .count(3))
var steps: [String]
@Guide(description: "Prep time in minutes", .range(1...120))
var prepTime: Int
}
let response = try await session.respond(
to: "Suggest a quick pasta recipe",
generating: Recipe.self
)
print(response.content.name)@Guide Constraints| Constraint | Purpose |
|---|---|
description: | Natural language hint for generation |
.anyOf([values]) | Restrict to enumerated string values |
.count(n) | Fixed array length |
.range(min...max) | Numeric range |
.minimum(n) / .maximum(n) | One-sided numeric bound |
.minimumCount(n) / .maximumCount(n) | Array length bounds |
.constant(value) | Always returns this value |
.pattern(regex) | String format enforcement |
.element(guide) | Guide applied to each array element |
Properties generate in declaration order. Place foundational data before dependent data for better results.
let stream = session.streamResponse(
to: "Suggest a recipe",
generating: Recipe.self
)
for try await snapshot in stream {
// snapshot.content is Recipe.PartiallyGenerated (all properties optional)
if let name = snapshot.content.name { updateNameLabel(name) }
}struct WeatherTool: Tool {
let name = "weather"
let description = "Get current weather for a city."
@Generable
struct Arguments {
@Guide(description: "The city name")
var city: String
}
func call(arguments: Arguments) async throws -> String {
let weather = try await fetchWeather(arguments.city)
return weather.description
}
}Register tools at session creation. The model invokes them autonomously.
do {
let response = try await session.respond(to: prompt)
} catch let error as LanguageModelSession.GenerationError {
switch error {
case .guardrailViolation(let context):
// Content triggered safety filters
case .exceededContextWindowSize(let context):
// Too many tokens; summarize and retry
case .concurrentRequests(let context):
// Another request is in progress on this session
case .unsupportedLanguageOrLocale(let context):
// Current locale not supported
case .unsupportedGuide(let context):
// A @Guide constraint is not supported
case .assetsUnavailable(let context):
// Model assets not available on device
case .refusal(let refusal, _):
// Model refused; stream refusal.explanation for details
case .rateLimited(let context):
// Too many requests; back off and retry
case .decodingFailure(let context):
// Response could not be decoded into the expected type
default: break
}
}let options = GenerationOptions(
sampling: .random(top: 40),
temperature: 0.7,
maximumResponseTokens: 512
)
let response = try await session.respond(to: prompt, options: options)Sampling modes: .greedy, .random(top:seed:), .random(probabilityThreshold:seed:).
tokenCount(for:) to monitor the context window budget[descriptive example]Foundation Models supports specialized use cases via SystemLanguageModel.UseCase:
.general -- Default for text generation, summarization, dialog.contentTagging -- Optimized for categorization and labeling tasksLoad fine-tuned adapters for specialized behavior (requires entitlement):
let adapter = try SystemLanguageModel.Adapter(name: "my-adapter")
try await adapter.compile()
let model = SystemLanguageModel(adapter: adapter, guardrails: .default)
let session = LanguageModelSession(model: model)See references/foundation-models.md for the complete Foundation Models API reference.
Apple's framework for deploying trained models. Automatically dispatches to the optimal compute unit (CPU, GPU, or Neural Engine).
| Format | Extension | When to Use |
|---|---|---|
.mlpackage | Directory (mlprogram) | All new models (iOS 15+) |
.mlmodel | Single file (neuralnetwork) | Legacy only (iOS 11-14) |
.mlmodelc | Compiled | Pre-compiled for faster loading |
Always use mlprogram (.mlpackage) for new work.
import coremltools as ct
# PyTorch conversion (torch.jit.trace)
model.eval() # CRITICAL: always call eval() before tracing
traced = torch.jit.trace(model, example_input)
mlmodel = ct.convert(
traced,
inputs=[ct.TensorType(shape=(1, 3, 224, 224), name="image")],
minimum_deployment_target=ct.target.iOS18,
convert_to='mlprogram',
)
mlmodel.save("Model.mlpackage")| Technique | Size Reduction | Accuracy Impact | Best Compute Unit |
|---|---|---|---|
| INT8 per-channel | ~4x | Low | CPU/GPU |
| INT4 per-block | ~8x | Medium | GPU |
| Palettization 4-bit | ~8x | Low-Medium | Neural Engine |
| W8A8 (weights+activations) | ~4x | Low | ANE (A17 Pro/M4+) |
| Pruning 75% | ~4x | Medium | CPU/ANE |
let config = MLModelConfiguration()
config.computeUnits = .all
let model = try MLModel(contentsOf: modelURL, configuration: config)
// Async prediction (iOS 17+)
let output = try await model.prediction(from: input)Swift type for multidimensional array operations:
import CoreML
let tensor = MLTensor([1.0, 2.0, 3.0, 4.0])
let reshaped = tensor.reshaped(to: [2, 2])
let result = tensor.softmax()See references/coreml-conversion.md for the full conversion pipeline and references/coreml-optimization.md for optimization techniques.
Apple's ML framework for Swift. Highest sustained generation throughput on Apple Silicon via unified memory architecture.
import MLX
import MLXLLM
let config = ModelConfiguration(id: "mlx-community/Mistral-7B-Instruct-v0.3-4bit")
let model = try await LLMModelFactory.shared.loadContainer(configuration: config)
try await model.perform { context in
let input = try await context.processor.prepare(
input: UserInput(prompt: "Hello")
)
let stream = try generate(
input: input,
parameters: GenerateParameters(temperature: 0.0),
context: context
)
for await part in stream {
print(part.chunk ?? "", terminator: "")
}
}| Device | RAM | Recommended Model | RAM Usage |
|---|---|---|---|
| iPhone 12-14 | 4-6 GB | SmolLM2-135M or Qwen 2.5 0.5B | ~0.3 GB |
| iPhone 15 Pro+ | 8 GB | Gemma 3n E4B 4-bit | ~3.5 GB |
| Mac 8 GB | 8 GB | Llama 3.2 3B 4-bit | ~3 GB |
| Mac 16 GB+ | 16 GB+ | Mistral 7B 4-bit | ~6 GB |
MLX.GPU.set(cacheLimit: 512 * 1024 * 1024)See references/mlx-swift.md for full MLX Swift patterns and llama.cpp integration.
When an app needs multiple AI backends (e.g., Foundation Models + MLX fallback):
func respond(to prompt: String) async throws -> String {
if SystemLanguageModel.default.isAvailable {
return try await foundationModelsRespond(prompt)
} else if canLoadMLXModel() {
return try await mlxRespond(prompt)
} else {
throw AIError.noBackendAvailable
}
}Serialize all model access through a coordinator actor to prevent contention:
actor ModelCoordinator {
func withExclusiveAccess<T>(_ work: () async throws -> T) async rethrows -> T {
try await work()
}
}session.prewarm() for Foundation Models before user interaction.mlmodelc for faster loadingperform() callLanguageModelSession() without checking
SystemLanguageModel.default.availability crashes on unsupported devices.tokenCount(for:) and summarize when needed.LanguageModelSession supports one
request at a time. Check session.isResponding or serialize access.model.eval() before Core ML tracing. PyTorch models must be
in eval mode before torch.jit.trace. Training-mode artifacts corrupt output.mlprogram (.mlpackage) for new
Core ML models. The legacy neuralnetwork format is deprecated.scenePhase == .background.@Generable properties in logical generation ordercontextSize)Sendable-conformant or @MainActor-isolated@Generable, tool calling, prompt designskills
accessorysetupkit
references
activitykit
references
adattributionkit
references
alarmkit
references
app-clips
app-intents
references
app-store-optimization
app-store-review
apple-on-device-ai
appmigrationkit
references
audioaccessorykit
references
authentication
references
avkit
references
background-processing
references
browserenginekit
references
callkit
references
carplay
references
cloudkit
references
contacts-framework
references
core-bluetooth
references
core-data
core-motion
references
core-nfc
references
coreml
references
cryptokit
references
cryptotokenkit
references
debugging-instruments
device-integrity
references
dockkit
references
energykit
references
eventkit
references
financekit
references
focus-engine
gamekit
references
healthkit
references
homekit
references
ios-accessibility
ios-localization
ios-networking
ios-simulator
references
mapkit
metrickit
references
musickit
references
natural-language
references
paperkit
references
passkit
references
pdfkit
references
pencilkit
references
permissionkit
references
photokit
push-notifications
realitykit
references
relevancekit
references
scenekit
references
sensorkit
references
speech-recognition
spritekit
references
storekit
swift-api-design-guidelines
swift-architecture
swift-charts
references
swift-codable
swift-concurrency
swift-formatstyle
swift-language
swift-security
references
swift-testing
swiftdata
swiftlint
swiftui-animation
swiftui-gestures
references
swiftui-layout-components
swiftui-liquid-glass
references
swiftui-patterns
swiftui-performance
swiftui-uikit-interop
swiftui-webkit
tabletopkit
references
tipkit
references
vision-framework
weatherkit
references
widgetkit
references