Kuzco icon

Kuzco

Open-source Swift package to run LLMs locally on iOS & macOS

github.com
Screenshot of Kuzco

About Kuzco

Kuzco is a Swift package designed to integrate large language models (LLMs) directly into iOS, macOS, and Mac Catalyst applications, enabling on-device AI inference without reliance on network connectivity. Built on llama.cpp, it ensures privacy, speed, and reliability by executing all operations locally. Optimized for Apple Silicon and Intel Macs, Kuzco supports multiple LLM architectures, including LLaMA, Mistral, Qwen, Phi, Gemma, and others, with automatic architecture detection based on model filenames.

The package offers a modern developer experience with async/await-friendly APIs, customizable prompts, and advanced configuration options for fine-tuning model performance. Developers can control various parameters, such as context length, batch size, GPU layers, sampling strategies, and repetition penalties, to tailor predictions to their needs. Kuzco also includes robust error handling, efficient memory management, and thread-safe operations, making it suitable for production-grade applications.

Kuzco simplifies integration with a straightforward API, allowing developers to load models safely, generate responses via streaming, and manage conversation contexts efficiently. It is compatible with Swift 5.9 and Xcode 15.0+, requiring iOS 15.0+ or macOS 12.0+ for deployment. Additionally, the package supports smaller quantized models for memory-constrained environments, ensuring scalability across various devices.

By leveraging Kuzco, developers can create high-performance, privacy-focused AI applications with minimal setup, while benefiting from its flexible configuration and extensive support for modern LLM architectures.

Information shown may be outdated. Found an error? Report it here