Parallax by Gradient is an innovative AI tool that facilitates the creation of decentralized AI clusters, allowing users to run large language models across various devices regardless of their specifications or locations. This tool is designed to enable local hosting of large language models (LLMs) on personal devices, supporting cross-platform functionality. It features pipeline parallel model sharding and dynamic key-value cache management, alongside continuous batching specifically optimized for Mac, ensuring high-performance dynamic request scheduling and routing.
Parallax's architecture leverages peer-to-peer communication through Lattica, a GPU backend powered by SGLang, and a MAC backend utilizing MLX LM. It is compatible with Python versions 3.11.0 to 3.14.0 and requires Ubuntu-24.04 for Blackwell GPUs. Installation varies by operating system, with Windows offering a dedicated application, while Linux and macOS users can install from source or via Docker for Linux with GPU devices.
The tool supports a range of prominent language models from providers like DeepSeek, MiniMax AI, Z AI, Moonshot AI, Alibaba's Qwen, OpenAI, and Meta, each offering unique capabilities for natural language understanding and generation. Parallax's setup involves launching a scheduler, configuring clusters and models, and connecting distributed nodes to form an AI cluster. Users can access a chat interface or interact via APIs, with options for both local and public network environments. The tool allows for flexible deployment, catering to both development and production environments.
Information shown may be outdated. Found an error? Report it here