感謝您提交詢問!我們的一位團隊成員將在短時間內與您聯繫。
感謝您提交預訂!我們的一位團隊成員將在短時間內與您聯繫。
課程簡介
Introduction to EXO and Local AI Clustering
- Overview of the EXO framework and the exo-explore ecosystem
- Comparing centralized cloud inference vs distributed local inference
- Architecture: libp2p device discovery, MLX backend, dashboard, and API layers
- Hardware requirements: Apple Silicon (M3 Ultra, M4 Pro/Max), Thunderbolt 5, shared storage
Installing EXO on macOS
- Setting up Xcode, Metal ToolChain, and macOS prerequisites
- Installing uv, Node.js, Rust nightly toolchain
- Installing the pinned macmon fork for Apple Silicon monitoring
- Cloning the repository and building the dashboard with npm
- Running EXO from source and verifying the localhost:52415 dashboard
Installing EXO on Linux
- Installing dependencies via apt or Homebrew on Linux
- Configuring uv, Node.js 18+, and Rust nightly
- Building the dashboard and running EXO in CPU-only mode
- Directory layout: XDG Base Directory paths for config, data, cache, and logs
Automatic Device Discovery and Cluster Formation
- Understanding libp2p-based auto-discovery across local networks
- Configuring custom namespaces with EXO_LIBP2P_NAMESPACE for cluster isolation
- Verifying node membership in the dashboard cluster view
- Handling discovery failures and network segmentation issues
Enabling RDMA over Thunderbolt 5
- RDMA architecture and the 99 percent latency reduction claim
- Enabling RDMA in macOS Recovery mode with rdma_ctl
- Cable requirements and port topology constraints on Mac Studio
- Matching macOS versions across all cluster nodes
- Troubleshooting RDMA discovery and DHCP configuration
Deploying Frontier Models
- Using the dashboard to load and shard DeepSeek v3.1, Qwen3-235B, and Llama family models
- Previewing instance placements with the /instance/previews API endpoint
- Creating model instances with pipeline or tensor-parallel sharding
- Configuring custom model cards from HuggingFace hub
Monitoring and Troubleshooting
- Reading EXO logs and understanding distributed tracing
- Interpreting cluster health in the dashboard cluster view
- Diagnosing worker node failures and reconnection behavior
- Using EXO_TRACING_ENABLED for performance bottleneck analysis
Cluster Maintenance and Updates
- Updating EXO binaries and dashboard rebuild procedures
- Migrating model caches and managing pre-downloaded models over NFS
- Gracefully removing nodes and rebalancing workloads
最低要求
- An understanding of networking fundamentals (IP, subnetting, firewalls)
- Experience with macOS or Linux command-line administration
- Familiarity with Python package management (pip/uv) and Node.js tooling
Audience
- System administrators
- DevOps engineers
- AI infrastructure architects responsible for on-premise LLM deployment
21 小時