New in llama.cpp: Model Management
llama.cpp server now ships with router mode, which lets you dynamically load, unload, and switch between multiple models without restarting.
Reminder: llama.cpp server is a lightweight, OpenAI-compatible HTTP server for running LLMs locally.
This