If you ever plan on supporting local models, you might want to use the OpenAI API with the ability to change the proxy server so it can be used by many different open source LLM backends like Oobabooga's text-generation-webui etc. (Koboldcpp had it's own API but Ollama has partial OpenAI API support)