InferBox

Self-hosted GPU inference server compatible with the OpenAI API. Embeddings, rerankers, LLMs, STT, image-gen. Micro-batching, speculative decoding, multi-GPU aware.

https://github.com/joseluissaorin/inferbox