José Luis Saorín Ferrer InferBox Self-hosted GPU inference server compatible with the OpenAI API. Embeddings, rerankers, LLMs, STT, image-gen. Micro-batching, speculative decoding, multi-GPU aware. https://github.com/joseluissaorin/inferbox