José Luis Saorín Ferrer
InferBox
Server di inferenza GPU self-hosted compatibile con l’API OpenAI. Embedding, reranker, LLM, STT, image-gen. Micro-batching, decodifica speculativa, consapevole del multi-GPU.
https://github.com/joseluissaorin/inferbox