Description
Production inference for custom and open-weight models. Containerised via BentoML, autoscaled behind Envoy, observably tuned for cost and latency. Typical engagement: design + pilot deployment for a single model, including autoscaler tuning and SLO baselining.
