The expected maximum number of requests per minute for the instance.
The expected model latency at maximum invocations per minute for the instance.