maxResource
The maximum number of resources (such as epochs) that can be used by a training job launched by a hyperparameter tuning job. Once a job reaches the MaxResource
value, it is stopped. If a value for MaxResource
is not provided, and Hyperband
is selected as the hyperparameter tuning strategy, HyperbandTraining
attempts to infer MaxResource
from the following keys (if present) in StaticsHyperParameters:
epochs
numepochs
n-epochs
n_epochs
num_epochs
If HyperbandStrategyConfig
is unable to infer a value for MaxResource
, it generates a validation error. The maximum value is 20,000 epochs. All metrics that correspond to an objective metric are used to derive early stopping decisions. For distributed training jobs, ensure that duplicate metrics are not printed in the logs across the individual nodes in a training job. If multiple nodes are publishing duplicate or incorrect metrics, training jobs may make an incorrect stopping decision and stop the job prematurely.