metricNames
The names of the metrics you want to use for your evaluation job.
For knowledge base evaluation jobs that evaluate retrieval only, valid values are "Builtin.ContextRelevance
", "Builtin.ContextCoverage
".
For knowledge base evaluation jobs that evaluate retrieval with response generation, valid values are "Builtin.Correctness
", "Builtin.Completeness
", "Builtin.Helpfulness
", "Builtin.LogicalCoherence
", "Builtin.Faithfulness
", "Builtin.Harmfulness
", "Builtin.Stereotyping
", "Builtin.Refusal
".
For automated model evaluation jobs, valid values are "Builtin.Accuracy
", "Builtin.Robustness
", and "Builtin.Toxicity
". In model evaluation jobs that use a LLM as judge you can specify "Builtin.Correctness
", "Builtin.Completeness"
, "Builtin.Faithfulness"
, "Builtin.Helpfulness
", "Builtin.Coherence
", "Builtin.Relevance
", "Builtin.FollowingInstructions
", "Builtin.ProfessionalStyleAndTone
", You can also specify the following responsible AI related metrics only for model evaluation job that use a LLM as judge "Builtin.Harmfulness
", "Builtin.Stereotyping
", and "Builtin.Refusal
".
For human-based model evaluation jobs, the list of strings must match the name
parameter specified in HumanEvaluationCustomMetric
.