Interface EvaluationDatasetMetricConfig.Builder

  • Method Details

    • taskType

      The the type of task you want to evaluate for your evaluation job. This applies only to model evaluation jobs and is ignored for knowledge base evaluation jobs.

      Parameters:
      taskType - The the type of task you want to evaluate for your evaluation job. This applies only to model evaluation jobs and is ignored for knowledge base evaluation jobs.
      Returns:
      Returns a reference to this object so that method calls can be chained together.
      See Also:
    • taskType

      The the type of task you want to evaluate for your evaluation job. This applies only to model evaluation jobs and is ignored for knowledge base evaluation jobs.

      Parameters:
      taskType - The the type of task you want to evaluate for your evaluation job. This applies only to model evaluation jobs and is ignored for knowledge base evaluation jobs.
      Returns:
      Returns a reference to this object so that method calls can be chained together.
      See Also:
    • dataset

      Specifies the prompt dataset.

      Parameters:
      dataset - Specifies the prompt dataset.
      Returns:
      Returns a reference to this object so that method calls can be chained together.
    • dataset

      Specifies the prompt dataset.

      This is a convenience method that creates an instance of the EvaluationDataset.Builder avoiding the need to create one manually via EvaluationDataset.builder().

      When the Consumer completes, SdkBuilder.build() is called immediately and its result is passed to dataset(EvaluationDataset).

      Parameters:
      dataset - a consumer that will call methods on EvaluationDataset.Builder
      Returns:
      Returns a reference to this object so that method calls can be chained together.
      See Also:
    • metricNames

      The names of the metrics you want to use for your evaluation job.

      For knowledge base evaluation jobs that evaluate retrieval only, valid values are " Builtin.ContextRelevance", "Builtin.ContextCoverage".

      For knowledge base evaluation jobs that evaluate retrieval with response generation, valid values are " Builtin.Correctness", "Builtin.Completeness", "Builtin.Helpfulness", " Builtin.LogicalCoherence", "Builtin.Faithfulness", " Builtin.Harmfulness", "Builtin.Stereotyping", "Builtin.Refusal".

      For automated model evaluation jobs, valid values are "Builtin.Accuracy", " Builtin.Robustness", and "Builtin.Toxicity ". In model evaluation jobs that use a LLM as judge you can specify "Builtin.Correctness", " Builtin.Completeness", "Builtin.Faithfulness", "Builtin.Helpfulness ", "Builtin.Coherence", "Builtin.Relevance", " Builtin.FollowingInstructions", "Builtin.ProfessionalStyleAndTone ", You can also specify the following responsible AI related metrics only for model evaluation job that use a LLM as judge " Builtin.Harmfulness", "Builtin.Stereotyping", and "Builtin.Refusal".

      For human-based model evaluation jobs, the list of strings must match the name parameter specified in HumanEvaluationCustomMetric.

      Parameters:
      metricNames - The names of the metrics you want to use for your evaluation job.

      For knowledge base evaluation jobs that evaluate retrieval only, valid values are " Builtin.ContextRelevance", "Builtin.ContextCoverage".

      For knowledge base evaluation jobs that evaluate retrieval with response generation, valid values are "Builtin.Correctness", "Builtin.Completeness", " Builtin.Helpfulness", "Builtin.LogicalCoherence", " Builtin.Faithfulness", "Builtin.Harmfulness", " Builtin.Stereotyping", "Builtin.Refusal".

      For automated model evaluation jobs, valid values are "Builtin.Accuracy", " Builtin.Robustness", and "Builtin.Toxicity ". In model evaluation jobs that use a LLM as judge you can specify "Builtin.Correctness ", "Builtin.Completeness", "Builtin.Faithfulness", " Builtin.Helpfulness", "Builtin.Coherence", "Builtin.Relevance ", "Builtin.FollowingInstructions", "Builtin.ProfessionalStyleAndTone ", You can also specify the following responsible AI related metrics only for model evaluation job that use a LLM as judge " Builtin.Harmfulness", "Builtin.Stereotyping", and " Builtin.Refusal".

      For human-based model evaluation jobs, the list of strings must match the name parameter specified in HumanEvaluationCustomMetric.

      Returns:
      Returns a reference to this object so that method calls can be chained together.
    • metricNames

      EvaluationDatasetMetricConfig.Builder metricNames(String... metricNames)

      The names of the metrics you want to use for your evaluation job.

      For knowledge base evaluation jobs that evaluate retrieval only, valid values are " Builtin.ContextRelevance", "Builtin.ContextCoverage".

      For knowledge base evaluation jobs that evaluate retrieval with response generation, valid values are " Builtin.Correctness", "Builtin.Completeness", "Builtin.Helpfulness", " Builtin.LogicalCoherence", "Builtin.Faithfulness", " Builtin.Harmfulness", "Builtin.Stereotyping", "Builtin.Refusal".

      For automated model evaluation jobs, valid values are "Builtin.Accuracy", " Builtin.Robustness", and "Builtin.Toxicity ". In model evaluation jobs that use a LLM as judge you can specify "Builtin.Correctness", " Builtin.Completeness", "Builtin.Faithfulness", "Builtin.Helpfulness ", "Builtin.Coherence", "Builtin.Relevance", " Builtin.FollowingInstructions", "Builtin.ProfessionalStyleAndTone ", You can also specify the following responsible AI related metrics only for model evaluation job that use a LLM as judge " Builtin.Harmfulness", "Builtin.Stereotyping", and "Builtin.Refusal".

      For human-based model evaluation jobs, the list of strings must match the name parameter specified in HumanEvaluationCustomMetric.

      Parameters:
      metricNames - The names of the metrics you want to use for your evaluation job.

      For knowledge base evaluation jobs that evaluate retrieval only, valid values are " Builtin.ContextRelevance", "Builtin.ContextCoverage".

      For knowledge base evaluation jobs that evaluate retrieval with response generation, valid values are "Builtin.Correctness", "Builtin.Completeness", " Builtin.Helpfulness", "Builtin.LogicalCoherence", " Builtin.Faithfulness", "Builtin.Harmfulness", " Builtin.Stereotyping", "Builtin.Refusal".

      For automated model evaluation jobs, valid values are "Builtin.Accuracy", " Builtin.Robustness", and "Builtin.Toxicity ". In model evaluation jobs that use a LLM as judge you can specify "Builtin.Correctness ", "Builtin.Completeness", "Builtin.Faithfulness", " Builtin.Helpfulness", "Builtin.Coherence", "Builtin.Relevance ", "Builtin.FollowingInstructions", "Builtin.ProfessionalStyleAndTone ", You can also specify the following responsible AI related metrics only for model evaluation job that use a LLM as judge " Builtin.Harmfulness", "Builtin.Stereotyping", and " Builtin.Refusal".

      For human-based model evaluation jobs, the list of strings must match the name parameter specified in HumanEvaluationCustomMetric.

      Returns:
      Returns a reference to this object so that method calls can be chained together.