Class StartLoaderJobRequest

All Implemented Interfaces:
SdkPojo, ToCopyableBuilder<StartLoaderJobRequest.Builder,StartLoaderJobRequest>

@Generated("software.amazon.awssdk:codegen") public final class StartLoaderJobRequest extends NeptunedataRequest implements ToCopyableBuilder<StartLoaderJobRequest.Builder,StartLoaderJobRequest>
  • Method Details

    • source

      public final String source()

      The source parameter accepts an S3 URI that identifies a single file, multiple files, a folder, or multiple folders. Neptune loads every data file in any folder that is specified.

      The URI can be in any of the following formats.

      • s3://(bucket_name)/(object-key-name)

      • https://s3.amazonaws.com/(bucket_name)/(object-key-name)

      • https://s3.us-east-1.amazonaws.com/(bucket_name)/(object-key-name)

      The object-key-name element of the URI is equivalent to the prefix parameter in an S3 ListObjects API call. It identifies all the objects in the specified S3 bucket whose names begin with that prefix. That can be a single file or folder, or multiple files and/or folders.

      The specified folder or folders can contain multiple vertex files and multiple edge files.

      Returns:
      The source parameter accepts an S3 URI that identifies a single file, multiple files, a folder, or multiple folders. Neptune loads every data file in any folder that is specified.

      The URI can be in any of the following formats.

      • s3://(bucket_name)/(object-key-name)

      • https://s3.amazonaws.com/(bucket_name)/(object-key-name)

      • https://s3.us-east-1.amazonaws.com/(bucket_name)/(object-key-name)

      The object-key-name element of the URI is equivalent to the prefix parameter in an S3 ListObjects API call. It identifies all the objects in the specified S3 bucket whose names begin with that prefix. That can be a single file or folder, or multiple files and/or folders.

      The specified folder or folders can contain multiple vertex files and multiple edge files.

    • format

      public final Format format()

      The format of the data. For more information about data formats for the Neptune Loader command, see Load Data Formats.

      Allowed values

      If the service returns an enum value that is not available in the current SDK version, format will return Format.UNKNOWN_TO_SDK_VERSION. The raw value returned by the service is available from formatAsString().

      Returns:
      The format of the data. For more information about data formats for the Neptune Loader command, see Load Data Formats.

      Allowed values

      See Also:
    • formatAsString

      public final String formatAsString()

      The format of the data. For more information about data formats for the Neptune Loader command, see Load Data Formats.

      Allowed values

      If the service returns an enum value that is not available in the current SDK version, format will return Format.UNKNOWN_TO_SDK_VERSION. The raw value returned by the service is available from formatAsString().

      Returns:
      The format of the data. For more information about data formats for the Neptune Loader command, see Load Data Formats.

      Allowed values

      See Also:
    • s3BucketRegion

      public final S3BucketRegion s3BucketRegion()

      The Amazon region of the S3 bucket. This must match the Amazon Region of the DB cluster.

      If the service returns an enum value that is not available in the current SDK version, s3BucketRegion will return S3BucketRegion.UNKNOWN_TO_SDK_VERSION. The raw value returned by the service is available from s3BucketRegionAsString().

      Returns:
      The Amazon region of the S3 bucket. This must match the Amazon Region of the DB cluster.
      See Also:
    • s3BucketRegionAsString

      public final String s3BucketRegionAsString()

      The Amazon region of the S3 bucket. This must match the Amazon Region of the DB cluster.

      If the service returns an enum value that is not available in the current SDK version, s3BucketRegion will return S3BucketRegion.UNKNOWN_TO_SDK_VERSION. The raw value returned by the service is available from s3BucketRegionAsString().

      Returns:
      The Amazon region of the S3 bucket. This must match the Amazon Region of the DB cluster.
      See Also:
    • iamRoleArn

      public final String iamRoleArn()

      The Amazon Resource Name (ARN) for an IAM role to be assumed by the Neptune DB instance for access to the S3 bucket. The IAM role ARN provided here should be attached to the DB cluster (see Adding the IAM Role to an Amazon Neptune Cluster.

      Returns:
      The Amazon Resource Name (ARN) for an IAM role to be assumed by the Neptune DB instance for access to the S3 bucket. The IAM role ARN provided here should be attached to the DB cluster (see Adding the IAM Role to an Amazon Neptune Cluster.
    • mode

      public final Mode mode()

      The load job mode.

      Allowed values: RESUME, NEW, AUTO.

      Default value: AUTO.

      • RESUME   –   In RESUME mode, the loader looks for a previous load from this source, and if it finds one, resumes that load job. If no previous load job is found, the loader stops.

        The loader avoids reloading files that were successfully loaded in a previous job. It only tries to process failed files. If you dropped previously loaded data from your Neptune cluster, that data is not reloaded in this mode. If a previous load job loaded all files from the same source successfully, nothing is reloaded, and the loader returns success.

      • NEW   –   In NEW mode, the creates a new load request regardless of any previous loads. You can use this mode to reload all the data from a source after dropping previously loaded data from your Neptune cluster, or to load new data available at the same source.

      • AUTO   –   In AUTO mode, the loader looks for a previous load job from the same source, and if it finds one, resumes that job, just as in RESUME mode.

        If the loader doesn't find a previous load job from the same source, it loads all data from the source, just as in NEW mode.

      If the service returns an enum value that is not available in the current SDK version, mode will return Mode.UNKNOWN_TO_SDK_VERSION. The raw value returned by the service is available from modeAsString().

      Returns:
      The load job mode.

      Allowed values: RESUME, NEW, AUTO.

      Default value: AUTO.

      • RESUME   –   In RESUME mode, the loader looks for a previous load from this source, and if it finds one, resumes that load job. If no previous load job is found, the loader stops.

        The loader avoids reloading files that were successfully loaded in a previous job. It only tries to process failed files. If you dropped previously loaded data from your Neptune cluster, that data is not reloaded in this mode. If a previous load job loaded all files from the same source successfully, nothing is reloaded, and the loader returns success.

      • NEW   –   In NEW mode, the creates a new load request regardless of any previous loads. You can use this mode to reload all the data from a source after dropping previously loaded data from your Neptune cluster, or to load new data available at the same source.

      • AUTO   –   In AUTO mode, the loader looks for a previous load job from the same source, and if it finds one, resumes that job, just as in RESUME mode.

        If the loader doesn't find a previous load job from the same source, it loads all data from the source, just as in NEW mode.

      See Also:
    • modeAsString

      public final String modeAsString()

      The load job mode.

      Allowed values: RESUME, NEW, AUTO.

      Default value: AUTO.

      • RESUME   –   In RESUME mode, the loader looks for a previous load from this source, and if it finds one, resumes that load job. If no previous load job is found, the loader stops.

        The loader avoids reloading files that were successfully loaded in a previous job. It only tries to process failed files. If you dropped previously loaded data from your Neptune cluster, that data is not reloaded in this mode. If a previous load job loaded all files from the same source successfully, nothing is reloaded, and the loader returns success.

      • NEW   –   In NEW mode, the creates a new load request regardless of any previous loads. You can use this mode to reload all the data from a source after dropping previously loaded data from your Neptune cluster, or to load new data available at the same source.

      • AUTO   –   In AUTO mode, the loader looks for a previous load job from the same source, and if it finds one, resumes that job, just as in RESUME mode.

        If the loader doesn't find a previous load job from the same source, it loads all data from the source, just as in NEW mode.

      If the service returns an enum value that is not available in the current SDK version, mode will return Mode.UNKNOWN_TO_SDK_VERSION. The raw value returned by the service is available from modeAsString().

      Returns:
      The load job mode.

      Allowed values: RESUME, NEW, AUTO.

      Default value: AUTO.

      • RESUME   –   In RESUME mode, the loader looks for a previous load from this source, and if it finds one, resumes that load job. If no previous load job is found, the loader stops.

        The loader avoids reloading files that were successfully loaded in a previous job. It only tries to process failed files. If you dropped previously loaded data from your Neptune cluster, that data is not reloaded in this mode. If a previous load job loaded all files from the same source successfully, nothing is reloaded, and the loader returns success.

      • NEW   –   In NEW mode, the creates a new load request regardless of any previous loads. You can use this mode to reload all the data from a source after dropping previously loaded data from your Neptune cluster, or to load new data available at the same source.

      • AUTO   –   In AUTO mode, the loader looks for a previous load job from the same source, and if it finds one, resumes that job, just as in RESUME mode.

        If the loader doesn't find a previous load job from the same source, it loads all data from the source, just as in NEW mode.

      See Also:
    • failOnError

      public final Boolean failOnError()

      failOnError   –   A flag to toggle a complete stop on an error.

      Allowed values: "TRUE", "FALSE".

      Default value: "TRUE".

      When this parameter is set to "FALSE", the loader tries to load all the data in the location specified, skipping any entries with errors.

      When this parameter is set to "TRUE", the loader stops as soon as it encounters an error. Data loaded up to that point persists.

      Returns:
      failOnError   –   A flag to toggle a complete stop on an error.

      Allowed values: "TRUE", "FALSE".

      Default value: "TRUE".

      When this parameter is set to "FALSE", the loader tries to load all the data in the location specified, skipping any entries with errors.

      When this parameter is set to "TRUE", the loader stops as soon as it encounters an error. Data loaded up to that point persists.

    • parallelism

      public final Parallelism parallelism()

      The optional parallelism parameter can be set to reduce the number of threads used by the bulk load process.

      Allowed values:

      • LOW –   The number of threads used is the number of available vCPUs divided by 8.

      • MEDIUM –   The number of threads used is the number of available vCPUs divided by 2.

      • HIGH –   The number of threads used is the same as the number of available vCPUs.

      • OVERSUBSCRIBE –   The number of threads used is the number of available vCPUs multiplied by 2. If this value is used, the bulk loader takes up all available resources.

        This does not mean, however, that the OVERSUBSCRIBE setting results in 100% CPU utilization. Because the load operation is I/O bound, the highest CPU utilization to expect is in the 60% to 70% range.

      Default value: HIGH

      The parallelism setting can sometimes result in a deadlock between threads when loading openCypher data. When this happens, Neptune returns the LOAD_DATA_DEADLOCK error. You can generally fix the issue by setting parallelism to a lower setting and retrying the load command.

      If the service returns an enum value that is not available in the current SDK version, parallelism will return Parallelism.UNKNOWN_TO_SDK_VERSION. The raw value returned by the service is available from parallelismAsString().

      Returns:
      The optional parallelism parameter can be set to reduce the number of threads used by the bulk load process.

      Allowed values:

      • LOW –   The number of threads used is the number of available vCPUs divided by 8.

      • MEDIUM –   The number of threads used is the number of available vCPUs divided by 2.

      • HIGH –   The number of threads used is the same as the number of available vCPUs.

      • OVERSUBSCRIBE –   The number of threads used is the number of available vCPUs multiplied by 2. If this value is used, the bulk loader takes up all available resources.

        This does not mean, however, that the OVERSUBSCRIBE setting results in 100% CPU utilization. Because the load operation is I/O bound, the highest CPU utilization to expect is in the 60% to 70% range.

      Default value: HIGH

      The parallelism setting can sometimes result in a deadlock between threads when loading openCypher data. When this happens, Neptune returns the LOAD_DATA_DEADLOCK error. You can generally fix the issue by setting parallelism to a lower setting and retrying the load command.

      See Also:
    • parallelismAsString

      public final String parallelismAsString()

      The optional parallelism parameter can be set to reduce the number of threads used by the bulk load process.

      Allowed values:

      • LOW –   The number of threads used is the number of available vCPUs divided by 8.

      • MEDIUM –   The number of threads used is the number of available vCPUs divided by 2.

      • HIGH –   The number of threads used is the same as the number of available vCPUs.

      • OVERSUBSCRIBE –   The number of threads used is the number of available vCPUs multiplied by 2. If this value is used, the bulk loader takes up all available resources.

        This does not mean, however, that the OVERSUBSCRIBE setting results in 100% CPU utilization. Because the load operation is I/O bound, the highest CPU utilization to expect is in the 60% to 70% range.

      Default value: HIGH

      The parallelism setting can sometimes result in a deadlock between threads when loading openCypher data. When this happens, Neptune returns the LOAD_DATA_DEADLOCK error. You can generally fix the issue by setting parallelism to a lower setting and retrying the load command.

      If the service returns an enum value that is not available in the current SDK version, parallelism will return Parallelism.UNKNOWN_TO_SDK_VERSION. The raw value returned by the service is available from parallelismAsString().

      Returns:
      The optional parallelism parameter can be set to reduce the number of threads used by the bulk load process.

      Allowed values:

      • LOW –   The number of threads used is the number of available vCPUs divided by 8.

      • MEDIUM –   The number of threads used is the number of available vCPUs divided by 2.

      • HIGH –   The number of threads used is the same as the number of available vCPUs.

      • OVERSUBSCRIBE –   The number of threads used is the number of available vCPUs multiplied by 2. If this value is used, the bulk loader takes up all available resources.

        This does not mean, however, that the OVERSUBSCRIBE setting results in 100% CPU utilization. Because the load operation is I/O bound, the highest CPU utilization to expect is in the 60% to 70% range.

      Default value: HIGH

      The parallelism setting can sometimes result in a deadlock between threads when loading openCypher data. When this happens, Neptune returns the LOAD_DATA_DEADLOCK error. You can generally fix the issue by setting parallelism to a lower setting and retrying the load command.

      See Also:
    • hasParserConfiguration

      public final boolean hasParserConfiguration()
      For responses, this returns true if the service returned a value for the ParserConfiguration property. This DOES NOT check that the value is non-empty (for which, you should check the isEmpty() method on the property). This is useful because the SDK will never return a null collection or map, but you may need to differentiate between the service returning nothing (or null) and the service returning an empty collection or map. For requests, this returns true if a value for the property was specified in the request builder, and false if a value was not specified.
    • parserConfiguration

      public final Map<String,String> parserConfiguration()

      parserConfiguration   –   An optional object with additional parser configuration values. Each of the child parameters is also optional:

      • namedGraphUri   –   The default graph for all RDF formats when no graph is specified (for non-quads formats and NQUAD entries with no graph).

        The default is https://aws.amazon.com/neptune/vocab/v01/DefaultNamedGraph.

      • baseUri   –   The base URI for RDF/XML and Turtle formats.

        The default is https://aws.amazon.com/neptune/default.

      • allowEmptyStrings   –   Gremlin users need to be able to pass empty string values("") as node and edge properties when loading CSV data. If allowEmptyStrings is set to false (the default), such empty strings are treated as nulls and are not loaded.

        If allowEmptyStrings is set to true, the loader treats empty strings as valid property values and loads them accordingly.

      Attempts to modify the collection returned by this method will result in an UnsupportedOperationException.

      This method will never return null. If you would like to know whether the service returned this field (so that you can differentiate between null and empty), you can use the hasParserConfiguration() method.

      Returns:
      parserConfiguration   –   An optional object with additional parser configuration values. Each of the child parameters is also optional:

      • namedGraphUri   –   The default graph for all RDF formats when no graph is specified (for non-quads formats and NQUAD entries with no graph).

        The default is https://aws.amazon.com/neptune/vocab/v01/DefaultNamedGraph.

      • baseUri   –   The base URI for RDF/XML and Turtle formats.

        The default is https://aws.amazon.com/neptune/default.

      • allowEmptyStrings   –   Gremlin users need to be able to pass empty string values("") as node and edge properties when loading CSV data. If allowEmptyStrings is set to false (the default), such empty strings are treated as nulls and are not loaded.

        If allowEmptyStrings is set to true, the loader treats empty strings as valid property values and loads them accordingly.

    • updateSingleCardinalityProperties

      public final Boolean updateSingleCardinalityProperties()

      updateSingleCardinalityProperties is an optional parameter that controls how the bulk loader treats a new value for single-cardinality vertex or edge properties. This is not supported for loading openCypher data.

      Allowed values: "TRUE", "FALSE".

      Default value: "FALSE".

      By default, or when updateSingleCardinalityProperties is explicitly set to "FALSE", the loader treats a new value as an error, because it violates single cardinality.

      When updateSingleCardinalityProperties is set to "TRUE", on the other hand, the bulk loader replaces the existing value with the new one. If multiple edge or single-cardinality vertex property values are provided in the source file(s) being loaded, the final value at the end of the bulk load could be any one of those new values. The loader only guarantees that the existing value has been replaced by one of the new ones.

      Returns:
      updateSingleCardinalityProperties is an optional parameter that controls how the bulk loader treats a new value for single-cardinality vertex or edge properties. This is not supported for loading openCypher data.

      Allowed values: "TRUE", "FALSE".

      Default value: "FALSE".

      By default, or when updateSingleCardinalityProperties is explicitly set to "FALSE", the loader treats a new value as an error, because it violates single cardinality.

      When updateSingleCardinalityProperties is set to "TRUE", on the other hand, the bulk loader replaces the existing value with the new one. If multiple edge or single-cardinality vertex property values are provided in the source file(s) being loaded, the final value at the end of the bulk load could be any one of those new values. The loader only guarantees that the existing value has been replaced by one of the new ones.

    • queueRequest

      public final Boolean queueRequest()

      This is an optional flag parameter that indicates whether the load request can be queued up or not.

      You don't have to wait for one load job to complete before issuing the next one, because Neptune can queue up as many as 64 jobs at a time, provided that their queueRequest parameters are all set to "TRUE". The queue order of the jobs will be first-in-first-out (FIFO).

      If the queueRequest parameter is omitted or set to "FALSE", the load request will fail if another load job is already running.

      Allowed values: "TRUE", "FALSE".

      Default value: "FALSE".

      Returns:
      This is an optional flag parameter that indicates whether the load request can be queued up or not.

      You don't have to wait for one load job to complete before issuing the next one, because Neptune can queue up as many as 64 jobs at a time, provided that their queueRequest parameters are all set to "TRUE". The queue order of the jobs will be first-in-first-out (FIFO).

      If the queueRequest parameter is omitted or set to "FALSE", the load request will fail if another load job is already running.

      Allowed values: "TRUE", "FALSE".

      Default value: "FALSE".

    • hasDependencies

      public final boolean hasDependencies()
      For responses, this returns true if the service returned a value for the Dependencies property. This DOES NOT check that the value is non-empty (for which, you should check the isEmpty() method on the property). This is useful because the SDK will never return a null collection or map, but you may need to differentiate between the service returning nothing (or null) and the service returning an empty collection or map. For requests, this returns true if a value for the property was specified in the request builder, and false if a value was not specified.
    • dependencies

      public final List<String> dependencies()

      This is an optional parameter that can make a queued load request contingent on the successful completion of one or more previous jobs in the queue.

      Neptune can queue up as many as 64 load requests at a time, if their queueRequest parameters are set to "TRUE". The dependencies parameter lets you make execution of such a queued request dependent on the successful completion of one or more specified previous requests in the queue.

      For example, if load Job-A and Job-B are independent of each other, but load Job-C needs Job-A and Job-B to be finished before it begins, proceed as follows:

      1. Submit load-job-A and load-job-B one after another in any order, and save their load-ids.

      2. Submit load-job-C with the load-ids of the two jobs in its dependencies field:

      Because of the dependencies parameter, the bulk loader will not start Job-C until Job-A and Job-B have completed successfully. If either one of them fails, Job-C will not be executed, and its status will be set to LOAD_FAILED_BECAUSE_DEPENDENCY_NOT_SATISFIED.

      You can set up multiple levels of dependency in this way, so that the failure of one job will cause all requests that are directly or indirectly dependent on it to be cancelled.

      Attempts to modify the collection returned by this method will result in an UnsupportedOperationException.

      This method will never return null. If you would like to know whether the service returned this field (so that you can differentiate between null and empty), you can use the hasDependencies() method.

      Returns:
      This is an optional parameter that can make a queued load request contingent on the successful completion of one or more previous jobs in the queue.

      Neptune can queue up as many as 64 load requests at a time, if their queueRequest parameters are set to "TRUE". The dependencies parameter lets you make execution of such a queued request dependent on the successful completion of one or more specified previous requests in the queue.

      For example, if load Job-A and Job-B are independent of each other, but load Job-C needs Job-A and Job-B to be finished before it begins, proceed as follows:

      1. Submit load-job-A and load-job-B one after another in any order, and save their load-ids.

      2. Submit load-job-C with the load-ids of the two jobs in its dependencies field:

      Because of the dependencies parameter, the bulk loader will not start Job-C until Job-A and Job-B have completed successfully. If either one of them fails, Job-C will not be executed, and its status will be set to LOAD_FAILED_BECAUSE_DEPENDENCY_NOT_SATISFIED.

      You can set up multiple levels of dependency in this way, so that the failure of one job will cause all requests that are directly or indirectly dependent on it to be cancelled.

    • userProvidedEdgeIds

      public final Boolean userProvidedEdgeIds()

      This parameter is required only when loading openCypher data that contains relationship IDs. It must be included and set to True when openCypher relationship IDs are explicitly provided in the load data (recommended).

      When userProvidedEdgeIds is absent or set to True, an :ID column must be present in every relationship file in the load.

      When userProvidedEdgeIds is present and set to False, relationship files in the load must not contain an :ID column. Instead, the Neptune loader automatically generates an ID for each relationship.

      It's useful to provide relationship IDs explicitly so that the loader can resume loading after error in the CSV data have been fixed, without having to reload any relationships that have already been loaded. If relationship IDs have not been explicitly assigned, the loader cannot resume a failed load if any relationship file has had to be corrected, and must instead reload all the relationships.

      Returns:
      This parameter is required only when loading openCypher data that contains relationship IDs. It must be included and set to True when openCypher relationship IDs are explicitly provided in the load data (recommended).

      When userProvidedEdgeIds is absent or set to True, an :ID column must be present in every relationship file in the load.

      When userProvidedEdgeIds is present and set to False, relationship files in the load must not contain an :ID column. Instead, the Neptune loader automatically generates an ID for each relationship.

      It's useful to provide relationship IDs explicitly so that the loader can resume loading after error in the CSV data have been fixed, without having to reload any relationships that have already been loaded. If relationship IDs have not been explicitly assigned, the loader cannot resume a failed load if any relationship file has had to be corrected, and must instead reload all the relationships.

    • toBuilder

      public StartLoaderJobRequest.Builder toBuilder()
      Description copied from interface: ToCopyableBuilder
      Take this object and create a builder that contains all of the current property values of this object.
      Specified by:
      toBuilder in interface ToCopyableBuilder<StartLoaderJobRequest.Builder,StartLoaderJobRequest>
      Specified by:
      toBuilder in class NeptunedataRequest
      Returns:
      a builder for type T
    • builder

      public static StartLoaderJobRequest.Builder builder()
    • serializableBuilderClass

      public static Class<? extends StartLoaderJobRequest.Builder> serializableBuilderClass()
    • hashCode

      public final int hashCode()
      Overrides:
      hashCode in class AwsRequest
    • equals

      public final boolean equals(Object obj)
      Overrides:
      equals in class AwsRequest
    • equalsBySdkFields

      public final boolean equalsBySdkFields(Object obj)
      Description copied from interface: SdkPojo
      Indicates whether some other object is "equal to" this one by SDK fields. An SDK field is a modeled, non-inherited field in an SdkPojo class, and is generated based on a service model.

      If an SdkPojo class does not have any inherited fields, equalsBySdkFields and equals are essentially the same.

      Specified by:
      equalsBySdkFields in interface SdkPojo
      Parameters:
      obj - the object to be compared with
      Returns:
      true if the other object equals to this object by sdk fields, false otherwise.
    • toString

      public final String toString()
      Returns a string representation of this object. This is useful for testing and debugging. Sensitive data will be redacted from this string using a placeholder value.
      Overrides:
      toString in class Object
    • getValueForField

      public final <T> Optional<T> getValueForField(String fieldName, Class<T> clazz)
      Description copied from class: SdkRequest
      Used to retrieve the value of a field from any class that extends SdkRequest. The field name specified should match the member name from the corresponding service-2.json model specified in the codegen-resources folder for a given service. The class specifies what class to cast the returned value to. If the returned value is also a modeled class, the SdkRequest.getValueForField(String, Class) method will again be available.
      Overrides:
      getValueForField in class SdkRequest
      Parameters:
      fieldName - The name of the member to be retrieved.
      clazz - The class to cast the returned object to.
      Returns:
      Optional containing the casted return value
    • sdkFields

      public final List<SdkField<?>> sdkFields()
      Specified by:
      sdkFields in interface SdkPojo
      Returns:
      List of SdkField in this POJO. May be empty list but should never be null.
    • sdkFieldNameToField

      public final Map<String,SdkField<?>> sdkFieldNameToField()
      Specified by:
      sdkFieldNameToField in interface SdkPojo
      Returns:
      The mapping between the field name and its corresponding field.