blockType

The type of text item that's recognized. In operations for text detection, the following types are returned:

  • PAGE - Contains a list of the LINE Block objects that are detected on a document page.

  • WORD - A word detected on a document page. A word is one or more ISO basic Latin script characters that aren't separated by spaces.

  • LINE - A string of tab-delimited, contiguous words that are detected on a document page.

In text analysis operations, the following types are returned:

  • PAGE - Contains a list of child Block objects that are detected on a document page.

  • KEY_VALUE_SET - Stores the KEY and VALUE Block objects for linked text that's detected on a document page. Use the EntityType field to determine if a KEY_VALUE_SET object is a KEY Block object or a VALUE Block object.

  • WORD - A word that's detected on a document page. A word is one or more ISO basic Latin script characters that aren't separated by spaces.

  • LINE - A string of tab-delimited, contiguous words that are detected on a document page.

  • TABLE - A table that's detected on a document page. A table is grid-based information with two or more rows or columns, with a cell span of one row and one column each.

  • TABLE_TITLE - The title of a table. A title is typically a line of text above or below a table, or embedded as the first row of a table.

  • TABLE_FOOTER - The footer associated with a table. A footer is typically a line or lines of text below a table or embedded as the last row of a table.

  • CELL - A cell within a detected table. The cell is the parent of the block that contains the text in the cell.

  • MERGED_CELL - A cell in a table whose content spans more than one row or column. The Relationships array for this cell contain data from individual cells.

  • SELECTION_ELEMENT - A selection element such as an option button (radio button) or a check box that's detected on a document page. Use the value of SelectionStatus to determine the status of the selection element.

  • SIGNATURE - The location and confidence score of a signature detected on a document page. Can be returned as part of a Key-Value pair or a detected cell.

  • QUERY - A question asked during the call of AnalyzeDocument. Contains an alias and an ID that attaches it to its answer.

  • QUERY_RESULT - A response to a question asked during the call of analyze document. Comes with an alias and ID for ease of locating in a response. Also contains location and confidence score.

The following BlockTypes are only returned for Amazon Textract Layout.

  • LAYOUT_TITLE - The main title of the document.

  • LAYOUT_HEADER - Text located in the top margin of the document.

  • LAYOUT_FOOTER - Text located in the bottom margin of the document.

  • LAYOUT_SECTION_HEADER - The titles of sections within a document.

  • LAYOUT_PAGE_NUMBER - The page number of the documents.

  • LAYOUT_LIST - Any information grouped together in list form.

  • LAYOUT_FIGURE - Indicates the location of an image in a document.

  • LAYOUT_TABLE - Indicates the location of a table in the document.

  • LAYOUT_KEY_VALUE - Indicates the location of form key-values in a document.

  • LAYOUT_TEXT - Text that is present typically as a part of paragraphs in documents.