AWS Textract Teardown – Pros and Cons of using Amazon’s Textract in 2021

Source Node: 834151


Pros and Cons of using AWS Textract

Pros:

Easy Setup with AWS Services: Setting up Textract with another AWS service is an easy task compared to other providers. For example, storing extracted document information with Amazon DynamoDB or S3 can be done by configuring an add-on.

Secure: Amazon Textract conforms to the AWS shared responsibility model, which includes regulations and guidelines for data protection. AWS is responsible for protecting the global infrastructure that runs all the AWS services; therefore we need not worry about our data being leaked or used by any others.

Cons:

Inability to Extract Custom Fields: There could be multiple data fields in a given invoice, say Invoice ID, Due Date, Transaction Date etc. These fields are something that are common in most invoices. But if we want to extract a custom field from an invoice, say, GST number or bank information, Textract does a poor job.

Integrations with upstream and downstream providers: Textract doesn’t allow you to integrate with different providers easily, say, for example, we’ll have to build an RPA pipeline with a third-party service; it would be difficult to find appropriate plugins that suit Textract.

Ability to define table headers: For table extraction tasks, textract doesn’t allow you to define table headers. Therefore, it would be not easy to search or find a particular column or a table in a given document.

No Fraud Checks: Modern OCRs are now able to find if a given document is original or fake by validating dates and finding pixelated regions. AWS Textract doesn’t come with this, its only job is to pick all the text from an uploaded document.

No Vertical Text Extraction: In some of the documents, invoice numbers or addresses can be found in a vertical alignment. At present, AWS only supports horizontal text extraction with a slight in-plane rotation.

Language Limit: Amazon Textract supports English, Spanish, German, French, Italian, and Portuguese text detection. Amazon Textract will not return the language detected in its output.

Everything’s Cloud: Any document processed with Textract goes into the cloud, only supporting a few regions. More information here. However, some companies might not be interested in taking their documents to the cloud for reasons like confidentiality or legal requirements. Still, unfortunately, AWS Textract does not support any on-premise deployment for document processing.

Retraining: If our accuracy is low on information extraction tasks for a set of documents, Textract doesn’t allow us to re-train them. To resolve this, we’ll have to again invest in a human review workflow, where an operator has to manually verify and annotate wrongly extracted values, which is again time consuming.

Conclusion

We hope this review of AWS Textract has been useful as you consider different solutions for data extraction/text recognition from your documents. We’ll keep updating this post periodically to cover latest changes.

Please add your thoughts and questions about using Amazon’s Textract solution in the comments section.

Source: Hero image from AWS website.

Start using Nanonets for Automation

Try out the model or request a demo today!

TRY NOW

Source: https://nanonets.com/blog/aws-textract-teardown-pros-cons-review/

Time Stamp:

More from AI & Machine Learning Blog