Automatically extract printed text, handwriting and other data from scanned documents

Reduce manual labor costs in handling faxes, prior authorizations, EOBs, referrals, prescription authorization requests, superbills and all.

Categorized as Guides

Use Amazon Textract Or EZHCRM

Healthcare is all about faxes, PDFs, scanned documents. Most healthcare businesses we work with do not even employ the age old OCR technology to automate any of these steps. Most companies extract data from scanned documents, such as PDF’s, tables and forms, through manual data entry. This is slow, expensive and prone to errors.

Some “slightly” advanced healthcare companies automate some parts of this through simple OCR software that requires manual configuration which needs to be updated each time the form changes for this to be a usable and viable solution.

Easily extract printed text, handwriting, and data from virtually any document

Yes, that’s the promise of EzHCRM (built on top of Amazon Textract) and we are all the better for it. In fact our entire revenue cycle management team is built around it and uses this within EzHCRM. We can help your team do the same as well.

What is this EzHCRM?

EzHCRM makes it easy to use Amazon Textract. Amazon Textract is a fully managed machine learning service. You don’t have to hire expensive data scientists, expensive IT engineers nor manage any servers. Your medical billing teams and contact center teams can get a lot done on their own. Amazon Textract automatically extracts printed text, handwriting, and other data from scanned documents. 

Sure, if you’re already used to OCR, you know that this is not new. However, Amazon Textract goes beyond simple optical character recognition (OCR) to identify, understand, and extract data from forms and tables. Tesseract doesn’t do that for you, does it?

What do I use EZHCRM for?

Simply put, you give EZHCRM your faxes, scanned documents and it will unlock all text data in those documents for you, ready for you to copy/paste wherever you want to – be it your EMR or your email.

Right now, you keep hiring more staff just to handle more faxes, prior authorizations, EOBs, referrals, prescription authorization requests, superbills because each one of these are achieved via manual processes. 

So, the more you grow, your headcount increases exponentially… to a point where it doesn’t actually make sense to grow.

To overcome these manual processes, EZHCRM uses machine learning to instantly read and process any type of document, accurately extracting printed text, handwriting, forms, tables and other data without the need for any manual effort or custom code.

With EZHCRM you can quickly automate manual document activities, enabling you to process millions of document pages in hours. Once the information is captured, you can take action on it within your business applications to initiate next steps.

Additionally, you can create smart search indexes, or add in human reviews with EZHCRM (with Amazon Augmented AI) to review nuanced or sensitive data.

How do I reduce fax handling costs with EZHCRM?

Use EZHCRM for receiving + reading your faxes and making sense of them. You already scan most faxes to your computers or manually key in data from faxes into your computer and then, to keep records, attach the scanned fax to your EMR patient record. 

Or, if you’re slightly advanced, you have a cloud based fax server (EZHCRM) and no longer deal with physical fax machines. This means that you’re one step ahead of the rest already. Each such cloud fax server gives you the fax as a pdf. Make no mistake – this pdf is not a real pdf. It’s actually an image inside the pdf, not a searchable pdf.

You just make sure that all your faxes are being fed to EZHCRM and it takes care of extracting all text, hand written notes from it.

This includes your medical clearance notes, your medical referrals, your handwritten medical referrals, medical records requests, prescription refill requests etc etc,

All of it.

How do I reduce costs for charge posting with EZHCRM?

Use EZHCRM for super bills. Most of your charge posting should already be happening directly from your practice management system. However, much like most of us in revenue cycle management, we tend to want to have control over charge posting and don’t want to automate things outside of the very simple cases of coding (ICD/CPT).

We want to double/triple check, scrub the claims, ensure that the chances of denials are next to zero, then only submit the claims to reduce the chances of reworking the claims downstream.

To reduce your costs, you can:

  • Use change healthcare claims APIs (available on AWS marketplace)
  • Use AWS Lambda
  • Use HIPAA secure EMR data dips
  • Use DynamoDB or RDS
  • Use AWS Textract for super bills 

Or, of course, you could simply use EzHCRM.

How do I reduce costs for payment posting with EZHCRM?

Payment posting process is broken, and you have to live with that. And you have to reconcile that.

Along with each remittance advice you are also getting an explanation of benefits (EOBs). This gives you a treasure trove of information – what got paid, what didn’t get paid, what needs more information, what got paid partially, what’s patient’s responsibility etc.

So, this is where EZHCRM can help you with this reconciliation. You can achieve this by using a slew of Amazon cloud services along with Amazon Textract to handle extracting text data from remittance advice documents, EOB documents… like so.

  • Use AWS Textract for EOBs and RAs
  • Use AWS Lambda
  • Use HIPAA secure EMR data dips
  • Use DynamoDB or RDS

Or, of course, you could simply use EzHCRM.

How about sensitive information? Are they stored in the cloud?

If you are a healthcare customer, EZHCRM (built on Amazon Textract) is HIPAA eligible so you can rest easy. EZHCRM (and Amazon Textract) is good to go with AWS System and Organization Controls (SOC), FedRAMP, HIPAA, ISO/IEC 27001:2013 for security management controls, ISO/IEC 27017:2015 for cloud-specific controls, ISO/IEC 27018:2014 for personal data protection, ISO/IEC 9001:2015 for quality management systems, and others. Read more on AWS website.

Can EZHCRM allow human intervention?


We integrate EZHCRM with Amazon A2I. EZHCRM + Amazon A2I works side by side with Amazon Textract and allows human intervention of the machine learning and predictive work that Amazon Textract is doing with your Medical insurance claims, intake forms, prescriptions and many other healthcare documents.

This way, the valuable information locked inside faxes can be extracted quickly and with precision. You can use Amazon A2I and Amazon Textract to process documents, extract the data and have a human review the critical data. All this is in-built into EZHCRM.

How about supporting multiple languages?

Yup – Amazon is world class in that (as you already know). Currently English, Spanish, Italian, Portuguese, French, German; handwriting for English only

Don’t see your language? Amazon keeps adding language support regularly and so does EZHCRM.

Automating Medical Prior Authorizations

Healthcare prior authorizations eats up a lot of our time. While prior authorizations are not needed for all specialties, many of our healthcare providers (orthopedic surgeons, ophthalmology practitioners, dental providers etc) do require a LOT of insurance prior authorizations to be processed. 

Our goal is to pass on cost savings to our healthcare customers. The more we leverage AWS and combine our healthcare business domain expertise, the more cost savings and healthcare workflow optimizations we discover.

In case you didn’t know, here’s how healthcare prior authorizations work in healthcare:

  1. A patient is seen for a “reason” or “chief complaint”. 
  2. The patient is diagnosed with “something” by the doctor (provider and doctor are used interchangeably). This diagnosis translates to an ICD code.
  3. Based on the diagnosis, a certain procedure (can be a surgery as well) needs to be performed by the doctor. This needs to be administered to the patient. This procedure translates to a CPT code.
  4. Since this procedure might not be the “usual” kind of procedure, the doctor wants to know that they would get paid for performing the procedure / CPT code.
  5. The doctor’s office (medical practice) needs to submit a prior authorization request to the insurance company of the patient (payer) using specific forms. This submission also includes documentation to establish the medical necessity of the CPT in question.
  6. Some insurance payers have a website to submit the prior authorization requests. Some payers only accept it via fax.
  7. Some procedures do not need a prior authorization. Some procedures always require a prior authorization. However, the decision tree of which procedure / CPT needs a prior authorization vs not also changes from time to time and is entirely up to the insurance company / payer. So, it is in the best interests of most medical practices to submit prior authorization requests for everything – to cover all bases.
  8. Of course, as one can imagine, this increases the load of processing PA requests on both the payer and the provider sides… and the cycle continues..
  9. The payer responds back with a decision within a couple of days – approved or denied. Sometimes, the payer (understably so, because they too have medical professionals on their own staff) requests further documentation from the doctor’s office.
  10. If it is denied, the doctor’s office tries to appeal the decision. If the payer wants more documentation, the doctor’s office gathers all the documentation and resends the fax or resubmits the prior authorization application on the website. 
  11. Then, they wait for the requisite number of days or call the payer to get an answer / decision.
  12. Based on the prior authorization decision of the payer, the next steps are determined. If the prior authorization was denied, then the doctor has a choice to recommend another procedure (or surgery). The doctor might feel that the recommended procedure is the only possible way to address the “most responsible diagnosis” and might still want to proceed with their recommended procedure. In this case, the patient has to decide whether they want to pay out of pocket for this procedure or not.
  13. Sometimes, based on the delays involved in processing prior authorization requests, the medical office might need to reschedule patient appointments as well – in the best interests of the patient and the provider office.
  14. This is a laborious process – end to end. On top of this, a member (or members) of the medical billing / revenue cycle management team has to check the fax machine or the website for the payer’s decision. More often than not, this arrives via a fax.
  15. If there is an approval from the payer side, this usually includes the start and end dates of the prior authorization validity. It also includes a prior authorization number to be used in the associated claim (once the CPT is submitted along with the ICDs in a claim).
  16. So, the revenue cycle management team has to scan the fax (or the website PDF), attach it to the patient visit record in their EMR/EPM. They also have to add the prior authorization note to the patient visit record so that the medical coders and medical billing staff can use it in their claim submission (and later, if needed, with the payer reps to buttress their claim).

As you can see, this is labor intensive and therefore increases practice management costs significantly.

How can you save money on medical prior authorization requests?

If you are a provider, you cannot do anything on the payer side of things, of course. 

You can, however, save time and money you spend on filling/refilling out prior authorizations request forms.

You can also save time and money you spend on monitoring the fax machine, updating your EMR with prior authorization details.

If you spend 15 mins per prior authorization form and process 20 prior authorizations per day – this means that you are spending 15*20 = 300 mins or 5 hrs per day processing prior authorizations alone.

This is where you can save time and therefore money.

Take any example of a prior authorization form. You know very well that you have to open up your EMR or EPM in one browser window, manually copy / paste each field into this prior authorization form (even if you are doing this on the payer’s web portal). Then you have to save this form and fax it (or upload to the payer’s web portal).

That alone takes a good 10-15 mins to complete, per prior authorization request.

Here’s an example of an approved prior authorization form

Sometimes authorization is not required and you get a response back like below

The fax cover sheets are almost always the same.

The format of denials are the same

The format of approvals are the same.

Unless you are using an electronic prior authorization software, you are doing all these things manually.. And wasting time + money.

How EZHCRM helps you with process automation

Just like MANY organizations are doing – you could be using robotic process automation (RPA) to automate workflow, back-office processes that are labor-intensive. RPA will allow you to cut down on the labor intensive process of filling out the forms.

RPA is just a software bot.. It takes the repetitive tasks that your practice management staff does and automates those things. Simple.

This way, your practice management HUMANS can oversee the things that software does, ensure quality outcomes and increase their own productivity immensely. This way, you leverage human capital for their “human touch” and not to do “donkey work”.

In EzHCRM, we use AWS Step Functions. AWS Step Functions is a serverless function orchestrator and workflow automation tool. It’s pretty amazing what AWS Step functions can orchestrate and automate for you.

The other thing you are going to need is Amazon Textract.

Combine the power of AWS Step functions and Amazon Textract – and you can run your practice on steroids. Keep the same number of staff and see your patient volume grow. Repurpose your staff to more patient facing functions.. Therefore improving your practice reputation even more.

If you are using EzHCRM, you wouldn’t have to fill out any of the prior authorization forms at all. You would simply select the patient, and choose the form you want to fax.

If you were running an eligibility check beforehand, you wouldn’t have to do anything but use a form like this.

So, all you have to do with Amazon Textract is to attach it to an AWS S3, let AWS Lambda kick off the extraction of data using Amazon Textract, then notify you via Amazon SNS about the success or failure of the job.  Or you can simply use EzHCRM.

Here’s a wonderful example of this solution with invoices (instead of prior authorization request forms).

Just like in the blog above, the steps are (slightly extended):

  1. Faxes received are converted into PDFs (almost every fax server can do that these days) and uploaded to an AWS S3 bucket.
  2. If you are unfortunately unable to get faxes and are still receiving PA responses via postal mail, those are also scanned and uploaded to the same AWS S3 bucket.
  3. If you are processing prior authorizations via the payer’s web portal, those responses are also available to be saved to your desktop as a PDF. These are also uploaded to the same AWS S3 bucket.
  4. So, basically, ALL prior authorization payer responses are scanned and loaded into an Amazon Simple Storage Service (S3) bucket.
  5. As soon as a PA response PDF is uploaded, an Amazon S3 trigger kicks off an AWS Lambda function.
  6. The only thing that this AWS Lambda function does is to start an asynchronous Amazon Textract job to analyze the text and data of the scanned prior authorization request.
  7. Once the Amazon Textract job finishes, it publishes a completion notification message with a status of “SUCCEEDED” or “FAILED” to an Amazon Simple Notification Service (SNS) topic.
  8. In turn, Amazon SNS sends this SUCCEEDED / FAILED message to an Amazon Simple Queue Service (SQS) queue that is subscribed to the SNS topic.
  9. This message in the SQS queue, in turn, triggers another AWS Lambda function.
  10. This is not the same AWS Lambda function as in the prior step. This particular AWS Lambda function initiates an AWS Step Functions state machine. The AWS Step function state machine then processes the results of the Amazon Textract job.
  11. If the Amazon Textract job has completed successfully, the AWS Lambda function will save the document analysis into an Amazon S3 bucket.
  12. Once the above mentioned AWS Lambda function has loaded the document analysis to the designated Amazon S3 bucket, this triggers another Lambda function to process this.
  13. This AWS Lambda function then goes ahead and retrieves the text and data of the scanned prior authorization response file to find the response information (refer to the images above). 
  14. Keep in mind that each payer has a different response format for their prior authorization request decisions.
  15. This AWS Lambda function then writes the response information to an Amazon DynamoDB table. It also writes the status indicating whether the prior authorization request has been processed or not.
  16. If the prior authorization request processing has been successful, then the PA document needs to be archived. So, we attach another AWS Lambda function to the Amazon DynamoDB table insert operation. If the Amazon DynamoDB item contains the approval /  rejection information, this AWS Lambda function is invoked.
  17. This other Lambda function archives the processed payer response to your prior authorization request into another AWS S3 bucket and gets it out of the way.
  18. If by chance the payer response was NOT positive, you still have a chance to appeal it (as you should anyway). In this case, the Amazon DynamoDB row does not contain the APPROVED record. For any record that doesn’t contain APPROVED – a message is published to an Amazon SNS topic requesting that the prior authorization response be reviewed by your staff.
  19. Once your staff reviews the response, the next step is to send an appeal fax / document and wait for the response. Then, the whole process as mentioned above starts over again.

Automating physician referral faxes

If you’ve never been a referral coordinator, go ahead and thank them for what they do each day. It’s not that they are performing brain surgery each day but the laborious, duplication of work they have to endure is just mind boggling. There’s no real reason for it and they don’t enjoy the grunt work they do either. They’d rather be doing the real work of patient care coordination.

Our clinical contact center teams have done many surveys that show us enough data to prove the same as well. Referral coordinators would rather spend time in coordination of patient care rather than on duplicate data entry.

Let’s take a quick look at a day in the life of a referral coordinator:

  1. Referrals arrive from various sources – faxes from EMRs, patients walk in with referrals written on referral pads, patients call in with referrals, referring physicians calls in their patients, referrals from third party referral websites (eg P2P, par8o, referwell, referralmd, Werq etc), referrals send direct from EMR to EMR (via DirectTrust) etc.
  2. The referral coordinator has to enter all the patients AND the referral sender information into a single place. This could be a notepad, a spreadsheet or a crm or in their heads.
  3. They have to enter all this information into an EMR (their EMR).
  4. Then they have to call each patient to make appointments for those patients. Only about 25% of patients will pick up the phone so the other 75% of patients will have to be called back the next day.. While new referrals will come in and add to the workload even more.
  5. When the referral coordinator connects with the patient and is able to get the patient an appointment, they have to update the referral sender that their patient has an appointment. This means either faxing back the referral sender or updating the third party referral websites,… individually..
  6. At this point, it is ideal and in the best interest of the patient for the referral coordinator to have a copy of the patient’s CCDA. I.e patient’s medical history, social history, medication history, surgical history etc. This way, when the patient comes in, your technician doesn’t have to waste 20 mins in patient intake/ working up the patient. Unfortunately, this means reaching out to each pcp office via fax, getting the patient to sign the consent form to release records, getting the CCDA faxed over (since the two EMRs don’t talk to each other) and scanning the CCDA in.. AND entering the history manually.. which is what you need to do, to save technicians time.
  7. This is also the time when the insurance eligibility and prior authorizations aspects need to all be in place… else you will end up with a denial. If the eligibility doesn’t work, you need to call the patient and reschedule the appointment until the insurance challenge is taken care of. If the patient’s visit requires prior authorization, you need to get prior authorization done before the patient visit as well… else reschedule the appointment.
  8. Then, the fun part of “reminding the patient” comes up, and of course, only 25% of patients really pick up the call. So, you’re only ever sure of 25% of your daily schedule. Then, the patient can become a noshow patient or cancel their appointment. Either way, you will have to update the referral sender. Why? Because they have made future appointments and care plans for the patient based on this visit with your provider. They need to make changes and adjustments based on the outcomes of your appointment as well. Told you it’s fun. So, you have to go to each referral website or send faxes to each referral sender etc and update them on the referral they sent you.
  9. Then the patient does actually show up. Now, you have to chase the doctor to finish the visit note. Of course you have some help here from the billing department as they need the visit notes to submit the claims as well.. but so,e billing departments submit claims without the visit notes being finalized… in those cases, you’re on your own! Once the visit notes are finalized , you need to fax them over to the referral sender immediately because they need it to close the referral loop on their end. They depend on these visit notes to prove to their payers that they did send the patients for the necessary consults. The payers need to prove to NCQA that their members (aka patients) are getting the necessary preventive and acute care they need and deserve to maintain certain levels of population health. 
  10. Then you have to go close this referral loop via fax or on the third party referral websites to ensure that your ratings remain “high” so that you keep getting more referrals and your referring physician partners stay happy.

Quite a painful process, isn’t it? See all the idiotic data entry one needs to do here?

Handling referral faxes with EzHCRM

EzHCRM handles all of this very easily for you. If you are not using EzHCRM, you can do the same with Amazon Textract.

Textract console itself can help you do this manually and you don’t need IT folks to do it.

Simply upload the referral fax you just received.

It will take a little bit of time and then you will be rewarded with the output – you will find the raw text, the forms tab, the tables and you will even have a “Human review” tab as well (see below).

Click on the “Download Results” button to download the results and then you can expand that zip file to get the following files

  • apiResponse.json
  • keyValues.csv
  • rawText.txt
  • Table-1.csv

You will, in probability, not care for the apiResponse.json as that’s pretty geeky. But you can use the other files to simply copy/paste information from those files into your EMR. A LOT of time saved from manually typing things and (potentially) making errors!

E.g. in the sample above, we get the following in the key values and the csv files as wellkey value

  • From Company: Best Medical Care, PC. (Hillside) 
  • From Facility: Best Medical Care, PC. (Hillside) 
  • Subject: Patient Document 
  • From Name: Alam, Noreen 
  • Fax Number: 718-206-2022 
  • Number of Page(s): 2

Plus who the referral was sent to as well. This alone saves us about 15-20 mins per referral document. That’s when we don’t receive full CCDAs as a fax. Imagine receiving full patient CCDAs via fax as well. How much time it takes us to transfer that data and how much time you can really save with Amazon Textract !

Robotic Process Automation of referrals using AWS Step Functions and Amazon Textract

This one is a bit more complicated and requires IT help. The general workflow comprises two steps.

  1. Using robotic process automation to login to the third party referral website with your own valid login credentials and downloading the referral(s) you received.
  2. Using Amazon Textract to extract information from the referral PDF you received.

For the RPA part, see this video below. We are not going into the details of how you can get this done yet (suffice to say that you can achieve this with AWS Step functions). Basically, the RPA logs in to the website, finds new referrals you received, clicks on each referral, views the details of the referral, fetches the details, gets the attachments and saves them to your desktop. Pretty much what you do as well, anyways. So, it is just being a “robot” version of you. Making your life easier.

Robotic Process Automation of referrals using EzHCRM

Once the referral documents have been downloaded, as you can expect, you can use EzHCRM to extract the information as usual. This example referral contains a lot of patient history information, so it is a bit larger but everything is still processed perfectly by EzHCRM

The resulting zip file contains all the information you need to copy/paste/import

Should you want to do this automatically without all the geeky headaches, reach out to us or use EzHCRM.

Automating Physician Referrals with EzHCRM

If you are with us thus far, why not take one step further and get even more grunt work done by the workhorses from Amazon? We automate all these with EzHCRM.

We use Amazon Texract combined with Amazon Comprehend Medical and Amazon Translate.

If you want to attempt this yourself, search for an amazing example created and hosted on Github aptly named “Textractor“. Download it, unzip it and simply run it (follow the README file there).

This example, although created for a proof of concept, can generate output in different formats including raw JSON, JSON for each page in the document, text, text in reading order, key/values exported as CSV, tables exported as CSV.

What’s amazing is that it can also generate insights or translate detected text by using Amazon Comprehend, Amazon Comprehend Medical and Amazon Translate. It takes advantage of Textract response parser library to easily consume JSON returned by Amazon Textract. Typically, parsing the JSON response from Amazon Textract is painful (it is a ginormous JSON file that you get as a response for each fax that you feed Amazon Textract).

Try it out. The results are AMAZING.