Character accuracy

Of all, a 100% accurate data capturing is what every user wants; it's a lot difficult than it sounds to achieve on low-quality images. If not provided with the recognized characters' accuracy details, the data-corrections team may have to spend more time than imagined. Our goal is to minimize that effort from the Data Entry or Quality Assurance team by proving the accuracy details alongside automating and giving access to the data cleaning.

Layout Detection

Tons of services that offer OCR detection, yet only 10s of those services preserve the layout. Often, these services which SEO-ed themselves for "image to excel" put each text line as a row (without column separation) in the excel or, much worse, just insert the image into the excel sheet. The high priced (3x ExtractTable's) data capture solutions partially preserve the layout, which comes with strict input restrictions like a high-quality non-skewed image with tables in the same location. However, the input files in the production environment come in all possible varieties, to which the traditional services are unhelpful.

ExtractTable offer Image to Excel & PDF to Excel conversions without the user to worry about the skewness, non-templated bordered or borderless table layout, one or multiple tables on a single page. We employ preprocessing on the input to give the best possible results than any other service.

Data Cleaning

As much as ExtractTable try to extract the best table structure from image to excel output, sometimes, the outlying cases like tables with tightly packed cells or low-quality images may result in merged rows or columns, or date formatting or decimal separator issues. And such scenarios cannot be neglected. With those in mind, we have released a built-in functionality, "MakeCorrections", in ExtractTable Python Library, to ease corrections on the output. The functionality helps to

  • ✔ Split Merged Rows
  • ✔ Split Merged Columns
  • ✔ Fix Decimal Format
  • ✔ Fix Date Format

