OCR Software for scanned documents
Text recognition, document indexing and data capture technology integrated with ShakeSpeare® Software
Very often organizations decide to digitize their archives and scan large amounts of documents. Nowadays the standard is that a specialized company digitalizes the archives of an organization and ensure digital archiving of the documents and data with approval to destroy the original physical documentation.
But what happens after the digital archive stage was achieved for an organization and what more is there in the OCR technology field that can help users and solutions search through vas amounts of data, process this data and manage it in an efficient way.
ShakeSpeare has from its very beginnings been in deep connection with OCR technology as scanning documents into the ShakeSpeare® DMS and use of the data from these documents in the M Engine have been used as standard from the first decade of the 21st century.
In our advancements and the market requirements there have been many new developments in the OCR and data capture technology, some are already ready for industrial use and many more new technical applications allow for advanced machine learning, AI text recognition, reading legal documents and through the workflow engine automating document generation to reply to these documents.
We collaborate with 2 data capture / document processing companies in various markets and integrate with their technology or processing centers as well as hold partnership agreements to get access to latest updates on technology advancements and further developments.
ShakeSpeare is often integrated with a standalone ABBYY solution, in 95% of cases we integrate with an ABBYY Flexicapture solution (generations 10 and above) and ABBYY Fine Reader Server (generations from 12 and above). In Germany and DACH we are integrated with the invoice processor company GINI, which is specialized for reading out data from invoices of all types present on the German market. The technology provider GINI has its own developed machine learning and data Centre in Munich where they process more and more invoices in Germany and surrounding countries. The quality of the read out information is amazing and allows for high automation grades for invoice processing.
If you would like to know more about the use-cases for text recognition (OCR) and data capture and processing by ShakeSpeare and integrations to OCR/Data capture providers, contact us, we will be happy to advise and present the technical possibilities for your use-case.
ShakeSpeare® in Cloud is available from 149€ / Month and includes 10GB storage space as well as licenses for 10 users.
If you want to read more about business process workflows and workflow automation software in general and various use-cases, please click here
Text recognition is around since the first scanners and companies like ABBYY have started developing the technology already in the 1980s. The premise of extracting data from printed and physical documents is not new and the value added by automatic processing to avoid retyping data, indexing documents for searching and automatic data extraction and processing are immense. Hundreds of thousands of companies nowadays benefit from OCR technology, either by searching archives more efficiently, storing data so that it may be found or just saving time by copy/pasting the data from scanned documents. An OCR (Optical Character Recognition) solution or module can be found in most software and scanning solutions. There are of course disadvantages and challenges in using the inappropriate technology for your use-case may result in disappointment and much higher costs than initially anticipated.
There are several key points to be aware of when planning to implement OCR and data capture technology (with or without machine learning or an AI solution as often propagated in the last 2 years). Also for us as an expert in the field of document automation, e-archiving, processing and data capture there are some critical points we need to understand well before undertaking an OCR/Data capture project with you:
- What kind of documents do you want to process – are they one-page documents, longer, handwritten, partially handwritten or are they only prints, maybe there are even already in pdf form and were created electronically but are protected against modifications?
- Where do these documents come from – can you control the scan quality (for best results a 300dpi b&w scans are preferred)
- Can you control file-sizes? Large file sizes (usually made by phone cameras) tend to be a huge load on the image processing system as OCR does not think in letters but rather in pixels
- Do you need to digitalize images as well or only texts (impacts on technology required)
- Required processing power and infrastructure that you will need for your documents depends heavily on quantity of documents you need processed, how fast the processing needs to take place (can it be in seconds, minutes, hours or days) and will also impact the technology cost
- What document loads do I expect to have, how frequent – many at once or steady flow
- Do I only want to scan & index the files and use the indexed files to search in my archives or do I want to extract information from documents and also appropriately label them on category, classification, etc.
- Do I want to do anything further with the extracted data – i.e. Process the data in the ShakeSpeare Workflow, generate new documents, forward data or make decisions based on the data.
- What is my operational structure – where machine fails, a human will be asked for inputs and validation. Who will do it, how frequent and do they know what they are doing. Remember, machines learn from human inputs to correct their mistakes and improve their confidence.
- Which languages are the documents in – are they largely used languages, the more widespread the language (and the alphabet/signs) is, the higher the chance that OCR and data capture will deliver amazing results.
OCR Software has progressed heavily and in the year 2020 we have seen great advances in machine learning and AI processing of documents also in the legal and healthcare field, which we deem some of the most sensitive fields where an AI can be deployed to learn on document extraction.
If you are looking to process your documents automatically or even have document processing as part of a larger organizational process which can make decisions or create documents based on extracted information, contact us, we have had several successful cases with OCR and data capture in the legal field (mostly with structured documents, such as fines, etc.) as well as financials.
Quick search in digital documents
With a scanner and the processing of the data by our professional OCR solution, the physical documents are digitized and can easily be found via the ShakeSpeare® Smart Search.
Digital document archive
With a searchable digital archive, the documents are stored securely and are retrievable at any time. The documents are provided in compressed PDF format that saves storage space and can be further processed digitally or physically.
Availability of information in digital form for automatic or manual processing
With the OCR solution, specific information such as addresses, deadlines, tables, etc. can be read out by the software and other data tables or software can be transferred. Various applications and data tables can thus be completed without any human intervention and without errors.
Software for the automatic readout and data extraction from documents