board of trustees annual report 1880

Contributors

Multi-Modal Approach for Improved Optical Character Recognition

Archival holdings have backlogs of information trapped in handwritten papers or born-analog typed documents such as those produced on a typewriter or printed using a press. Access is thus limited to physical viewing or by means of transcription into a digital format. Traditional Optical Character Recognition (OCR) tools for this purpose often perform with low accuracy, particularly when documents have complex layouts, varied fonts or handwriting, or use low-quality scans. These tools typically rely on pixel-level text extraction and function using rule-based algorithms and pattern-matching techniques that are inflexible and lack context.

In partnership with the Special Collections Research Center and University Archives, CAAI is developing a tool that utilizes a multi-modal approach to combine image analysis with natural language processing (NLP) to interpret the text alongside contents for greatly improved accuracy. This method is particularly valuable for handwritten, tabular data, such as ledgers, that suffer the greatest error with traditional OCR. A multi-modal approach will allow for greater identification of relationships between text blocks, lines, and drawings via computer vision to handle information hierarchies while combining the semantic understanding from Large Language Models (LLMs) to give context for improved accuracy.

This effort is in progress.