Application of Deep Learning and Optical Character Recognition in Digitizing Financial Statements

  • Nguyễn Quang Học
  • Đặng Thiên Vũ
  • Trần Thị Minh Hiền
  • Lê Hoành Sử
Keywords: AI, deep learning, optical character recognition, financial Statements, digitization.

Abstract

The digital revolution is fundamentally altering our interaction with data. Traditional methods like manual data entry for digitizing tables in financial statements are becoming obsolete, failing to meet the standards of cost efficiency and time effectiveness in reporting. To address this challenge, this paper proposes a method centered on leveraging PaddleOCR to automatically recognize tables within images extracted from financial reports. Our approach harnesses deep learning models and optical character recognition (OCR) technology embedded within this open-source tool. The process involves detecting tables, detecting and recognizing text, predicting table structures, and ultimately reconstructing them into HTML format and Excel files. Through experimentation and comparison with actual tables, our study achieves an average TEDS score of 95% for regular tables with full borders and 83% for borderless tables. These promising results underscore the tool's viability in digitizing documents containing tables, thereby streamlining data entry processes. Furthermore, this outcome marks a significant milestone toward the broader goal of complete digitization through robotic process automation (RPA).

điểm /   đánh giá
Published
2024-08-25
Section
ARTICLES