Introduction

Tetxtra Insights is an evolution of the previous version (Textra). We aim to build a tool for financial summary generation with a goal to keep inference with local and limited computing resources.

For an easy comprehension of our tool here is a infernce pipeline :

Inference_Pipeline

1.Input

The process begins with one or multiple invoices (Images,Pdf files), which can be either client invoices or payment invoices.

2.Enhancement

The first step is enhancing quality of the invoices. with python libraries like OpenCV, PILOW,… Preprocessing Images (For Scanned or Photographed Invoices) by :

  • Convert images to grayscale for simpler processing.

  • Denoise the image to remove background noise.

  • Binarize the image (convert to black and white) for better OCR (Optical Character Recognition) results.

Hint

  • You can find more details about this step in Tetxra documentation. here

3.Information Extraction

Extracts key fields from invoices for further analysis like :
  • Total amount (Tax included)

  • Total amount (Tax excluded)

  • VAT rate

  • Date of invoice

  • Invoice number

  • Supplier Contacts (email, phone, address …)

4.Categorization

A critical step in the invoice processing pipeline, where extracted information is analyzed and assigned to the correct accounting categories based on the Moroccan General Chart of Accounts. This ensures compliance with accounting standards and simplifies reporting.

In general three types of accounts are used :
  • Debit Account

  • Credit Account

  • VAT Account

5.Reporting to the journal book

What is journal book ?

A document that chronologically records all the financial transactions of a company during a given fiscal year.

It allows:

  • Recording all the company’s accounting transactions chronologically, both on the credit and debit sides.

  • Organizing the entries for the current fiscal year based on their nature (purchases, sales, treasury, etc.).

Journal_Book