Configuring infrastructure for parallel document processing

Currently, there are two approaches to processing several documents at a time. You can use one of the methods below to configure load balancing and scaling for your OCR Container.

Static load method

This method is suitable for stable workloads with reliable request failure handling. The helm chart deployment example lets you specify the desired number of pods. Load balancing is managed by Nginx-ingress, which responds with a 503 error if a pod is busy, requiring a retry. To determine the optimal pod count, you need to experimentally measure the processing time for a single document. It's important to note that this method does not include automatic pod scaling.  

For example, let's consider 10,000 single or double-page documents being processed per day, resulting in a maximum of 20,000 pages. After conducting measurements, you obtained the following result: on average, one page takes 6 seconds to be processed. Considering that there are 86,400 seconds in a day, a single container can then process 14,400 pages (which is the number obtained when you divide the total number of seconds in a day by the time it takes to process a single page: 86,400/6). Therefore, to handle this workload, two container instances (20,000/14,400 pages) will be required.

After you have determined the optimal number of pods, you need to change the number of replicas in the file that you are going to use for deploying the container.

Dynamic load method

This method requires a queue system like RabbitMQ, RedisStream, or a similar choice. In this approach, a sidecar container handles incoming requests and queues them. It retrieves requests from the queue and forwards them to the main container when it's available. Additionally, a custom metric for the queue needs to be created, and autoscaling is configured using the Horizontal Pod Autoscaler. For detailed instructions, refer to the Kubernetes documentation: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/.

2/19/2024 10:23:36 AM

Usage of Cookies. In order to optimize the website functionality and improve your online experience ABBYY uses cookies. You agree to the usage of cookies when you continue using this site. Further details can be found in our Privacy Notice.