You Can contact us by filling the form below:

Thank you, your message was sent successfully!

Please, wait our respond.

Sorry, the error has occured.


You Can contact us by filling the form below:

Thank you, your message was sent successfully!

Please, wait our respond.

Sorry, the error has occured.


PDF parsing utilities

ABOUT


Portal recognition utility is a part of a big solution. In general, the solution consists of a Crawler, OCR utility and a Website. The crawler scrapes data from a website on regular basis, saves the scraped data into the Crawler database. OCR utility processes the data available in the Crawler database and outputs it into text format. Then text data is uploaded onto another Website database. From this website it is possible to search for a key word/sets of words and get output with all text documents related to the search terms.

 

 

PROPERTIES


Number of files: ~400 000
Files Size: ~ 700 Gb
OCR data size: ~ 50Gb
Number of databases: 2
Number of tables: 20
Number of stored procedures: 34
Number of user-defined functions: 2

FEATURES


Using Tesseract OCR for PDF scans
Full-Text Search in Azure SQL Database

 

EFFORT


100 man-hours

 

Services


Web Crawling
Data Collection
OCR conversion
XML feeds
Stored procedures
Data export
Data import
Jobs
Data classification
Reporting solutions
Database Backup
Database Replication
Data Merge
Database Maintenance

Tools and Technologies


Contact us
Feel free to contact us.
Our work hours:
Monday-Friday 8am - 8pm

Thank you, your message was sent successfully!

Please, wait our respond.


Sorry, the error has occured.


Our location:

Megapolis Office Center, Office 607,
Moskovskiy av. 179-B Kharkiv, 61068, Ukraine