From ff6ac190e2e877b340c19eef57488b02d7d1376a Mon Sep 17 00:00:00 2001 From: mattmcw Date: Tue, 26 Jul 2022 16:53:44 -0400 Subject: [PATCH] Updated readme --- README.md | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/README.md b/README.md index 1af63db..0d10472 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,12 @@ # NOTES ON OPTICAL PRINTER TECHNIQUE Reproduction on the guide written by Dennis Couzin. +Loses some of the charm of the photocopied original floating around the internet, but this reproduction is done for the sake of readability/searchability of the text. + +Tesseract does a majority of the heavy lifting, making about a 85% transcription with minor changes needed to spelling and slightly more effort formatting it into markdown for rendering. +Pre-processing using OpenCV and tuning tesseract for the typewritten font may produce even better text. + +Preserving alternate spellings not created in the OCR process. ### PDF Dependencies @@ -16,6 +22,7 @@ bash compile.sh * OpenCV 2 * Tesseract * PIL +* PyMuPDF ```bash cd extract