diff --git a/README.md b/README.md index 1af63db..0d10472 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,12 @@ # NOTES ON OPTICAL PRINTER TECHNIQUE Reproduction on the guide written by Dennis Couzin. +Loses some of the charm of the photocopied original floating around the internet, but this reproduction is done for the sake of readability/searchability of the text. + +Tesseract does a majority of the heavy lifting, making about a 85% transcription with minor changes needed to spelling and slightly more effort formatting it into markdown for rendering. +Pre-processing using OpenCV and tuning tesseract for the typewritten font may produce even better text. + +Preserving alternate spellings not created in the OCR process. ### PDF Dependencies @@ -16,6 +22,7 @@ bash compile.sh * OpenCV 2 * Tesseract * PIL +* PyMuPDF ```bash cd extract