Reproduction of Dennis Couzin's 1983 work on optical printing. Derived from a scan in a PDF.

Go to file

Matt McWilliams a8d65a9784 All remaining text added prior to an editorial pass, graphics re-created and index updated.		2022-08-02 16:15:46 -04:00
extract	Add extraction scripts and initial work on markdown	2022-07-26 13:38:38 -04:00
html	All remaining text added prior to an editorial pass, graphics re-created and index updated.	2022-08-02 16:15:46 -04:00
ocr	Add extraction scripts and initial work on markdown	2022-07-26 13:38:38 -04:00
original	Add original pdf	2022-07-26 13:38:54 -04:00
pdf	All remaining text added prior to an editorial pass, graphics re-created and index updated.	2022-08-02 16:15:46 -04:00
NOTES_ON_OPTICAL_PRINTER_TECHNIQUE.md	All remaining text added prior to an editorial pass, graphics re-created and index updated.	2022-08-02 16:15:46 -04:00
README.md	Add start of HTML build process. Being used to make a pocket-sized version.	2022-07-31 20:25:10 -04:00
compile.sh	Add start of HTML build process. Being used to make a pocket-sized version.	2022-07-31 20:25:10 -04:00
style.css	Add extraction scripts and initial work on markdown	2022-07-26 13:38:38 -04:00

README.md

NOTES ON OPTICAL PRINTER TECHNIQUE

Reproduction on the guide written by Dennis Couzin. Loses some of the charm of the photocopied original floating around the internet, but this reproduction is done for the sake of readability/searchability of the text.

Tesseract does a majority of the heavy lifting, making about a 85% transcription with minor changes needed to spelling and slightly more effort formatting it into markdown for rendering. Pre-processing using OpenCV and tuning tesseract for the typewritten font may produce even better text.

Preserving alternate spellings not created in the OCR process.

PDF + HTML Dependencies

pandoc
showdown

bash compile.sh

Text extraction dependencies

Python3.7
OpenCV 2
Tesseract
PIL
PyMuPDF

cd extract
python3 pdf.py > ../ocr/pdf_output.txt
python3 ocr.py > ../ocr/tesseract_output.txt