Reproduction of Dennis Couzin's 1983 work on optical printing. Derived from a scan in a PDF.
Go to file
Matt McWilliams d444d47cc3 Add hyperlinks to sections from the index page. 2022-07-31 19:58:24 -04:00
extract Add extraction scripts and initial work on markdown 2022-07-26 13:38:38 -04:00
ocr Add extraction scripts and initial work on markdown 2022-07-26 13:38:38 -04:00
original Add original pdf 2022-07-26 13:38:54 -04:00
pdf Add hyperlinks to sections from the index page. 2022-07-31 19:58:24 -04:00
NOTES_ON_OPTICAL_PRINTER_TECHNIQUE.md Add hyperlinks to sections from the index page. 2022-07-31 19:58:24 -04:00
README.md Updated readme 2022-07-26 16:53:44 -04:00
compile.sh Add introduction page--still not adding graphics, will recreate in SVG 2022-07-26 14:03:39 -04:00
style.css Add extraction scripts and initial work on markdown 2022-07-26 13:38:38 -04:00

README.md

NOTES ON OPTICAL PRINTER TECHNIQUE

Reproduction on the guide written by Dennis Couzin. Loses some of the charm of the photocopied original floating around the internet, but this reproduction is done for the sake of readability/searchability of the text.

Tesseract does a majority of the heavy lifting, making about a 85% transcription with minor changes needed to spelling and slightly more effort formatting it into markdown for rendering. Pre-processing using OpenCV and tuning tesseract for the typewritten font may produce even better text.

Preserving alternate spellings not created in the OCR process.

PDF Dependencies

  • pandoc
bash compile.sh 

Text extraction dependencies

  • Python3.7
  • OpenCV 2
  • Tesseract
  • PIL
  • PyMuPDF
cd extract
python3 pdf.py > ../ocr/pdf_output.txt
python3 ocr.py > ../ocr/tesseract_output.txt