Reproduction of Dennis Couzin's 1983 work on optical printing. Derived from a scan in a PDF.
Go to file
Matt McWilliams fbff95e21e Fix encoding errors in HTML build 2022-08-31 16:25:27 -04:00
extract Add extraction scripts and initial work on markdown 2022-07-26 13:38:38 -04:00
html Fix encoding errors in HTML build 2022-08-31 16:25:27 -04:00
img Added image_10 2022-08-19 14:31:27 -04:00
ocr Add extraction scripts and initial work on markdown 2022-07-26 13:38:38 -04:00
original Extract images to recreate with SVG 2022-08-03 15:59:16 -04:00
pdf Update page numbers on the index as the images have all been added. 2022-08-19 15:13:08 -04:00
tmpl Style using Computer Modern font used by default in tex 2022-08-29 16:56:52 -04:00
.gitignore Start handbook build process 2022-08-29 16:20:29 -04:00
NOTES_ON_OPTICAL_PRINTER_TECHNIQUE.md Correct minor typos found during handbook creation 2022-08-29 16:56:32 -04:00
README.md No longer use showdown to build HTML, will still need to improve build process for that page, but are getting the tables correctly. Need to fix the underlining, but will use spans for now. 2022-08-02 16:27:53 -04:00
compile.sh Fix encoding errors in HTML build 2022-08-31 16:25:27 -04:00
compile_handbook.sh Correct minor typos found during handbook creation 2022-08-29 16:56:32 -04:00
style.css No longer use showdown to build HTML, will still need to improve build process for that page, but are getting the tables correctly. Need to fix the underlining, but will use spans for now. 2022-08-02 16:27:53 -04:00

README.md

NOTES ON OPTICAL PRINTER TECHNIQUE

Reproduction on the guide written by Dennis Couzin. Loses some of the charm of the photocopied original floating around the internet, but this reproduction is done for the sake of readability/searchability of the text.

Tesseract does a majority of the heavy lifting, making about a 85% transcription with minor changes needed to spelling and slightly more effort formatting it into markdown for rendering. Pre-processing using OpenCV and tuning tesseract for the typewritten font may produce even better text.

Preserving alternate spellings not created in the OCR process.

PDF + HTML Dependencies

  • pandoc
  • wkhtmltopdf
bash compile.sh 

Text extraction dependencies

  • Python3.7
  • OpenCV 2
  • Tesseract
  • PIL
  • PyMuPDF
cd extract
python3 pdf.py > ../ocr/pdf_output.txt
python3 ocr.py > ../ocr/tesseract_output.txt