Reproduction of Dennis Couzin's 1983 work on optical printing. Derived from a scan in a PDF.
You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
 
 
 
Matt McWilliams 3c57e2b7e2 Add graphic image_5 1 week ago
extract Add extraction scripts and initial work on markdown 3 weeks ago
html Add graphic image_5 1 week ago
img Add graphic image_5 1 week ago
ocr Add extraction scripts and initial work on markdown 3 weeks ago
original Extract images to recreate with SVG 1 week ago
pdf Add graphic image_5 1 week ago
NOTES_ON_OPTICAL_PRINTER_TECHNIQUE.md Add graphic image_5 1 week ago
README.md No longer use showdown to build HTML, will still need to improve build process for that page, but are getting the tables correctly. Need to fix the underlining, but will use spans for now. 2 weeks ago
compile.sh Rewrite jpg to svg in HTML build process 1 week ago
style.css No longer use showdown to build HTML, will still need to improve build process for that page, but are getting the tables correctly. Need to fix the underlining, but will use spans for now. 2 weeks ago

README.md

NOTES ON OPTICAL PRINTER TECHNIQUE

Reproduction on the guide written by Dennis Couzin. Loses some of the charm of the photocopied original floating around the internet, but this reproduction is done for the sake of readability/searchability of the text.

Tesseract does a majority of the heavy lifting, making about a 85% transcription with minor changes needed to spelling and slightly more effort formatting it into markdown for rendering. Pre-processing using OpenCV and tuning tesseract for the typewritten font may produce even better text.

Preserving alternate spellings not created in the OCR process.

PDF + HTML Dependencies

  • pandoc
  • wkhtmltopdf
bash compile.sh 

Text extraction dependencies

  • Python3.7
  • OpenCV 2
  • Tesseract
  • PIL
  • PyMuPDF
cd extract
python3 pdf.py > ../ocr/pdf_output.txt
python3 ocr.py > ../ocr/tesseract_output.txt