Reproduction of Dennis Couzin's 1983 work on optical printing. Derived from a scan in a PDF.

Go to file

mmcwilliams f37f1c4029 Add new flag to prevent header and footer from being printed		2024-09-02 22:47:37 -04:00
extract	Abandon pretenses of building a PDF/A version of the document and go with what works. Add a requirements.txt file for the extraction tool.	2022-09-08 09:08:53 -04:00
html	Additional patch provided by Tom Murphy.	2024-08-31 14:52:25 -04:00
img	Added image_10	2022-08-19 14:31:27 -04:00
ocr	Add extraction scripts and initial work on markdown	2022-07-26 13:38:38 -04:00
original	Extract images to recreate with SVG	2022-08-03 15:59:16 -04:00
pdf	Additional patch provided by Tom Murphy.	2024-08-31 14:52:25 -04:00
tmpl	Build the handbook. Missing zinemaker step, but will later work out a bash script that builds the pages and automatically stitches a pdf for printing.	2022-08-31 16:26:18 -04:00
.gitignore	Build the handbook. Missing zinemaker step, but will later work out a bash script that builds the pages and automatically stitches a pdf for printing.	2022-08-31 16:26:18 -04:00
NOTES_ON_OPTICAL_PRINTER_TECHNIQUE.md	Additional patch provided by Tom Murphy.	2024-08-31 14:52:25 -04:00
README.md	No longer use showdown to build HTML, will still need to improve build process for that page, but are getting the tables correctly. Need to fix the underlining, but will use spans for now.	2022-08-02 16:27:53 -04:00
build_handbook.sh	Build the handbook. Missing zinemaker step, but will later work out a bash script that builds the pages and automatically stitches a pdf for printing.	2022-08-31 16:26:18 -04:00
compile.sh	Fix encoding errors in HTML build	2022-08-31 16:25:27 -04:00
compile_handbook.sh	Add new flag to prevent header and footer from being printed	2024-09-02 22:47:37 -04:00
style.css	No longer use showdown to build HTML, will still need to improve build process for that page, but are getting the tables correctly. Need to fix the underlining, but will use spans for now.	2022-08-02 16:27:53 -04:00

README.md

NOTES ON OPTICAL PRINTER TECHNIQUE

Reproduction on the guide written by Dennis Couzin. Loses some of the charm of the photocopied original floating around the internet, but this reproduction is done for the sake of readability/searchability of the text.

Tesseract does a majority of the heavy lifting, making about a 85% transcription with minor changes needed to spelling and slightly more effort formatting it into markdown for rendering. Pre-processing using OpenCV and tuning tesseract for the typewritten font may produce even better text.

Preserving alternate spellings not created in the OCR process.

PDF + HTML Dependencies

pandoc
wkhtmltopdf

bash compile.sh

Text extraction dependencies

Python3.7
OpenCV 2
Tesseract
PIL
PyMuPDF

cd extract
python3 pdf.py > ../ocr/pdf_output.txt
python3 ocr.py > ../ocr/tesseract_output.txt