Add extraction scripts and initial work on markdown

This commit is contained in:
Matt McWilliams 2022-07-26 13:38:38 -04:00
commit 90dff1d7a4
8 changed files with 3719 additions and 0 deletions

View File

@ -0,0 +1,52 @@
---
title: NOTES ON OPTICAL PRINTER TECHNIQUE
author: Dennis Couzin
date: "March 1983"
...
\pagenumbering{gobble}
\newpage
\pagenumbering{arabic}
::: {.indexTable}
| | | | |
|-----|-----|-----|-----|
| Magnification | 1 | Fades in Original | 14 |
| Blowup & Reduction | 2 | Chart C: Neutral Density | |
| Blowup Sharpness | 2 | and Equivalent Shutter | |
| Printer Lenses | 3 | Angle | 15 |
| Optical Zoom | 3 | Image Superposition | 16 |
| Lens Aperture | 3 | Gamma & Bipack | 16 |
| Focusing | 4 | Incidentally | 16 |
| Focusing Aperture | 4 | Exposure Compensation | 18 |
| Focusing Precision | 4 | Special Originals | 18 |
| Focusing Target | 4 | Texturing | 18 |
| Depth of Field | 4 | Multi-Exposure | 19 |
| Bolex Prism | 4 | Multi-Pack | 19 |
| Bolex Groundglass | 4 | Natural Superposition | 19 |
| Defocus | 4 | Flashing | 19 |
| X-Y Adjustment | 4 | Contrast Adjustment | 19 |
| Exact 1:1 | 5 | Color Image Superposition | 20 |
| Aimframe | 5 | Weighted Double Exposures | 20 |
| Framelines | 6 | Dissolves | 21 |
| Emulsion Position | 7 | Effects Dissolves | 21 |
| Time | 8 | Fades from Negative | 21 |
| Fancy Freeze | 8 | Color Exposure | 22 |
| Fancy Slow | 8 | Testing | 22 |
| Diffusers | 8 | CC Pack Reduction | 25 |
| UV Filter | 9 | High Contrast Prints | 25 |
| IR Filter | 9 | Hicon Exposure | 26 |
| Green Filter | 3 | Contrast Building Steps | 26 |
| Filter Location | 9 | Hicon Speckle | 26 |
| Exposure | 9 | Tone Isolation | 27 |
| Exposure Adjusters | 9 | Logic of Mask Combination | 27 |
| Specifying Exposure | 11 | Image Spread and Bloom | 27 |
| Film Speed | 11 | Mask and Countermask | 28 |
| Right Exposure | 11 | Reversal/Negative Fitting | 28 |
| Generations | 12 | Feathered Maska | 29 |
| Bellows Formula | 13 | Image Marriage | 29 |
| Fades | 13 | Mask Blackness | 30 |
| Log Fade | 14 | Hicona from Color Originals| 30 |
| Bolex Variable Shutter | 14 | Hicon Processing | 30 |
| Linear Fade | 14 | Optical Printed Release Prints | 31 |
| Other Fades | 14 | Ritual and Art | 31 |
:::

24
README.md Normal file
View File

@ -0,0 +1,24 @@
# NOTES ON OPTICAL PRINTER TECHNIQUE
Reproduction on the guide written by Dennis Couzin.
### PDF Dependencies
* pandoc
```bash
bash compile.sh
```
### Text extraction dependencies
* Python3.7
* OpenCV 2
* Tesseract
* PIL
```bash
cd extract
python3 pdf.py > ../ocr/pdf_output.txt
python3 ocr.py > ../ocr/tesseract_output.txt
```

5
compile.sh Normal file
View File

@ -0,0 +1,5 @@
#!/bin/bash
mkdir -p pdf
pandoc --css=noopt.css -o pdf/NOTES_ON_OPTICAL_PRINTER_TECHNIQUE.pdf NOTES_ON_OPTICAL_PRINTER_TECHNIQUE.md

20
extract/ocr.py Normal file
View File

@ -0,0 +1,20 @@
#pip install pytesseract PyMuPDF Pillow opencv-python
import fitz
import io
from PIL import Image
import pytesseract
import cv2
pytesseract.pytesseract.tesseract_cmd = '/usr/bin/tesseract'
file = "../original/NOTES_ON_OPTICAL_PRINTER_TECHNIQUE.pdf"
pdf_file = fitz.open(file)
for page in pdf_file:
pix = page.get_pixmap(dpi=300)
filePath = "pages/page-%i.png" % page.number
pix.save(filePath)
image = cv2.imread(filePath)
text = pytesseract.image_to_string(image, lang='eng', config='--psm 6 --oem 3')
print(text)

12
extract/pdf.py Normal file
View File

@ -0,0 +1,12 @@
#pip install pytesseract PyMuPDF
import fitz
import io
file = "../original/NOTES_ON_OPTICAL_PRINTER_TECHNIQUE.pdf"
pdf_file = fitz.open(file)
for page in pdf_file:
text = page.get_text().encode("utf8")
print(text.decode("unicode_escape"))

1908
ocr/pdf_output.txt Executable file

File diff suppressed because it is too large Load Diff

1690
ocr/tesseract_output.txt Executable file

File diff suppressed because it is too large Load Diff

8
style.css Normal file
View File

@ -0,0 +1,8 @@
.centered{
margin: 0 auto;
display: block;
}
.indexTable{
color: red;
}