Add extraction scripts and initial work on markdown
This commit is contained in:
commit
90dff1d7a4
|
@ -0,0 +1,52 @@
|
||||||
|
---
|
||||||
|
title: NOTES ON OPTICAL PRINTER TECHNIQUE
|
||||||
|
author: Dennis Couzin
|
||||||
|
date: "March 1983"
|
||||||
|
...
|
||||||
|
\pagenumbering{gobble}
|
||||||
|
\newpage
|
||||||
|
\pagenumbering{arabic}
|
||||||
|
::: {.indexTable}
|
||||||
|
|
||||||
|
| | | | |
|
||||||
|
|-----|-----|-----|-----|
|
||||||
|
| Magnification | 1 | Fades in Original | 14 |
|
||||||
|
| Blowup & Reduction | 2 | Chart C: Neutral Density | |
|
||||||
|
| Blowup Sharpness | 2 | and Equivalent Shutter | |
|
||||||
|
| Printer Lenses | 3 | Angle | 15 |
|
||||||
|
| Optical Zoom | 3 | Image Superposition | 16 |
|
||||||
|
| Lens Aperture | 3 | Gamma & Bipack | 16 |
|
||||||
|
| Focusing | 4 | Incidentally | 16 |
|
||||||
|
| Focusing Aperture | 4 | Exposure Compensation | 18 |
|
||||||
|
| Focusing Precision | 4 | Special Originals | 18 |
|
||||||
|
| Focusing Target | 4 | Texturing | 18 |
|
||||||
|
| Depth of Field | 4 | Multi-Exposure | 19 |
|
||||||
|
| Bolex Prism | 4 | Multi-Pack | 19 |
|
||||||
|
| Bolex Groundglass | 4 | Natural Superposition | 19 |
|
||||||
|
| Defocus | 4 | Flashing | 19 |
|
||||||
|
| X-Y Adjustment | 4 | Contrast Adjustment | 19 |
|
||||||
|
| Exact 1:1 | 5 | Color Image Superposition | 20 |
|
||||||
|
| Aimframe | 5 | Weighted Double Exposures | 20 |
|
||||||
|
| Framelines | 6 | Dissolves | 21 |
|
||||||
|
| Emulsion Position | 7 | Effects Dissolves | 21 |
|
||||||
|
| Time | 8 | Fades from Negative | 21 |
|
||||||
|
| Fancy Freeze | 8 | Color Exposure | 22 |
|
||||||
|
| Fancy Slow | 8 | Testing | 22 |
|
||||||
|
| Diffusers | 8 | CC Pack Reduction | 25 |
|
||||||
|
| UV Filter | 9 | High Contrast Prints | 25 |
|
||||||
|
| IR Filter | 9 | Hicon Exposure | 26 |
|
||||||
|
| Green Filter | 3 | Contrast Building Steps | 26 |
|
||||||
|
| Filter Location | 9 | Hicon Speckle | 26 |
|
||||||
|
| Exposure | 9 | Tone Isolation | 27 |
|
||||||
|
| Exposure Adjusters | 9 | Logic of Mask Combination | 27 |
|
||||||
|
| Specifying Exposure | 11 | Image Spread and Bloom | 27 |
|
||||||
|
| Film Speed | 11 | Mask and Countermask | 28 |
|
||||||
|
| Right Exposure | 11 | Reversal/Negative Fitting | 28 |
|
||||||
|
| Generations | 12 | Feathered Maska | 29 |
|
||||||
|
| Bellows Formula | 13 | Image Marriage | 29 |
|
||||||
|
| Fades | 13 | Mask Blackness | 30 |
|
||||||
|
| Log Fade | 14 | Hicona from Color Originals| 30 |
|
||||||
|
| Bolex Variable Shutter | 14 | Hicon Processing | 30 |
|
||||||
|
| Linear Fade | 14 | Optical Printed Release Prints | 31 |
|
||||||
|
| Other Fades | 14 | Ritual and Art | 31 |
|
||||||
|
:::
|
|
@ -0,0 +1,24 @@
|
||||||
|
# NOTES ON OPTICAL PRINTER TECHNIQUE
|
||||||
|
|
||||||
|
Reproduction on the guide written by Dennis Couzin.
|
||||||
|
|
||||||
|
### PDF Dependencies
|
||||||
|
|
||||||
|
* pandoc
|
||||||
|
|
||||||
|
```bash
|
||||||
|
bash compile.sh
|
||||||
|
```
|
||||||
|
|
||||||
|
### Text extraction dependencies
|
||||||
|
|
||||||
|
* Python3.7
|
||||||
|
* OpenCV 2
|
||||||
|
* Tesseract
|
||||||
|
* PIL
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd extract
|
||||||
|
python3 pdf.py > ../ocr/pdf_output.txt
|
||||||
|
python3 ocr.py > ../ocr/tesseract_output.txt
|
||||||
|
```
|
|
@ -0,0 +1,5 @@
|
||||||
|
#!/bin/bash
|
||||||
|
|
||||||
|
mkdir -p pdf
|
||||||
|
|
||||||
|
pandoc --css=noopt.css -o pdf/NOTES_ON_OPTICAL_PRINTER_TECHNIQUE.pdf NOTES_ON_OPTICAL_PRINTER_TECHNIQUE.md
|
|
@ -0,0 +1,20 @@
|
||||||
|
#pip install pytesseract PyMuPDF Pillow opencv-python
|
||||||
|
|
||||||
|
import fitz
|
||||||
|
import io
|
||||||
|
from PIL import Image
|
||||||
|
import pytesseract
|
||||||
|
import cv2
|
||||||
|
|
||||||
|
pytesseract.pytesseract.tesseract_cmd = '/usr/bin/tesseract'
|
||||||
|
file = "../original/NOTES_ON_OPTICAL_PRINTER_TECHNIQUE.pdf"
|
||||||
|
|
||||||
|
pdf_file = fitz.open(file)
|
||||||
|
|
||||||
|
for page in pdf_file:
|
||||||
|
pix = page.get_pixmap(dpi=300)
|
||||||
|
filePath = "pages/page-%i.png" % page.number
|
||||||
|
pix.save(filePath)
|
||||||
|
image = cv2.imread(filePath)
|
||||||
|
text = pytesseract.image_to_string(image, lang='eng', config='--psm 6 --oem 3')
|
||||||
|
print(text)
|
|
@ -0,0 +1,12 @@
|
||||||
|
#pip install pytesseract PyMuPDF
|
||||||
|
|
||||||
|
import fitz
|
||||||
|
import io
|
||||||
|
|
||||||
|
file = "../original/NOTES_ON_OPTICAL_PRINTER_TECHNIQUE.pdf"
|
||||||
|
|
||||||
|
pdf_file = fitz.open(file)
|
||||||
|
|
||||||
|
for page in pdf_file:
|
||||||
|
text = page.get_text().encode("utf8")
|
||||||
|
print(text.decode("unicode_escape"))
|
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
Loading…
Reference in New Issue