Skip to content

Digital NK

Tag: Tesseract

Building an OCR Tool For North Korean Archival Data (Part 2)

Designing a pre-processing method to improve OCR results using Python and OpenCV for old North Korean print material. Creating a simple character segmentation algorithm using contouring and simple heuristics.  2. […]

BenSeptember 15, 2017Computer Vision, OCR, OpenCV, Python, RG-242, Tesseract, US National Archives 
Read more

Recent Posts

  • Language Models & Literary Clichés: Analyzing North Korean Poetry with BERT
  • Porting North Korean Dictionaries with Rust
  • Reverse Engineering a North Korean Sim City Game
  • Machine Learning and the Bane of Romanization
  • North and South Korea Through Word Embeddings

Archives

  • October 2020
  • May 2020
  • April 2019
  • January 2018
  • December 2017
  • October 2017
  • September 2017
  • August 2017

Tags

Android (2) Assembly (1) Cartography (1) Computer Vision (2) CSS (1) D3.js (1) Data Collection (1) Data Scraping (1) Datasets (1) Data Visualization (2) Deep Learning (1) Face detection (1) Gender (1) Ghidra (2) Hangul (1) History (1) Java (2) Keras (2) Korean War (1) KPA (1) Leiden University (1) Machine Learning (2) Maps (1) McCune-Reischauer (1) Natural Language Processing (3) Neural Network (1) Neural Networks (1) OCR (2) OpenCV (2) Posters (1) Propaganda (1) Python (6) R (1) Reverse engineering (2) RG-242 (2) RNN (1) Romanization (1) Sequence-to-Sequence (1) Tesseract (1) Text mining (1) Transfer Learning (1) Unicode (1) US Air Force (1) US National Archives (2) Visual Arts (1)