Using masked language modeling as a way to detect literary clichés. Training BERT to use on North Korean language data. Borrowing a pseudo-perplexity metric to use as a measure of […]
Tag: Python
Machine Learning and the Bane of Romanization
An attempt to develop a quick and dirty method to automatically transliterate Korean using the McCune-Reischauer system with NLP, neural networks and character level sequence to sequence models. 0. IntroductionI […]
North and South Korea Through Word Embeddings
0. IntroductionI sometimes get asked if there are many differences between the languages spoken in North and South Korea, to which I usually answer “not that much”. Since the Sunshine […]
Gender Distribution in North Korean Posters with Convolutional Neural Networks
A brief analysis of gender distribution in visual representations of everyday life in North Korea using facial recognition algorithms and transfer learning applied to convolutional neural networks. 1. The DataA […]
Building an OCR Tool For North Korean Archival Data (Part 2)
Designing a pre-processing method to improve OCR results using Python and OpenCV for old North Korean print material. Creating a simple character segmentation algorithm using contouring and simple heuristics. 2. […]
Building an OCR Tool For North Korean Archival Data (Part 1)
With the goal of performing OCR and indexing the content of archival documents, this posts explains how to build a simple Python web scraper to extract data from a public […]