Using masked language modeling as a way to detect literary clichés. Training BERT to use on North Korean language data. Borrowing a pseudo-perplexity metric to use as a measure of […]
Author: Ben
Porting North Korean Dictionaries with Rust
Reverse engineering North Korean dictionary software to export the data to a more accessible format. Reading into North Korean software protection schemes, encoding formats and database indexing. Dealing with OOP […]
Reverse Engineering a North Korean Sim City Game
Reverse engineering the North Korean version of a popular Sim City-like game using Ghidra and ndSpy to understand video game monetization strategies in the DPRK and the marketization of the […]
Machine Learning and the Bane of Romanization
An attempt to develop a quick and dirty method to automatically transliterate Korean using the McCune-Reischauer system with NLP, neural networks and character level sequence to sequence models. 0. IntroductionI […]
North and South Korea Through Word Embeddings
0. IntroductionI sometimes get asked if there are many differences between the languages spoken in North and South Korea, to which I usually answer “not that much”. Since the Sunshine […]
Visualizing the Korean War : Data, Bombs and Propaganda
A quick look at a US Air Force dataset, some geographical data visualization using d3.js and an enquiry into North Korean data collection method during the Korean War and their […]
Gender Distribution in North Korean Posters with Convolutional Neural Networks
A brief analysis of gender distribution in visual representations of everyday life in North Korea using facial recognition algorithms and transfer learning applied to convolutional neural networks. 1. The DataA […]
Building an OCR Tool For North Korean Archival Data (Part 2)
Designing a pre-processing method to improve OCR results using Python and OpenCV for old North Korean print material. Creating a simple character segmentation algorithm using contouring and simple heuristics. 2. […]
Building an OCR Tool For North Korean Archival Data (Part 1)
With the goal of performing OCR and indexing the content of archival documents, this posts explains how to build a simple Python web scraper to extract data from a public […]