Analyzing Stylistic Variation across
Different Political Regimes


1. Previous Work and Motivation

Automated authorship attribution has a long history (starting from the early 20th century [12]) and has since then been extensively studied and elaborated upon. The problem of authorship identification is based on the assumption that there are stylistic features that can help distinguish the real author of a text from any other theoretical author. This said set of stylistic features was recently defined as a linguistic fingerprint (or stylome), which can be measured, is largely unconscious, and is constant [17]. One of the oldest studiesto propose an approach to this problem is on the issue of the Federalist Papers, in which [13] an attempt is made to determine the real author of a few of these papers, which have disputed paternity. This work remains iconic in the field, both for introducing a standard dataset and for proposing an effective method for distinguishing between the authors’ styles, that is still relevant to this day, based on the frequency of function words. Many other types of features have been proposed and successfully used in subsequent studies to determine the author of a text. These types of features generally contrast with the content words commonly used in text categorization by topic, and are said to be used unconsciously and harder to control by the author. Such features are, for example, grammatical structures [1], part-of-speech n-grams [7], lexical richness [16], or even the more general feature of character n-grams [6, 5]. Having applications that go beyond finding the real authors of controversial texts, ranging from plagiarism detection to forensics and security, stylometry has widened its scope into other related subtopics such as author verification (verifying whether a text was written by a certain author) [8], author profiling (e.g. gender or age prediction), author diarization, or author masking (given a document, paraphrase it so that the original style does not match that of its original author anymore).

      In this work, we attempt to explore a slightly different issue, strongly related with the stylistic fingerprint of an author, namely if an author preserves his stylome across different time periods, and across political and cultural environments. More specifically, we want to see if we can discriminate between texts written by the same author under a communist regime, as compared to under democracy and texts written under a democratic one. […]

