Can You Tell an Author’s Identity By Looking at Punctuation Alone?
Research has found that an author’s use of punctuation can be extremely revealing.
Mental Floss
Lucas Reilly
Photo from iStock.com / RyersonClark.
In 2016, neuroscientist Adam J Calhoun wondered what his favorite books would look like if he removed the words and left nothing but the punctuation. The result was a stunning—and surprisingly beautiful—visual stream of commas, question marks, semicolons, em-dashes, and periods.
Recently, Calhoun’s inquiry piqued the interest of researchers in the United Kingdom, who wondered if it was possible to identify an author from his or her punctuation alone.
For decades, linguists have been able to use the quirks of written texts to pinpoint the author. The process, called stylometric analysis or stylometry, has dozens of legal and academic applications, helping researchers authenticate anonymous works of literature and even nab criminals like the Unabomber. But it usually focuses on an author's word choices and grammar or the length of his or her sentences. Until now, punctuation has been largely ignored.
But according to a 2019 paper led by Alexandra N. M. Darmon of the Oxford Centre for Industrial and Applied Mathematics, an author’s use of punctuation can be extremely revealing. Darmon’s team assembled nearly 15,000 documents from 651 different authors and “de-worded” each text. “Is it possible to distinguish literary genres based on their punctuation sequences?” the researchers asked. “Do the punctuation styles of authors evolve over time?”
Apparently, yes. The researchers crafted mathematical formulas that could identify individual authors with 72 percent accuracy. Their ability to detect a specific genre—from horror to philosophy to detective fiction—was accurate more than half the time, clocking in at a 65 percent success rate.
The results, published on the preprint server SocArXiv, also revealed how punctuation style has evolved. The researchers found that “the use of quotation marks and periods has increased over time (at least in our [sample]) but that the use of commas has decreased over time. Less noticeably, the use of semicolons has also decreased over time.”
You probably don’t need to develop a powerful algorithm to figure that last bit out—you just have to crack open something by Dickens.
Lucas Reilly proudly worked on mental_floss magazine for four years, where he served as a senior editor. For two years, he worked as a longform feature writer for the web. He's embedded with professional eclipse chasers in Nebraska, interviewed feudal lords in Britain, hunted for buried treasure in Virginia, and once profiled a man who had tried to turn into a goat. Chances are, you can find him at the library.
Advertisement
Mental Floss
More from Mental Floss
‘Friend Books’ Were the Facebook of the 16th Century
55 saves
10 Myers-Briggs Type Charts for Pop Culture Characters
654 saves
What Your Facebook Updates Say About Your Age
309 saves
Advertisement
This post originally appeared on Mental Floss and was published January 21, 2019. This article is republished here with permission.
Want more fun facts and fascinating stories?Visit Mental Floss
Advertisement
More Stories from Pocket
Pocket Collections
How to Write (Almost) Anything, From a Great Joke to a Killer Cover Letter
How Non-English Speakers Learn This Crazy Grammar Rule You Know But Never Heard Of
The Art of Wrath
You’re Never Too Old to Become Fluent in a Foreign Language
Tsundoku: The Art of Buying Books and Never Reading Them
Discover More Topics
CoronavirusBusinessCareerEntertainmentFoodParentingPoliticsScienceSelf ImprovementTechnologyTravelHealth & Fitness
Products
Must-read articles
Daily newsletter
Pocket Premium
Save to Pocket extensions
Company
About
Careers
Blog
Resources
Get help
Developer API
Pocket for publishers
Advertise
Get the app
Download On the Apple App Store
Get It On Google Play
Pocket is part of the Mozilla family of products.
© 2020 Read It Later, Inc.Privacy policyTerms of service
Get fascinating stories daily with Pock