About

The idea

Every word has a history. Not just an etymology, a life. Words rise with the events that make them necessary, peak when the world cannot stop talking about them and fade when something else takes their place. That arc is visible in data, if you know where to look.

wordshift grew from a simple observation: the Google Books Ngram corpus contains over 8 million digitised books spanning more than two centuries of English publishing, and buried in that data is something remarkable. A record of what humanity was paying attention to, year by year, from 1800 to the present. Search any word and you are looking at a kind of cultural seismograph. The spikes and valleys are not random. They mean something.

The data

wordshift draws on the Google Books Ngram corpus, a dataset assembled by researchers at Harvard and Google and first described in a landmark 2010 paper in Science that introduced the field of culturomics, the quantitative study of human culture through language. The corpus tracks word and phrase frequency across millions of published books in English, giving wordshift a 222-year window into how the language has shifted in response to war, discovery, legislation, crisis and change.

Frequency values are smoothed using a 7-year rolling average to reduce year-to-year noise, and indexed so that each word’s historical peak equals 100. This makes it possible to compare words of vastly different absolute frequency on the same chart, to see “pandemic” and “cholera” side by side without one flattening the other.

The methodology

The inflection points on each curve are not hand-picked. They are detected algorithmically using topographic prominence, the same mathematical method used to rank mountain peaks by their significance relative to surrounding terrain. A moment earns its dot by standing out from its surroundings, not simply by being high.

Each moment is annotated with historical context using Claude, Anthropic’s AI language model. The annotation draws on what was happening in English publishing around that specific year: the legislation, the discoveries, the crises and the cultural shifts that drove writers and institutions to reach for that word more often. Annotations are honest about uncertainty. When a pattern looks like a corpus artefact rather than genuine history, the annotation says so.

The corpus has real limitations. It over-represents certain publishers, genres and periods. Words that appear before they were coined usually reflect scanning errors or misdated books. wordshift tries to surface these caveats rather than hide them.

Explore

The best way to understand wordshift is to search something you are curious about. Start with a word you think you know and see what the data says.

Search wordshift →