Wolfram Language & System Documentation Center

WordFrequencyData

WordFrequencyData[word]

gives the frequency of word in typical published English text.

WordFrequencyData[{word₁,word₂,…}]

gives an association of frequencies of the word_i.

WordFrequencyData[word,"TimeSeries"]

gives a time series for the frequency of word in typical published English text.

WordFrequencyData[word,"TimeSeries",datespec]

gives a time series for dates specified by datespec.

WordFrequencyData[word,"prop"]

gives property prop of the word frequency.

WordFrequencyData

WordFrequencyData[word]

gives the frequency of word in typical published English text.

WordFrequencyData[{word₁,word₂,…}]

gives an association of frequencies of the word_i.

WordFrequencyData[word,"TimeSeries"]

gives a time series for the frequency of word in typical published English text.

WordFrequencyData[word,"TimeSeries",datespec]

gives a time series for dates specified by datespec.

WordFrequencyData[word,"prop"]

gives property prop of the word frequency.

Details and Options

WordFrequencyData[word₁|word₂|…] gives the total frequencies of all the word_i.
WordFrequencyData[word,"Total",datespec] gives the total frequency of word for the dates specified by datespec.
By default, WordFrequencyData uses the Google Books English n-gram public dataset.
Possible options include:
IgnoreCase False whether to ignore case in word

Language "English" what language of source corpus to use
In WordFrequencyData[word,"prop"], possible properties include:

	"Total"	give total frequencies over a date range
	"TimeSeries"	give a time series of frequencies
	"CaseVariants"	give results for all variants of upper and lower case
	"PartsOfSpeechVariants"	give results for all variants of parts of speech
	{prop₁,prop₂,…}	give results for combinations of properties

Possible date specifications include:

	All	use all available dates for the specified source corpus
	DateObject[…]	use DateObject
	year	use specific year
	{year_min,year_max}	use year range between year_min and year_max
	{{d₁,d₂,…}}	use explicit dates {d₁,d₂,…}

Examples

open all close all

Basic Examples (4)

Get the frequency of the word "dog" in typical English:

Wolfram Language code: WordFrequencyData["dog"]

Get the typical frequencies of several words:

Wolfram Language code: WordFrequencyData[{"dog", "cat"}]

Compute the ratio between the words "war" and "peace" in published text:

Wolfram Language code: WordFrequencyData["war"] / WordFrequencyData["peace"]

Plot the historical time series for the frequency of the word "economy":

Wolfram Language code: DateListPlot[WordFrequencyData["economy", "TimeSeries"]]

Scope (4)

Get the overall frequency of "atlas":

Wolfram Language code: WordFrequencyData["atlas"]

Find the frequency of multiple words at once:

Wolfram Language code: WordFrequencyData[{"perro", "gato"}, Language -> "Spanish"]

WordFrequencyData accepts as input TextElement with a specific "GrammaticalUnit":

Wolfram Language code:

WordFrequencyData[{TextElement["burned", <|"GrammaticalUnit" -> Entity["GrammaticalUnit", "Verb"]|>], TextElement["burned", <|"GrammaticalUnit" -> Entity["GrammaticalUnit", "Adjective"]|>], TextElement["burnt", <|"GrammaticalUnit" -> Entity["GrammaticalUnit", "Verb"]|>], TextElement["burnt", <|"GrammaticalUnit" -> Entity["GrammaticalUnit", "Adjective"]|>]}, "Total", 2000, IgnoreCase -> True]//Sort//Reverse

Plot the historical time series for the frequency of the word "computer" since 1900:

Wolfram Language code: DateListPlot[WordFrequencyData["computer", "TimeSeries", {1900, Now}]]

Generalizations & Extensions (1)

When Alternatives is used as an input, the result is the total frequency for any of the alternatives:

Wolfram Language code: WordFrequencyData["today" | "yesterday"]

Alternatives may be used in combination with other properties, such as "TimeSeries":

Wolfram Language code: WordFrequencyData[{"tv" | "television", "phone" | "telephone"}, "TimeSeries", {1900, Now}, IgnoreCase -> True]

Wolfram Language code: %//DateListPlot

Options (6)

IgnoreCase (1)

Returns the frequency of a word, ignoring any lower- or uppercase variants. The default value is False:

Wolfram Language code: WordFrequencyData["war", IgnoreCase -> True]

This value is usually greater than the default:

Wolfram Language code: WordFrequencyData["war"]

Language (5)

Find the frequency of a common Spanish word in a Spanish-language text corpus:

Wolfram Language code: WordFrequencyData["en", Language -> "Spanish"]

Spanish words might appear in the other languages, but with a much lower frequency:

Wolfram Language code: WordFrequencyData["perro"] / WordFrequencyData["perro", Language -> "Spanish"]

A common word in French returns a high frequency value:

Wolfram Language code: WordFrequencyData["oui", Language -> "French"]

Popularity of the word "peace" in Spanish:

Wolfram Language code: WordFrequencyData["paz", Language -> "Spanish"]

The word "Sputnik" in Russian:

Wolfram Language code: WordFrequencyData["Спутник", Language -> Entity["Language", "Russian"]]

Get a time series of the word "Haus" in German between 1900 and now and plot the result:

Wolfram Language code: WordFrequencyData["Haus", "TimeSeries", {1900, Now}, Language -> "German", IgnoreCase -> True]

Wolfram Language code: %//DateListPlot

Properties & Relations (14)

"CaseVariants" (3)

A word can have many lower- and uppercase variants:

Wolfram Language code: WordFrequencyData["apple", "CaseVariants", 2000]

Getting the frequency of the word with IgnoreCase->True should be equivalent to getting the Total for the previous list:

Wolfram Language code: WordFrequencyData["apple", "Total", 2000, IgnoreCase -> True]

Wolfram Language code: Total[%%]

Get the most popular case variation of "DOS":

Wolfram Language code: Sort[WordFrequencyData["DOS", "CaseVariants"]]//Reverse

When asking for multiple words, the association will contain all variants of each word:

Wolfram Language code: WordFrequencyData[{"nascar", "alice"}, "CaseVariants"]

"PartOfSpeechVariants" (4)

Calculate the frequency of a word in an specific year for all part of speech variants:

Wolfram Language code: WordFrequencyData["apple", "PartOfSpeechVariants", 1991]

Show different usages of the word "nuke" in 1944:

Wolfram Language code: WordFrequencyData["nuke", "PartOfSpeechVariants", 1944]

Some words may return many part of speech variants:

Wolfram Language code: WordFrequencyData["burnt", "PartOfSpeechVariants"]

Combining this argument with "CaseVariants":

Wolfram Language code: WordFrequencyData["apple", {"PartOfSpeechVariants", "CaseVariants"}]

Combining with "CaseVariants" and "TimeSeries":

Wolfram Language code: WordFrequencyData["apple", {"PartOfSpeechVariants", "CaseVariants", "TimeSeries"}]//Short

"TimeSeries" (2)

Get the frequency of the word "war" throughout the twentieth century:

Wolfram Language code: warXX = WordFrequencyData["war", "TimeSeries", {1901, 2000}, IgnoreCase -> True]

This can be plotted directly using DateListPlot:

Wolfram Language code: DateListPlot[warXX]

Compare the usage of "peace" and "war" over time:

Wolfram Language code: warpeaceXX = WordFrequencyData[{"war", "peace"}, "TimeSeries", {1901, 2000}, IgnoreCase -> True]

Wolfram Language code: DateListPlot[warpeaceXX]

And compare their usage in another language too:

Wolfram Language code:

warpeaceRussianXX = WordFrequencyData[{"война", "мир"}, "TimeSeries", {1901, 2000}, IgnoreCase -> True, Language -> "Russian"]

Wolfram Language code: DateListPlot[warpeaceRussianXX]

Plot the ratio of the words "war" and "peace" for both languages:

Wolfram Language code:

DateListPlot[<|"English" -> warpeaceXX["war"] / warpeaceXX["peace"], "Russian" -> warpeaceRussianXX["война"] / warpeaceRussianXX["мир"]|>]

"Total" (5)

"Total" is the default property:

Wolfram Language code: WordFrequencyData["war", "Total"] === WordFrequencyData["war"]

For a simple date range:

Wolfram Language code: WordFrequencyData["war", "Total", {1900, 1960}, IgnoreCase -> True]

The usage of DateObject objects in the date specification is allowed:

Wolfram Language code: WordFrequencyData["peace", "Total", {DateObject[{1900}], DateObject[{1960}]}]

The "Total" can be computed over a specific list of years:

Wolfram Language code: WordFrequencyData[{"war", "peace"}, "Total", {{1914}, {1939}, {1945}}, IgnoreCase -> True]

Infinity can be used to specify an unbound range:

Wolfram Language code: WordFrequencyData[{"war", "peace"}, "Total", {1900, Infinity}, IgnoreCase -> True]

Possible Issues (1)

Words that are not included within the corpus will return Missing["NotAvailable"]:

Wolfram Language code: WordFrequencyData[{"beratna", "sésata"}]

Neat Examples (11)

Popularity of the word "dog" and its translations in different languages:

Wolfram Language code:

TextGrid[Table[With[{word = First@WordTranslation["dog", "English" -> lang]}, {lang, word, WordFrequencyData[word, Language -> lang]}], {lang, {"English", "Spanish", "German", "Russian", "French", "Italian"}}]]

The words "gold" versus "oil" over time:

Wolfram Language code: DateListPlot[WordFrequencyData[{"gold", "oil"}, "TimeSeries", {1920, 2000}, IgnoreCase -> True]]

Frequency of terms for telephone and television over time:

Wolfram Language code:

DateListPlot[WordFrequencyData[{"television", "tv", "phone", "telephone"}, "TimeSeries", {1900, 2000}, IgnoreCase -> True]]

Joining synonyms:

Wolfram Language code: tvphone = WordFrequencyData[{"television", "tv", "phone", "telephone"}, "TimeSeries", {1900, 2000}, IgnoreCase -> True];

Wolfram Language code: DateListPlot[<|"tv" -> (tvphone["television"] + tvphone["tv"]), "phone" -> (tvphone["phone"] + tvphone["telephone"])|>]

Common diseases:

Wolfram Language code:

DateListPlot[WordFrequencyData[{"HIV", "smallpox", "tuberculosis", "cholera"}, "TimeSeries", {DateObject[{1850}], DateObject[{2010}]}, IgnoreCase -> True], PlotRange -> All]

Sorting day names by popularity:

Wolfram Language code:

DateListPlot[SortBy[WordFrequencyData[{"Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"}, "TimeSeries", {DateObject[{1800}], DateObject[{2000}]}, IgnoreCase -> True], Total]//Reverse, PlotRange -> All]

Some words have lost their old orthography:

Wolfram Language code: DateListPlot[WordFrequencyData[{"encyclopaedia", "encyclopædia"}, "TimeSeries", {1750, 1830}, IgnoreCase -> True ]]

Wolfram Language code:

DateListPlot[Divide@@WordFrequencyData[{"encyclopaedia", "encyclopædia"}, "TimeSeries", {1750, 1830}, IgnoreCase -> True ], PlotRange -> All]

The word "democracy" gets more frequent usage in the twentieth century:

Wolfram Language code:

DateListPlot[WordFrequencyData[{"democracy", "monarchy"}, "TimeSeries", {1800, 2000}, IgnoreCase -> True], PlotRange -> All]

"Apple" with initial uppercase A became popular after 1980:

Wolfram Language code:

DateListPlot[SortBy[Select[WordFrequencyData["apple", {"TimeSeries", "CaseVariants"}, {1950, 2000}], Mean[#] > 10 ^ -7&], Mean]//Reverse, PlotRange -> All]

The relative frequency of part of speech variants may change over time. "Tackle" as a verb and as a noun is a good example:

Wolfram Language code:

DateListPlot[KeyMap[Rasterize, SortBy[Select[WordFrequencyData["tackle", {"POSVariants", "TimeSeries"}, {1900, 2000}, IgnoreCase -> True], Mean[#] > 10 ^ -6&], Mean]//Reverse], PlotRange -> All]

Wolfram Language code:

DateListPlot[SortBy[Select[WordFrequencyData["tackle", {"POSVariants", "TimeSeries"}, {1900, 2000}, IgnoreCase -> True], Mean[#] > 10 ^ -6&], Mean]//Reverse, PlotRange -> All]

Regularization of irregular verbs may explain the changes in the part of speech and orthography of some words, such as "burnt" versus "burned":

Wolfram Language code:

DateListPlot[KeyMap[Rasterize, WordFrequencyData[{TextElement["burned", <|"GrammaticalUnit" -> Entity["GrammaticalUnit", "Verb"]|>], TextElement["burned", <|"GrammaticalUnit" -> Entity["GrammaticalUnit", "Adjective"]|>], TextElement["burnt", <|"GrammaticalUnit" -> Entity["GrammaticalUnit", "Verb"]|>], TextElement["burnt", <|"GrammaticalUnit" -> Entity["GrammaticalUnit", "Adjective"]|>], "burned", "burnt"}, "TimeSeries", {1800, 2000}]], PlotRange -> All, ImageSize -> Medium]

Evolution of "ustedes" versus "vosotros" in Spanish:

Wolfram Language code:

DateListPlot[WordFrequencyData[{"vosotros", "ustedes"}, {"TimeSeries"}, {1850, Now}, Language -> "Spanish", IgnoreCase -> True ], PlotRange -> All]

Top

More Learning

Tech Support

Wolfram Solutions

Wolfram Solutions For Education

Get Started

Grow Your Skills

Work with Us

Educational Programs for Adults

Educational Programs for Youth

Read

WordFrequencyData

Details and Options

Examples

Basic Examples (4)

Scope (4)

Generalizations & Extensions (1)

Options (6)

IgnoreCase (1)

Language (5)

Properties & Relations (14)

"CaseVariants" (3)

"PartOfSpeechVariants" (4)

"TimeSeries" (2)

"Total" (5)

Possible Issues (1)

Neat Examples (11)

Text

CMS

APA

BibTeX

BibLaTeX

WordFrequencyData

Details and Options

Examples

Basic Examples (4)

Scope (4)

Generalizations & Extensions (1)

Options (6)

IgnoreCase (1)

Language (5)

Properties & Relations (14)

"CaseVariants" (3)

"PartOfSpeechVariants" (4)

"TimeSeries" (2)

"Total" (5)

Possible Issues (1)

Neat Examples (11)

See Also

Related Guides

Related Workflows

History

Text

CMS

APA

BibTeX

BibLaTeX