WordFrequencyData[word]
gives the frequency of word in typical published English text.
WordFrequencyData[{word1,word2,…}]
gives an association of frequencies of the wordi.
WordFrequencyData[word,"TimeSeries"]
gives a time series for the frequency of word in typical published English text.
WordFrequencyData[word,"TimeSeries",datespec]
gives a time series for dates specified by datespec.
WordFrequencyData[word,"prop"]
gives property prop of the word frequency.
WordFrequencyData
WordFrequencyData[word]
gives the frequency of word in typical published English text.
WordFrequencyData[{word1,word2,…}]
gives an association of frequencies of the wordi.
WordFrequencyData[word,"TimeSeries"]
gives a time series for the frequency of word in typical published English text.
WordFrequencyData[word,"TimeSeries",datespec]
gives a time series for dates specified by datespec.
WordFrequencyData[word,"prop"]
gives property prop of the word frequency.
Details and Options
- WordFrequencyData[word1|word2|…] gives the total frequencies of all the wordi.
- WordFrequencyData[word,"Total",datespec] gives the total frequency of word for the dates specified by datespec.
- By default, WordFrequencyData uses the Google Books English n-gram public dataset.
- Possible options include:
-
IgnoreCase False whether to ignore case in word Language "English" what language of source corpus to use - In WordFrequencyData[word,"prop"], possible properties include:
-
"Total" give total frequencies over a date range "TimeSeries" give a time series of frequencies "CaseVariants" give results for all variants of upper and lower case "PartsOfSpeechVariants" give results for all variants of parts of speech {prop1,prop2,…} give results for combinations of properties - Possible date specifications include:
-
All use all available dates for the specified source corpus DateObject[…] use DateObject year use specific year {yearmin,yearmax} use year range between yearmin and yearmax {{d1,d2,…}} use explicit dates {d1,d2,…}
Examples
open all close allBasic Examples (4)
Get the frequency of the word "dog" in typical English:
WordFrequencyData["dog"]Get the typical frequencies of several words:
WordFrequencyData[{"dog", "cat"}]Compute the ratio between the words "war" and "peace" in published text:
WordFrequencyData["war"] / WordFrequencyData["peace"]Plot the historical time series for the frequency of the word "economy":
DateListPlot[WordFrequencyData["economy", "TimeSeries"]]Scope (4)
Get the overall frequency of "atlas":
WordFrequencyData["atlas"]Find the frequency of multiple words at once:
WordFrequencyData[{"perro", "gato"}, Language -> "Spanish"]WordFrequencyData accepts as input TextElement with a specific "GrammaticalUnit":
WordFrequencyData[{TextElement["burned", <|"GrammaticalUnit" -> Entity["GrammaticalUnit", "Verb"]|>], TextElement["burned", <|"GrammaticalUnit" -> Entity["GrammaticalUnit", "Adjective"]|>], TextElement["burnt", <|"GrammaticalUnit" -> Entity["GrammaticalUnit", "Verb"]|>], TextElement["burnt", <|"GrammaticalUnit" -> Entity["GrammaticalUnit", "Adjective"]|>]}, "Total", 2000, IgnoreCase -> True]//Sort//ReversePlot the historical time series for the frequency of the word "computer" since 1900:
DateListPlot[WordFrequencyData["computer", "TimeSeries", {1900, Now}]]Generalizations & Extensions (1)
When Alternatives is used as an input, the result is the total frequency for any of the alternatives:
WordFrequencyData["today" | "yesterday"]Alternatives may be used in combination with other properties, such as "TimeSeries":
WordFrequencyData[{"tv" | "television", "phone" | "telephone"}, "TimeSeries", {1900, Now}, IgnoreCase -> True]%//DateListPlotOptions (6)
IgnoreCase (1)
Returns the frequency of a word, ignoring any lower- or uppercase variants. The default value is False:
WordFrequencyData["war", IgnoreCase -> True]This value is usually greater than the default:
WordFrequencyData["war"]Language (5)
Find the frequency of a common Spanish word in a Spanish-language text corpus:
WordFrequencyData["en", Language -> "Spanish"]Spanish words might appear in the other languages, but with a much lower frequency:
WordFrequencyData["perro"] / WordFrequencyData["perro", Language -> "Spanish"]A common word in French returns a high frequency value:
WordFrequencyData["oui", Language -> "French"]Popularity of the word "peace" in Spanish:
WordFrequencyData["paz", Language -> "Spanish"]The word "Sputnik" in Russian:
WordFrequencyData["Спутник", Language -> Entity["Language", "Russian"]]Get a time series of the word "Haus" in German between 1900 and now and plot the result:
WordFrequencyData["Haus", "TimeSeries", {1900, Now}, Language -> "German", IgnoreCase -> True]%//DateListPlotProperties & Relations (14)
"CaseVariants" (3)
A word can have many lower- and uppercase variants:
WordFrequencyData["apple", "CaseVariants", 2000]Getting the frequency of the word with IgnoreCase->True should be equivalent to getting the Total for the previous list:
WordFrequencyData["apple", "Total", 2000, IgnoreCase -> True]Total[%%]Get the most popular case variation of "DOS":
Sort[WordFrequencyData["DOS", "CaseVariants"]]//ReverseWhen asking for multiple words, the association will contain all variants of each word:
WordFrequencyData[{"nascar", "alice"}, "CaseVariants"]"PartOfSpeechVariants" (4)
Calculate the frequency of a word in an specific year for all part of speech variants:
WordFrequencyData["apple", "PartOfSpeechVariants", 1991]Show different usages of the word "nuke" in 1944:
WordFrequencyData["nuke", "PartOfSpeechVariants", 1944]Some words may return many part of speech variants:
WordFrequencyData["burnt", "PartOfSpeechVariants"]Combining this argument with "CaseVariants":
WordFrequencyData["apple", {"PartOfSpeechVariants", "CaseVariants"}]Combining with "CaseVariants" and "TimeSeries":
WordFrequencyData["apple", {"PartOfSpeechVariants", "CaseVariants", "TimeSeries"}]//Short"TimeSeries" (2)
Get the frequency of the word "war" throughout the twentieth century:
warXX = WordFrequencyData["war", "TimeSeries", {1901, 2000}, IgnoreCase -> True]This can be plotted directly using DateListPlot:
DateListPlot[warXX]Compare the usage of "peace" and "war" over time:
warpeaceXX = WordFrequencyData[{"war", "peace"}, "TimeSeries", {1901, 2000}, IgnoreCase -> True]DateListPlot[warpeaceXX]And compare their usage in another language too:
warpeaceRussianXX = WordFrequencyData[{"война", "мир"}, "TimeSeries", {1901, 2000}, IgnoreCase -> True, Language -> "Russian"]DateListPlot[warpeaceRussianXX]Plot the ratio of the words "war" and "peace" for both languages:
DateListPlot[<|"English" -> warpeaceXX["war"] / warpeaceXX["peace"], "Russian" -> warpeaceRussianXX["война"] / warpeaceRussianXX["мир"]|>]"Total" (5)
"Total" is the default property:
WordFrequencyData["war", "Total"] === WordFrequencyData["war"]WordFrequencyData["war", "Total", {1900, 1960}, IgnoreCase -> True]The usage of DateObject objects in the date specification is allowed:
WordFrequencyData["peace", "Total", {DateObject[{1900}], DateObject[{1960}]}]The "Total" can be computed over a specific list of years:
WordFrequencyData[{"war", "peace"}, "Total", {{1914}, {1939}, {1945}}, IgnoreCase -> True]Infinity can be used to specify an unbound range:
WordFrequencyData[{"war", "peace"}, "Total", {1900, Infinity}, IgnoreCase -> True]Possible Issues (1)
Words that are not included within the corpus will return Missing["NotAvailable"]:
WordFrequencyData[{"beratna", "sésata"}]Neat Examples (11)
Popularity of the word "dog" and its translations in different languages:
TextGrid[Table[With[{word = First@WordTranslation["dog", "English" -> lang]}, {lang, word, WordFrequencyData[word, Language -> lang]}], {lang, {"English", "Spanish", "German", "Russian", "French", "Italian"}}]]The words "gold" versus "oil" over time:
DateListPlot[WordFrequencyData[{"gold", "oil"}, "TimeSeries", {1920, 2000}, IgnoreCase -> True]]Frequency of terms for telephone and television over time:
DateListPlot[WordFrequencyData[{"television", "tv", "phone", "telephone"}, "TimeSeries", {1900, 2000}, IgnoreCase -> True]]tvphone = WordFrequencyData[{"television", "tv", "phone", "telephone"}, "TimeSeries", {1900, 2000}, IgnoreCase -> True];DateListPlot[<|"tv" -> (tvphone["television"] + tvphone["tv"]), "phone" -> (tvphone["phone"] + tvphone["telephone"])|>]DateListPlot[WordFrequencyData[{"HIV", "smallpox", "tuberculosis", "cholera"}, "TimeSeries", {DateObject[{1850}], DateObject[{2010}]}, IgnoreCase -> True], PlotRange -> All]Sorting day names by popularity:
DateListPlot[SortBy[WordFrequencyData[{"Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"}, "TimeSeries", {DateObject[{1800}], DateObject[{2000}]}, IgnoreCase -> True], Total]//Reverse, PlotRange -> All]Some words have lost their old orthography:
DateListPlot[WordFrequencyData[{"encyclopaedia", "encyclopædia"}, "TimeSeries", {1750, 1830}, IgnoreCase -> True ]]DateListPlot[Divide@@WordFrequencyData[{"encyclopaedia", "encyclopædia"}, "TimeSeries", {1750, 1830}, IgnoreCase -> True ], PlotRange -> All]The word "democracy" gets more frequent usage in the twentieth century:
DateListPlot[WordFrequencyData[{"democracy", "monarchy"}, "TimeSeries", {1800, 2000}, IgnoreCase -> True], PlotRange -> All]"Apple" with initial uppercase A became popular after 1980:
DateListPlot[SortBy[Select[WordFrequencyData["apple", {"TimeSeries", "CaseVariants"}, {1950, 2000}], Mean[#] > 10 ^ -7&], Mean]//Reverse, PlotRange -> All]The relative frequency of part of speech variants may change over time. "Tackle" as a verb and as a noun is a good example:
DateListPlot[KeyMap[Rasterize, SortBy[Select[WordFrequencyData["tackle", {"POSVariants", "TimeSeries"}, {1900, 2000}, IgnoreCase -> True], Mean[#] > 10 ^ -6&], Mean]//Reverse], PlotRange -> All]DateListPlot[SortBy[Select[WordFrequencyData["tackle", {"POSVariants", "TimeSeries"}, {1900, 2000}, IgnoreCase -> True], Mean[#] > 10 ^ -6&], Mean]//Reverse, PlotRange -> All]Regularization of irregular verbs may explain the changes in the part of speech and orthography of some words, such as "burnt" versus "burned":
DateListPlot[KeyMap[Rasterize, WordFrequencyData[{TextElement["burned", <|"GrammaticalUnit" -> Entity["GrammaticalUnit", "Verb"]|>], TextElement["burned", <|"GrammaticalUnit" -> Entity["GrammaticalUnit", "Adjective"]|>], TextElement["burnt", <|"GrammaticalUnit" -> Entity["GrammaticalUnit", "Verb"]|>], TextElement["burnt", <|"GrammaticalUnit" -> Entity["GrammaticalUnit", "Adjective"]|>], "burned", "burnt"}, "TimeSeries", {1800, 2000}]], PlotRange -> All, ImageSize -> Medium]Evolution of "ustedes" versus "vosotros" in Spanish:
DateListPlot[WordFrequencyData[{"vosotros", "ustedes"}, {"TimeSeries"}, {1850, Now}, Language -> "Spanish", IgnoreCase -> True ], PlotRange -> All]See Also
WordFrequency WordCloud PartOfSpeech DictionaryWordQ WordData WordDefinition LanguageData
Function Repository: LetterFrequencyData
Related Guides
Related Workflows
- Analyze the Text on a Webpage
History
Text
Wolfram Research (2016), WordFrequencyData, Wolfram Language function, https://reference.wolfram.com/language/ref/WordFrequencyData.html.
CMS
Wolfram Language. 2016. "WordFrequencyData." Wolfram Language & System Documentation Center. Wolfram Research. https://reference.wolfram.com/language/ref/WordFrequencyData.html.
APA
Wolfram Language. (2016). WordFrequencyData. Wolfram Language & System Documentation Center. Retrieved from https://reference.wolfram.com/language/ref/WordFrequencyData.html
BibTeX
@misc{reference.wolfram_2026_wordfrequencydata, author="Wolfram Research", title="{WordFrequencyData}", year="2016", howpublished="\url{https://reference.wolfram.com/language/ref/WordFrequencyData.html}", note=[Accessed: 12-June-2026]}
BibLaTeX
@online{reference.wolfram_2026_wordfrequencydata, organization={Wolfram Research}, title={WordFrequencyData}, year={2016}, url={https://reference.wolfram.com/language/ref/WordFrequencyData.html}, note=[Accessed: 12-June-2026]}