SpeakerMatchQ[audio,ref]
gives True if speaker features in audio match the one from reference ref and returns False otherwise.
SpeakerMatchQ[{audio1,audio2,…},ref]
gives a list of results for each of audioi.
SpeakerMatchQ[ref]
represents an operator form of SpeakerMatchQ that can be applied to an audio object.
SpeakerMatchQ
SpeakerMatchQ[audio,ref]
gives True if speaker features in audio match the one from reference ref and returns False otherwise.
SpeakerMatchQ[{audio1,audio2,…},ref]
gives a list of results for each of audioi.
SpeakerMatchQ[ref]
represents an operator form of SpeakerMatchQ that can be applied to an audio object.
Details and Options
- SpeakerMatchQ computes speaker features for audio and reference ref and returns True if the distance between speaker features is acceptable.
- The reference ref could be any of the following:
-
ref a single-reference Audio object ref1|ref2|… several possible references, tried in order - The following options can be given:
-
AcceptanceThreshold 0.5 minimum probability to consider acceptable Masking All interval of interest RecognitionPrior 0.5 prior probability for a True result TargetDevice "CPU" the target device on which to compute - Use the Masking option to specify the interval of interest in any of the audioi. Possible settings include:
-
All uses the whole audio {t1,t2} uses the interval t1 to t2 {{t11,t12},{t21,t22},…} uses the interval ti1 to ti2 from audioi - SpeakerMatchQ uses machine learning. Its methods, training sets and biases included therein may change and yield varied results in different versions of the Wolfram Language.
- SpeakerMatchQ may download resources that will be stored in your local object store at $LocalBase, and that can be listed using LocalObjects[] and removed using ResourceRemove.
Examples
open all close allBasic Examples (2)
Check whether two recordings belong to the same speaker:
SpeakerMatchQ[ExampleData[{"Audio", "FemaleVoice"}], ExampleData[{"Audio", "MaleVoice"}]]Compare the speaker in a recording and a time-stretched version of it:
SpeakerMatchQ[AudioTimeStretch[ExampleData[{"Audio", "FemaleVoice"}], 1.5], ExampleData[{"Audio", "FemaleVoice"}]]Scope (3)
Test whether the speaker in a recording matches any of several references:
SpeakerMatchQ[\!\(\*AudioBox[""]\), ExampleData[{"Audio", "MaleVoice"}] | ExampleData[{"Audio", "FemaleVoice"}]]Test whether any of the speakers from a list of recordings matches a reference:
list = {ExampleData[{"Audio", "MaleVoice"}], ExampleData[{"Audio", "FemaleVoice"}]};SpeakerMatchQ[list, ExampleData[{"Audio", "MaleVoice"}]]Use SpeakerMatchQ in operator form:
list = {ExampleData[{"Audio", "MaleVoice"}], ExampleData[{"Audio", "FemaleVoice"}]};GroupBy[list, SpeakerMatchQ[ExampleData[{"Audio", "MaleVoice"}]]]Options (4)
AcceptanceThreshold (1)
By default, 0.5 is used as the acceptance threshold:
a = ExampleData[{"Audio", "FemaleVoice"}];SpeakerMatchQ[a, AudioPitchShift[a, .2]]Specify the minimum probability to consider acceptable:
SpeakerMatchQ[a, AudioPitchShift[a, .2], AcceptanceThreshold -> .1]Masking (2)
By default, the whole audio recording is compared, which may fail if it contains multiple speakers:
a = AudioJoin[{ExampleData[{"Audio", "FemaleVoice"}], ExampleData[{"Audio", "MaleVoice"}]}];SpeakerMatchQ[a, ExampleData[{"Audio", "MaleVoice"}]]Specify an interval of interest within the recording to compare against the reference:
SpeakerMatchQ[a, ExampleData[{"Audio", "MaleVoice"}], Masking -> {Quantity[4.3, "Seconds"], Quantity[6.7, "Seconds"]}]Apply separate masking to each input audio in a list of recordings:
a = ExampleData[{"Audio", "FemaleVoice"}];
b = ExampleData[{"Audio", "MaleVoice"}];
list = AudioJoin /@ {{a, b}, {b, a}};SpeakerMatchQ[list, ExampleData[{"Audio", "MaleVoice"}]]SpeakerMatchQ[list, ExampleData[{"Audio", "MaleVoice"}], Masking -> {{Quantity[0, "Seconds"], Quantity[2.4, "Seconds"]}, {Quantity[4.3, "Seconds"], Quantity[6.7, "Seconds"]}}]RecognitionPrior (1)
Specify the prior probability that the speaker in a recording matches a reference:
SpeakerMatchQ[ExampleData[{"Audio", "FemaleVoice"}], ExampleData[{"Audio", "MaleVoice"}], RecognitionPrior -> .5]Use a higher prior probability:
SpeakerMatchQ[AudioAmplify[ExampleData[{"Audio", "MaleVoice"}], .9], ExampleData[{"Audio", "MaleVoice"}], RecognitionPrior -> .8]Applications (3)
Compare the speaker in a recording and a time-stretched version of it:
a = ExampleData[{"Audio", "FemaleVoice"}];list = Table[AudioTimeStretch[a, s], {s, 1, 5, .5}];SpeakerMatchQ[list, a]Compare the speaker in a recording and a pitch-shifted version of it:
a = ExampleData[{"Audio", "MaleVoice"}];list = Table[AudioPitchShift[a, s, Method -> "Speech"], {s, 1, 2, .2}];
SpeakerMatchQ[list, a]In the Spoken Digit Command dataset, construct a speaker-match matrix for a subset of recordings:
testdata = ResourceData["Spoken Digit Commands", "TestData"];
Length[testdata]RandomSample[testdata, 3]//DatasetSelect 10 random speakers for which the dataset has between 2 and 5 samples:
speakers = Keys@RandomSample[Select[Counts[testdata[[All, "SpeakerID"]]], 2 ≤ # ≤ 5&], 10];Extract all recordings corresponding to these speakers and sort them by speaker ID:
testsubset = RandomSample[Select[testdata, MemberQ[speakers, #SpeakerID]&]];
testsubset = SortBy[testsubset, #SpeakerID&];Compute and plot the matrix of matching speakers:
DistanceMatrix[testsubset[[All, "Input"]], DistanceFunction -> (Boole[SpeakerMatchQ[##]]&)]//MatrixPlotProperties & Relations (1)
SpeakerMatchQ computes speaker features on its input recordings and compares these embeddings.
From the Spoken Digit Command dataset, extract recordings from speakers who only have between 2 and 5 recordings:
testdata = ResourceData["Spoken Digit Commands", "TestData"];
speakers = Keys@RandomSample[Select[Counts[testdata[[All, "SpeakerID"]]], 2 ≤ # ≤ 5&], 10];
testsubset = RandomSample[Select[testdata, MemberQ[speakers, #SpeakerID]&]];
testsubset = SortBy[testsubset, #SpeakerID&];Compute speaker features on each recording:
features = FeatureExtract[testsubset[[All, "Input"]], "SpeakerFeatureVector"];Visualize a sample of a computed features:
RandomChoice[features]//ListPlot[#, Filling -> Axis]&Compare the speaker features and plot a distance matrix on them:
distances = DistanceMatrix[features, DistanceFunction -> CosineDistance];
MatrixPlot[distances]Compute a binary distance matrix showing whether the speaker features match:
DistanceMatrix[features, DistanceFunction -> (Boole[CosineDistance[##] ≤ .4]&)]//MatrixPlotCompare with the result of SpeakerMatchQ; the difference is because no voice is detected in some of the recordings:
Quiet[DistanceMatrix[testsubset[[All, "Input"]], DistanceFunction -> (Boole[SpeakerMatchQ[##]]&)]]//MatrixPlotPossible Issues (1)
SpeakerMatchQ finds voiced intervals first and fails if no voice is detected in any one of the inputs:
list = {ExampleData[{"Audio", "FemaleVoice"}, "Audio"], ExampleData[{"Audio", "IRStairway"}, "Audio"], ExampleData[{"Audio", "NoisyTalk"}, "Audio"]};SpeakerMatchQ[list, ExampleData[{"Audio", "MaleVoice"}]]Related Guides
History
Text
Wolfram Research (2020), SpeakerMatchQ, Wolfram Language function, https://reference.wolfram.com/language/ref/SpeakerMatchQ.html.
CMS
Wolfram Language. 2020. "SpeakerMatchQ." Wolfram Language & System Documentation Center. Wolfram Research. https://reference.wolfram.com/language/ref/SpeakerMatchQ.html.
APA
Wolfram Language. (2020). SpeakerMatchQ. Wolfram Language & System Documentation Center. Retrieved from https://reference.wolfram.com/language/ref/SpeakerMatchQ.html
BibTeX
@misc{reference.wolfram_2026_speakermatchq, author="Wolfram Research", title="{SpeakerMatchQ}", year="2020", howpublished="\url{https://reference.wolfram.com/language/ref/SpeakerMatchQ.html}", note=[Accessed: 13-June-2026]}
BibLaTeX
@online{reference.wolfram_2026_speakermatchq, organization={Wolfram Research}, title={SpeakerMatchQ}, year={2020}, url={https://reference.wolfram.com/language/ref/SpeakerMatchQ.html}, note=[Accessed: 13-June-2026]}