Wolfram Language & System Documentation Center

SpeakerMatchQ

SpeakerMatchQ[audio,ref]

gives True if speaker features in audio match the one from reference ref and returns False otherwise.

SpeakerMatchQ[{audio₁,audio₂,…},ref]

gives a list of results for each of audio_i.

SpeakerMatchQ[ref]

represents an operator form of SpeakerMatchQ that can be applied to an audio object.

Details and Options

SpeakerMatchQ computes speaker features for audio and reference ref and returns True if the distance between speaker features is acceptable.
The reference ref could be any of the following:
ref a single-reference Audio object

ref₁|ref₂|… several possible references, tried in order
The following options can be given:

AcceptanceThreshold	0.5	minimum probability to consider acceptable
Masking	All	interval of interest
RecognitionPrior	0.5	prior probability for a True result
TargetDevice	"CPU"	the target device on which to compute

Use the Masking option to specify the interval of interest in any of the audio_i. Possible settings include:
All uses the whole audio

{t₁,t₂} uses the interval t₁ to t₂

{{t₁₁,t₁₂},{t₂₁,t₂₂},…} uses the interval t_i1 to t_i2 from audio_i
SpeakerMatchQ uses machine learning. Its methods, training sets and biases included therein may change and yield varied results in different versions of the Wolfram Language.
SpeakerMatchQ may download resources that will be stored in your local object store at $LocalBase, and that can be listed using LocalObjects[] and removed using ResourceRemove.

Examples

open all close all

Basic Examples (2)

Check whether two recordings belong to the same speaker:

Wolfram Language code: SpeakerMatchQ[ExampleData[{"Audio", "FemaleVoice"}], ExampleData[{"Audio", "MaleVoice"}]]

Compare the speaker in a recording and a time-stretched version of it:

Wolfram Language code: SpeakerMatchQ[AudioTimeStretch[ExampleData[{"Audio", "FemaleVoice"}], 1.5], ExampleData[{"Audio", "FemaleVoice"}]]

Scope (3)

Test whether the speaker in a recording matches any of several references:

Wolfram Language code:

SpeakerMatchQ[\!\(\*AudioBox["![Embedded Audio Player](audio://content-pf50d)"]\), ExampleData[{"Audio", "MaleVoice"}] | ExampleData[{"Audio", "FemaleVoice"}]]

Test whether any of the speakers from a list of recordings matches a reference:

Wolfram Language code: list = {ExampleData[{"Audio", "MaleVoice"}], ExampleData[{"Audio", "FemaleVoice"}]};

Wolfram Language code: SpeakerMatchQ[list, ExampleData[{"Audio", "MaleVoice"}]]

Use SpeakerMatchQ in operator form:

Wolfram Language code: list = {ExampleData[{"Audio", "MaleVoice"}], ExampleData[{"Audio", "FemaleVoice"}]};

Wolfram Language code: GroupBy[list, SpeakerMatchQ[ExampleData[{"Audio", "MaleVoice"}]]]

Options (4)

AcceptanceThreshold (1)

By default, 0.5 is used as the acceptance threshold:

Wolfram Language code: a = ExampleData[{"Audio", "FemaleVoice"}];

Wolfram Language code: SpeakerMatchQ[a, AudioPitchShift[a, .2]]

Specify the minimum probability to consider acceptable:

Wolfram Language code: SpeakerMatchQ[a, AudioPitchShift[a, .2], AcceptanceThreshold -> .1]

Masking (2)

By default, the whole audio recording is compared, which may fail if it contains multiple speakers:

Wolfram Language code: a = AudioJoin[{ExampleData[{"Audio", "FemaleVoice"}], ExampleData[{"Audio", "MaleVoice"}]}];

Wolfram Language code: SpeakerMatchQ[a, ExampleData[{"Audio", "MaleVoice"}]]

Specify an interval of interest within the recording to compare against the reference:

Wolfram Language code: SpeakerMatchQ[a, ExampleData[{"Audio", "MaleVoice"}], Masking -> {Quantity[4.3, "Seconds"], Quantity[6.7, "Seconds"]}]

Apply separate masking to each input audio in a list of recordings:

Wolfram Language code:

a = ExampleData[{"Audio", "FemaleVoice"}];
b = ExampleData[{"Audio", "MaleVoice"}];
list = AudioJoin /@ {{a, b}, {b, a}};

Wolfram Language code: SpeakerMatchQ[list, ExampleData[{"Audio", "MaleVoice"}]]

Wolfram Language code:

SpeakerMatchQ[list, ExampleData[{"Audio", "MaleVoice"}], Masking -> {{Quantity[0, "Seconds"], Quantity[2.4, "Seconds"]}, {Quantity[4.3, "Seconds"], Quantity[6.7, "Seconds"]}}]

RecognitionPrior (1)

Specify the prior probability that the speaker in a recording matches a reference:

Wolfram Language code: SpeakerMatchQ[ExampleData[{"Audio", "FemaleVoice"}], ExampleData[{"Audio", "MaleVoice"}], RecognitionPrior -> .5]

Use a higher prior probability:

Wolfram Language code:

SpeakerMatchQ[AudioAmplify[ExampleData[{"Audio", "MaleVoice"}], .9], ExampleData[{"Audio", "MaleVoice"}], RecognitionPrior -> .8]

Applications (3)

Compare the speaker in a recording and a time-stretched version of it:

Wolfram Language code: a = ExampleData[{"Audio", "FemaleVoice"}];

Wolfram Language code: list = Table[AudioTimeStretch[a, s], {s, 1, 5, .5}];

Wolfram Language code: SpeakerMatchQ[list, a]

Compare the speaker in a recording and a pitch-shifted version of it:

Wolfram Language code: a = ExampleData[{"Audio", "MaleVoice"}];

Wolfram Language code:

list = Table[AudioPitchShift[a, s, Method -> "Speech"], {s, 1, 2, .2}];
SpeakerMatchQ[list, a]

In the Spoken Digit Command dataset, construct a speaker-match matrix for a subset of recordings:

Wolfram Language code:

testdata = ResourceData["Spoken Digit Commands", "TestData"];
Length[testdata]

Wolfram Language code: RandomSample[testdata, 3]//Dataset

Select 10 random speakers for which the dataset has between 2 and 5 samples:

Wolfram Language code: speakers = Keys@RandomSample[Select[Counts[testdata[[All, "SpeakerID"]]], 2 ≤ # ≤ 5&], 10];

Extract all recordings corresponding to these speakers and sort them by speaker ID:

Wolfram Language code:

testsubset = RandomSample[Select[testdata, MemberQ[speakers, #SpeakerID]&]];
testsubset = SortBy[testsubset, #SpeakerID&];

Compute and plot the matrix of matching speakers:

Wolfram Language code: DistanceMatrix[testsubset[[All, "Input"]], DistanceFunction -> (Boole[SpeakerMatchQ[##]]&)]//MatrixPlot

Properties & Relations (1)

SpeakerMatchQ computes speaker features on its input recordings and compares these embeddings.

From the Spoken Digit Command dataset, extract recordings from speakers who only have between 2 and 5 recordings:

Wolfram Language code:

testdata = ResourceData["Spoken Digit Commands", "TestData"];
speakers = Keys@RandomSample[Select[Counts[testdata[[All, "SpeakerID"]]], 2 ≤ # ≤ 5&], 10];
testsubset = RandomSample[Select[testdata, MemberQ[speakers, #SpeakerID]&]];
testsubset = SortBy[testsubset, #SpeakerID&];

Compute speaker features on each recording:

Wolfram Language code: features = FeatureExtract[testsubset[[All, "Input"]], "SpeakerFeatureVector"];

Visualize a sample of a computed features:

Wolfram Language code: RandomChoice[features]//ListPlot[#, Filling -> Axis]&

Compare the speaker features and plot a distance matrix on them:

Wolfram Language code:

distances = DistanceMatrix[features, DistanceFunction -> CosineDistance];
MatrixPlot[distances]

Compute a binary distance matrix showing whether the speaker features match:

Wolfram Language code: DistanceMatrix[features, DistanceFunction -> (Boole[CosineDistance[##] ≤ .4]&)]//MatrixPlot

Compare with the result of SpeakerMatchQ; the difference is because no voice is detected in some of the recordings:

Wolfram Language code: Quiet[DistanceMatrix[testsubset[[All, "Input"]], DistanceFunction -> (Boole[SpeakerMatchQ[##]]&)]]//MatrixPlot

Possible Issues (1)

SpeakerMatchQ finds voiced intervals first and fails if no voice is detected in any one of the inputs:

Wolfram Language code:

list = {ExampleData[{"Audio", "FemaleVoice"}, "Audio"], ExampleData[{"Audio", "IRStairway"}, "Audio"], ExampleData[{"Audio", "NoisyTalk"}, "Audio"]};

Wolfram Language code: SpeakerMatchQ[list, ExampleData[{"Audio", "MaleVoice"}]]

Top

More Learning

Tech Support

Wolfram Solutions

Wolfram Solutions For Education

Get Started

Grow Your Skills

Work with Us

Educational Programs for Adults

Educational Programs for Youth

Read

SpeakerMatchQ

Details and Options

Examples

Basic Examples (2)

Scope (3)

Options (4)

AcceptanceThreshold (1)

Masking (2)

RecognitionPrior (1)

Applications (3)

Properties & Relations (1)

Possible Issues (1)

Text

CMS

APA

BibTeX

BibLaTeX

	ref	a single-reference Audio object
	ref₁\|ref₂\|…	several possible references, tried in order

	All	uses the whole audio
	{t₁,t₂}	uses the interval t₁ to t₂
	{{t₁₁,t₁₂},{t₂₁,t₂₂},…}	uses the interval t_i1 to t_i2 from audio_i

SpeakerMatchQ

Details and Options

Examples

Basic Examples (2)

Scope (3)

Options (4)

AcceptanceThreshold (1)

Masking (2)

RecognitionPrior (1)

Applications (3)

Properties & Relations (1)

Possible Issues (1)

See Also

Related Guides

History

Text

CMS

APA

BibTeX

BibLaTeX