Wolfram Language & System Documentation Center

FeatureExtract

extracts features for each example using a feature extractor trained on the examples given.

extracts features using the specified feature extractor method spec.

Details and Options

FeatureExtract is typically used to process raw data into usable features (e.g. for training a machine learning algorithm)
FeatureExtract can be used on many types of data, including numerical, textual, sounds, images, graphs, time series and combinations of these.
Possible values of examples are:

	example	with spec defined, a single unitary example without training
	{example₁,…}	a list of training examples
	Dataset[…]	a Dataset object
	Tabular[…]	a Tabular object

Each example_i can be a single data element, a list of data elements, an association of data elements, or a Dataset object.
Possible values for spec include:

	extractor	use the specified extractor method
	partextractor	apply the extractor to the specific example part
	{part₁extractor₁,…}	specify extractors for specific parts

In FeatureExtract[examples,{part₁extractor₁,…}], the extractor_i are all applied separately to examples.
Possible feature extractor methods include:

	Automatic	automatic extraction
	Identity	give data unchanged
	"ConformedData"	conformed images, colors, dates, etc.
	"NumericVector"	numeric vector from any data
	"name"	a named extractor method
	f	applies function f to each example
	{extractor₁,extractor₂,…}	use a sequence of extractors in turn

Possible forms of part are:

	All	all parts of each example
	i	i part of each example
	{i₁,i₂,…}	parts i₁, i₂, … of each example
	"key"	part with the specified key in each example
	{"key₁","key₂",…}	parts with names "key_i" in each example

When explicitly specifying parts, any unmentioned parts are dropped when extracting features.

Extractors

FeatureExtract[examples] is equivalent to FeatureExtract[examples, Automatic] which is typically equivalent to FeatureExtract[examples, "NumericVector"].
The "NumericVector" method will typically convert examples to numeric vectors, impute missing data, and reduce the dimension using DimensionReduction.
Feature extractor methods specific to a single data type are applied only to data elements with whose types they are compatible. Other data elements are returned unchanged.
Not all specific feature extractors are available when only one example is provided - for example vector processors will assume a list of is a list of scalar features.
The specific extractors are:
Numeric data:

	"DiscretizedVector"	discretized numerical data
	"DimensionReducedVector"	reduced-dimension numeric vectors
	"MissingImputed"	data with missing values imputed
	"StandardizedVector"	numeric data processed with Standardize

Nominal data:
"IndicatorVector" nominal data "one-hot encoded" with indicator vectors

"IntegerVector" nominal data encoded with integers
Text:

	"LowerCasedText"	text with each character lowercase
	"SegmentedCharacters"	text segmented into characters
	"SegmentedWords"	text segmented into words
	"SentenceVector"	semantic vector from a text
	"TFIDF"	term frequency-inverse document frequency vector
	"WordVectors"	semantic vectors sequence from a text (English only)

Images:

	"FaceFeatures"	semantic vector from an image of a human face
	"ImageFeatures"	semantic vector from an image
	"PixelVector"	vector of pixel values from an image

Audio objects:

	"AudioFeatures"	sequence of semantic vectors from an audio object
	"AudioFeatureVector"	semantic vector from an audio object
	"LPC"	audio linear prediction coefficients
	"MelSpectrogram"	audio spectrogram with logarithmic frequencies bins
	"MFCC"	audio mel-frequency cepstral coefficients vectors sequence
	"SpeakerFeatures"	sequence of semantic speaker vectors
	"SpeakerFeatureVector"	semantic vector for a speaker
	"Spectrogram"	audio spectrogram

Video objects:
"VideoFeatures" sequence of semantic vectors from a video object

"VideoFeatureVector" semantic vector from a video object
Graphs:
"GraphFeatures" numeric vector summarizing graph properties
Molecules:

	"AtomPairs"	Boolean vector from pairs of atoms and the path lengths between them
	"MoleculeExtendedConnectivity"	Boolean vector from enumerated molecule subgraphs
	"MoleculeFeatures"	numeric vector summarizing molecule properties
	"MoleculeTopologicalFeatures"	Boolean vector from circular atom neighborhoods

Options

The following options can be given:

FeatureNames	Automatic	names to assign to elements of the example_i
FeatureTypes	Automatic	feature types to assume for elements of the example_i
RandomSeeding	1234	what seeding of pseudorandom generators should be done internally

Possible settings for RandomSeeding include:

	Automatic	automatically reseed every time the function is called
	Inherited	use externally seeded random numbers
	seed	use an explicit integer or strings as a seed

FeatureExtract[…] is equivalent to FeatureExtraction[…,"ExtractedFeatures"].

Examples

open all close all

Basic Examples (4)

Extract features from a simple dataset:

Wolfram Language code: FeatureExtract[{{1.4, "A"}, {1.5, "A"}, {2.3, "B"}, {5.4, "B"}}]

Extract feature from images:

Wolfram Language code: FeatureExtract[{[image], [image], [image], [image], [image]}]

Standardized numerical values using the "StandardizedVector" extractor method:

Wolfram Language code: FeatureExtract[{{1.4, 30.1}, {1.5, 46.3}, {2.3, 27.4}, {5.4, 51.2}}, "StandardizedVector"]

Extract TFIDF vectors on characters by chaining the extractor methods "SegmentedCharacters" and "TFIDF":

Wolfram Language code:

FeatureExtract[{"the cat is grey", "my cat is fast", "this dog is scary", "the big dog"}, {"SegmentedCharacters", "TFIDF"}] // MatrixForm

Scope (26)

Input Shape (9)

Extract a list of features:

Wolfram Language code: FeatureExtract[{"A rose by any other name", "It was the worst of times"}]

Extract from a list of examples with multiple features:

Wolfram Language code:

FeatureExtract[{{"It was the best of times.", "Charles Dickens"}, {"A journey of a thousand miles begins with a single step.", "Laozi (attrib.)"}, {"To be or not to be, that is the question.", "William Shakespere"}}]

Extract features on a mixed-type dataset:

Wolfram Language code:

FeatureExtract[{{"the cat is grey", [image]}, {"my cat is fast", [image]}, {"this dog is scary", [image]}, {"the big dog", [image]}}]

Extract features from a dataset formatted as a list of associations:

Wolfram Language code:

FeatureExtract[{<|"age" -> 32, "height" -> 160, "gender" -> "female"|>, 
	<|"height" -> 183, "age" -> 41, "gender" -> "female"|>, 
	<|"height" -> 123, "age" -> 30, "gender" -> "female"|>, 
	<|"height" -> 175, "age" -> 21, "gender" -> "male"|>, 
	<|"height" -> 150, "age" -> 11, "gender" -> "male"|>, 
	<|"age" -> 52, "height" -> 164, "gender" -> "female"|>}]

Extract features from data given as feature lists:

Wolfram Language code:

FeatureExtract[<|"age" -> {32, 41, 30, 21, 11, 52}, "height" -> {160, 183, 123, 175, 150, 164}, "gender" -> {"female", "female", "female", "male", "male", "female"}|>]

Feature extract from a Tabular:

Wolfram Language code:

FeatureExtract[Tabular[Association["RawSchema" -> Association["ColumnProperties" -> 
     Association["age" -> Association["ElementType" -> "Integer64"], 
      "height" -> Association["ElementType" -> "Integer64"], 
      "gender" -> Association["ElementType" -> "String"]], "KeyColumns" -> None, 
    "Backend" -> "WolframKernel"], "Options" -> {}, 
  "BackendData" -> Association["ColumnData" -> DataStructure["ColumnTable", 
      {{TabularColumn[Association["Data" -> {{32, 41, 30, 21, 11, 52}, {}, None}, 
          "ElementType" -> "Integer64"]], TabularColumn[Association[
          "Data" -> {{160, 183, 123, 175, 150, 164}, {}, None}, "ElementType" -> "Integer64"]], 
        TabularColumn[Association["Data" -> {{3, {0, 6, 12, 18, 22, 26, 32}, 
             "femalefemalefemalemalemalefemale"}, {}, None}, "ElementType" -> "String"]]}}]]]]]

Feature extract from a Dataset:

Wolfram Language code:

FeatureExtract[Dataset[{Association["age" -> 32, "height" -> 160, "gender" -> "female"], 
  Association["age" -> 41, "height" -> 183, "gender" -> "female"], 
  Association["age" -> 30, "height" -> 123, "gender" -> "female"], 
  Association["age" -> 21, "height" -> 175, "gender" -> "male"], 
  Association["age" -> 11, "height" -> 150, "gender" -> "male"], 
  Association["age" -> 52, "height" -> 164, "gender" -> "female"]}]]

Extract features from a dataset that contains missing values:

Wolfram Language code: FeatureExtract[{{1.4, Missing[], "A"}, {1.5, 50.2, "A"}, {Missing[], 42.3, "B"}, {5.4, 61.7, "B"}}]

Extract features from a single example using an extractor that requires no training:

Wolfram Language code: FeatureExtract["Some text.", "WordVectors"]//Shallow

Extractor Specifications (8)

Specify the feature extractor "SentenceVector" on a single textual feature:

Wolfram Language code: FeatureExtract["the cat is cute", "SentenceVector"]//Short

Extract features using the "SentenceVector" method followed by dimension reduction "DimensionReducedVector":

Wolfram Language code:

FeatureExtract[{"the cat is grey", "my cat is fast", "this dog is scary", "the big dog"}, {"SentenceVector", "DimensionReducedVector"}]

Extract features on texts and images using the text-only "TFIDF" method:

Wolfram Language code:

FeatureExtract[{{"the cat is grey", [image]}, {"my cat is fast", [image]}, {"this dog is scary", [image]}, {"the big dog", [image]}}, "TFIDF"]

Features have only been extracted from the text part, since "TFIDF" does not apply to images.

Specify feature extraction on multiple features by position:

Wolfram Language code:

FeatureExtract[{{"Glucose", Molecule["Glucose"]}, {"Water", Molecule["Water"]}, {"Acetic Acid", Molecule["Acetic Acid"]}}, {1  -> { "SentenceVector", "DimensionReducedVector"}, 2  -> {"MoleculeFeatures", "DimensionReducedVector"}}]

Extract features with the "IndicatorVector" method on the second nominal variable only:

Wolfram Language code:

FeatureExtract[{{"Yes", "A"}, {"No", "A"}, {"No", "B"}, {"Maybe", "B"}, {"No", "C"}}, 2 -> "IndicatorVector"] // MatrixForm

Use the Identity extractor method to copy the first variable as well:

Wolfram Language code:

FeatureExtract[{{"Yes", "A"}, {"No", "A"}, {"No", "B"}, {"Maybe", "B"}, {"No", "C"}}, {2 -> "IndicatorVector", 1 -> Identity}]

A variable can be copied multiple times:

Wolfram Language code:

FeatureExtract[{{"Yes", "A"}, {"No", "A"}, {"No", "B"}, {"Maybe", "B"}, {"No", "C"}}, {2 -> "IndicatorVector", 1 -> Identity, 1 -> Identity}]

Specify the feature extraction on multiple features by key:

Wolfram Language code:

FeatureExtract[Tabular[Association["RawSchema" -> Association["ColumnProperties" -> 
     Association["Name" -> Association["ElementType" -> "String"], 
      "Molecule" -> Association["ElementType" -> "InertExpression"]], "KeyColumns" -> None, 
    "Backend" -> "WolframKernel"], "Options" -> {}, 
  "BackendData" -> Association["ColumnData" -> DataStructure["ColumnTable", 
      {{TabularColumn[Association["Data" -> {{3, {0, 7, 12, 23}, "GlucoseWaterAcetic Acid"}, {}, 
            None}, "ElementType" -> "String"]], TabularColumn[
         Association["Data" -> {{Molecule[{"O", "C", "C", "O", "C", "O", "C", "O", "C", "O", "C", 
               "O", "H", "H", "H", "H", "H", "H", "H", "H", "H", "H", "H", "H"}, 
              {Bond[{1, 2}, "Double"], Bond[{2, 3}, "Single"], Bond[{3, 4}, "Single"], Bond[{3, 5}, 
                "Single"], Bond[{5, 6}, "Single"], Bond[{5, 7}, "Single"], Bond[{7, 8}, "Single"], 
               Bond[{7, 9}, "Single"], Bond[{9, 10}, "Single"], Bond[{9, 11}, "Single"], Bond[
                {11, 12}, "Single"], Bond[{2, 13}, "Single"], Bond[{3, 14}, "Single"], Bond[
                {4, 15}, "Single"], Bond[{5, 16}, "Single"], Bond[{6, 17}, "Single"], Bond[{7, 18}, 
                "Single"], Bond[{8, 19}, "Single"], Bond[{9, 20}, "Single"], Bond[{10, 21}, 
                "Single"], Bond[{11, 22}, "Single"], Bond[{11, 23}, "Single"], Bond[{12, 24}, 
                "Single"]}, {StereochemistryElements -> {Association["StereoType" -> "Tetrahedral", 
                  "ChiralCenter" -> 3, "Direction" -> "Counterclockwise"], Association[
                  "StereoType" -> "Tetrahedral", "ChiralCenter" -> 5, "Direction" -> "Clockwise"], 
                 Association["StereoType" -> "Tetrahedral", "ChiralCenter" -> 7, "Direction" -> 
                   "Counterclockwise"], Association["StereoType" -> "Tetrahedral", 
                  "ChiralCenter" -> 9, "Direction" -> "Counterclockwise"]}}], 
             Molecule[{"O", "H", "H"}, {Bond[{1, 2}, "Single"], Bond[{1, 3}, "Single"]}, {}], 
             Molecule[{"C", "C", "O", "O", "H", "H", "H", "H"}, {Bond[{1, 2}, "Single"], Bond[
                {2, 3}, "Double"], Bond[{2, 4}, "Single"], Bond[{1, 5}, "Single"], Bond[{1, 6}, 
                "Single"], Bond[{1, 7}, "Single"], Bond[{4, 8}, "Single"]}, {}]}, {}, None}, 
          "ElementType" -> "InertExpression", "CachedOriginalExpression" -> 
           {Molecule[{"O", "C", "C", "O", "C", "O", "C", "O", "C", "O", "C", "O", "H", "H", "H", 
              "H", "H", "H", "H", "H", "H", "H", "H", "H"}, {Bond[{1, 2}, "Double"], 
              Bond[{2, 3}, "Single"], Bond[{3, 4}, "Single"], Bond[{3, 5}, "Single"], 
              Bond[{5, 6}, "Single"], Bond[{5, 7}, "Single"], Bond[{7, 8}, "Single"], 
              Bond[{7, 9}, "Single"], Bond[{9, 10}, "Single"], Bond[{9, 11}, "Single"], 
              Bond[{11, 12}, "Single"], Bond[{2, 13}, "Single"], Bond[{3, 14}, "Single"], 
              Bond[{4, 15}, "Single"], Bond[{5, 16}, "Single"], Bond[{6, 17}, "Single"], 
              Bond[{7, 18}, "Single"], Bond[{8, 19}, "Single"], Bond[{9, 20}, "Single"], 
              Bond[{10, 21}, "Single"], Bond[{11, 22}, "Single"], Bond[{11, 23}, "Single"], 
              Bond[{12, 24}, "Single"]}, {StereochemistryElements -> {Association["StereoType" -> 
                  "Tetrahedral", "ChiralCenter" -> 3, "Direction" -> "Counterclockwise"], 
                Association["StereoType" -> "Tetrahedral", "ChiralCenter" -> 5, "Direction" -> 
                  "Clockwise"], Association["StereoType" -> "Tetrahedral", "ChiralCenter" -> 7, 
                 "Direction" -> "Counterclockwise"], Association["StereoType" -> "Tetrahedral", 
                 "ChiralCenter" -> 9, "Direction" -> "Counterclockwise"]}}], 
            Molecule[{"O", "H", "H"}, {Bond[{1, 2}, "Single"], Bond[{1, 3}, "Single"]}, {}], 
            Molecule[{"C", "C", "O", "O", "H", "H", "H", "H"}, {Bond[{1, 2}, "Single"], 
              Bond[{2, 3}, "Double"], Bond[{2, 4}, "Single"], Bond[{1, 5}, "Single"], 
              Bond[{1, 6}, "Single"], Bond[{1, 7}, "Single"], Bond[{4, 8}, "Single"]}, {}]}]]}}]]]],   {"Name"  -> { "SentenceVector", "DimensionReducedVector"}, "Molecule"  -> { "MoleculeFeatures", "DimensionReducedVector"}}]

Generate a feature extractor using a custom function:

Wolfram Language code:

data = {DateObject[{2014, 5, 5}, TimeObject[{9, 53, 6.30158}, TimeZone -> -5.], TimeZone -> -5.], DateObject[{2000, 1, 1}, TimeObject[{0, 0, 0.}, TimeZone -> -5.], TimeZone -> -5.], DateObject[{2006, 12}], DateObject[{2007, 8, 23}], DateObject[{2016, 4, 4}, TimeObject[{15, 59, 18.2738}, TimeZone -> -4.], TimeZone -> -4.]};

Wolfram Language code: FeatureExtract[data, {AbsoluteTime[#], #["Year"]}&]

Chain the custom extractor with the "StandardizedVector" method:

Wolfram Language code: FeatureExtract[data, {{AbsoluteTime[#], #["Year"]}&, "StandardizedVector"}]

Conform data prior to processing:

Wolfram Language code: FeatureExtract[{[image], [image], [image], [image]}, {"ConformedData", "ImageFeatures", "DimensionReducedVector"}]

Feature Types (9)

Extract features on textual data:

Wolfram Language code: FeatureExtract[{"the cat is grey", "my cat is fast", "this dog is scary", "the big dog"}]

Compute term frequency-inverse document frequency vectors from texts:

Wolfram Language code: FeatureExtract[{"the cat is grey", "my cat is fast", "this dog is scary", "the big dog"}, "TFIDF"] // MatrixForm

By default, texts will be segmented into words. This gives the same result:

Wolfram Language code:

FeatureExtract[{"the cat is grey", "my cat is fast", "this dog is scary", "the big dog"}, {"SegmentedWords", "TFIDF"}] // MatrixForm

Extract features with the "IndicatorVector" method on nominal variables:

Wolfram Language code: FeatureExtract[{{"Yes", "A"}, {"No", "A"}, {"No", "B"}, {"Maybe", "B"}, {"No", "C"}}, "IndicatorVector"] // MatrixForm

Extract features from a list of DateObject instances:

Wolfram Language code:

FeatureExtract[{DateObject[{2014, 5, 5, 9, 53, 6.30158}, "Instant", "Gregorian", -6.], DateObject[{2000, 1, 1, 0, 0, 0.}, "Instant", "Gregorian", -6.], DateObject[{2006, 12}, "Month", "Gregorian", -6.], DateObject[{2007, 8, 23}, "Day", "Gregorian", -6.], DateObject[{2016, 4, 4, 15, 59, 18.2738}, "Instant", "Gregorian", -4.]}]

Train a feature extractor on a list of Graph instances:

Wolfram Language code: FeatureExtract[{[image], [image], [image], [image]}]

Train a feature extractor on a list of TimeSeries instances:

Wolfram Language code:

FeatureExtract[{TemporalData[TimeSeries, {{{0, 1, 0, 3, 0, 0, 0, 0, 2, 1, 0, 3, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 0, 
    0, 0, 0, 0, 0, 2, 0, 0, 36, 0, 6, 1, 2, 8, 6, 24, 20, 31, 68, 45, 140, 116, 65, 376, 322, 382, 
    516, 544, 767, 1133, 1788, 1360, 5886, 5412 ... tion[{2020, 1, 23, 0, 0, 0.}, {2020, 10, 5, 0, 0, 0.}, {1, "Day"}]}, 
  1, {"Continuous", 1}, {"Discrete", 1}, 1, {DateFunction -> Automatic, 
   ResamplingMethod -> {"Interpolation", InterpolationOrder -> 1}, ValueDimensions -> 1}}, True, 
 12.2], TemporalData[TimeSeries, {{{0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 23, 2, 1, 3, 5, 4, 13, 6, 11, 9, 20, 11, 6, 
    23, 14, 38, 50, 86, 66, 103, 37, 121, 70, 1 ...  {TemporalData`DateSpecification[{2020, 1, 23, 0, 0, 0.}, {2020, 10, 5, 0, 0, 0.}, {1, "Day"}]}, 
  1, {"Continuous", 1}, {"Discrete", 1}, 1, 
  {ResamplingMethod -> {"Interpolation", InterpolationOrder -> 1}, ValueDimensions -> 1}}, True, 
 12.2], TemporalData[TimeSeries, {{{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 0, 0, 0, 2, 0, 9, 0, 7, 5, 6, 7, 14, 99, 0, 11, 38, 
    121, 51, 249, 172, 228, 572, 331, 323, 307,  ... TemporalData`DateSpecification[{2020, 1, 23, 0, 0, 0.}, 
    {2020, 10, 5, 0, 0, 0.}, {1, "Day"}]}, 1, {"Continuous", 1}, {"Discrete", 1}, 1, 
  {ResamplingMethod -> {"Interpolation", InterpolationOrder -> 1}, ValueDimensions -> 1}}, True, 
 12.2], TemporalData[TimeSeries, {{{0, 0, 0, 0, 0, 0, 0, 0, 2, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 9, 0, 4, 0, 3, 0, 8, 17, 14, 4, 27, 
    24, 33, 52, 54, 53, 61, 71, 57, 163, 182, 196 ...  {TemporalData`DateSpecification[{2020, 1, 23, 0, 0, 0.}, {2020, 10, 5, 0, 0, 0.}, {1, "Day"}]}, 
  1, {"Continuous", 1}, {"Discrete", 1}, 1, 
  {ResamplingMethod -> {"Interpolation", InterpolationOrder -> 1}, ValueDimensions -> 1}}, True, 
 12.2], TemporalData[TimeSeries, {{{0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 2, 6, 0, 4, 9, 12, 20, 11, 
    28, 9, 26, 68, 35, 46, 101, 92, 21, 48, 69 ...  {TemporalData`DateSpecification[{2020, 1, 23, 0, 0, 0.}, {2020, 10, 5, 0, 0, 0.}, {1, "Day"}]}, 
  1, {"Continuous", 1}, {"Discrete", 1}, 1, 
  {ResamplingMethod -> {"Interpolation", InterpolationOrder -> 1}, ValueDimensions -> 1}}, True, 
 12.2]}]

Extract features from Molecule data:

Wolfram Language code: FeatureExtract[{Molecule["Sucrose"], Molecule["Butane"], Molecule["Pentane"]}]

Extract features from a selection of Audio instances:

Wolfram Language code:

FeatureExtract[{Audio[Sound[Table[SoundNote[i, If[i == 12, 0.5, 0.1], "Violin"], {i, 0, 12}]]], Audio[Sound[Table[SoundNote[i, If[i == 12, 0.5, 0.1], "Cello"], {i, 0, 12}]]], Audio[Sound[Table[SoundNote[i, If[i == 12, 0.5, 0.1], "Trumpet"], {i, 0, 12}]]]}]

Extract features from a dataset that contains missing values:

Wolfram Language code: FeatureExtract[{{1.4, Missing[], "A"}, {1.5, 50.2, "A"}, {Missing[], 42.3, "B"}, {5.4, 61.7, "B"}}]

Options (2)

FeatureNames (1)

Use FeatureNames to name features, and refer to their names in part specifications:

Wolfram Language code:

FeatureExtract[{{2.3, "male"}, {4.8, Missing[]}, {Missing[], "female"}, {5.2, "female"}}, {"age" -> Identity, "gender" -> "IndicatorVector"}, FeatureNames -> {"age", "gender"}] // MatrixForm

FeatureTypes (1)

Extract features with the "IndicatorVector" method on a simple dataset:

Wolfram Language code: FeatureExtract[{{1, "A"}, {2, "A"}, {2, "B"}, {1, "B"}}, "IndicatorVector"]//MatrixForm

As "IndicatorVector" method only acts on nominal features, the first feature has been assumed to be nominal.

Use FeatureTypes to enforce the interpretation of the first feature as numerical, and bypass the feature extractor:

Wolfram Language code:

FeatureExtract[{{1, "A"}, {2, "A"}, {2, "B"}, {1, "B"}}, "IndicatorVector", FeatureTypes -> <|1 -> "Numerical"|>]// MatrixForm

Applications (1)

Dataset Visualization (1)

Construct a dataset of dog images:

Wolfram Language code:

dataset = {[image], [image], [image], [image], [image], [image], [image], [image], [image], [image], [image], [image], [image], [image], [image], [image], [image], [image], [image], [image], [image]};

Extract features from this dataset:

Wolfram Language code: features = FeatureExtract[dataset];

Reduce the dimension of the extracted vectors to 2:

Wolfram Language code: xy = DimensionReduce[features, 2, Method -> "TSNE"]

Visualize the images at their feature positions:

Wolfram Language code: ListPlot[List /@ xy, PlotMarkers -> (Image[#, ImageSize -> 40]& /@ dataset)]

A similar visualization can be directly obtained using FeatureSpacePlot:

Wolfram Language code: FeatureSpacePlot[dataset]

Properties & Relations (2)

Extract features with no training examples is the equivalent of FeatureExtraction[None,...]:

Wolfram Language code: data = "This is a sentence"

Wolfram Language code: FeatureExtract[data, "SentenceVector"] == FeatureExtraction[None, "SentenceVector"][data]

FeatureExtract[…] is equivalent to FeatureExtraction[…,"ExtractedFeatures"]:

Wolfram Language code: data = {"the cat is grey", "my cat is fast", "this dog is scary", "the big dog"};

Wolfram Language code: FeatureExtract[data, "TFIDF"] == FeatureExtraction[data, "TFIDF", "ExtractedFeatures"]

Possible Issues (1)

Feature extraction with no training data will use inbuilt defaults where needed:

Wolfram Language code: FeatureExtract[[image], "ConformedData", FeatureTypes -> "Image"]//Information

Inputting the same data as training data may give a different result:

Wolfram Language code: FeatureExtract[{[image]}, "ConformedData", FeatureTypes -> "Image"]//Information

Top

More Learning

Tech Support

Wolfram Solutions

Wolfram Solutions For Education

Get Started

Grow Your Skills

Work with Us

Educational Programs for Adults

Educational Programs for Youth

Read

FeatureExtract

Details and Options

Extractors

Options

Examples

Basic Examples (4)

Scope (26)

Input Shape (9)

Extractor Specifications (8)

Feature Types (9)

Options (2)

FeatureNames (1)

FeatureTypes (1)

Applications (1)

Dataset Visualization (1)

Properties & Relations (2)

Possible Issues (1)

Text

CMS

APA

BibTeX

BibLaTeX

FeatureExtract

Details and Options

Extractors

Options

Examples

Basic Examples (4)

Scope (26)

Input Shape (9)

Extractor Specifications (8)

Feature Types (9)

Options (2)

FeatureNames (1)

FeatureTypes (1)

Applications (1)

Dataset Visualization (1)

Properties & Relations (2)

Possible Issues (1)

See Also

Related Guides

History

Text

CMS

APA

BibTeX

BibLaTeX