ORC (.orc)
Background & Context
-
- Efficient, general-purpose, column-oriented data format.
- Developed by the Apache Software Foundation.
- ORC is an acronym for Optimized Row Columnar.
- Binary file format.
- Supports multiple compression methods.
Import & Export
- Import["file.orc"] imports an ORC file as a Tabular object.
- Import["file.orc",elem] imports the specified elements.
- Import["file.orc",{elem,subelem1,…}] imports subelements subelemi, useful for partial data import.
- The import format can be specified with Import["file","ORC"] or Import["file",{"ORC",elem,…}].
- Export["file.orc",expr] exports a Tabular object to ORC file format.
- Supported expressions expr include:
-
{v1,v2,…} a single column of data {{v11,v12,…},{v21,v22,…},…} lists of rows of data array an array such as SparseArray, QuantityArray, etc. tseries a TimeSeries, EventSeries or a TemporalData object dataset a Dataset or a Tabular object - See the following reference pages for full general information:
-
Import, Export import from or export to a file CloudImport, CloudExport import from or export to a cloud object ImportString, ExportString import from or export to a string ImportByteArray, ExportByteArray import from or export to a byte array
Import Elements
- General Import elements:
-
"Elements" list of elements and options available in this file "Summary" summary of the file "Rules" list of rules for all available elements - Data representation elements:
-
"Data" two-dimensional array "Dataset" table data as a Dataset "EventSeries" table data as an EventSeries "Tabular" a Tabular object "TimeSeries" table data as a TimeSeries - Import by default uses the "Tabular" element.
- Subelements for partial data import for the "Tabular" element can take row and column specifications in the form {"Tabular",rows,cols}, where rows and cols can be any of the following:
-
n nth row or column -n counts from the end n;;m from n through m n;;m;;s from n through m with steps of s {n1,n2,…} specific rows or columns ni - Column specifications can also be any of the following:
-
"col" single column "col" {col1,col2,…} list of column names coli - Data descriptor elements:
-
"ColumnLabels" names of columns "ColumnTypes" association with data type for each column "Schema" TabularSchema object - Metadata elements:
-
"ColumnCount" number of columns stored in file "Dimensions" data dimensions "RowCount" number of rows stored in file "MetaInformation" metadata
Options
- General Import options:
-
IncludeMetaInformation All metadata types to import "Schema" Automatic schema used to construct Tabular object "TimeColumn" Automatic column to use for times in "EventSeries" and "TimeSeries" elements - Possible settings for the "Schema" option include:
-
schema a complete TabularSchema specification propval a schema property and value (see reference page for TabularSchema) <|"prop1"val1,…|> an association of schema properties and values - General Export options:
-
"Compression" None compression method "CompressionStrategy" "Speed" compression strategy - The following settings for "Compression" are supported:
-
None no compression "LZ4" LZ4 compression "GZIP" GZIP Hadoop compression "Snappy" Snappy compression "ZSTD" ZSTD compression - The following settings for "CompressionStategy" are supported:
-
"Size" optimize size of file "Speed" optimize the speed of export
Examples
open all close allBasic Examples (3)
Import Tabular object from ORC file:
Import["ExampleData/USstates.orc"]Import["ExampleData/USstates.orc", "Summary"]Export Tabular object to ORC file:
tabular = Import["ExampleData/USstates.orc"];Export["file.orc", tabular]Scope (3)
Import (3)
Show all elements available in the file:
Import["ExampleData/USstates.orc", "Elements"]By default, a Tabular object is returned:
Import["ExampleData/USstates.orc"]//TabularQImport["ExampleData/USstates.orc", "ColumnTypes"]Import Elements (19)
"ColumnCount" (1)
"ColumnLabels" (1)
"ColumnTypes" (1)
"Data" (3)
Import["ExampleData/USstates.orc", "Data"]//ShortImport["ExampleData/USstates.orc", {"Data", 1 ;; 3}]Import["ExampleData/USstates.orc", {"Data", All, {1, 3}}]//ShortImport only selected columns using column names:
Import["ExampleData/USstates.orc", {"Data", All, {"Name", "Area"}}]//Short"Dataset" (3)
Get the data as a Dataset:
Import["ExampleData/USstates.orc", "Dataset"]Import["ExampleData/USstates.orc", {"Dataset", 1 ;; 3}]Import["ExampleData/USstates.orc", {"Dataset", All, {1, 3}}]Import only selected columns using column names:
Import["ExampleData/USstates.orc", {"Dataset", All, {"Name", "Area"}}]"Dimensions" (1)
"EventSeries" (1)
Export a Tabular object to an ORC file:
file = Export["file.orc", ResourceData["Sample Tabular Data: Sales Data"]]Import an ORC file as an EventSeries:
Import[file, "EventSeries"]Import a single row from an ORC file:
Import[file, {"EventSeries", 5}]Import some specific rows from an ORC file:
Import[file, {"EventSeries", {1, 5, 7}}]Import the first 10 rows of an ORC file:
Import[file, {"EventSeries", 1 ;; 10}]Import only selected columns using column names:
Import[file, {"EventSeries", All, {"Product", "Date", "Quantity"}}]"MetaInformation" (1)
"RowCount" (1)
"Schema" (1)
Get the TabularSchema object:
Import["ExampleData/USstates.orc", "Schema"]"Summary" (1)
"Tabular" (3)
Get the data from a file as a Tabular object:
Import["ExampleData/USstates.orc", "Tabular"]Import["ExampleData/USstates.orc", {"Tabular", 1 ;; 5}]Import["ExampleData/USstates.orc", {"Tabular", All, {1, 3}}]Import["ExampleData/USstates.orc", {"Tabular", All, {"Name", "Area"}}]"TimeSeries" (1)
Export a Tabular object to an ORC file:
file = Export["file.orc", ResourceData["Sample Tabular Data: Sales Data"]]Import an ORC file as a TimeSeries:
Import[file, "TimeSeries"]Import a single row from an ORC file:
Import[file, {"TimeSeries", 5}]Import some specific rows from an ORC file:
Import[file, {"TimeSeries", {1, 5, 7}}]Import the first 10 rows of an ORC file:
Import[file, {"TimeSeries", 1 ;; 10}]Import only selected columns using column names:
Import[file, {"TimeSeries", All, {"Product", "Date", "Quantity"}}]Import Options (3)
IncludeMetaInformation (1)
By default, all metadata stored in a file is imported and embedded in the Tabular object:
tabular = Import["ExampleData/USstates.orc"];
tabular["Metadata"]tabular = Import["ExampleData/USstates.orc", IncludeMetaInformation -> None];
tabular["Metadata"]"Schema" (1)
Export Tabular object to Parquet file:
file = Export["out.parquet", Tabular[Association["RawSchema" -> Association["ColumnProperties" ->
Association["A" -> Association["ElementType" -> "String"],
"B" -> Association["ElementType" -> "String"]], "KeyColumns" -> None,
"Backend" -> "WolframKernel"], "BackendData" ->
Association["ColumnData" -> DataStructure["ColumnTable",
{{TabularColumn[Association["Data" -> {{0, {0, 11, 22, 33, 44, 55},
"Jan 03 2006Jan 04 2006Jan 05 2006Jan 06 2006Jan 09 2006"}, {}, None},
"ElementType" -> "String"]], TabularColumn[Association[
"Data" -> {{3, {0, 5, 10, 15, 20, 25}, "11.8212.0412.0911.8812.43"}, {}, None},
"ElementType" -> "String"]]}}]]]]];By default, column labels and their types stored in a file are used when Tabular or Dataset objects are imported:
tabular = Import[file];
tabular["ColumnTypes"]Use "Schema" option to specify column labels and types:
tabular = Import[file, "Schema" -> {"ColumnKeys" -> {"Date", "Value"}, "ElementType" -> {"Date" -> "Date", "Value" -> "Real32"}}];
tabular["ColumnTypes"]"TimeColumn" (1)
Export a Tabular object to an ORC file:
file = Export["file.orc", Tabular[Association["RawSchema" -> Association["ColumnProperties" ->
Association["Date" -> Association["ElementType" -> TypeSpecifier["Date"]["Integer32", "Day",
"Gregorian", None]], "Value" -> Association["ElementType" -> "Real32"]],
"KeyColumns" -> None, "Backend" -> "WolframKernel"], "Options" -> {},
"BackendData" -> Association["ColumnData" -> DataStructure["ColumnTable",
{{TabularColumn[Association["Data" -> {5, {{NumericArray[{13150, 13151, 13152, 13153, 13156},
"Integer32"], {}, None}}, None}, "ElementType" -> "Date"["Integer32", "Day",
"Gregorian", None]]], TabularColumn[Association[
"Data" -> {NumericArray[{11.819999694824219, 12.039999961853027, 12.09000015258789,
11.880000114440918, 12.430000305175781}, "Real32"], {}, None},
"ElementType" -> "Real32"]]}}]]]]];By default, the time column is selected automatically for "TimeSeries" and "EventSeries" elements:
Import[file, "TimeSeries"]Use the "TimeColumn" option to specify the time column:
Import[file, "TimeSeries", "TimeColumn" -> "Value"]Export Options (4)
"Compression" (2)
Compression is disabled by default:
tabular = Import["ExampleData/USstates.orc"];
Export["out.orc", tabular]//FileSizeCompare supported compression methods:
tabular = Import["ExampleData/USstates.orc"];
AssociationMap[(FileSize@Export["out.orc", tabular, "Compression" -> #])&, {"LZ4", "GZIP", "Snappy", "ZSTD"}]"CompressionStrategy" (2)
By default, "Speed" value of "CompressionStrategy" is used:
tabular = Import["ExampleData/USstates.orc"];
AssociationMap[(FileSize@Export["out.orc", tabular, "CompressionStrategy" -> "Speed", "Compression" -> #])&, {"LZ4", "GZIP", "Snappy", "ZSTD"}]Use "Size" compression strategy:
tabular = Import["ExampleData/USstates.orc"];
AssociationMap[(FileSize@Export["out.orc", tabular, "CompressionStrategy" -> "Size", "Compression" -> #])&, {"LZ4", "GZIP", "Snappy", "ZSTD"}]History
Introduced in 2025 (14.2) | Updated in 2026 (15.0)