teem | / | nrrd |
Definition of NRRD File Format |
This document defines the NRRD file format. It has been updated to reflect the latest version of the format, supported by Teem version 1.9 and later.
Since this document aims to be a self-contained reference for the NRRD file format, some of the material here repeats ideas found elsewhere in the nrrd documentation. Besides defining the format, this document also seeks to supply some rationale. This page has been written with a view towards keeping it useful upon printing.
When saved, the filenames for NRRD files should end in ".nrrd". Detached headers, discussed in Section 3, should end in ".nhdr". Suggested suffixes for data files associated with detached headers (".raw", ".txt", etc) are listed in the encoding part of Section 5.
The nrrdRead()/nrrdLoad() and nrrdWrite()/nrrdSave() functions of the nrrd library are intended to completely support the format as described here, but as yet there is no test suite.
The general format of a NRRD file (with attached header) is:
NRRD000X <field>: <desc> <field>: <desc> # <comment> ... <field>: <desc> <key>:=<value> <key>:=<value> <key>:=<value> # <comment> <data><data><data><data><data><data>...
The very first line contains nothing but the NRRD "magic". The magic is what identifies the file as a NRRD file, for the benefit of readers that have to handle multiple file formats. For NRRD files, the first four characters are always "NRRD", and the remaining characters give information about the file format version. The "X" in "NRRD000X" identifies which version of the file format is being used. As new features have been added to the format, the X as been incremented to ensure that readers in old binaries don't try to parse newer headers. Briefly, the different magics are:
Each of the "<field>: <desc>" lines specifies information about one of the fields in the nrrd. Each of these lines is called a "field specification", or more loosely, a "field". Each field specification is contained in exactly one line. Each field specification may appear no more than once in the nrrd header. All the field specifications have the same structure: a string "<field>" identifying the field (called the field identifier), then a colon followed by a single space ": ", and then the information describing the field "<desc>" (called the field descriptor). All field identifiers are case insensitive. Many of the field descriptors are also case insensitive, the exceptions are the field descriptors containing strings: the content, labels, units, space units, and sample units fields. Whitespace (that is not part of the previous line's termination) is not allowed before a field identifier. Extra whitespace after the field descriptor and before the line termination is ignored.
Each of the "<key>:=<value>" lines specifies a key/value pair in the nrrd. These can appear in NRRD0002 (and higher version) files, but not NRRD0001 files. The key and value strings are delimited by the first ":=" to appear on the line: any spaces before or after ":=" are assumed part of the key or value, respectively. The length of the key string must be at least one character, the value string can be zero-length. Because both the key and value must appear on exactly one line, a minimal escaping scheme is required, which readers must interpret and writers must generate:
Comment lines start with a pound, "#", with no proceeding whitespace. The comment string itself starts with the first character which is a not a pound or a space (" "). Comment lines with a zero-length comment string should be ignored. Comment lines may be interspersed with field specifications in any order. This allows field specifications to be commented out and commented upon easily. Gracious NRRD readers should store all the comments seen in the header, but this is not a requirement. Comments are case sensitive.
The magic, field specifications, key/value pairs, and comments comprise the NRRD header. After the header, there is a single blank line containing zero characters. This separates the header from the data, which follows. The header, the blank line, and the data comprise the NRRD file. A single NRRD file can store the information and data for a single array. There is currently no facility for storing multiple arrays in a single NRRD file.
The NRRD format is complicated by the fact that some fields are always necessary, while some fields are always optional, and some fields are necessary only some of the time. Whether or not a field is necessary depends on previous information in the header. This is why there is no standard template for all NRRD headers, or a context-free grammar for NRRD headers.
An important and necessary basic field specification is the one giving the dimension of the nrrd. The format is:
<int> can be any integer greater than 0. NRRD readers may not have the ability to represent absolutely any dimension, but they must be able to handle nrrds with dimension 16 or less, which is what the current nrrd implementation can do.dimension: <int>
The number of samples along each axis is the only necessary per-axis specification. The format is:
<size[i]> is the number of samples along axis i, with axis ordering going from fastest to slowest. As with all the other per-axis field identifiers, sizes ends with "s" to emphasize the plurality of the field it specifies. The field identifiers do not change, however, for one-dimensional nrrds.sizes: <size[0]> <size[1]> ... <size[dim-1]>
Basic specifications can appear both before and after the dimension field. Per-axis specifications can only appear after the dimension field specification. This simplifies the task of parsing per-axis specifications, since we know how many pieces of information need to be parsed (the same as the dimension), as well as avoiding attempts at cleverness in the form of guessing the dimension from the first per-axis specification. There is a similar field specification ordering constraint associated with the orientation information described in Section 4: the space in which the array conceptually "lives" has to be identified (or at least its dimension given) prior to parsing all the other orientation-related fields which have one piece of information per space coordinate. A final constraint on field specification ordering is that the "data file: LIST" form of the data file field. must be the last field specification in the header. Within these constraints, the field specification may appear in any order.
The issue of axis ordering is fundamental. In memory and on disk, there is a strict linear order of all the values in an array, so that each value has a single integer address. Conceptually, however, each value has one or more integer coordinates which identify its position within the array (as many coordinates as there are dimensions in the array). The "fastest" axis is the one associated with the coordinate which increments fastest as the samples are traversed in linear order. For example, the typical raster ordering of interleaved RGB 2-dimensional image data is actually a three-dimensional array. The fastest axis is the color axis (only three samples long), followed by the horizontal axis, with the vertical axis being the slowest. All the per-axis field specifications identify information for each axis, and the axis ordering is always (reading left to right) fastest to slowest. NRRD does not assume any names for the axes, such as "X", "Y", "Z" or "I", "J", "K": they are identified solely by their location in the ordering from fastest to slowest.
Besides dimension, there are two other always-necessary basic specifications: the type and the encoding specifications. Their format is:
The possible values for type include the C identifiers you would probably use to identify a type: "int" means a 32-bit signed integer, "float" means 32-bit floating point, and so on. Useful variants like "uchar" (same as "unsigned char") are allowed. There is also the block type, which is used to represent some chunk of opaque memory, of user-specified size; see the type specification in Section 5 for all the details.type: <type> encoding: <encoding>
The encoding tells how the data (following the blank line after the header) is written out; "ascii" and "raw" are common values, but "hex" allows extraction of images from some PostScript files, and compression is also supported; see the encoding specification (Section 5). NRRD readers must be able to support raw and ASCII encoding, everything else is optional. See the general description of NRRD format for more information about how optional encodings are handled.
The field specifications described so far provide the means of writing a minimal NRRD header:
This is identical in meaning to the PPM header:NRRD0001 # my first nrrd type: uchar dimension: 3 sizes: 3 640 480 encoding: raw
P6 # my first nrrd 640 480 255
The field specifiers described so far, and illustrated above, are the only ones which are always necessary. However, other field specifications become necessary as a function of other fields: if the type was "float", and the encoding is something other than "ascii", then the endian of the data would have to be recorded. The details of which fields require which other fields are spelled out in Section 5 and Section 6.
In order to implement this in NRRD, each of the optional fields must have a way of representing the idea of "don't know"-- a state distinct from knowing a specific default value, or knowing the value specified in the header. All of optional fields can be initialized to "don't know", and only after "known" values are specified in an input header does the field become worthy of being saved in an output NRRD header. For optional fields with string values (content, labels, and units), the empty string ("") is the obvious choice for "don't know". For centers and kinds, the strings "???" and "none" (and the value used to represent them) means "don't know". In contrast, the optional fields with integer values (line skip and byte skip) actually have sensible a sensible "known" default value, namely zero.
But how does one represent "don't know" with optional floating point data? NRRD uses NaN, or Not-a-Number. NaN is a value that can be represented in the ubiquitous IEEE 754 floating point standard, as the result of doing undefined arithmetic operations, such as zero divided by zero. While it may seem overly cute or clever to use NaN as a flag for "don't know", this is in fact exactly in keeping with the purpose of NaN as described in the original 754 standard. Furthermore, as described in the documentation for the air library, it is possible to generate a NaN at compile time (so that it doesn't have to be produced as a result of doing an undefined arithmetic operation), and it is possible to quickly test if a given number is NaN. Even if operations involving NaN are not implemented in the floating point hardware, but in software emulation supplied by the operating system, they will never be the bottle neck in reading and writing a NRRD file. Because of NaN's important role as signifier of "don't know", the NRRD reader must be able to interpret the case-insensitive string "nan" as a NaN, even if this is not already the behavior of sscanf() on a given platform (it probably isn't). Writing a small wrapper function around sscanf() is a very small price to pay for the representational convenience of NaN. Section 2 gives the details for how NRRD readers and writers should handle ASCII encoding of the the IEEE 754 special values.
In Section 5 and Section 6, numeric field specification descriptions include a "Type", which identifies the minimum precision with which the information must be represented by the NRRD reader. In this context, "int" means a 32-bit signed integer, and "double" means a 64-bit floating point number. Field specifications with alternate equivalent forms are listed together (for example, "block size" is the same as blocksize"). Equivalent field descriptors are listed together in the table enumerating the meanings of the various descriptors (for example, "uchar is the same as "unsigned char").
Note that quotes are used to delimit the field descriptors in the explanation of their meaning; quotes are not part of the descriptor itself (except for the labels and various units specifications, in which the descriptors (strings) are delimited by quotes).
The difference between a quiet and signaling NaN is a detail of IEEE 754 which was left implementation-specific, so different platforms have different ways of distinguishing between quiet and signaling NaN, and some don't distinguishing between them at all. The intent was that quiet NaNs represent an indeterminate value, as in 0/0, or inf/inf, meaning simply that arithmetic doesn't define a single value for the result. On the other hand, signaling NaNs represent an invalid value, to signal that a non-existent or uninitialized floating point value was accessed, or that the input parameters to a function were so botched that no valid output can be generated; the signaling NaN is supposed to signal "someone goofed". Based on the fact that different portions of 754 can be implemented in software, or hardware, or a combination of the two, there may be performance considerations between the two kinds of NaNs. But in any case, its basically all moot, since unfortunately, there is no cross-platform standard API for the floating point exception handlers which can interact with signaling NaNs.
Given this, in the NRRD file format (and in the nrrd library), a NaN is a NaN is a NaN, with no difference between signaling and quiet, and no recognition of the integer value in the mantissa field of the NaN. If the signaling/quiet distinction mattered, then when writing raw floating point data, not only would endianness have to be recorded, but also the convention for representing quiet NaN, and if the data came from a platform that knows the difference between the two NaNs. Readers would have to possibly traverse the whole array after input to detect and switch NaN representations. Doing this checking is not practical or efficient, and the consequences of not doing it are either moot or non-existent. Thankfully, there are unique and fully specified bit patterns for positive and negative infinity.
NRRD writers should verify that their printf() function behaves in accordance with these rules.
There is one new field specification which is required in detached headers, using the "data file" identifier. This field specification can take one of three possible forms (the second and third were copied from the MetaImage format).
"datafile:" is also valid as the field identifier. The addition of this field is the only difference between attached headers and detached headers. The magic at the beginning of the header is the same, so there is currently no way to immediately detect if the header being parsed is attached or detached. Detached headers may end with the the last field specification, or with a single blank line following the last field specification (in which case anything following the blank line is ignored). The meaning of the three forms of the field descriptor are:
- data file: <filename>
- data file: <format> <min> <max> <step> [<subdim>]
- data file: LIST [<subdim>]
When there are multiple separate datafiles (second and third forms above), the amount of data in each file has to be determined somehow. By default, each file is assumed to contain one slice along the slowest axis of the nrrd. That is, the dimension of data in each file is D-1, where D is the dimension of the full array (given by the dimension: field). A different datafile dimension (besides D-1) can be communicated with the optional <subdim> value. This value can be between 1 and D. When <subdim> is less than D-1 (for example, giving a 4-D volume one 2-D slice at a time), the number of data files can be determined by the product of one or more of the slowest axes. When <subdim> is equal to D, the data is assumed to be a set of equal-sized "slabs", based on cuts along the slowest axis. The number of "slabs" must divide into the number of samples along the slowest axis.
Breaking the dataset into a header and one or more data files raises a new concerns, namely that the header file can't know if the data file has been erased, renamed, or moved. NRRD provides no means to overcome these problems once they've been created. On the other hand, moving the header and data files together to a new place is a common operation, and is supported by the special semantics associated with the data filename:
When using multiple data files, the line skip and byte skip fields describe how to data is to be accessed in all the files (separately): within every file, line skipping and byte skipping is used to get at the data. The encoding and endian fields must hold for all data files.
Orientation information is defined by a combination of basic and per-axis field specifications, which accomplish four things:
When orientation information is defined by the fields below, some of the other per-axis fields may not be used. Specifically, on a per-axis basis, there is mutual exclusion between setting a space direction and using a non-NaN value for "mins", "maxs", and "spacings", or a non-empty string in the "units" field. However, there is no such exclusion with the "thicknesses", "centers", "kinds", and "labels" fields.
space: <space>
<space> | Space dimension | Description |
"right-anterior-superior" or "RAS" | 3 | For medical data, a patient-based right-handed coordinate frame, with ordered basis vectors pointing towards right, anterior, and superior, respectively. This space is used in the NIFTI-1 extension to the Analyze format. |
"left-anterior-superior" or "LAS" | 3 | For medical data, a patient-based left-handed coordinate frame, with ordered basis vectors pointing towards left, anterior, and superior, respectively. This space is used in the Analyze 7.5 format. |
"left-posterior-superior" or "LPS" | 3 | For medical data, a patient-based right-handed coordinate frame, with ordered basis vectors pointing towards left, posterior, and superior, respectively. This space is used in DICOM 3. |
"right-anterior-superior-time" or "RAST" | 4 | Like RAS, but with time along the fourth axis. |
"left-anterior-superior-time" or "LAST" | 4 | Like LAS, but with time along the fourth axis. |
"left-posterior-superior-time" or "LPST" | 4 | Like LPS, but with time along the fourth axis. |
"scanner-xyz" | 3 | For medical data, a scanner-based right-handed coordinate frame, used in ACR/NEMA 2.0 (pre-DICOM 3). If a patient lies parallel to the ground, face-up on the table, with their feet-to-head direction same as the front-to-back direction of the imaging equipment, the axes of this scanner-based coordinate frame and the (patient-based) left-posterior-superior frame coincide. |
"scanner-xyz-time" | 4 | Like scanner-xyz, but with time along the fourth axis. |
"3D-right-handed" | 3 | Any right-handed three-dimensional space |
"3D-left-handed" | 3 | Any left-handed three-dimensional space |
"3D-right-handed-time" | 4 | Like 3D-right-handed, but with time along the fourth axis. |
"3D-left-handed-time" | 4 | Like 3D-left-handed, but with time along the fourth axis. |
It is important to recognize that the identification of the space (or
rather its basis vectors) is not the same as identifying which
axes of the array are aligned with which space basis vectors. That
is, in some other formats, "RAS" implies something about axis
ordering: that the coordinates along the fastest axis increase in the
left to right direction, and that the second and third array
coordinates increase along the anterior and superior directions. The
NRRD format, in contrast, is careful to separate the issue of axis
ordering from the task of identifying the space in which the array is
oriented. It is possible to reorder the axes, and/or the samples
along an axis, while keeping the spatial locations of the samples
unchanged. None of this changes the identification (such as
right-anterior-superior) of the world space.
space dimension: <int>
space units: "<unit[0]>" "<unit[1]>" ... "<unit[dim-1]>"
space units: "mm" "mm" "mm"
space origin: <vector>
The format of the <vector> is as follows. The vector is delimited by "(" and ")", and the individual components are comma-separated. This is an example of a three-dimensional origin specification:
space origin: (0.0,1.0,0.3)
space directions: <vector[0]> <vector[1]> ... <vector[dim-1]>
space directions: none (1,0,0) (0,1,0) (0,0,3)
measurement frame: <vector[0]> <vector[1]> ... <vector[spaceDim-1]>
The measurement frame is a basic (per-array) field specification (not per-axis), which identifies a spaceDim-by-spaceDim matrix, where spaceDim is the dimension of world space (implied by space or given by space dimension). The matrix transforms (a column vector of) coordinates in the measurement frame to coordinates in world space. vector[i] gives column i of the measurement frame matrix. Just as the space directions field gives, one column at a time, the mapping from image space to world space coordinates, the measurement frame gives the mapping measurement frame to world space coordinates, also one column at a time.
Somewhat confusingly and unfortunately, there are currently no semantics defined which relate the measurement frame matrix to any kind of any axis (such as "3-vector"), even though it might seem natural. That is, the matrix defined by the "measurement frame" field is always a square matrix whose size is entirely determined by the dimension of world space, even if an axis identifies its kind as something which would seem to call for a different number of measurement frame dimensions. This is unfortunately not a clear-cut issue. Basically, it is not the job of the "measurement frame" field to reconcile an illogical combination of world space dimension and per-axis kind, which can arise with or without a measurement frame being defined. Even when there is no inconsistency, there is no graceful way of identifying which axes' kinds have coordinates which may be mapped to world space, since this is logically per-axis information, but having a per-axis measurement frame is certainly overkill. There is also the possibility that a measurement frame should be recorded for an image even though it is storing only scalar values (e.g., a sequence of diffusion-weighted MR images has a measurement frame for the coefficients of the diffusion-sensitizing gradient directions, and the measurement frame field is the logical store this information). Experience and time may clarify this situation.
dimension: <int>
type: <type>
<type> | Meaning | C type |
"signed char", "int8", "int8_t" | signed 1-byte integer | signed char |
"uchar", "unsigned char", "uint8", "uint8_t" | unsigned 1-byte integer | unsigned char |
"short", "short int", "signed short", "signed short int", "int16", "int16_t" | signed 2-byte integer | short |
"ushort", "unsigned short", "unsigned short int", "uint16", "uint16_t" | unsigned 2-byte integer | unsigned short |
"int", "signed int", "int32", "int32_t" | signed 4-byte integer | int |
"uint", "unsigned int", "uint32", "uint32_t" | unsigned 4-byte integer | unsigned int |
"longlong", "long long", "long long int", "signed long long", "signed long long int", "int64", "int64_t" | signed 8-byte integer | long long int |
"ulonglong", "unsigned long long", "unsigned long long int", "uint64", "uint64_t" | unsigned 8-byte integer | unsigned long long int |
"float" | 4-byte floating point | float |
"double" | 8-byte floating point | double |
"block" | An opaque chunk of memory with user-defined size (via the "block size:" specifier) |
The type descriptors used are valid type declarations in C, C99, Matlab, Microsoft-land, or some other program. Notice that "char" is not a NRRD type descriptor, to avoid potential confusion associated with the inherent signed/unsigned ambiguity of the "char" C type. If the platform has different C type names for the types described, there will have to be a disconnect between the type implied by the type descriptor, and the actual types used. In other words, the NRRD format requires a binding between the first two columns in the chart above. The third column is just what the current nrrd implementation uses on most supported platforms; this has proven surprisingly portable. In Windows, however, there is no "long long", so "__int64" is used instead.
As currently defined, NRRD is simply not portable to platforms on which all the types described above (second column) are not available via some C type declaration or another. We will eventually have many computers in which the minimum addressable unit is larger than 8 bits, in which case NRRD will either have to be expanded to allow types with unaddressable values (in which case a bit type might as well be added), or, some rules will have to be defined for converting a smaller type into an addressable type during data read. Having individually addressable data samples vastly simplifies the task of implementing array operations.
The block type is unlike the others. It is included for completeness in representation of the types available in the nrrd library, which uses this type to represent C structs or C++ objects: opaque chunks of memory that can be copied and permuted, but not interpreted as (or generated from) scalar values. The size of that chunk is given in the block size field specification. But block is not safe as a cross-platform general purpose type. Here are the special considerations:
block size: <int> blocksize: <int>
encoding: <encoding>
<encoding> | Meaning | Standard detached suffix |
"raw" | The data appears on disk exactly the same as in memory, in terms of byte values and byte ordering. Produced by write() and fwrite(), suitable for read() or fread(). | ".raw" |
"txt", "text", "ascii" | Integral values are written/read as with printf()/sscanf(), and floating point values are used in a way consistent with Section 2. The individual values are separated by one or more whitespace characters (from the C string " \t\n\r\v\f"). No line terminations are required anywhere. Their presence is no different than any other kind of whitespace. | ".txt" |
"hex" | The data is raw, but written with two (case-insensitive) hexadecimal characters per byte. White space characters (as defined above) are ignored on reading. Writers should put a line termination after every 70 characters, and after the last line of numbers. | ".hex" |
"gz", "gzip" | The data is raw, but compressed with the gzip program. Implementation and specification is available from http://www.gzip.org/, but the nrrd library actually uses the zlib library available from http://www.gzip.org/zlib/. However, the compressed data must start with the gzip binary header, the same as is produced/read by the gzip/gunzip command-line tools. Compressed data starting with only the zlib binary header (from the underlying library) is not allowed. | ".raw.gz" |
"bz2", "bzip2" | The data is raw, but compressed with the bzip2 program. Analogous to the gzip encoding, the compressed data must start with the same binary header as produced by the command-line bzip2 program, to ensure inter-operability with it. Implementation and information is available from http://sources.redhat.com/bzip2/. | ".raw.bz2" |
The formatting for hex is mostly the same as the ASCIIHexDecode and the ASCIIHexEncode filters of PostScript, but they are not identical: PostScript allows multiple filters (data can be run-length encoded as well as hex-encoded), null ('\0') characters count as whitespace, and the end of the data is explicitly indicated by a ">". However, in combination with the line skip specifier, it is usually possible to extract 8-bit image data from PostScript files, assuming you understand enough PostScript to determine the image dimensions.
The "standard detached suffix" is the filename suffix that should be used by NRRD writers producing a separate data file in conjunction with a detached header. This is most important for the compression encodings, since the stand-alone programs expect certain suffixes when decompressing (".gz" and ".bz2" for gzip and bzip2, respectively), and these suffixes are stripped after decompression. Because the result of decompressing compressed data from a NRRD is always raw (as opposed to compressed ascii text), the suffix for the detached file includes ".raw" as well. NRRD readers, however, should not care about the filename suffix of a detached data file.
Data file contents remaining after all data has been read should be ignored. This sanctions the strategy of using a detached nrrd header to refer to some smaller chunk of data in a separate larger data file. Data before the region of interest can be passed over with line skip and/or byte skip.
See the byte skip specification for information about how compression encoding changes its meaning.
There is complete orthogonality between the encoding of the data, and
whether the header is attached or detached. The header is never
compressed- it is necessarily straight ASCII text.
endian: <endian>
<endian> | Meaning | Who |
"little" | Most significant bytes are at higher addresses ("little end first") | Intel and compatible |
"big" | Most significant bytes are at lower addresses ("big end first") | Everyone else |
The convention with NRRD files is that non-ascii data
should reflect the byte ordering of the current platform.
There is no preference for one endian or the other in NRRD files, and
NRRD writers should never have to worry about fixing endianness, only
recording it when necessary. Fixing endianness is the responsibility
of the NRRD reader. This way, NRRD readers and writers used within
one platform never pay the overhead of fixing endianness. That
overhead should only be incurred when going between platforms with
different endiannesses.
content: <string>
This field is intended as the place to store a very concise textual description of the information in the array, similar to the second line (the "title") of a VTK file format header. The nrrd library, for instance, uses content to store a textual representation of a summary of the operations applied to a nrrd. If nrrdSlice() slices a nrrd with content "engine" along axis 0 at position 50, then the content of the result will be "slice(engine,0,50)".
min: <min>
max: <max>
old min: <min> oldmin: <min>
For example, if a floating point nrrd with values ranging from 0.0 to 1.0 is quantized to 8 bits, old min will be 0.0. This is not the middle of the range of values that were all mapped to the lowest output integer, but the lowest of those values.
Infinite values are not valid, "nan" means "don't know".
old max: <max> oldmax: <max>
data file: <filename> datafile: <filename> data file: <format> <min> <max> <step> [<subdim>] datafile: <format> <min> <max> <step> [<subdim>] data file: LIST [<subdim>] datafile: LIST [<subdim>]
This is always optional, but it is the only means of distinguishing
from an attached or a detached NRRD header. When it is present, it is
interpreted according to Section 3, and the
header is considered finished at the EOF, or at the blank line
following the last field, whichever comes first. If the header ends
with a blank line, any data after the blank line is ignored. If
this field is not present, the data is assumed to be in the same file
as the header, following the blank line marking the end of the header.
line skip: <skip> lineskip: <skip>
When used in combination with byte skip, the line skipping is done before the byte skipping. The meaning of line skip is not affected by the encoding field.
byte skip: <skip> byteskip: <skip>
If skip is greater than or equal to zero, it tells how many bytes to skip in a data file in order to get to the beginning of the data. By definition, the bytes are skipped according to the action of fgetc(). When used in combination with line skip, the byte skipping is done after the line skipping. When this field does not appear, skip is taken to be zero. As an idiom copied from the MetaImage file format, the value of skip can be -1. This is valid only with raw encoding. The action of this byte skip is to fseek() backwards from the end of the data file, to the beginning of the data. The distance to seek is calculated from the nrrd type and axis sizes. This is a useful trick for getting at binary data in other formats with unknown (or variable) length binary headers, such as DICOM, TIFF, and BMP, but only if the data is uncompressed, and only if the end of the data is contiguous with the end of the file (which can fail to be the case in DICOM and TIFF).
If skip is -1, the action of lineskip is entirely moot.
The interpretation of byte skip changes according to whether or not the encoding used is a form of compression or not. The only compressions currently supported gzip and bzip2. In uncompressed encodings, the byte skipping is done just like the line skipping: within the data file, so as to locate the beginning of the data, and prior to the decoding of any data. In compressed encodings however, the line skipping is done first, and then the decompression begins. The byte skipping is done within the stream of decompressed data.
The reason for skipping bytes but not lines in the decompressed stream
is basically motivated by the conceptual difference between ASCII and
binary headers. One reason to write headers in ASCII is to make them
human readable, so they probably shouldn't be compressed to begin
with. Also, ASCII headers (such as in PNM images) often allow
multiple lines of optional comments, so the number of lines to skip
has to be determined on a per-file basis by looking at the
(uncompressed) file, at which point the data might as well be written
out as a NRRD file. In contrast, binary headers are very often fixed
length, and not human readable, which means that when the header and
data are compressed together, the beginning of the data can be easily
found via a byte skip offset. This also applies to large
datasets written by FORTRAN programs, for which even "raw" data can be
proceeded by a four-byte representation of the data length.
number: <string>
sample units: <string> sampleunits: <string>
sizes: <size[0]> <size[1]> ... <size[dim-1]>
spacings: <space[0]> <space[1]> ... <space[dim-1]>
Because there must be one spacing for each axis, spacings must be given for axes which don't logically have a spatial component, such as the RGB axis of color image data, which is usually axis 0. Rather than invent a value (such as 1.0) for sample spacing where no value is sensible, a spacing value of "nan" should be used instead. In addition, "nan" can represent the fact that spacing information would be sensible here, but simply isn't known. Of course, if spacings are NaN for every axis, the field probably shouldn't be in the header.
The meaning and interpretation of the spacings field is
independent of the centers, axis min and axis
maxs fields, even though mutually incompatible settings are
possible.
thicknesses: <thickness[0]> <thickness[1]> ... <thickness[dim-1]>
It is likely the case that only one axis has this information
associated with it, so "nan" should be used for the other
axes. The compelling reason to make thickness a per-axis field is
that if it were a basic field, one would need a separate basic field
to identify which axis is the slice axis, and this information would
become invalid if the array axes were permuted.
axis mins: <min[0]> <min[1]> ... <min[dim-1]> axismins: <min[0]> <min[1]> ... <min[dim-1]>
Infinite values are not valid as axis mins. Any non-infinite values, including zero, are valid. As with spacings information, the use of "nan" as an axis min value is probably preferable to inventing one where no value is meaningful or known.
Presence of the axis mins field does not require presence of
the axis maxs field, although it is often useful for these to
appear together. However, using the axis mins field alone
can emulate the ORIGIN field of the VTK file format header.
axis maxs: <max[0]> <max[1]> ... <max[dim-1]> axismaxs: <max[0]> <max[1]> ... <max[dim-1]>
The settings of axis mins and axis maxs would seem
to imply a value for spacings, but this also depends on the
values of centers. Mutually incompatible settings of
these fields are possible to save in a NRRD header, but is not the job
of the NRRD reader to ensure their consistency, only to check that the
individual values in isolation are sensible (for instance, an axis max
can't be infinite).
centers: <center[0]> <center[1]> ... <center[dim-1]> centerings: <center[0]> <center[1]> ... <center[dim-1]>
<cent[i]> | Meaning | Examples |
"cell" | The location of the sample is centered in the interior of the grid element. | Histograms, scatterplots, images for mip-maps, images in contexts in which a pixel can be correctly thought of as "a little square", volumes as a grid of cuberilles, in which the logical element is a cube with a single value at the center. |
"node" | The location of the sample is at the boundary between grid elements. | Volumes as a grid of "voxels", in which the logical element is a cube with a value at each of its eight corners. |
"???" or "none" | Centering information for this axis is either meaningless or unknown | Any non-spatial axis, such as a short axis for vector or tensor components, preceding all the spatial axes. |
labels: "<label[0]>" "<label[1]>" ... "<label[dim-1]>"
As shown above, each label is delimited by double quotes. Within each label, double quotes may be included by escaping them (\"), but no other form of escaping is supported. For axes with no labels, use a quoted empty string ("").
There is no fixed limit on how long the line containing the
labels field can be.
units: "<unit[0]>" "<unit[1]>" ... "<unit[dim-1]>"
kinds: <kind[0]> <kind[1]> ... <kind[dim-1]>
The possible values for <kind[i]> are as follows. Prior to the release of Teem 1.9, numerous kinds were added; future kinds will be added with more skepticism and caution.
As NRRD files can represent sampled functions, images, fields, or maps of various types, one has to keep in mind the difference between axes which represent the domain of the function (or, an independent variable), versus the range of the function (or, a dependent variable). The only kinds below which represent a domain are:
<kind[i]> | Required axis size | Meaning |
"domain" | (none) | The samples along this axis are positioned along some domain, as opposed to being the components of coordinates of non-scalar quantities. Basically: "it makes sense to resample or blur along this axis". |
"space" | (none) | Like domain, but this is a spatial domain |
"time" | (none) | Like domain, but this is a temporal domain |
"list" | (none) | The samples along this axis are coordinates or coefficients of non-scalar quantities, such as a 20-dimensional vector. Basically: "it makes no sense to resample or blur along this axis". |
"point" | (none) | Specialized form of list: samples along this axis are coordinates of a point. |
"vector" | (none) | Specialized form of list: samples along this axis are coefficients of a vector, likely a contravariant vector. |
"covariant-vector" | (none) | Specialized form of list: samples along this axis are coefficients of a covariant vector, such as a gradient of a scalar field. |
"normal" | (none) | Specialized form of list: samples along this axis are coefficients of a nominally unit-length covariant vector (though no efforts are made at the level of the NRRD file reader/writer to verify or assert this). |
"stub" | 1 | The single sample on this axis is just a stub. |
"scalar" | 1 | The single sample on this axis is explicitly indicated as a scalar value. |
"complex" | 2 | The two samples along this axis are the 2 components
of a complex value: real imag |
"2-vector" | 2 | Any 2-vector |
"3-color" | 3 | A 3-vector whose components are color values |
"RGB-color" | 3 | Specialized form of "3-color": samples are red, green, and blue values, in that order |
"HSV-color" | 3 | Specialized form of "3-color": samples are hue, saturation, and value values, in that order. |
"XYZ-color" | 3 | Specialized form of "3-color": samples are the X, Y, and Z coefficients (in that order) of the CIE XYZ colorspace |
"4-color" | 4 | Any 4-vector of colors (ordering of R, G, B, and A are not imposed) |
"RGBA-color" | 4 | Specialized form of "4-color": the values are red, green, blue, and alpha, in that order. |
"3-vector" | 3 | Any 3-vector |
"3-gradient" | 3 | A 3-vector which is known to be covariant |
"3-normal" | 3 | A (covariant) 3-vector which is assumed to have unit L2 norm |
"4-vector" | 4 | Any 4-vector |
"quaternion" | 4 | The (w,x,y,z) coefficients of a quaternion, in that order, with no normalization assumed, where w is the real coefficient, and x,y,z are imaginary. |
"2D-symmetric-matrix" | 3 | Unique components of 2D symmetric matrix, in order: Mxx Mxy Myy |
"2D-masked-symmetric-matrix" | 4 | Unique components of 2D symmetric matrix, preceeded by a mask value
to indicate validity of the matrix: mask Mxx Mxy Myy |
"2D-matrix" | 4 | Components of a 2D matrix: Mxx Mxy Myx Myy |
"2D-masked-matrix" | 4 | Mask value with components of 2D matrix: mask Mxx Mxy Myx Myy |
"3D-symmetric-matrix" | 6 | Unique components of a 3D symmetric matrix: Mxx Mxy Mxz Myy Myz Mzz |
"3D-masked-symmetric-matrix" | 7 | Mask value with unique components of 3D symmetric matrix: mask Mxx Mxy Mxz Myy Myz Mzz |
"3D-matrix" | 9 | Components of 3D matrix: Mxx Mxy Mxz Myx Myy Myz Mzx Mzy Mzz |
"3D-masked-matrix" | 10 | Mask value with components of 3D matrix: mask Mxx Mxy Mxz Myx Myy Myz Mzx Mzy Mzz |
"???", "none" | (none) | Kind information for this axis is unknown or not representable |
One issue that arises with storing vectors and matrices in NRRDs is identifying the coordinate frame in which their coefficients are measured. The way to do this in NRRD is to use the measurement frame field in combination with the space directions field: which relates the measurement frame to the orientation of image as a whole. Without the measurement frame, field there are no semantics for what the measurement frame is (it isn't safe to assume that its the same as the image orientation).