VTK File Formats
The file format used by Paraview and VTK is quite simple, but there are many subformats that may lead to confusion. Common to all formats is:
- A right-handed cartesian coordinate system is used.
- Both coordinate points and cells (= the area between the points)
are defined in the file, so data values can be assigned either to
points or to cells.
The data values can be integer or floating point numbers (C variable types: int32 or float), of any dimension (scalars, vectors, tensors).
ASCII floating point numbers are written with a decimal point (not comma!); the scientific notation (like "3.14E-3") can also be used. Binary values are written in "C style", i.e. without the additional data that Fortran prepends/appends.
An arbitrary number of datasets can be assigned to the points or the cells, but since a file can contain only one coordinate grid, each dataset must have data values for every coordinate point or grid cell. Datasets on different grids must be written in different files.
This is not entirely true in the case of the new XML file types. Here each file may contain "one or more Piece elements, each describing a portion of the dataset". Each piece comes with its own definition of associated points and cells.
Animations are possible by giving the filenames consecutive numbers (at the end of the name, before the extension), e.g. "file001.vtk", "file002.vtk" and so on. These files can then be played as an animation.
Old ("Legacy") and New (XML) Format
There are two generations of file types.
The old *.vtk file format uses ASCII header lines that separate the data sections. The data itself can be ASCII or binary.
The new XML-based formats (extension: depends; e.g. *.vtp for polygonal data, *.vtu for unstructured grid) enclose the data sections with XML tags. The data can also be stored as ASCII or binary, but to get valid XML files, the binary data has to be base64-encoded. Additionally, the binary data can be compressed using the zlib, so the file size is smaller than with the raw binary format, even though the compressed data is base64-encoded.
Both formats have their advantages and disadvantages. For example the new XML format does not seem to have support for color lookup tables, but supports parallel file formats.
There are the following grid types:
- Unstructured grid: 2D or 3D grid; for every grid point all three coordinates and for each grid cell all constituent points and the cell shape are given. XML file extension: *.vtu
Polygonal data: 2D grid; like the unstructured grid, but there are no polyhedra, but only flat polygons. Especially suited for maps (topography). XML file extension: *.vtp
Rectilinear grid: 3D grid; the grid cells are cuboids, so only the steps along the coordinate axes have to be given, but not the individual point coordinates or the connectivity.
Structured grid: 3D grid; here, all point coordinates are given, but the connectivity is omitted.
Structured points: Like the rectilinear grid, but the spacing between the points is equidistant — so only the origin and the spacing has to be given, not the point coordinates.
Use the legacy ASCII format when you are writing a program that produces output for Paraview, because this format is the simplest one. And you can also separate the grid from the data values, because each section is only prepended by a header, not enclosed in tags — you can define your grid once and then concatenate different datasets to it.
If you need a different file format, do the conversion using the VTK toolkit; e.g. write a Tcl script. You can then convert your simple ASCII *.vtk file into a full-featured binary compressed XML file in one single step.
Example Files in Detail
The file formats are documented in this file. However, this description is not always clear — so here are some example files together with explanation.
Common To All Formats
All file formats have the following sections:
Points: The coordinates of the grid points; each point has three coordinates, so there are 3n values for n points.
Cells: The shapes of the cells, expressed by grid points. If a cell is defined by the first, the sixth, the seventh and the tenth grid point, there would be the numbers 1, 6, 7, 10.
Cell types: To each cell, a type can be assigned, so there are as many cell type values as cells.
Point data: To each grid point, a value can be assigned. This can be a scalar value or a vector.
Cell data: Also to each cell, a value (scalar, vector) can be assigned.
ASCII vtk Format
Each section of the file starts with one line describing the type of section and the amount of data that follows. The data is written as ASCII numbers, separated by spaces or line breaks.
POINTS 27 float: 27 grid points follow (each with 3 coordinates = 81 floating point values)
CELLS 8 72: 8 cells follow, in total 72 values (for each cell: first the number of values that define the cell (here: 8) and then the 8 grid point indices)
CELL_DATA 8: Data for 8 cells follow. Multiple data sections can follow; each one begins with a line defining the data type and the name of the dataset. SCALARS my_scalar_values float means that a dataset with scalar values and the name “my_scalar_values” follows.
Binary vtk Format
Similar to the ASCII vtk format, except that the data is written in binary. For example, after a line with POINTS 27 float follow 324 bytes because each of the 81 floating point values requires 4 bytes.
ASCII vtu Format
This format is based on XML, so the data sections are not prepended by lines defining the content, but enclosed by the corresponding XML tags. The data content is quite similar to the vtk format; but one exception is the definition of the cells. In the vtk format, each cell is defined by a line like 8 0 9 3 12 1 10 4 13; the first value gives the number of grid points that delimit the cell, and the next values are the indices of these grid points (because the first value is 8, 8 indices follow). In the vtu format, the first column is skipped, so each entry looks like 0 9 3 12 1 10 4 13, but there is another data array with the corresponding offset values; if each cell is made of 8 points, the offsets array looks like 8 16 24 32 40 ....
Binary Inline vtu Format
The XML structure is like in the ASCII vtu format. But the data is not stored the same way as in the binary vtk format, because the XML language forbids special characters. So the binary data has to be base64-encoded (sample code see here or encode/decode online). Additionally, there is a header prepended to the data; it is a 32 bit integer containing the data length (in bytes). This header is encoded separately. So in pseudocode, the data output would look like this (always without spaces or line breaks!):
int32 = length( data ); output = base64-encode( int32 ) + base64-encode( data );
Binary Appended vtu Format
All the data can also be appended to the data format definitions. In this case, the DataArray tags contain offsets into the base64 encoded data section. The data is encoded the same way as in the inline format, i.e. each entry with a Int32 header containing the data length and base64 encoded. Note that the appended data section begins with an underscore.
Compressed Binary vtu Formats
(Update by Marcus 17.03.2010)
The "binary" vtu formats also allow to compress data before performing base64 encoding. The compression can be applied to a complete set of data. However, it is also possible to split the data into blocks of equal size (the last block might be smaller) and compress those blocks individually.
The only compression method currently available is based on zlib. The use of compression must be indicated by adding
compressor attribute to the
VTKFile element of the
XML tree in the following fashion
As in the base64 encoded case an additional header is required. Header and compressed blocks are to be encoded separately and written consecutively. The header consists of N+3 values of type Uint32/Int32, where N represents the number of blocks. The meaning of the indiviual values is as follows
- [value 1]: number of blocks
- [value 2]: size of a block before compression
- [value 3]: size of a final block before compression, since this block might be smaller
- [value 4 - N+3]: size of k-th block after compression, k = 1, ..., N
See Kitware Public Wiki for a full description. You can also check
vtkXMLDataParser::ReadCompressionHeader() to get an idea :) Pseudocode for the case of encoding all data at once, i.e. one block,
could look like this:
int32 = 1; int32 = length( data ); int32 = length( data ); zdata = compress( data ); int32 = length( zdata ); output = base64-encode( int32 ) + base64-encode( zdata );
Example code how to use the zlib can be found here. Compilation and usage (convert from compressed to uncompressed base64 encoded data):
$ cc -o zpipe -lz zpipe.c $ cat compressed_binary_vtu_data | base64-decode | zpipe -d | base64-encode
Parallel File Formats
One big advantage of VTK is the ability to split up large datasets in several files and process them on a parallel computer. There, I/O is usually the bottleneck, when several processors have to access one single large file. When splitting it up, every processor can read its data independently from the others.