---
title: 'XML annotation format'
linkTitle: 'XML annotation format'
weight: 22
---
When you want to download annotations from Computer Vision Annotation Tool (CVAT)
you can choose one of several data formats. The document describes XML annotation format.
Each format has X.Y version (e.g. 1.0). In general the major version (X) is incremented when the data format has
incompatible changes and the minor version (Y) is incremented when the data format is slightly modified
(e.g. it has one or several extra fields inside meta information).
The document will describe all changes for all versions of XML annotation format.
## Version 1.1
There are two different formats for images and video tasks at the moment.
The both formats have a common part which is described below. From the previous version `flipped` tag was added.
Also `original_size` tag was added for interpolation mode to specify frame size.
In annotation mode each image tag has `width` and `height` attributes for the same purpose.
For what is `rle`, see [Run-length encoding](https://en.wikipedia.org/wiki/Run-length_encoding)
```xml
1.1Number: id of the taskString: some task nameNumber: count of frames/images in the taskString: interpolation or annotationNumber: number of overlapped frames between segmentsString: URL on an page which describe the taskBoolean: were images of the task flipped? (True/False)String: date when the task was createdString: date when the task was updatedNumber: id of the segmentNumber: first frameNumber: last frameString: URL (e.g. http://cvat.example.com/?id=213)String: the author of the taskString: email of the authorNumber: frame widthNumber: frame heightString: date when the annotation was dumped
...
```
### Annotation
Below you can find description of the data format for images tasks.
On each image it is possible to have many different objects. Each object can have multiple attributes.
If an annotation task is created with `z_order` flag then each object will have `z_order` attribute which is used
to draw objects properly when they are intersected (if `z_order` is bigger the object is closer to camera).
In previous versions of the format only `box` shape was available.
In later releases `mask`, `polygon`, `polyline`, `points`, `skeletons` and `tags` were added.
Please see below for more details:
```xml
...
String: the attribute value
...
String: the attribute value
...
String: the attribute value
...
String: the attribute value
...
String: the attribute value
...
String: the attribute value
...
String: the attribute value
...
String: the attribute value
...
...
...
```
Example:
```xml
1.14segmentation27annotation0False2018-09-25 11:34:24.617558+03:002018-09-25 11:38:27.301183+03:004026http://localhost:8080/?id=4admin2018-09-25 11:38:28.799808+03:00
```
### Interpolation
Below you can find description of the data format for video tasks.
The annotation contains tracks. Each track corresponds to an object which can be presented on multiple frames.
The same object cannot be presented on the same frame in multiple locations.
Each location of the object can have multiple attributes even if an attribute is immutable for the object it will be
cloned for each location (a known redundancy).
```xml
...
...
```
Example:
```xml
1.15interpolation4620interpolation5False2018-09-25 12:32:09.868194+03:002018-09-25 16:05:05.619841+03:00504619http://localhost:8080/?id=5admin6404802018-09-25 16:05:07.134046+03:00
```