+
+RFC: Common Trace Format Proposal (v1.6)
+
+Mathieu Desnoyers, EfficiOS Inc.
+
+The goal of the present document is to propose a trace format that suits the
+needs of the embedded, telecom, high-performance and kernel communities. It is
+based on the Common Trace Format Requirements (v1.4) document. It is designed to
+allow traces to be natively generated by the Linux kernel, Linux user-space
+applications written in C/C++, and hardware components.
+
+The latest version of this document can be found at:
+
+ git tree: git://git.efficios.com/ctf.git
+ gitweb: http://git.efficios.com/?p=ctf.git
+
+A reference implementation of a library to read and write this trace format is
+being implemented within the BabelTrace project, a converter between trace
+formats. The development tree is available at:
+
+ git tree: git://git.efficios.com/babeltrace.git
+ gitweb: http://git.efficios.com/?p=babeltrace.git
+
+
+1. Preliminary definitions
+
+ - Event Trace: An ordered sequence of events.
+ - Event Stream: An ordered sequence of events, containing a subset of the
+ trace event types.
+ - Event Packet: A sequence of physically contiguous events within an event
+ stream.
+ - Event: This is the basic entry in a trace. (aka: a trace record).
+ - An event identifier (ID) relates to the class (a type) of event within
+ an event stream.
+ e.g. event: irq_entry.
+ - An event (or event record) relates to a specific instance of an event
+ class.
+ e.g. event: irq_entry, at time X, on CPU Y
+ - Source Architecture: Architecture writing the trace.
+ - Reader Architecture: Architecture reading the trace.
+
+
+2. High-level representation of a trace
+
+A trace is divided into multiple event streams. Each event stream contains a
+subset of the trace event types.
+
+The final output of the trace, after its generation and optional transport over
+the network, is expected to be either on permanent or temporary storage in a
+virtual file system. Because each event stream is appended to while a trace is
+being recorded, each is associated with a separate file for output. Therefore,
+a stored trace can be represented as a directory containing one file per stream.
+
+A metadata event stream contains information on trace event types. It describes:
+
+- Trace version.
+- Types available.
+- Per-stream event header description.
+- Per-stream event header selection.
+- Per-stream event context fields.
+- Per-event
+ - Event type to stream mapping.
+ - Event type to name mapping.
+ - Event type to ID mapping.
+ - Event fields description.
+
+
+3. Event stream
+
+An event stream is divided in contiguous event packets of variable size. These
+subdivisions have a variable size. An event packet can contain a certain amount
+of padding at the end. The rationale for the event stream design choices is
+explained in Appendix B. Stream Header Rationale.
+
+An event stream is divided in contiguous event packets of variable size. These
+subdivisions have a variable size. An event packet can contain a certain amount
+of padding at the end. The stream header is repeated at the beginning of each
+event packet.
+
+The event stream header will therefore be referred to as the "event packet
+header" throughout the rest of this document.
+
+
+4. Types
+
+4.1 Basic types
+
+A basic type is a scalar type, as described in this section.
+
+4.1.1 Type inheritance
+
+Type specifications can be inherited to allow deriving types from a
+type class. For example, see the uint32_t named type derived from the "integer"
+type class below ("Integers" section). Types have a precise binary
+representation in the trace. A type class has methods to read and write these
+types, but must be derived into a type to be usable in an event field.
+
+4.1.2 Alignment
+
+We define "byte-packed" types as aligned on the byte size, namely 8-bit.
+We define "bit-packed" types as following on the next bit, as defined by the
+"bitfields" section.
+
+All basic types, except bitfields, are either aligned on an architecture-defined
+specific alignment or byte-packed, depending on the architecture preference.
+Architectures providing fast unaligned write byte-packed basic types to save
+space, aligning each type on byte boundaries (8-bit). Architectures with slow
+unaligned writes align types on specific alignment values. If no specific
+alignment is declared for a type nor its parents, it is assumed to be bit-packed
+for bitfields and byte-packed for other types.
+
+Metadata attribute representation of a specific alignment:
+
+ align = value; /* value in bits */
+
+4.1.3 Byte order
+
+By default, the native endianness of the source architecture the trace is used.
+Byte order can be overridden for a basic type by specifying a "byte_order"
+attribute. Typical use-case is to specify the network byte order (big endian:
+"be") to save data captured from the network into the trace without conversion.
+If not specified, the byte order is native.
+
+Metadata representation:
+
+ byte_order = native OR network OR be OR le; /* network and be are aliases */
+
+4.1.4 Size
+
+Type size, in bits, for integers and floats is that returned by "sizeof()" in C
+multiplied by CHAR_BIT.
+We require the size of "char" and "unsigned char" types (CHAR_BIT) to be fixed
+to 8 bits for cross-endianness compatibility.
+
+Metadata representation:
+
+ size = value; (value is in bits)
+
+4.1.5 Integers
+
+Signed integers are represented in two-complement. Integer alignment, size,
+signedness and byte ordering are defined in the metadata. Integers aligned on
+byte size (8-bit) and with length multiple of byte size (8-bit) correspond to
+the C99 standard integers. In addition, integers with alignment and/or size that
+are _not_ a multiple of the byte size are permitted; these correspond to the C99
+standard bitfields, with the added specification that the CTF integer bitfields
+have a fixed binary representation. A MIT-licensed reference implementation of
+the CTF portable bitfields is available at:
+
+ http://git.efficios.com/?p=babeltrace.git;a=blob;f=include/babeltrace/bitfield.h
+
+Binary representation of integers:
+
+- On little and big endian:
+ - Within a byte, high bits correspond to an integer high bits, and low bits
+ correspond to low bits.
+- On little endian:
+ - Integer across multiple bytes are placed from the less significant to the
+ most significant.
+ - Consecutive integers are placed from lower bits to higher bits (even within
+ a byte).
+- On big endian:
+ - Integer across multiple bytes are placed from the most significant to the
+ less significant.
+ - Consecutive integers are placed from higher bits to lower bits (even within
+ a byte).
+
+This binary representation is derived from the bitfield implementation in GCC
+for little and big endian. However, contrary to what GCC does, integers can
+cross units boundaries (no padding is required). Padding can be explicitely
+added (see 4.1.6 GNU/C bitfields) to follow the GCC layout if needed.
+
+Metadata representation:
+
+ integer {
+ signed = true OR false; /* default false */
+ byte_order = native OR network OR be OR le; /* default native */
+ size = value; /* value in bits, no default */
+ align = value; /* value in bits */
+ };
+
+Example of type inheritance (creation of a uint32_t named type):
+
+typedef integer {
+ size = 32;
+ signed = false;
+ align = 32;
+} uint32_t;
+
+Definition of a named 5-bit signed bitfield:
+
+typedef integer {
+ size = 5;
+ signed = true;
+ align = 1;
+} int5_t;
+
+4.1.6 GNU/C bitfields
+
+The GNU/C bitfields follow closely the integer representation, with a
+particularity on alignment: if a bitfield cannot fit in the current unit, the
+unit is padded and the bitfield starts at the following unit. The unit size is
+defined by the size of the type "unit_type".
+
+Metadata representation. Either:
+
+gcc_bitfield {
+ unit_type = integer {
+ ...
+ };
+ size = value;
+};
+
+Or bitfield within structures as specified by the C standard
+
+ unit_type name:size:
+
+As an example, the following structure declared in C compiled by GCC:
+
+struct example {
+ short a:12;
+ short b:5;
+};
+
+is equivalent to the following structure declaration, aligned on the largest
+element (short). The second bitfield would be aligned on the next unit boundary,
+because it would not fit in the current unit. The two declarations (C
+declaration above or CTF declaration with "type gcc_bitfield") are strictly
+equivalent.
+
+struct example {
+ gcc_bitfield {
+ unit_type = short;
+ size = 12;
+ } a;
+ gcc_bitfield {
+ unit_type = short;
+ size = 5;
+ } b;
+};
+
+4.1.7 Floating point
+
+The floating point values byte ordering is defined in the metadata.
+
+Floating point values follow the IEEE 754-2008 standard interchange formats.
+Description of the floating point values include the exponent and mantissa size
+in bits. Some requirements are imposed on the floating point values:
+
+- FLT_RADIX must be 2.
+- mant_dig is the number of digits represented in the mantissa. It is specified
+ by the ISO C99 standard, section 5.2.4, as FLT_MANT_DIG, DBL_MANT_DIG and
+ LDBL_MANT_DIG as defined by <float.h>.
+- exp_dig is the number of digits represented in the exponent. Given that
+ mant_dig is one bit more than its actual size in bits (leading 1 is not
+ needed) and also given that the sign bit always takes one bit, exp_dig can be
+ specified as:
+
+ - sizeof(float) * CHAR_BIT - FLT_MANT_DIG
+ - sizeof(double) * CHAR_BIT - DBL_MANT_DIG
+ - sizeof(long double) * CHAR_BIT - LDBL_MANT_DIG
+
+Metadata representation:
+
+floating_point {
+ exp_dig = value;
+ mant_dig = value;
+ byte_order = native OR network OR be OR le;
+};
+
+Example of type inheritance:
+
+typedef floating_point {
+ exp_dig = 8; /* sizeof(float) * CHAR_BIT - FLT_MANT_DIG */
+ mant_dig = 24; /* FLT_MANT_DIG */
+ byte_order = native;
+} float;
+
+TODO: define NaN, +inf, -inf behavior.
+
+4.1.8 Enumerations
+
+Enumerations are a mapping between an integer type and a table of strings. The
+numerical representation of the enumeration follows the integer type specified
+by the metadata. The enumeration mapping table is detailed in the enumeration
+description within the metadata. The mapping table maps inclusive value ranges
+(or single values) to strings. Instead of being limited to simple
+"value -> string" mappings, these enumerations map
+"[ start_value ... end_value ] -> string", which map inclusive ranges of
+values to strings. An enumeration from the C language can be represented in
+this format by having the same start_value and end_value for each element, which
+is in fact a range of size 1. This single-value range is supported without
+repeating the start and end values with the value = string declaration. If the
+<integer_type> is omitted, the type chosen by the C compiler to hold the
+enumeration is used. The <integer_type> specifier can only be omitted for
+enumerations containing only simple "value -> string" mappings (compatible with
+C).
+
+enum <integer_type> name {
+ string = start_value1 ... end_value1,
+ "other string" = start_value2 ... end_value2,
+ yet_another_string, /* will be assigned to end_value2 + 1 */
+ "some other string" = value,
+ ...
+};
+
+If the values are omitted, the enumeration starts at 0 and increment of 1 for
+each entry:
+
+enum {
+ ZERO,
+ ONE,
+ TWO,
+ TEN = 10,
+ ELEVEN,
+};
+
+Overlapping ranges within a single enumeration are implementation defined.
+
+4.2 Compound types
+
+4.2.1 Structures
+
+Structures are aligned on the largest alignment required by basic types
+contained within the structure. (This follows the ISO/C standard for structures)
+
+Metadata representation of a named structure:
+
+struct name {
+ field_type field_name;
+ field_type field_name;
+ ...
+};
+
+Example:
+
+struct example {
+ integer { /* Nameless type */
+ size = 16;
+ signed = true;
+ align = 16;
+ } first_field_name;
+ uint64_t second_field_name; /* Named type declared in the metadata */
+};
+
+The fields are placed in a sequence next to each other. They each possess a
+field name, which is a unique identifier within the structure.
+
+A nameless structure can be declared as a field type:
+
+struct {
+ ...
+} field_name;
+
+4.2.2 Arrays
+
+Arrays are fixed-length. Their length is declared in the type declaration within
+the metadata. They contain an array of "inner type" elements, which can refer to
+any type not containing the type of the array being declared (no circular
+dependency). The length is the number of elements in an array.
+
+Metadata representation of a named array, either:
+
+typedef array {
+ length = value;
+ elem_type = type;
+} name;
+
+or:
+
+typedef elem_type name[length];
+
+E.g.:
+
+typedef array {
+ length = 10;
+ elem_type = uint32_t;
+} example;
+
+A nameless array can be declared as a field type, e.g.:
+
+array {
+ length = 5;
+ elem_type = uint8_t;
+} field_name;
+
+or
+
+uint8_t field_name[10];
+
+
+4.2.3 Sequences
+
+Sequences are dynamically-sized arrays. They start with an integer that specify
+the length of the sequence, followed by an array of "inner type" elements.
+The length is the number of elements in the sequence.
+
+Metadata representation for a named sequence, either:
+
+typedef sequence {
+ length_type = type; /* integer class */
+ elem_type = type;
+} name;
+
+or:
+
+typedef elem_type name[length_type];
+
+A nameless sequence can be declared as a field type, e.g.:
+
+sequence {
+ length_type = int;
+ elem_type = long;
+} field_name;
+
+or
+
+long field_name[int];
+
+The length type follows the integer types specifications, and the sequence
+elements follow the "array" specifications.
+
+4.2.4 Strings
+
+Strings are an array of bytes of variable size and are terminated by a '\0'
+"NULL" character. Their encoding is described in the metadata. In absence of
+encoding attribute information, the default encoding is UTF-8.
+
+Metadata representation of a named string type:
+
+typedef string {
+ encoding = UTF8 OR ASCII;
+} name;
+
+A nameless string type can be declared as a field type:
+
+string field_name; /* Use default UTF8 encoding */
+
+5. Event Packet Header
+
+The event packet header consists of two part: one is mandatory and have a fixed
+layout. The second part, the "event packet context", has its layout described in
+the metadata.
+
+- Aligned on page size. Fixed size. Fields either aligned or packed (depending
+ on the architecture preference).
+ No padding at the end of the event packet header. Native architecture byte
+ ordering.
+
+Fixed layout (event packet header):
+
+- Magic number (CTF magic numbers: 0xC1FC1FC1 and its reverse endianness
+ representation: 0xC11FFCC1) It needs to have a non-symmetric bytewise
+ representation. Used to distinguish between big and little endian traces (this
+ information is determined by knowing the endianness of the architecture
+ reading the trace and comparing the magic number against its value and the
+ reverse, 0xC11FFCC1). This magic number specifies that we use the CTF metadata
+ description language described in this document. Different magic numbers
+ should be used for other metadata description languages.
+- Trace UUID, used to ensure the event packet match the metadata used.
+ (note: we cannot use a metadata checksum because metadata can be appended to
+ while tracing is active)
+- Stream ID, used as reference to stream description in metadata.
+
+Metadata-defined layout (event packet context):
+
+- Event packet content size (in bytes).
+- Event packet size (in bytes, includes padding).
+- Event packet content checksum (optional). Checksum excludes the event packet
+ header.
+- Per-stream event packet sequence count (to deal with UDP packet loss). The
+ number of significant sequence counter bits should also be present, so
+ wrap-arounds are deal with correctly.
+- Timestamp at the beginning and timestamp at the end of the event packet.
+ Both timestamps are written in the packet header, but sampled respectively
+ while (or before) writing the first event and while (or after) writing the
+ last event in the packet. The inclusive range between these timestamps should
+ include all event timestamps assigned to events contained within the packet.
+- Events discarded count
+ - Snapshot of a per-stream free-running counter, counting the number of
+ events discarded that were supposed to be written in the stream prior to
+ the first event in the event packet.
+ * Note: producer-consumer buffer full condition should fill the current
+ event packet with padding so we know exactly where events have been
+ discarded.
+- Lossless compression scheme used for the event packet content. Applied
+ directly to raw data. New types of compression can be added in following
+ versions of the format.
+ 0: no compression scheme
+ 1: bzip2
+ 2: gzip
+ 3: xz
+- Cypher used for the event packet content. Applied after compression.
+ 0: no encryption
+ 1: AES
+- Checksum scheme used for the event packet content. Applied after encryption.
+ 0: no checksum
+ 1: md5
+ 2: sha1
+ 3: crc32
+
+5.1 Event Packet Header Fixed Layout Description
+
+struct event_packet_header {
+ uint32_t magic;
+ uint8_t trace_uuid[16];
+ uint32_t stream_id;
+};
+
+5.2 Event Packet Context Description
+
+Event packet context example. These are declared within the stream declaration
+in the metadata. All these fields are optional except for "content_size" and
+"packet_size", which must be present in the context.
+
+An example event packet context type:
+
+struct event_packet_context {
+ uint64_t timestamp_begin;
+ uint64_t timestamp_end;
+ uint32_t checksum;
+ uint32_t stream_packet_count;
+ uint32_t events_discarded;
+ uint32_t cpu_id;
+ uint32_t/uint16_t content_size;
+ uint32_t/uint16_t packet_size;
+ uint8_t stream_packet_count_bits; /* Significant counter bits */
+ uint8_t compression_scheme;
+ uint8_t encryption_scheme;
+ uint8_t checksum;
+};
+
+6. Event Structure
+
+The overall structure of an event is:
+
+ - Event Header (as specifed by the stream metadata)
+ - Extended Event Header (as specified by the event header)
+ - Event Context (as specified by the stream metadata)
+ - Event Payload (as specified by the event metadata)
+
+
+6.1 Event Header
+
+One major factor can vary between streams: the number of event IDs assigned to
+a stream. Luckily, this information tends to stay relatively constant (modulo
+event registration while trace is being recorded), so we can specify different
+representations for streams containing few event IDs and streams containing
+many event IDs, so we end up representing the event ID and timestamp as densely
+as possible in each case.
+
+We therefore provide two types of events headers. Type 1 accommodates streams
+with less than 31 event IDs. Type 2 accommodates streams with 31 or more event
+IDs.
+
+The "extended headers" are used in the rare occasions where the information
+cannot be represented in the ranges available in the event header. They are also
+used in the rare occasions where the data required for a field could not be
+collected: the flag corresponding to the missing field within the missing_fields
+array is then set to 1.
+
+Types uintX_t represent an X-bit unsigned integer.
+
+
+6.1.1 Type 1 - Few event IDs
+
+ - Aligned on 32-bit (or 8-bit if byte-packed, depending on the architecture
+ preference).
+ - Fixed size: 32 bits.
+ - Native architecture byte ordering.
+
+struct event_header_1 {
+ uint5_t id; /*
+ * id: range: 0 - 30.
+ * id 31 is reserved to indicate a following
+ * extended header.
+ */
+ uint27_t timestamp;
+};
+
+The end of a type 1 header is aligned on a 32-bit boundary (or packed).
+
+
+6.1.2 Extended Type 1 Event Header
+
+ - Follows struct event_header_1, which is aligned on 32-bit, so no need to
+ realign.
+ - Variable size (depends on the number of fields per event).
+ - Native architecture byte ordering.
+ - NR_FIELDS is the number of fields within the event.
+
+struct event_header_1_ext {
+ uint32_t id; /* 32-bit event IDs */
+ uint64_t timestamp; /* 64-bit timestamps */
+ uint1_t missing_fields[NR_FIELDS]; /* missing event fields bitmap */
+};
+
+
+6.1.3 Type 2 - Many event IDs
+
+ - Aligned on 32-bit (or 8-bit if byte-packed, depending on the architecture
+ preference).
+ - Fixed size: 48 bits.
+ - Native architecture byte ordering.
+
+struct event_header_2 {
+ uint32_t timestamp;
+ uint16_t id; /*
+ * id: range: 0 - 65534.
+ * id 65535 is reserved to indicate a following
+ * extended header.
+ */
+};
+
+The end of a type 2 header is aligned on a 16-bit boundary (or 8-bit if
+byte-packed).
+
+
+6.1.4 Extended Type 2 Event Header
+
+ - Follows struct event_header_2, which alignment end on a 16-bit boundary, so
+ we need to align on 64-bit integer architecture alignment (or 8-bit if
+ byte-packed).
+ - Variable size (depends on the number of fields per event).
+ - Native architecture byte ordering.
+ - NR_FIELDS is the number of fields within the event.
+
+struct event_header_2_ext {
+ uint64_t timestamp; /* 64-bit timestamps */
+ uint32_t id; /* 32-bit event IDs */
+ uint1_t missing_fields[NR_FIELDS]; /* missing event fields bitmap */
+};
+
+
+6.2 Event Context
+
+The event context contains information relative to the current event. The choice
+and meaning of this information is specified by the metadata "stream"
+information. For this trace format, event context is usually empty, except when
+the metadata "stream" information specifies otherwise by declaring a non-empty
+structure for the event context. An example of event context is to save the
+event payload size with each event, or to save the current PID with each event.
+These are declared within the stream declaration within the metadata.
+
+An example event context type:
+
+ struct event_context {
+ uint pid;
+ uint16_t payload_size;
+ };
+
+
+6.3 Event Payload
+
+An event payload contains fields specific to a given event type. The fields
+belonging to an event type are described in the event-specific metadata
+within a structure type.
+
+6.3.1 Padding
+
+No padding at the end of the event payload. This differs from the ISO/C standard
+for structures, but follows the CTF standard for structures. In a trace, even
+though it makes sense to align the beginning of a structure, it really makes no
+sense to add padding at the end of the structure, because structures are usually
+not followed by a structure of the same type.
+
+This trick can be done by adding a zero-length "end" field at the end of the C
+structures, and by using the offset of this field rather than using sizeof()
+when calculating the size of a structure (see Appendix "A. Helper macros").
+
+6.3.2 Alignment
+
+The event payload is aligned on the largest alignment required by types
+contained within the payload. (This follows the ISO/C standard for structures)
+
+
+
+7. Metadata
+
+The meta-data is located in a stream named "metadata". It is made of "event
+packets", which each start with an event packet header. The event type within
+the metadata stream have no event header nor event context. Each event only
+contains a null-terminated "string" payload, which is a metadata description
+entry. The events are packed one next to another. Each event packet start with
+an event packet header, which contains, amongst other fields, the magic number
+and trace UUID.
+
+The metadata can be parsed by reading through the metadata strings, skipping
+newlines and null-characters. Type names may contain spaces.
+
+trace {
+ major = value; /* Trace format version */
+ minor = value;
+ uuid = value; /* Trace UUID */
+ word_size = value;
+};
+
+stream {
+ id = stream_id;
+ event {
+ /* Type 1 - Few event IDs; Type 2 - Many event IDs. See section 6.1. */
+ header_type = event_header_1 OR event_header_2;
+ /*
+ * Extended event header type. Only present if specified in event header
+ * on a per-event basis.
+ */
+ header_type_ext = event_header_1_ext OR event_header_2_ext;
+ context_type = struct {
+ ...
+ };
+ };
+ packet {
+ context_type = struct {
+ ...
+ };
+ };
+};
+
+event {
+ name = eventname;
+ id = value; /* Numeric identifier within the stream */
+ stream = stream_id;
+ fields = struct {
+ ...
+ };
+};
+
+/* More detail on types in section 4. Types */
+
+/* Named types */
+typedef some existing type new_type;
+
+typedef type_class {
+ ...
+} new_type;
+
+struct name {
+ ...
+};
+
+enum name {
+ ...
+};
+
+/* Unnamed types, contained within compound type fields or type assignments. */
+struct {
+ ...
+};
+
+enum {
+ ...
+};
+
+array {
+ ...
+};
+
+sequence {
+ ...
+};
+
+A. Helper macros
+
+The two following macros keep track of the size of a GNU/C structure without
+padding at the end by placing HEADER_END as the last field. A one byte end field
+is used for C90 compatibility (C99 flexible arrays could be used here). Note
+that this does not affect the effective structure size, which should always be
+calculated with the header_sizeof() helper.
+
+#define HEADER_END char end_field
+#define header_sizeof(type) offsetof(typeof(type), end_field)
+
+
+B. Stream Header Rationale
+
+An event stream is divided in contiguous event packets of variable size. These
+subdivisions allow the trace analyzer to perform a fast binary search by time
+within the stream (typically requiring to index only the event packet headers)
+without reading the whole stream. These subdivisions have a variable size to
+eliminate the need to transfer the event packet padding when partially filled
+event packets must be sent when streaming a trace for live viewing/analysis.
+An event packet can contain a certain amount of padding at the end. Dividing
+streams into event packets is also useful for network streaming over UDP and
+flight recorder mode tracing (a whole event packet can be swapped out of the
+buffer atomically for reading).
+
+The stream header is repeated at the beginning of each event packet to allow
+flexibility in terms of:
+
+ - streaming support,
+ - allowing arbitrary buffers to be discarded without making the trace
+ unreadable,
+ - allow UDP packet loss handling by either dealing with missing event packet
+ or asking for re-transmission.
+ - transparently support flight recorder mode,
+ - transparently support crash dump.
+
+The event stream header will therefore be referred to as the "event packet
+header" throughout the rest of this document.