+++ /dev/null
-
-RFC: Common Trace Format Proposal for Linux (v1.6)
-
-Mathieu Desnoyers, EfficiOS Inc.
-
-The goal of the present document is to propose a trace format that suits the
-needs of the embedded, telecom, high-performance and kernel communities. It is
-based on the Common Trace Format Requirements (v1.4) document. It is designed to
-allow tracing that is natively generated by the Linux kernel and Linux
-user-space applications written in C/C++.
-
-A reference implementation of a library to read and write this trace format is
-being implemented within the BabelTrace project, a converter between trace
-formats. The development tree is available at:
-
- git tree: git://git.efficios.com/babeltrace.git
- gitweb: http://git.efficios.com/?p=babeltrace.git
-
-
-1. Preliminary definitions
-
- - Event Trace: An ordered sequence of events.
- - Event Stream: An ordered sequence of events, containing a subset of the
- trace event types.
- - Event Packet: A sequence of physically contiguous events within an event
- stream.
- - Event: This is the basic entry in a trace. (aka: a trace record).
- - An event identifier (ID) relates to the class (a type) of event within
- an event stream.
- e.g. event: irq_entry.
- - An event (or event record) relates to a specific instance of an event
- class.
- e.g. event: irq_entry, at time X, on CPU Y
- - Source Architecture: Architecture writing the trace.
- - Reader Architecture: Architecture reading the trace.
-
-
-2. High-level representation of a trace
-
-A trace is divided into multiple event streams. Each event stream contains a
-subset of the trace event types.
-
-The final output of the trace, after its generation and optional transport over
-the network, is expected to be either on permanent or temporary storage in a
-virtual file system. Because each event stream is appended to while a trace is
-being recorded, each is associated with a separate file for output. Therefore,
-a stored trace can be represented as a directory containing one file per stream.
-
-A metadata event stream contains information on trace event types. It describes:
-
-- Trace version.
-- Types available.
-- Per-stream event header description.
-- Per-stream event header selection.
-- Per-stream event context fields.
-- Per-event
- - Event type to stream mapping.
- - Event type to name mapping.
- - Event type to ID mapping.
- - Event fields description.
-
-
-3. Event stream
-
-An event stream is divided in contiguous event packets of variable size. These
-subdivisions have a variable size. An event packet can contain a certain amount
-of padding at the end. The rationale for the event stream design choices is
-explained in Appendix B. Stream Header Rationale.
-
-An event stream is divided in contiguous event packets of variable size. These
-subdivisions have a variable size. An event packet can contain a certain amount
-of padding at the end. The stream header is repeated at the beginning of each
-event packet.
-
-The event stream header will therefore be referred to as the "event packet
-header" throughout the rest of this document.
-
-
-4. Types
-
-4.1 Basic types
-
-A basic type is a scalar type, as described in this section.
-
-4.1.1 Type inheritance
-
-Type specifications can be inherited to allow deriving types from a
-type class. For example, see the uint32_t named type derived from the "integer"
-type class below ("Integers" section). Types have a precise binary
-representation in the trace. A type class has methods to read and write these
-types, but must be derived into a type to be usable in an event field.
-
-4.1.2 Alignment
-
-We define "byte-packed" types as aligned on the byte size, namely 8-bit.
-We define "bit-packed" types as following on the next bit, as defined by the
-"bitfields" section.
-
-All basic types, except bitfields, are either aligned on an architecture-defined
-specific alignment or byte-packed, depending on the architecture preference.
-Architectures providing fast unaligned write byte-packed basic types to save
-space, aligning each type on byte boundaries (8-bit). Architectures with slow
-unaligned writes align types on specific alignment values. If no specific
-alignment is declared for a type nor its parents, it is assumed to be bit-packed
-for bitfields and byte-packed for other types.
-
-Metadata attribute representation of a specific alignment:
-
- align = value; /* value in bits */
-
-4.1.3 Byte order
-
-By default, the native endianness of the source architecture the trace is used.
-Byte order can be overridden for a basic type by specifying a "byte_order"
-attribute. Typical use-case is to specify the network byte order (big endian:
-"be") to save data captured from the network into the trace without conversion.
-If not specified, the byte order is native.
-
-Metadata representation:
-
- byte_order = native OR network OR be OR le; /* network and be are aliases */
-
-4.1.4 Size
-
-Type size, in bits, for integers and floats is that returned by "sizeof()" in C
-multiplied by CHAR_BIT.
-We require the size of "char" and "unsigned char" types (CHAR_BIT) to be fixed
-to 8 bits for cross-endianness compatibility.
-
-Metadata representation:
-
- size = value; (value is in bits)
-
-4.1.5 Integers
-
-Signed integers are represented in two-complement. Integer alignment, size,
-signedness and byte ordering are defined in the metadata. Integers aligned on
-byte size (8-bit) and with length multiple of byte size (8-bit) correspond to
-the C99 standard integers. In addition, integers with alignment and/or size that
-are _not_ a multiple of the byte size are permitted; these correspond to the C99
-standard bitfields, with the added specification that the CTF integer bitfields
-have a fixed binary representation. A MIT-licensed reference implementation of
-the CTF portable bitfields is available at:
-
- http://git.efficios.com/?p=babeltrace.git;a=blob;f=include/babeltrace/bitfield.h
-
-Binary representation of integers:
-
-- On little and big endian:
- - Within a byte, high bits correspond to an integer high bits, and low bits
- correspond to low bits.
-- On little endian:
- - Integer across multiple bytes are placed from the less significant to the
- most significant.
- - Consecutive integers are placed from lower bits to higher bits (even within
- a byte).
-- On big endian:
- - Integer across multiple bytes are placed from the most significant to the
- less significant.
- - Consecutive integers are placed from higher bits to lower bits (even within
- a byte).
-
-This binary representation is derived from the bitfield implementation in GCC
-for little and big endian. However, contrary to what GCC does, integers can
-cross units boundaries (no padding is required). Padding can be explicitely
-added (see 4.1.6 GNU/C bitfields) to follow the GCC layout if needed.
-
-Metadata representation:
-
- integer {
- signed = true OR false; /* default false */
- byte_order = native OR network OR be OR le; /* default native */
- size = value; /* value in bits, no default */
- align = value; /* value in bits */
- };
-
-Example of type inheritance (creation of a uint32_t named type):
-
-typedef integer {
- size = 32;
- signed = false;
- align = 32;
-} uint32_t;
-
-Definition of a named 5-bit signed bitfield:
-
-typedef integer {
- size = 5;
- signed = true;
- align = 1;
-} int5_t;
-
-4.1.6 GNU/C bitfields
-
-The GNU/C bitfields follow closely the integer representation, with a
-particularity on alignment: if a bitfield cannot fit in the current unit, the
-unit is padded and the bitfield starts at the following unit. The unit size is
-defined by the size of the type "unit_type".
-
-Metadata representation. Either:
-
-gcc_bitfield {
- unit_type = integer {
- ...
- };
- size = value;
-};
-
-Or bitfield within structures as specified by the C standard
-
- unit_type name:size:
-
-As an example, the following structure declared in C compiled by GCC:
-
-struct example {
- short a:12;
- short b:5;
-};
-
-is equivalent to the following structure declaration, aligned on the largest
-element (short). The second bitfield would be aligned on the next unit boundary,
-because it would not fit in the current unit. The two declarations (C
-declaration above or CTF declaration with "type gcc_bitfield") are strictly
-equivalent.
-
-struct example {
- gcc_bitfield {
- unit_type = short;
- size = 12;
- } a;
- gcc_bitfield {
- unit_type = short;
- size = 5;
- } b;
-};
-
-4.1.7 Floating point
-
-The floating point values byte ordering is defined in the metadata.
-
-Floating point values follow the IEEE 754-2008 standard interchange formats.
-Description of the floating point values include the exponent and mantissa size
-in bits. Some requirements are imposed on the floating point values:
-
-- FLT_RADIX must be 2.
-- mant_dig is the number of digits represented in the mantissa. It is specified
- by the ISO C99 standard, section 5.2.4, as FLT_MANT_DIG, DBL_MANT_DIG and
- LDBL_MANT_DIG as defined by <float.h>.
-- exp_dig is the number of digits represented in the exponent. Given that
- mant_dig is one bit more than its actual size in bits (leading 1 is not
- needed) and also given that the sign bit always takes one bit, exp_dig can be
- specified as:
-
- - sizeof(float) * CHAR_BIT - FLT_MANT_DIG
- - sizeof(double) * CHAR_BIT - DBL_MANT_DIG
- - sizeof(long double) * CHAR_BIT - LDBL_MANT_DIG
-
-Metadata representation:
-
-floating_point {
- exp_dig = value;
- mant_dig = value;
- byte_order = native OR network OR be OR le;
-};
-
-Example of type inheritance:
-
-typedef floating_point {
- exp_dig = 8; /* sizeof(float) * CHAR_BIT - FLT_MANT_DIG */
- mant_dig = 24; /* FLT_MANT_DIG */
- byte_order = native;
-} float;
-
-TODO: define NaN, +inf, -inf behavior.
-
-4.1.8 Enumerations
-
-Enumerations are a mapping between an integer type and a table of strings. The
-numerical representation of the enumeration follows the integer type specified
-by the metadata. The enumeration mapping table is detailed in the enumeration
-description within the metadata. The mapping table maps inclusive value ranges
-(or single values) to strings. Instead of being limited to simple
-"value -> string" mappings, these enumerations map
-"[ start_value ... end_value ] -> string", which map inclusive ranges of
-values to strings. An enumeration from the C language can be represented in
-this format by having the same start_value and end_value for each element, which
-is in fact a range of size 1. This single-value range is supported without
-repeating the start and end values with the value = string declaration. If the
-<integer_type> is omitted, the type chosen by the C compiler to hold the
-enumeration is used. The <integer_type> specifier can only be omitted for
-enumerations containing only simple "value -> string" mappings (compatible with
-C).
-
-enum <integer_type> name {
- string = start_value1 ... end_value1,
- "other string" = start_value2 ... end_value2,
- yet_another_string, /* will be assigned to end_value2 + 1 */
- "some other string" = value,
- ...
-};
-
-If the values are omitted, the enumeration starts at 0 and increment of 1 for
-each entry:
-
-enum {
- ZERO,
- ONE,
- TWO,
- TEN = 10,
- ELEVEN,
-};
-
-Overlapping ranges within a single enumeration are implementation defined.
-
-4.2 Compound types
-
-4.2.1 Structures
-
-Structures are aligned on the largest alignment required by basic types
-contained within the structure. (This follows the ISO/C standard for structures)
-
-Metadata representation of a named structure:
-
-struct name {
- field_type field_name;
- field_type field_name;
- ...
-};
-
-Example:
-
-struct example {
- integer { /* Nameless type */
- size = 16;
- signed = true;
- align = 16;
- } first_field_name;
- uint64_t second_field_name; /* Named type declared in the metadata */
-};
-
-The fields are placed in a sequence next to each other. They each possess a
-field name, which is a unique identifier within the structure.
-
-A nameless structure can be declared as a field type:
-
-struct {
- ...
-} field_name;
-
-4.2.2 Arrays
-
-Arrays are fixed-length. Their length is declared in the type declaration within
-the metadata. They contain an array of "inner type" elements, which can refer to
-any type not containing the type of the array being declared (no circular
-dependency). The length is the number of elements in an array.
-
-Metadata representation of a named array, either:
-
-typedef array {
- length = value;
- elem_type = type;
-} name;
-
-or:
-
-typedef elem_type name[length];
-
-E.g.:
-
-typedef array {
- length = 10;
- elem_type = uint32_t;
-} example;
-
-A nameless array can be declared as a field type, e.g.:
-
-array {
- length = 5;
- elem_type = uint8_t;
-} field_name;
-
-or
-
-uint8_t field_name[10];
-
-
-4.2.3 Sequences
-
-Sequences are dynamically-sized arrays. They start with an integer that specify
-the length of the sequence, followed by an array of "inner type" elements.
-The length is the number of elements in the sequence.
-
-Metadata representation for a named sequence, either:
-
-typedef sequence {
- length_type = type; /* integer class */
- elem_type = type;
-} name;
-
-or:
-
-typedef elem_type name[length_type];
-
-A nameless sequence can be declared as a field type, e.g.:
-
-sequence {
- length_type = int;
- elem_type = long;
-} field_name;
-
-or
-
-long field_name[int];
-
-The length type follows the integer types specifications, and the sequence
-elements follow the "array" specifications.
-
-4.2.4 Strings
-
-Strings are an array of bytes of variable size and are terminated by a '\0'
-"NULL" character. Their encoding is described in the metadata. In absence of
-encoding attribute information, the default encoding is UTF-8.
-
-Metadata representation of a named string type:
-
-typedef string {
- encoding = UTF8 OR ASCII;
-} name;
-
-A nameless string type can be declared as a field type:
-
-string field_name; /* Use default UTF8 encoding */
-
-5. Event Packet Header
-
-The event packet header consists of two part: one is mandatory and have a fixed
-layout. The second part, the "event packet context", has its layout described in
-the metadata.
-
-- Aligned on page size. Fixed size. Fields either aligned or packed (depending
- on the architecture preference).
- No padding at the end of the event packet header. Native architecture byte
- ordering.
-
-Fixed layout (event packet header):
-
-- Magic number (CTF magic numbers: 0xC1FC1FC1 and its reverse endianness
- representation: 0xC11FFCC1) It needs to have a non-symmetric bytewise
- representation. Used to distinguish between big and little endian traces (this
- information is determined by knowing the endianness of the architecture
- reading the trace and comparing the magic number against its value and the
- reverse, 0xC11FFCC1). This magic number specifies that we use the CTF metadata
- description language described in this document. Different magic numbers
- should be used for other metadata description languages.
-- Trace UUID, used to ensure the event packet match the metadata used.
- (note: we cannot use a metadata checksum because metadata can be appended to
- while tracing is active)
-- Stream ID, used as reference to stream description in metadata.
-
-Metadata-defined layout (event packet context):
-
-- Event packet content size (in bytes).
-- Event packet size (in bytes, includes padding).
-- Event packet content checksum (optional). Checksum excludes the event packet
- header.
-- Per-stream event packet sequence count (to deal with UDP packet loss). The
- number of significant sequence counter bits should also be present, so
- wrap-arounds are deal with correctly.
-- Timestamp at the beginning and timestamp at the end of the event packet.
- Both timestamps are written in the packet header, but sampled respectively
- while (or before) writing the first event and while (or after) writing the
- last event in the packet. The inclusive range between these timestamps should
- include all event timestamps assigned to events contained within the packet.
-- Events discarded count
- - Snapshot of a per-stream free-running counter, counting the number of
- events discarded that were supposed to be written in the stream prior to
- the first event in the event packet.
- * Note: producer-consumer buffer full condition should fill the current
- event packet with padding so we know exactly where events have been
- discarded.
-- Lossless compression scheme used for the event packet content. Applied
- directly to raw data. New types of compression can be added in following
- versions of the format.
- 0: no compression scheme
- 1: bzip2
- 2: gzip
- 3: xz
-- Cypher used for the event packet content. Applied after compression.
- 0: no encryption
- 1: AES
-- Checksum scheme used for the event packet content. Applied after encryption.
- 0: no checksum
- 1: md5
- 2: sha1
- 3: crc32
-
-5.1 Event Packet Header Fixed Layout Description
-
-struct event_packet_header {
- uint32_t magic;
- uint8_t trace_uuid[16];
- uint32_t stream_id;
-};
-
-5.2 Event Packet Context Description
-
-Event packet context example. These are declared within the stream declaration
-in the metadata. All these fields are optional except for "content_size" and
-"packet_size", which must be present in the context.
-
-An example event packet context type:
-
-struct event_packet_context {
- uint64_t timestamp_begin;
- uint64_t timestamp_end;
- uint32_t checksum;
- uint32_t stream_packet_count;
- uint32_t events_discarded;
- uint32_t cpu_id;
- uint32_t/uint16_t content_size;
- uint32_t/uint16_t packet_size;
- uint8_t stream_packet_count_bits; /* Significant counter bits */
- uint8_t compression_scheme;
- uint8_t encryption_scheme;
- uint8_t checksum;
-};
-
-6. Event Structure
-
-The overall structure of an event is:
-
- - Event Header (as specifed by the stream metadata)
- - Extended Event Header (as specified by the event header)
- - Event Context (as specified by the stream metadata)
- - Event Payload (as specified by the event metadata)
-
-
-6.1 Event Header
-
-One major factor can vary between streams: the number of event IDs assigned to
-a stream. Luckily, this information tends to stay relatively constant (modulo
-event registration while trace is being recorded), so we can specify different
-representations for streams containing few event IDs and streams containing
-many event IDs, so we end up representing the event ID and timestamp as densely
-as possible in each case.
-
-We therefore provide two types of events headers. Type 1 accommodates streams
-with less than 31 event IDs. Type 2 accommodates streams with 31 or more event
-IDs.
-
-The "extended headers" are used in the rare occasions where the information
-cannot be represented in the ranges available in the event header. They are also
-used in the rare occasions where the data required for a field could not be
-collected: the flag corresponding to the missing field within the missing_fields
-array is then set to 1.
-
-Types uintX_t represent an X-bit unsigned integer.
-
-
-6.1.1 Type 1 - Few event IDs
-
- - Aligned on 32-bit (or 8-bit if byte-packed, depending on the architecture
- preference).
- - Fixed size: 32 bits.
- - Native architecture byte ordering.
-
-struct event_header_1 {
- uint5_t id; /*
- * id: range: 0 - 30.
- * id 31 is reserved to indicate a following
- * extended header.
- */
- uint27_t timestamp;
-};
-
-The end of a type 1 header is aligned on a 32-bit boundary (or packed).
-
-
-6.1.2 Extended Type 1 Event Header
-
- - Follows struct event_header_1, which is aligned on 32-bit, so no need to
- realign.
- - Variable size (depends on the number of fields per event).
- - Native architecture byte ordering.
- - NR_FIELDS is the number of fields within the event.
-
-struct event_header_1_ext {
- uint32_t id; /* 32-bit event IDs */
- uint64_t timestamp; /* 64-bit timestamps */
- uint1_t missing_fields[NR_FIELDS]; /* missing event fields bitmap */
-};
-
-
-6.1.3 Type 2 - Many event IDs
-
- - Aligned on 32-bit (or 8-bit if byte-packed, depending on the architecture
- preference).
- - Fixed size: 48 bits.
- - Native architecture byte ordering.
-
-struct event_header_2 {
- uint32_t timestamp;
- uint16_t id; /*
- * id: range: 0 - 65534.
- * id 65535 is reserved to indicate a following
- * extended header.
- */
-};
-
-The end of a type 2 header is aligned on a 16-bit boundary (or 8-bit if
-byte-packed).
-
-
-6.1.4 Extended Type 2 Event Header
-
- - Follows struct event_header_2, which alignment end on a 16-bit boundary, so
- we need to align on 64-bit integer architecture alignment (or 8-bit if
- byte-packed).
- - Variable size (depends on the number of fields per event).
- - Native architecture byte ordering.
- - NR_FIELDS is the number of fields within the event.
-
-struct event_header_2_ext {
- uint64_t timestamp; /* 64-bit timestamps */
- uint32_t id; /* 32-bit event IDs */
- uint1_t missing_fields[NR_FIELDS]; /* missing event fields bitmap */
-};
-
-
-6.2 Event Context
-
-The event context contains information relative to the current event. The choice
-and meaning of this information is specified by the metadata "stream"
-information. For this trace format, event context is usually empty, except when
-the metadata "stream" information specifies otherwise by declaring a non-empty
-structure for the event context. An example of event context is to save the
-event payload size with each event, or to save the current PID with each event.
-These are declared within the stream declaration within the metadata.
-
-An example event context type:
-
- struct event_context {
- uint pid;
- uint16_t payload_size;
- };
-
-
-6.3 Event Payload
-
-An event payload contains fields specific to a given event type. The fields
-belonging to an event type are described in the event-specific metadata
-within a structure type.
-
-6.3.1 Padding
-
-No padding at the end of the event payload. This differs from the ISO/C standard
-for structures, but follows the CTF standard for structures. In a trace, even
-though it makes sense to align the beginning of a structure, it really makes no
-sense to add padding at the end of the structure, because structures are usually
-not followed by a structure of the same type.
-
-This trick can be done by adding a zero-length "end" field at the end of the C
-structures, and by using the offset of this field rather than using sizeof()
-when calculating the size of a structure (see Appendix "A. Helper macros").
-
-6.3.2 Alignment
-
-The event payload is aligned on the largest alignment required by types
-contained within the payload. (This follows the ISO/C standard for structures)
-
-
-
-7. Metadata
-
-The meta-data is located in a stream named "metadata". It is made of "event
-packets", which each start with an event packet header. The event type within
-the metadata stream have no event header nor event context. Each event only
-contains a null-terminated "string" payload, which is a metadata description
-entry. The events are packed one next to another. Each event packet start with
-an event packet header, which contains, amongst other fields, the magic number
-and trace UUID.
-
-The metadata can be parsed by reading through the metadata strings, skipping
-newlines and null-characters. Type names may contain spaces.
-
-trace {
- major = value; /* Trace format version */
- minor = value;
- uuid = value; /* Trace UUID */
- word_size = value;
-};
-
-stream {
- id = stream_id;
- event {
- /* Type 1 - Few event IDs; Type 2 - Many event IDs. See section 6.1. */
- header_type = event_header_1 OR event_header_2;
- /*
- * Extended event header type. Only present if specified in event header
- * on a per-event basis.
- */
- header_type_ext = event_header_1_ext OR event_header_2_ext;
- context_type = struct {
- ...
- };
- };
- packet {
- context_type = struct {
- ...
- };
- };
-};
-
-event {
- name = eventname;
- id = value; /* Numeric identifier within the stream */
- stream = stream_id;
- fields = struct {
- ...
- };
-};
-
-/* More detail on types in section 4. Types */
-
-/* Named types */
-typedef some existing type new_type;
-
-typedef type_class {
- ...
-} new_type;
-
-struct name {
- ...
-};
-
-enum name {
- ...
-};
-
-/* Unnamed types, contained within compound type fields or type assignments. */
-struct {
- ...
-};
-
-enum {
- ...
-};
-
-array {
- ...
-};
-
-sequence {
- ...
-};
-
-A. Helper macros
-
-The two following macros keep track of the size of a GNU/C structure without
-padding at the end by placing HEADER_END as the last field. A one byte end field
-is used for C90 compatibility (C99 flexible arrays could be used here). Note
-that this does not affect the effective structure size, which should always be
-calculated with the header_sizeof() helper.
-
-#define HEADER_END char end_field
-#define header_sizeof(type) offsetof(typeof(type), end_field)
-
-
-B. Stream Header Rationale
-
-An event stream is divided in contiguous event packets of variable size. These
-subdivisions allow the trace analyzer to perform a fast binary search by time
-within the stream (typically requiring to index only the event packet headers)
-without reading the whole stream. These subdivisions have a variable size to
-eliminate the need to transfer the event packet padding when partially filled
-event packets must be sent when streaming a trace for live viewing/analysis.
-An event packet can contain a certain amount of padding at the end. Dividing
-streams into event packets is also useful for network streaming over UDP and
-flight recorder mode tracing (a whole event packet can be swapped out of the
-buffer atomically for reading).
-
-The stream header is repeated at the beginning of each event packet to allow
-flexibility in terms of:
-
- - streaming support,
- - allowing arbitrary buffers to be discarded without making the trace
- unreadable,
- - allow UDP packet loss handling by either dealing with missing event packet
- or asking for re-transmission.
- - transparently support flight recorder mode,
- - transparently support crash dump.
-
-The event stream header will therefore be referred to as the "event packet
-header" throughout the rest of this document.
--- /dev/null
+
+RFC: Common Trace Format Proposal (v1.6)
+
+Mathieu Desnoyers, EfficiOS Inc.
+
+The goal of the present document is to propose a trace format that suits the
+needs of the embedded, telecom, high-performance and kernel communities. It is
+based on the Common Trace Format Requirements (v1.4) document. It is designed to
+allow traces to be natively generated by the Linux kernel, Linux user-space
+applications written in C/C++, and hardware components.
+
+The latest version of this document can be found at:
+
+ git tree: git://git.efficios.com/ctf.git
+ gitweb: http://git.efficios.com/?p=ctf.git
+
+A reference implementation of a library to read and write this trace format is
+being implemented within the BabelTrace project, a converter between trace
+formats. The development tree is available at:
+
+ git tree: git://git.efficios.com/babeltrace.git
+ gitweb: http://git.efficios.com/?p=babeltrace.git
+
+
+1. Preliminary definitions
+
+ - Event Trace: An ordered sequence of events.
+ - Event Stream: An ordered sequence of events, containing a subset of the
+ trace event types.
+ - Event Packet: A sequence of physically contiguous events within an event
+ stream.
+ - Event: This is the basic entry in a trace. (aka: a trace record).
+ - An event identifier (ID) relates to the class (a type) of event within
+ an event stream.
+ e.g. event: irq_entry.
+ - An event (or event record) relates to a specific instance of an event
+ class.
+ e.g. event: irq_entry, at time X, on CPU Y
+ - Source Architecture: Architecture writing the trace.
+ - Reader Architecture: Architecture reading the trace.
+
+
+2. High-level representation of a trace
+
+A trace is divided into multiple event streams. Each event stream contains a
+subset of the trace event types.
+
+The final output of the trace, after its generation and optional transport over
+the network, is expected to be either on permanent or temporary storage in a
+virtual file system. Because each event stream is appended to while a trace is
+being recorded, each is associated with a separate file for output. Therefore,
+a stored trace can be represented as a directory containing one file per stream.
+
+A metadata event stream contains information on trace event types. It describes:
+
+- Trace version.
+- Types available.
+- Per-stream event header description.
+- Per-stream event header selection.
+- Per-stream event context fields.
+- Per-event
+ - Event type to stream mapping.
+ - Event type to name mapping.
+ - Event type to ID mapping.
+ - Event fields description.
+
+
+3. Event stream
+
+An event stream is divided in contiguous event packets of variable size. These
+subdivisions have a variable size. An event packet can contain a certain amount
+of padding at the end. The rationale for the event stream design choices is
+explained in Appendix B. Stream Header Rationale.
+
+An event stream is divided in contiguous event packets of variable size. These
+subdivisions have a variable size. An event packet can contain a certain amount
+of padding at the end. The stream header is repeated at the beginning of each
+event packet.
+
+The event stream header will therefore be referred to as the "event packet
+header" throughout the rest of this document.
+
+
+4. Types
+
+4.1 Basic types
+
+A basic type is a scalar type, as described in this section.
+
+4.1.1 Type inheritance
+
+Type specifications can be inherited to allow deriving types from a
+type class. For example, see the uint32_t named type derived from the "integer"
+type class below ("Integers" section). Types have a precise binary
+representation in the trace. A type class has methods to read and write these
+types, but must be derived into a type to be usable in an event field.
+
+4.1.2 Alignment
+
+We define "byte-packed" types as aligned on the byte size, namely 8-bit.
+We define "bit-packed" types as following on the next bit, as defined by the
+"bitfields" section.
+
+All basic types, except bitfields, are either aligned on an architecture-defined
+specific alignment or byte-packed, depending on the architecture preference.
+Architectures providing fast unaligned write byte-packed basic types to save
+space, aligning each type on byte boundaries (8-bit). Architectures with slow
+unaligned writes align types on specific alignment values. If no specific
+alignment is declared for a type nor its parents, it is assumed to be bit-packed
+for bitfields and byte-packed for other types.
+
+Metadata attribute representation of a specific alignment:
+
+ align = value; /* value in bits */
+
+4.1.3 Byte order
+
+By default, the native endianness of the source architecture the trace is used.
+Byte order can be overridden for a basic type by specifying a "byte_order"
+attribute. Typical use-case is to specify the network byte order (big endian:
+"be") to save data captured from the network into the trace without conversion.
+If not specified, the byte order is native.
+
+Metadata representation:
+
+ byte_order = native OR network OR be OR le; /* network and be are aliases */
+
+4.1.4 Size
+
+Type size, in bits, for integers and floats is that returned by "sizeof()" in C
+multiplied by CHAR_BIT.
+We require the size of "char" and "unsigned char" types (CHAR_BIT) to be fixed
+to 8 bits for cross-endianness compatibility.
+
+Metadata representation:
+
+ size = value; (value is in bits)
+
+4.1.5 Integers
+
+Signed integers are represented in two-complement. Integer alignment, size,
+signedness and byte ordering are defined in the metadata. Integers aligned on
+byte size (8-bit) and with length multiple of byte size (8-bit) correspond to
+the C99 standard integers. In addition, integers with alignment and/or size that
+are _not_ a multiple of the byte size are permitted; these correspond to the C99
+standard bitfields, with the added specification that the CTF integer bitfields
+have a fixed binary representation. A MIT-licensed reference implementation of
+the CTF portable bitfields is available at:
+
+ http://git.efficios.com/?p=babeltrace.git;a=blob;f=include/babeltrace/bitfield.h
+
+Binary representation of integers:
+
+- On little and big endian:
+ - Within a byte, high bits correspond to an integer high bits, and low bits
+ correspond to low bits.
+- On little endian:
+ - Integer across multiple bytes are placed from the less significant to the
+ most significant.
+ - Consecutive integers are placed from lower bits to higher bits (even within
+ a byte).
+- On big endian:
+ - Integer across multiple bytes are placed from the most significant to the
+ less significant.
+ - Consecutive integers are placed from higher bits to lower bits (even within
+ a byte).
+
+This binary representation is derived from the bitfield implementation in GCC
+for little and big endian. However, contrary to what GCC does, integers can
+cross units boundaries (no padding is required). Padding can be explicitely
+added (see 4.1.6 GNU/C bitfields) to follow the GCC layout if needed.
+
+Metadata representation:
+
+ integer {
+ signed = true OR false; /* default false */
+ byte_order = native OR network OR be OR le; /* default native */
+ size = value; /* value in bits, no default */
+ align = value; /* value in bits */
+ };
+
+Example of type inheritance (creation of a uint32_t named type):
+
+typedef integer {
+ size = 32;
+ signed = false;
+ align = 32;
+} uint32_t;
+
+Definition of a named 5-bit signed bitfield:
+
+typedef integer {
+ size = 5;
+ signed = true;
+ align = 1;
+} int5_t;
+
+4.1.6 GNU/C bitfields
+
+The GNU/C bitfields follow closely the integer representation, with a
+particularity on alignment: if a bitfield cannot fit in the current unit, the
+unit is padded and the bitfield starts at the following unit. The unit size is
+defined by the size of the type "unit_type".
+
+Metadata representation. Either:
+
+gcc_bitfield {
+ unit_type = integer {
+ ...
+ };
+ size = value;
+};
+
+Or bitfield within structures as specified by the C standard
+
+ unit_type name:size:
+
+As an example, the following structure declared in C compiled by GCC:
+
+struct example {
+ short a:12;
+ short b:5;
+};
+
+is equivalent to the following structure declaration, aligned on the largest
+element (short). The second bitfield would be aligned on the next unit boundary,
+because it would not fit in the current unit. The two declarations (C
+declaration above or CTF declaration with "type gcc_bitfield") are strictly
+equivalent.
+
+struct example {
+ gcc_bitfield {
+ unit_type = short;
+ size = 12;
+ } a;
+ gcc_bitfield {
+ unit_type = short;
+ size = 5;
+ } b;
+};
+
+4.1.7 Floating point
+
+The floating point values byte ordering is defined in the metadata.
+
+Floating point values follow the IEEE 754-2008 standard interchange formats.
+Description of the floating point values include the exponent and mantissa size
+in bits. Some requirements are imposed on the floating point values:
+
+- FLT_RADIX must be 2.
+- mant_dig is the number of digits represented in the mantissa. It is specified
+ by the ISO C99 standard, section 5.2.4, as FLT_MANT_DIG, DBL_MANT_DIG and
+ LDBL_MANT_DIG as defined by <float.h>.
+- exp_dig is the number of digits represented in the exponent. Given that
+ mant_dig is one bit more than its actual size in bits (leading 1 is not
+ needed) and also given that the sign bit always takes one bit, exp_dig can be
+ specified as:
+
+ - sizeof(float) * CHAR_BIT - FLT_MANT_DIG
+ - sizeof(double) * CHAR_BIT - DBL_MANT_DIG
+ - sizeof(long double) * CHAR_BIT - LDBL_MANT_DIG
+
+Metadata representation:
+
+floating_point {
+ exp_dig = value;
+ mant_dig = value;
+ byte_order = native OR network OR be OR le;
+};
+
+Example of type inheritance:
+
+typedef floating_point {
+ exp_dig = 8; /* sizeof(float) * CHAR_BIT - FLT_MANT_DIG */
+ mant_dig = 24; /* FLT_MANT_DIG */
+ byte_order = native;
+} float;
+
+TODO: define NaN, +inf, -inf behavior.
+
+4.1.8 Enumerations
+
+Enumerations are a mapping between an integer type and a table of strings. The
+numerical representation of the enumeration follows the integer type specified
+by the metadata. The enumeration mapping table is detailed in the enumeration
+description within the metadata. The mapping table maps inclusive value ranges
+(or single values) to strings. Instead of being limited to simple
+"value -> string" mappings, these enumerations map
+"[ start_value ... end_value ] -> string", which map inclusive ranges of
+values to strings. An enumeration from the C language can be represented in
+this format by having the same start_value and end_value for each element, which
+is in fact a range of size 1. This single-value range is supported without
+repeating the start and end values with the value = string declaration. If the
+<integer_type> is omitted, the type chosen by the C compiler to hold the
+enumeration is used. The <integer_type> specifier can only be omitted for
+enumerations containing only simple "value -> string" mappings (compatible with
+C).
+
+enum <integer_type> name {
+ string = start_value1 ... end_value1,
+ "other string" = start_value2 ... end_value2,
+ yet_another_string, /* will be assigned to end_value2 + 1 */
+ "some other string" = value,
+ ...
+};
+
+If the values are omitted, the enumeration starts at 0 and increment of 1 for
+each entry:
+
+enum {
+ ZERO,
+ ONE,
+ TWO,
+ TEN = 10,
+ ELEVEN,
+};
+
+Overlapping ranges within a single enumeration are implementation defined.
+
+4.2 Compound types
+
+4.2.1 Structures
+
+Structures are aligned on the largest alignment required by basic types
+contained within the structure. (This follows the ISO/C standard for structures)
+
+Metadata representation of a named structure:
+
+struct name {
+ field_type field_name;
+ field_type field_name;
+ ...
+};
+
+Example:
+
+struct example {
+ integer { /* Nameless type */
+ size = 16;
+ signed = true;
+ align = 16;
+ } first_field_name;
+ uint64_t second_field_name; /* Named type declared in the metadata */
+};
+
+The fields are placed in a sequence next to each other. They each possess a
+field name, which is a unique identifier within the structure.
+
+A nameless structure can be declared as a field type:
+
+struct {
+ ...
+} field_name;
+
+4.2.2 Arrays
+
+Arrays are fixed-length. Their length is declared in the type declaration within
+the metadata. They contain an array of "inner type" elements, which can refer to
+any type not containing the type of the array being declared (no circular
+dependency). The length is the number of elements in an array.
+
+Metadata representation of a named array, either:
+
+typedef array {
+ length = value;
+ elem_type = type;
+} name;
+
+or:
+
+typedef elem_type name[length];
+
+E.g.:
+
+typedef array {
+ length = 10;
+ elem_type = uint32_t;
+} example;
+
+A nameless array can be declared as a field type, e.g.:
+
+array {
+ length = 5;
+ elem_type = uint8_t;
+} field_name;
+
+or
+
+uint8_t field_name[10];
+
+
+4.2.3 Sequences
+
+Sequences are dynamically-sized arrays. They start with an integer that specify
+the length of the sequence, followed by an array of "inner type" elements.
+The length is the number of elements in the sequence.
+
+Metadata representation for a named sequence, either:
+
+typedef sequence {
+ length_type = type; /* integer class */
+ elem_type = type;
+} name;
+
+or:
+
+typedef elem_type name[length_type];
+
+A nameless sequence can be declared as a field type, e.g.:
+
+sequence {
+ length_type = int;
+ elem_type = long;
+} field_name;
+
+or
+
+long field_name[int];
+
+The length type follows the integer types specifications, and the sequence
+elements follow the "array" specifications.
+
+4.2.4 Strings
+
+Strings are an array of bytes of variable size and are terminated by a '\0'
+"NULL" character. Their encoding is described in the metadata. In absence of
+encoding attribute information, the default encoding is UTF-8.
+
+Metadata representation of a named string type:
+
+typedef string {
+ encoding = UTF8 OR ASCII;
+} name;
+
+A nameless string type can be declared as a field type:
+
+string field_name; /* Use default UTF8 encoding */
+
+5. Event Packet Header
+
+The event packet header consists of two part: one is mandatory and have a fixed
+layout. The second part, the "event packet context", has its layout described in
+the metadata.
+
+- Aligned on page size. Fixed size. Fields either aligned or packed (depending
+ on the architecture preference).
+ No padding at the end of the event packet header. Native architecture byte
+ ordering.
+
+Fixed layout (event packet header):
+
+- Magic number (CTF magic numbers: 0xC1FC1FC1 and its reverse endianness
+ representation: 0xC11FFCC1) It needs to have a non-symmetric bytewise
+ representation. Used to distinguish between big and little endian traces (this
+ information is determined by knowing the endianness of the architecture
+ reading the trace and comparing the magic number against its value and the
+ reverse, 0xC11FFCC1). This magic number specifies that we use the CTF metadata
+ description language described in this document. Different magic numbers
+ should be used for other metadata description languages.
+- Trace UUID, used to ensure the event packet match the metadata used.
+ (note: we cannot use a metadata checksum because metadata can be appended to
+ while tracing is active)
+- Stream ID, used as reference to stream description in metadata.
+
+Metadata-defined layout (event packet context):
+
+- Event packet content size (in bytes).
+- Event packet size (in bytes, includes padding).
+- Event packet content checksum (optional). Checksum excludes the event packet
+ header.
+- Per-stream event packet sequence count (to deal with UDP packet loss). The
+ number of significant sequence counter bits should also be present, so
+ wrap-arounds are deal with correctly.
+- Timestamp at the beginning and timestamp at the end of the event packet.
+ Both timestamps are written in the packet header, but sampled respectively
+ while (or before) writing the first event and while (or after) writing the
+ last event in the packet. The inclusive range between these timestamps should
+ include all event timestamps assigned to events contained within the packet.
+- Events discarded count
+ - Snapshot of a per-stream free-running counter, counting the number of
+ events discarded that were supposed to be written in the stream prior to
+ the first event in the event packet.
+ * Note: producer-consumer buffer full condition should fill the current
+ event packet with padding so we know exactly where events have been
+ discarded.
+- Lossless compression scheme used for the event packet content. Applied
+ directly to raw data. New types of compression can be added in following
+ versions of the format.
+ 0: no compression scheme
+ 1: bzip2
+ 2: gzip
+ 3: xz
+- Cypher used for the event packet content. Applied after compression.
+ 0: no encryption
+ 1: AES
+- Checksum scheme used for the event packet content. Applied after encryption.
+ 0: no checksum
+ 1: md5
+ 2: sha1
+ 3: crc32
+
+5.1 Event Packet Header Fixed Layout Description
+
+struct event_packet_header {
+ uint32_t magic;
+ uint8_t trace_uuid[16];
+ uint32_t stream_id;
+};
+
+5.2 Event Packet Context Description
+
+Event packet context example. These are declared within the stream declaration
+in the metadata. All these fields are optional except for "content_size" and
+"packet_size", which must be present in the context.
+
+An example event packet context type:
+
+struct event_packet_context {
+ uint64_t timestamp_begin;
+ uint64_t timestamp_end;
+ uint32_t checksum;
+ uint32_t stream_packet_count;
+ uint32_t events_discarded;
+ uint32_t cpu_id;
+ uint32_t/uint16_t content_size;
+ uint32_t/uint16_t packet_size;
+ uint8_t stream_packet_count_bits; /* Significant counter bits */
+ uint8_t compression_scheme;
+ uint8_t encryption_scheme;
+ uint8_t checksum;
+};
+
+6. Event Structure
+
+The overall structure of an event is:
+
+ - Event Header (as specifed by the stream metadata)
+ - Extended Event Header (as specified by the event header)
+ - Event Context (as specified by the stream metadata)
+ - Event Payload (as specified by the event metadata)
+
+
+6.1 Event Header
+
+One major factor can vary between streams: the number of event IDs assigned to
+a stream. Luckily, this information tends to stay relatively constant (modulo
+event registration while trace is being recorded), so we can specify different
+representations for streams containing few event IDs and streams containing
+many event IDs, so we end up representing the event ID and timestamp as densely
+as possible in each case.
+
+We therefore provide two types of events headers. Type 1 accommodates streams
+with less than 31 event IDs. Type 2 accommodates streams with 31 or more event
+IDs.
+
+The "extended headers" are used in the rare occasions where the information
+cannot be represented in the ranges available in the event header. They are also
+used in the rare occasions where the data required for a field could not be
+collected: the flag corresponding to the missing field within the missing_fields
+array is then set to 1.
+
+Types uintX_t represent an X-bit unsigned integer.
+
+
+6.1.1 Type 1 - Few event IDs
+
+ - Aligned on 32-bit (or 8-bit if byte-packed, depending on the architecture
+ preference).
+ - Fixed size: 32 bits.
+ - Native architecture byte ordering.
+
+struct event_header_1 {
+ uint5_t id; /*
+ * id: range: 0 - 30.
+ * id 31 is reserved to indicate a following
+ * extended header.
+ */
+ uint27_t timestamp;
+};
+
+The end of a type 1 header is aligned on a 32-bit boundary (or packed).
+
+
+6.1.2 Extended Type 1 Event Header
+
+ - Follows struct event_header_1, which is aligned on 32-bit, so no need to
+ realign.
+ - Variable size (depends on the number of fields per event).
+ - Native architecture byte ordering.
+ - NR_FIELDS is the number of fields within the event.
+
+struct event_header_1_ext {
+ uint32_t id; /* 32-bit event IDs */
+ uint64_t timestamp; /* 64-bit timestamps */
+ uint1_t missing_fields[NR_FIELDS]; /* missing event fields bitmap */
+};
+
+
+6.1.3 Type 2 - Many event IDs
+
+ - Aligned on 32-bit (or 8-bit if byte-packed, depending on the architecture
+ preference).
+ - Fixed size: 48 bits.
+ - Native architecture byte ordering.
+
+struct event_header_2 {
+ uint32_t timestamp;
+ uint16_t id; /*
+ * id: range: 0 - 65534.
+ * id 65535 is reserved to indicate a following
+ * extended header.
+ */
+};
+
+The end of a type 2 header is aligned on a 16-bit boundary (or 8-bit if
+byte-packed).
+
+
+6.1.4 Extended Type 2 Event Header
+
+ - Follows struct event_header_2, which alignment end on a 16-bit boundary, so
+ we need to align on 64-bit integer architecture alignment (or 8-bit if
+ byte-packed).
+ - Variable size (depends on the number of fields per event).
+ - Native architecture byte ordering.
+ - NR_FIELDS is the number of fields within the event.
+
+struct event_header_2_ext {
+ uint64_t timestamp; /* 64-bit timestamps */
+ uint32_t id; /* 32-bit event IDs */
+ uint1_t missing_fields[NR_FIELDS]; /* missing event fields bitmap */
+};
+
+
+6.2 Event Context
+
+The event context contains information relative to the current event. The choice
+and meaning of this information is specified by the metadata "stream"
+information. For this trace format, event context is usually empty, except when
+the metadata "stream" information specifies otherwise by declaring a non-empty
+structure for the event context. An example of event context is to save the
+event payload size with each event, or to save the current PID with each event.
+These are declared within the stream declaration within the metadata.
+
+An example event context type:
+
+ struct event_context {
+ uint pid;
+ uint16_t payload_size;
+ };
+
+
+6.3 Event Payload
+
+An event payload contains fields specific to a given event type. The fields
+belonging to an event type are described in the event-specific metadata
+within a structure type.
+
+6.3.1 Padding
+
+No padding at the end of the event payload. This differs from the ISO/C standard
+for structures, but follows the CTF standard for structures. In a trace, even
+though it makes sense to align the beginning of a structure, it really makes no
+sense to add padding at the end of the structure, because structures are usually
+not followed by a structure of the same type.
+
+This trick can be done by adding a zero-length "end" field at the end of the C
+structures, and by using the offset of this field rather than using sizeof()
+when calculating the size of a structure (see Appendix "A. Helper macros").
+
+6.3.2 Alignment
+
+The event payload is aligned on the largest alignment required by types
+contained within the payload. (This follows the ISO/C standard for structures)
+
+
+
+7. Metadata
+
+The meta-data is located in a stream named "metadata". It is made of "event
+packets", which each start with an event packet header. The event type within
+the metadata stream have no event header nor event context. Each event only
+contains a null-terminated "string" payload, which is a metadata description
+entry. The events are packed one next to another. Each event packet start with
+an event packet header, which contains, amongst other fields, the magic number
+and trace UUID.
+
+The metadata can be parsed by reading through the metadata strings, skipping
+newlines and null-characters. Type names may contain spaces.
+
+trace {
+ major = value; /* Trace format version */
+ minor = value;
+ uuid = value; /* Trace UUID */
+ word_size = value;
+};
+
+stream {
+ id = stream_id;
+ event {
+ /* Type 1 - Few event IDs; Type 2 - Many event IDs. See section 6.1. */
+ header_type = event_header_1 OR event_header_2;
+ /*
+ * Extended event header type. Only present if specified in event header
+ * on a per-event basis.
+ */
+ header_type_ext = event_header_1_ext OR event_header_2_ext;
+ context_type = struct {
+ ...
+ };
+ };
+ packet {
+ context_type = struct {
+ ...
+ };
+ };
+};
+
+event {
+ name = eventname;
+ id = value; /* Numeric identifier within the stream */
+ stream = stream_id;
+ fields = struct {
+ ...
+ };
+};
+
+/* More detail on types in section 4. Types */
+
+/* Named types */
+typedef some existing type new_type;
+
+typedef type_class {
+ ...
+} new_type;
+
+struct name {
+ ...
+};
+
+enum name {
+ ...
+};
+
+/* Unnamed types, contained within compound type fields or type assignments. */
+struct {
+ ...
+};
+
+enum {
+ ...
+};
+
+array {
+ ...
+};
+
+sequence {
+ ...
+};
+
+A. Helper macros
+
+The two following macros keep track of the size of a GNU/C structure without
+padding at the end by placing HEADER_END as the last field. A one byte end field
+is used for C90 compatibility (C99 flexible arrays could be used here). Note
+that this does not affect the effective structure size, which should always be
+calculated with the header_sizeof() helper.
+
+#define HEADER_END char end_field
+#define header_sizeof(type) offsetof(typeof(type), end_field)
+
+
+B. Stream Header Rationale
+
+An event stream is divided in contiguous event packets of variable size. These
+subdivisions allow the trace analyzer to perform a fast binary search by time
+within the stream (typically requiring to index only the event packet headers)
+without reading the whole stream. These subdivisions have a variable size to
+eliminate the need to transfer the event packet padding when partially filled
+event packets must be sent when streaming a trace for live viewing/analysis.
+An event packet can contain a certain amount of padding at the end. Dividing
+streams into event packets is also useful for network streaming over UDP and
+flight recorder mode tracing (a whole event packet can be swapped out of the
+buffer atomically for reading).
+
+The stream header is repeated at the beginning of each event packet to allow
+flexibility in terms of:
+
+ - streaming support,
+ - allowing arbitrary buffers to be discarded without making the trace
+ unreadable,
+ - allow UDP packet loss handling by either dealing with missing event packet
+ or asking for re-transmission.
+ - transparently support flight recorder mode,
+ - transparently support crash dump.
+
+The event stream header will therefore be referred to as the "event packet
+header" throughout the rest of this document.