From: Mathieu Desnoyers Date: Thu, 13 Jan 2011 23:48:09 +0000 (-0500) Subject: Rename proposal document title, add links X-Git-Tag: v1.8~95 X-Git-Url: http://git.efficios.com/?p=ctf.git;a=commitdiff_plain;h=cc089c3a15c506c905e594c2ac350f41b84706b0;ds=sidebyside Rename proposal document title, add links Signed-off-by: Mathieu Desnoyers --- diff --git a/common-trace-format-linux-proposal.txt b/common-trace-format-linux-proposal.txt deleted file mode 100644 index 2dfa4ab..0000000 --- a/common-trace-format-linux-proposal.txt +++ /dev/null @@ -1,794 +0,0 @@ - -RFC: Common Trace Format Proposal for Linux (v1.6) - -Mathieu Desnoyers, EfficiOS Inc. - -The goal of the present document is to propose a trace format that suits the -needs of the embedded, telecom, high-performance and kernel communities. It is -based on the Common Trace Format Requirements (v1.4) document. It is designed to -allow tracing that is natively generated by the Linux kernel and Linux -user-space applications written in C/C++. - -A reference implementation of a library to read and write this trace format is -being implemented within the BabelTrace project, a converter between trace -formats. The development tree is available at: - - git tree: git://git.efficios.com/babeltrace.git - gitweb: http://git.efficios.com/?p=babeltrace.git - - -1. Preliminary definitions - - - Event Trace: An ordered sequence of events. - - Event Stream: An ordered sequence of events, containing a subset of the - trace event types. - - Event Packet: A sequence of physically contiguous events within an event - stream. - - Event: This is the basic entry in a trace. (aka: a trace record). - - An event identifier (ID) relates to the class (a type) of event within - an event stream. - e.g. event: irq_entry. - - An event (or event record) relates to a specific instance of an event - class. - e.g. event: irq_entry, at time X, on CPU Y - - Source Architecture: Architecture writing the trace. - - Reader Architecture: Architecture reading the trace. - - -2. High-level representation of a trace - -A trace is divided into multiple event streams. Each event stream contains a -subset of the trace event types. - -The final output of the trace, after its generation and optional transport over -the network, is expected to be either on permanent or temporary storage in a -virtual file system. Because each event stream is appended to while a trace is -being recorded, each is associated with a separate file for output. Therefore, -a stored trace can be represented as a directory containing one file per stream. - -A metadata event stream contains information on trace event types. It describes: - -- Trace version. -- Types available. -- Per-stream event header description. -- Per-stream event header selection. -- Per-stream event context fields. -- Per-event - - Event type to stream mapping. - - Event type to name mapping. - - Event type to ID mapping. - - Event fields description. - - -3. Event stream - -An event stream is divided in contiguous event packets of variable size. These -subdivisions have a variable size. An event packet can contain a certain amount -of padding at the end. The rationale for the event stream design choices is -explained in Appendix B. Stream Header Rationale. - -An event stream is divided in contiguous event packets of variable size. These -subdivisions have a variable size. An event packet can contain a certain amount -of padding at the end. The stream header is repeated at the beginning of each -event packet. - -The event stream header will therefore be referred to as the "event packet -header" throughout the rest of this document. - - -4. Types - -4.1 Basic types - -A basic type is a scalar type, as described in this section. - -4.1.1 Type inheritance - -Type specifications can be inherited to allow deriving types from a -type class. For example, see the uint32_t named type derived from the "integer" -type class below ("Integers" section). Types have a precise binary -representation in the trace. A type class has methods to read and write these -types, but must be derived into a type to be usable in an event field. - -4.1.2 Alignment - -We define "byte-packed" types as aligned on the byte size, namely 8-bit. -We define "bit-packed" types as following on the next bit, as defined by the -"bitfields" section. - -All basic types, except bitfields, are either aligned on an architecture-defined -specific alignment or byte-packed, depending on the architecture preference. -Architectures providing fast unaligned write byte-packed basic types to save -space, aligning each type on byte boundaries (8-bit). Architectures with slow -unaligned writes align types on specific alignment values. If no specific -alignment is declared for a type nor its parents, it is assumed to be bit-packed -for bitfields and byte-packed for other types. - -Metadata attribute representation of a specific alignment: - - align = value; /* value in bits */ - -4.1.3 Byte order - -By default, the native endianness of the source architecture the trace is used. -Byte order can be overridden for a basic type by specifying a "byte_order" -attribute. Typical use-case is to specify the network byte order (big endian: -"be") to save data captured from the network into the trace without conversion. -If not specified, the byte order is native. - -Metadata representation: - - byte_order = native OR network OR be OR le; /* network and be are aliases */ - -4.1.4 Size - -Type size, in bits, for integers and floats is that returned by "sizeof()" in C -multiplied by CHAR_BIT. -We require the size of "char" and "unsigned char" types (CHAR_BIT) to be fixed -to 8 bits for cross-endianness compatibility. - -Metadata representation: - - size = value; (value is in bits) - -4.1.5 Integers - -Signed integers are represented in two-complement. Integer alignment, size, -signedness and byte ordering are defined in the metadata. Integers aligned on -byte size (8-bit) and with length multiple of byte size (8-bit) correspond to -the C99 standard integers. In addition, integers with alignment and/or size that -are _not_ a multiple of the byte size are permitted; these correspond to the C99 -standard bitfields, with the added specification that the CTF integer bitfields -have a fixed binary representation. A MIT-licensed reference implementation of -the CTF portable bitfields is available at: - - http://git.efficios.com/?p=babeltrace.git;a=blob;f=include/babeltrace/bitfield.h - -Binary representation of integers: - -- On little and big endian: - - Within a byte, high bits correspond to an integer high bits, and low bits - correspond to low bits. -- On little endian: - - Integer across multiple bytes are placed from the less significant to the - most significant. - - Consecutive integers are placed from lower bits to higher bits (even within - a byte). -- On big endian: - - Integer across multiple bytes are placed from the most significant to the - less significant. - - Consecutive integers are placed from higher bits to lower bits (even within - a byte). - -This binary representation is derived from the bitfield implementation in GCC -for little and big endian. However, contrary to what GCC does, integers can -cross units boundaries (no padding is required). Padding can be explicitely -added (see 4.1.6 GNU/C bitfields) to follow the GCC layout if needed. - -Metadata representation: - - integer { - signed = true OR false; /* default false */ - byte_order = native OR network OR be OR le; /* default native */ - size = value; /* value in bits, no default */ - align = value; /* value in bits */ - }; - -Example of type inheritance (creation of a uint32_t named type): - -typedef integer { - size = 32; - signed = false; - align = 32; -} uint32_t; - -Definition of a named 5-bit signed bitfield: - -typedef integer { - size = 5; - signed = true; - align = 1; -} int5_t; - -4.1.6 GNU/C bitfields - -The GNU/C bitfields follow closely the integer representation, with a -particularity on alignment: if a bitfield cannot fit in the current unit, the -unit is padded and the bitfield starts at the following unit. The unit size is -defined by the size of the type "unit_type". - -Metadata representation. Either: - -gcc_bitfield { - unit_type = integer { - ... - }; - size = value; -}; - -Or bitfield within structures as specified by the C standard - - unit_type name:size: - -As an example, the following structure declared in C compiled by GCC: - -struct example { - short a:12; - short b:5; -}; - -is equivalent to the following structure declaration, aligned on the largest -element (short). The second bitfield would be aligned on the next unit boundary, -because it would not fit in the current unit. The two declarations (C -declaration above or CTF declaration with "type gcc_bitfield") are strictly -equivalent. - -struct example { - gcc_bitfield { - unit_type = short; - size = 12; - } a; - gcc_bitfield { - unit_type = short; - size = 5; - } b; -}; - -4.1.7 Floating point - -The floating point values byte ordering is defined in the metadata. - -Floating point values follow the IEEE 754-2008 standard interchange formats. -Description of the floating point values include the exponent and mantissa size -in bits. Some requirements are imposed on the floating point values: - -- FLT_RADIX must be 2. -- mant_dig is the number of digits represented in the mantissa. It is specified - by the ISO C99 standard, section 5.2.4, as FLT_MANT_DIG, DBL_MANT_DIG and - LDBL_MANT_DIG as defined by . -- exp_dig is the number of digits represented in the exponent. Given that - mant_dig is one bit more than its actual size in bits (leading 1 is not - needed) and also given that the sign bit always takes one bit, exp_dig can be - specified as: - - - sizeof(float) * CHAR_BIT - FLT_MANT_DIG - - sizeof(double) * CHAR_BIT - DBL_MANT_DIG - - sizeof(long double) * CHAR_BIT - LDBL_MANT_DIG - -Metadata representation: - -floating_point { - exp_dig = value; - mant_dig = value; - byte_order = native OR network OR be OR le; -}; - -Example of type inheritance: - -typedef floating_point { - exp_dig = 8; /* sizeof(float) * CHAR_BIT - FLT_MANT_DIG */ - mant_dig = 24; /* FLT_MANT_DIG */ - byte_order = native; -} float; - -TODO: define NaN, +inf, -inf behavior. - -4.1.8 Enumerations - -Enumerations are a mapping between an integer type and a table of strings. The -numerical representation of the enumeration follows the integer type specified -by the metadata. The enumeration mapping table is detailed in the enumeration -description within the metadata. The mapping table maps inclusive value ranges -(or single values) to strings. Instead of being limited to simple -"value -> string" mappings, these enumerations map -"[ start_value ... end_value ] -> string", which map inclusive ranges of -values to strings. An enumeration from the C language can be represented in -this format by having the same start_value and end_value for each element, which -is in fact a range of size 1. This single-value range is supported without -repeating the start and end values with the value = string declaration. If the - is omitted, the type chosen by the C compiler to hold the -enumeration is used. The specifier can only be omitted for -enumerations containing only simple "value -> string" mappings (compatible with -C). - -enum name { - string = start_value1 ... end_value1, - "other string" = start_value2 ... end_value2, - yet_another_string, /* will be assigned to end_value2 + 1 */ - "some other string" = value, - ... -}; - -If the values are omitted, the enumeration starts at 0 and increment of 1 for -each entry: - -enum { - ZERO, - ONE, - TWO, - TEN = 10, - ELEVEN, -}; - -Overlapping ranges within a single enumeration are implementation defined. - -4.2 Compound types - -4.2.1 Structures - -Structures are aligned on the largest alignment required by basic types -contained within the structure. (This follows the ISO/C standard for structures) - -Metadata representation of a named structure: - -struct name { - field_type field_name; - field_type field_name; - ... -}; - -Example: - -struct example { - integer { /* Nameless type */ - size = 16; - signed = true; - align = 16; - } first_field_name; - uint64_t second_field_name; /* Named type declared in the metadata */ -}; - -The fields are placed in a sequence next to each other. They each possess a -field name, which is a unique identifier within the structure. - -A nameless structure can be declared as a field type: - -struct { - ... -} field_name; - -4.2.2 Arrays - -Arrays are fixed-length. Their length is declared in the type declaration within -the metadata. They contain an array of "inner type" elements, which can refer to -any type not containing the type of the array being declared (no circular -dependency). The length is the number of elements in an array. - -Metadata representation of a named array, either: - -typedef array { - length = value; - elem_type = type; -} name; - -or: - -typedef elem_type name[length]; - -E.g.: - -typedef array { - length = 10; - elem_type = uint32_t; -} example; - -A nameless array can be declared as a field type, e.g.: - -array { - length = 5; - elem_type = uint8_t; -} field_name; - -or - -uint8_t field_name[10]; - - -4.2.3 Sequences - -Sequences are dynamically-sized arrays. They start with an integer that specify -the length of the sequence, followed by an array of "inner type" elements. -The length is the number of elements in the sequence. - -Metadata representation for a named sequence, either: - -typedef sequence { - length_type = type; /* integer class */ - elem_type = type; -} name; - -or: - -typedef elem_type name[length_type]; - -A nameless sequence can be declared as a field type, e.g.: - -sequence { - length_type = int; - elem_type = long; -} field_name; - -or - -long field_name[int]; - -The length type follows the integer types specifications, and the sequence -elements follow the "array" specifications. - -4.2.4 Strings - -Strings are an array of bytes of variable size and are terminated by a '\0' -"NULL" character. Their encoding is described in the metadata. In absence of -encoding attribute information, the default encoding is UTF-8. - -Metadata representation of a named string type: - -typedef string { - encoding = UTF8 OR ASCII; -} name; - -A nameless string type can be declared as a field type: - -string field_name; /* Use default UTF8 encoding */ - -5. Event Packet Header - -The event packet header consists of two part: one is mandatory and have a fixed -layout. The second part, the "event packet context", has its layout described in -the metadata. - -- Aligned on page size. Fixed size. Fields either aligned or packed (depending - on the architecture preference). - No padding at the end of the event packet header. Native architecture byte - ordering. - -Fixed layout (event packet header): - -- Magic number (CTF magic numbers: 0xC1FC1FC1 and its reverse endianness - representation: 0xC11FFCC1) It needs to have a non-symmetric bytewise - representation. Used to distinguish between big and little endian traces (this - information is determined by knowing the endianness of the architecture - reading the trace and comparing the magic number against its value and the - reverse, 0xC11FFCC1). This magic number specifies that we use the CTF metadata - description language described in this document. Different magic numbers - should be used for other metadata description languages. -- Trace UUID, used to ensure the event packet match the metadata used. - (note: we cannot use a metadata checksum because metadata can be appended to - while tracing is active) -- Stream ID, used as reference to stream description in metadata. - -Metadata-defined layout (event packet context): - -- Event packet content size (in bytes). -- Event packet size (in bytes, includes padding). -- Event packet content checksum (optional). Checksum excludes the event packet - header. -- Per-stream event packet sequence count (to deal with UDP packet loss). The - number of significant sequence counter bits should also be present, so - wrap-arounds are deal with correctly. -- Timestamp at the beginning and timestamp at the end of the event packet. - Both timestamps are written in the packet header, but sampled respectively - while (or before) writing the first event and while (or after) writing the - last event in the packet. The inclusive range between these timestamps should - include all event timestamps assigned to events contained within the packet. -- Events discarded count - - Snapshot of a per-stream free-running counter, counting the number of - events discarded that were supposed to be written in the stream prior to - the first event in the event packet. - * Note: producer-consumer buffer full condition should fill the current - event packet with padding so we know exactly where events have been - discarded. -- Lossless compression scheme used for the event packet content. Applied - directly to raw data. New types of compression can be added in following - versions of the format. - 0: no compression scheme - 1: bzip2 - 2: gzip - 3: xz -- Cypher used for the event packet content. Applied after compression. - 0: no encryption - 1: AES -- Checksum scheme used for the event packet content. Applied after encryption. - 0: no checksum - 1: md5 - 2: sha1 - 3: crc32 - -5.1 Event Packet Header Fixed Layout Description - -struct event_packet_header { - uint32_t magic; - uint8_t trace_uuid[16]; - uint32_t stream_id; -}; - -5.2 Event Packet Context Description - -Event packet context example. These are declared within the stream declaration -in the metadata. All these fields are optional except for "content_size" and -"packet_size", which must be present in the context. - -An example event packet context type: - -struct event_packet_context { - uint64_t timestamp_begin; - uint64_t timestamp_end; - uint32_t checksum; - uint32_t stream_packet_count; - uint32_t events_discarded; - uint32_t cpu_id; - uint32_t/uint16_t content_size; - uint32_t/uint16_t packet_size; - uint8_t stream_packet_count_bits; /* Significant counter bits */ - uint8_t compression_scheme; - uint8_t encryption_scheme; - uint8_t checksum; -}; - -6. Event Structure - -The overall structure of an event is: - - - Event Header (as specifed by the stream metadata) - - Extended Event Header (as specified by the event header) - - Event Context (as specified by the stream metadata) - - Event Payload (as specified by the event metadata) - - -6.1 Event Header - -One major factor can vary between streams: the number of event IDs assigned to -a stream. Luckily, this information tends to stay relatively constant (modulo -event registration while trace is being recorded), so we can specify different -representations for streams containing few event IDs and streams containing -many event IDs, so we end up representing the event ID and timestamp as densely -as possible in each case. - -We therefore provide two types of events headers. Type 1 accommodates streams -with less than 31 event IDs. Type 2 accommodates streams with 31 or more event -IDs. - -The "extended headers" are used in the rare occasions where the information -cannot be represented in the ranges available in the event header. They are also -used in the rare occasions where the data required for a field could not be -collected: the flag corresponding to the missing field within the missing_fields -array is then set to 1. - -Types uintX_t represent an X-bit unsigned integer. - - -6.1.1 Type 1 - Few event IDs - - - Aligned on 32-bit (or 8-bit if byte-packed, depending on the architecture - preference). - - Fixed size: 32 bits. - - Native architecture byte ordering. - -struct event_header_1 { - uint5_t id; /* - * id: range: 0 - 30. - * id 31 is reserved to indicate a following - * extended header. - */ - uint27_t timestamp; -}; - -The end of a type 1 header is aligned on a 32-bit boundary (or packed). - - -6.1.2 Extended Type 1 Event Header - - - Follows struct event_header_1, which is aligned on 32-bit, so no need to - realign. - - Variable size (depends on the number of fields per event). - - Native architecture byte ordering. - - NR_FIELDS is the number of fields within the event. - -struct event_header_1_ext { - uint32_t id; /* 32-bit event IDs */ - uint64_t timestamp; /* 64-bit timestamps */ - uint1_t missing_fields[NR_FIELDS]; /* missing event fields bitmap */ -}; - - -6.1.3 Type 2 - Many event IDs - - - Aligned on 32-bit (or 8-bit if byte-packed, depending on the architecture - preference). - - Fixed size: 48 bits. - - Native architecture byte ordering. - -struct event_header_2 { - uint32_t timestamp; - uint16_t id; /* - * id: range: 0 - 65534. - * id 65535 is reserved to indicate a following - * extended header. - */ -}; - -The end of a type 2 header is aligned on a 16-bit boundary (or 8-bit if -byte-packed). - - -6.1.4 Extended Type 2 Event Header - - - Follows struct event_header_2, which alignment end on a 16-bit boundary, so - we need to align on 64-bit integer architecture alignment (or 8-bit if - byte-packed). - - Variable size (depends on the number of fields per event). - - Native architecture byte ordering. - - NR_FIELDS is the number of fields within the event. - -struct event_header_2_ext { - uint64_t timestamp; /* 64-bit timestamps */ - uint32_t id; /* 32-bit event IDs */ - uint1_t missing_fields[NR_FIELDS]; /* missing event fields bitmap */ -}; - - -6.2 Event Context - -The event context contains information relative to the current event. The choice -and meaning of this information is specified by the metadata "stream" -information. For this trace format, event context is usually empty, except when -the metadata "stream" information specifies otherwise by declaring a non-empty -structure for the event context. An example of event context is to save the -event payload size with each event, or to save the current PID with each event. -These are declared within the stream declaration within the metadata. - -An example event context type: - - struct event_context { - uint pid; - uint16_t payload_size; - }; - - -6.3 Event Payload - -An event payload contains fields specific to a given event type. The fields -belonging to an event type are described in the event-specific metadata -within a structure type. - -6.3.1 Padding - -No padding at the end of the event payload. This differs from the ISO/C standard -for structures, but follows the CTF standard for structures. In a trace, even -though it makes sense to align the beginning of a structure, it really makes no -sense to add padding at the end of the structure, because structures are usually -not followed by a structure of the same type. - -This trick can be done by adding a zero-length "end" field at the end of the C -structures, and by using the offset of this field rather than using sizeof() -when calculating the size of a structure (see Appendix "A. Helper macros"). - -6.3.2 Alignment - -The event payload is aligned on the largest alignment required by types -contained within the payload. (This follows the ISO/C standard for structures) - - - -7. Metadata - -The meta-data is located in a stream named "metadata". It is made of "event -packets", which each start with an event packet header. The event type within -the metadata stream have no event header nor event context. Each event only -contains a null-terminated "string" payload, which is a metadata description -entry. The events are packed one next to another. Each event packet start with -an event packet header, which contains, amongst other fields, the magic number -and trace UUID. - -The metadata can be parsed by reading through the metadata strings, skipping -newlines and null-characters. Type names may contain spaces. - -trace { - major = value; /* Trace format version */ - minor = value; - uuid = value; /* Trace UUID */ - word_size = value; -}; - -stream { - id = stream_id; - event { - /* Type 1 - Few event IDs; Type 2 - Many event IDs. See section 6.1. */ - header_type = event_header_1 OR event_header_2; - /* - * Extended event header type. Only present if specified in event header - * on a per-event basis. - */ - header_type_ext = event_header_1_ext OR event_header_2_ext; - context_type = struct { - ... - }; - }; - packet { - context_type = struct { - ... - }; - }; -}; - -event { - name = eventname; - id = value; /* Numeric identifier within the stream */ - stream = stream_id; - fields = struct { - ... - }; -}; - -/* More detail on types in section 4. Types */ - -/* Named types */ -typedef some existing type new_type; - -typedef type_class { - ... -} new_type; - -struct name { - ... -}; - -enum name { - ... -}; - -/* Unnamed types, contained within compound type fields or type assignments. */ -struct { - ... -}; - -enum { - ... -}; - -array { - ... -}; - -sequence { - ... -}; - -A. Helper macros - -The two following macros keep track of the size of a GNU/C structure without -padding at the end by placing HEADER_END as the last field. A one byte end field -is used for C90 compatibility (C99 flexible arrays could be used here). Note -that this does not affect the effective structure size, which should always be -calculated with the header_sizeof() helper. - -#define HEADER_END char end_field -#define header_sizeof(type) offsetof(typeof(type), end_field) - - -B. Stream Header Rationale - -An event stream is divided in contiguous event packets of variable size. These -subdivisions allow the trace analyzer to perform a fast binary search by time -within the stream (typically requiring to index only the event packet headers) -without reading the whole stream. These subdivisions have a variable size to -eliminate the need to transfer the event packet padding when partially filled -event packets must be sent when streaming a trace for live viewing/analysis. -An event packet can contain a certain amount of padding at the end. Dividing -streams into event packets is also useful for network streaming over UDP and -flight recorder mode tracing (a whole event packet can be swapped out of the -buffer atomically for reading). - -The stream header is repeated at the beginning of each event packet to allow -flexibility in terms of: - - - streaming support, - - allowing arbitrary buffers to be discarded without making the trace - unreadable, - - allow UDP packet loss handling by either dealing with missing event packet - or asking for re-transmission. - - transparently support flight recorder mode, - - transparently support crash dump. - -The event stream header will therefore be referred to as the "event packet -header" throughout the rest of this document. diff --git a/common-trace-format-proposal.txt b/common-trace-format-proposal.txt new file mode 100644 index 0000000..97f67e7 --- /dev/null +++ b/common-trace-format-proposal.txt @@ -0,0 +1,799 @@ + +RFC: Common Trace Format Proposal (v1.6) + +Mathieu Desnoyers, EfficiOS Inc. + +The goal of the present document is to propose a trace format that suits the +needs of the embedded, telecom, high-performance and kernel communities. It is +based on the Common Trace Format Requirements (v1.4) document. It is designed to +allow traces to be natively generated by the Linux kernel, Linux user-space +applications written in C/C++, and hardware components. + +The latest version of this document can be found at: + + git tree: git://git.efficios.com/ctf.git + gitweb: http://git.efficios.com/?p=ctf.git + +A reference implementation of a library to read and write this trace format is +being implemented within the BabelTrace project, a converter between trace +formats. The development tree is available at: + + git tree: git://git.efficios.com/babeltrace.git + gitweb: http://git.efficios.com/?p=babeltrace.git + + +1. Preliminary definitions + + - Event Trace: An ordered sequence of events. + - Event Stream: An ordered sequence of events, containing a subset of the + trace event types. + - Event Packet: A sequence of physically contiguous events within an event + stream. + - Event: This is the basic entry in a trace. (aka: a trace record). + - An event identifier (ID) relates to the class (a type) of event within + an event stream. + e.g. event: irq_entry. + - An event (or event record) relates to a specific instance of an event + class. + e.g. event: irq_entry, at time X, on CPU Y + - Source Architecture: Architecture writing the trace. + - Reader Architecture: Architecture reading the trace. + + +2. High-level representation of a trace + +A trace is divided into multiple event streams. Each event stream contains a +subset of the trace event types. + +The final output of the trace, after its generation and optional transport over +the network, is expected to be either on permanent or temporary storage in a +virtual file system. Because each event stream is appended to while a trace is +being recorded, each is associated with a separate file for output. Therefore, +a stored trace can be represented as a directory containing one file per stream. + +A metadata event stream contains information on trace event types. It describes: + +- Trace version. +- Types available. +- Per-stream event header description. +- Per-stream event header selection. +- Per-stream event context fields. +- Per-event + - Event type to stream mapping. + - Event type to name mapping. + - Event type to ID mapping. + - Event fields description. + + +3. Event stream + +An event stream is divided in contiguous event packets of variable size. These +subdivisions have a variable size. An event packet can contain a certain amount +of padding at the end. The rationale for the event stream design choices is +explained in Appendix B. Stream Header Rationale. + +An event stream is divided in contiguous event packets of variable size. These +subdivisions have a variable size. An event packet can contain a certain amount +of padding at the end. The stream header is repeated at the beginning of each +event packet. + +The event stream header will therefore be referred to as the "event packet +header" throughout the rest of this document. + + +4. Types + +4.1 Basic types + +A basic type is a scalar type, as described in this section. + +4.1.1 Type inheritance + +Type specifications can be inherited to allow deriving types from a +type class. For example, see the uint32_t named type derived from the "integer" +type class below ("Integers" section). Types have a precise binary +representation in the trace. A type class has methods to read and write these +types, but must be derived into a type to be usable in an event field. + +4.1.2 Alignment + +We define "byte-packed" types as aligned on the byte size, namely 8-bit. +We define "bit-packed" types as following on the next bit, as defined by the +"bitfields" section. + +All basic types, except bitfields, are either aligned on an architecture-defined +specific alignment or byte-packed, depending on the architecture preference. +Architectures providing fast unaligned write byte-packed basic types to save +space, aligning each type on byte boundaries (8-bit). Architectures with slow +unaligned writes align types on specific alignment values. If no specific +alignment is declared for a type nor its parents, it is assumed to be bit-packed +for bitfields and byte-packed for other types. + +Metadata attribute representation of a specific alignment: + + align = value; /* value in bits */ + +4.1.3 Byte order + +By default, the native endianness of the source architecture the trace is used. +Byte order can be overridden for a basic type by specifying a "byte_order" +attribute. Typical use-case is to specify the network byte order (big endian: +"be") to save data captured from the network into the trace without conversion. +If not specified, the byte order is native. + +Metadata representation: + + byte_order = native OR network OR be OR le; /* network and be are aliases */ + +4.1.4 Size + +Type size, in bits, for integers and floats is that returned by "sizeof()" in C +multiplied by CHAR_BIT. +We require the size of "char" and "unsigned char" types (CHAR_BIT) to be fixed +to 8 bits for cross-endianness compatibility. + +Metadata representation: + + size = value; (value is in bits) + +4.1.5 Integers + +Signed integers are represented in two-complement. Integer alignment, size, +signedness and byte ordering are defined in the metadata. Integers aligned on +byte size (8-bit) and with length multiple of byte size (8-bit) correspond to +the C99 standard integers. In addition, integers with alignment and/or size that +are _not_ a multiple of the byte size are permitted; these correspond to the C99 +standard bitfields, with the added specification that the CTF integer bitfields +have a fixed binary representation. A MIT-licensed reference implementation of +the CTF portable bitfields is available at: + + http://git.efficios.com/?p=babeltrace.git;a=blob;f=include/babeltrace/bitfield.h + +Binary representation of integers: + +- On little and big endian: + - Within a byte, high bits correspond to an integer high bits, and low bits + correspond to low bits. +- On little endian: + - Integer across multiple bytes are placed from the less significant to the + most significant. + - Consecutive integers are placed from lower bits to higher bits (even within + a byte). +- On big endian: + - Integer across multiple bytes are placed from the most significant to the + less significant. + - Consecutive integers are placed from higher bits to lower bits (even within + a byte). + +This binary representation is derived from the bitfield implementation in GCC +for little and big endian. However, contrary to what GCC does, integers can +cross units boundaries (no padding is required). Padding can be explicitely +added (see 4.1.6 GNU/C bitfields) to follow the GCC layout if needed. + +Metadata representation: + + integer { + signed = true OR false; /* default false */ + byte_order = native OR network OR be OR le; /* default native */ + size = value; /* value in bits, no default */ + align = value; /* value in bits */ + }; + +Example of type inheritance (creation of a uint32_t named type): + +typedef integer { + size = 32; + signed = false; + align = 32; +} uint32_t; + +Definition of a named 5-bit signed bitfield: + +typedef integer { + size = 5; + signed = true; + align = 1; +} int5_t; + +4.1.6 GNU/C bitfields + +The GNU/C bitfields follow closely the integer representation, with a +particularity on alignment: if a bitfield cannot fit in the current unit, the +unit is padded and the bitfield starts at the following unit. The unit size is +defined by the size of the type "unit_type". + +Metadata representation. Either: + +gcc_bitfield { + unit_type = integer { + ... + }; + size = value; +}; + +Or bitfield within structures as specified by the C standard + + unit_type name:size: + +As an example, the following structure declared in C compiled by GCC: + +struct example { + short a:12; + short b:5; +}; + +is equivalent to the following structure declaration, aligned on the largest +element (short). The second bitfield would be aligned on the next unit boundary, +because it would not fit in the current unit. The two declarations (C +declaration above or CTF declaration with "type gcc_bitfield") are strictly +equivalent. + +struct example { + gcc_bitfield { + unit_type = short; + size = 12; + } a; + gcc_bitfield { + unit_type = short; + size = 5; + } b; +}; + +4.1.7 Floating point + +The floating point values byte ordering is defined in the metadata. + +Floating point values follow the IEEE 754-2008 standard interchange formats. +Description of the floating point values include the exponent and mantissa size +in bits. Some requirements are imposed on the floating point values: + +- FLT_RADIX must be 2. +- mant_dig is the number of digits represented in the mantissa. It is specified + by the ISO C99 standard, section 5.2.4, as FLT_MANT_DIG, DBL_MANT_DIG and + LDBL_MANT_DIG as defined by . +- exp_dig is the number of digits represented in the exponent. Given that + mant_dig is one bit more than its actual size in bits (leading 1 is not + needed) and also given that the sign bit always takes one bit, exp_dig can be + specified as: + + - sizeof(float) * CHAR_BIT - FLT_MANT_DIG + - sizeof(double) * CHAR_BIT - DBL_MANT_DIG + - sizeof(long double) * CHAR_BIT - LDBL_MANT_DIG + +Metadata representation: + +floating_point { + exp_dig = value; + mant_dig = value; + byte_order = native OR network OR be OR le; +}; + +Example of type inheritance: + +typedef floating_point { + exp_dig = 8; /* sizeof(float) * CHAR_BIT - FLT_MANT_DIG */ + mant_dig = 24; /* FLT_MANT_DIG */ + byte_order = native; +} float; + +TODO: define NaN, +inf, -inf behavior. + +4.1.8 Enumerations + +Enumerations are a mapping between an integer type and a table of strings. The +numerical representation of the enumeration follows the integer type specified +by the metadata. The enumeration mapping table is detailed in the enumeration +description within the metadata. The mapping table maps inclusive value ranges +(or single values) to strings. Instead of being limited to simple +"value -> string" mappings, these enumerations map +"[ start_value ... end_value ] -> string", which map inclusive ranges of +values to strings. An enumeration from the C language can be represented in +this format by having the same start_value and end_value for each element, which +is in fact a range of size 1. This single-value range is supported without +repeating the start and end values with the value = string declaration. If the + is omitted, the type chosen by the C compiler to hold the +enumeration is used. The specifier can only be omitted for +enumerations containing only simple "value -> string" mappings (compatible with +C). + +enum name { + string = start_value1 ... end_value1, + "other string" = start_value2 ... end_value2, + yet_another_string, /* will be assigned to end_value2 + 1 */ + "some other string" = value, + ... +}; + +If the values are omitted, the enumeration starts at 0 and increment of 1 for +each entry: + +enum { + ZERO, + ONE, + TWO, + TEN = 10, + ELEVEN, +}; + +Overlapping ranges within a single enumeration are implementation defined. + +4.2 Compound types + +4.2.1 Structures + +Structures are aligned on the largest alignment required by basic types +contained within the structure. (This follows the ISO/C standard for structures) + +Metadata representation of a named structure: + +struct name { + field_type field_name; + field_type field_name; + ... +}; + +Example: + +struct example { + integer { /* Nameless type */ + size = 16; + signed = true; + align = 16; + } first_field_name; + uint64_t second_field_name; /* Named type declared in the metadata */ +}; + +The fields are placed in a sequence next to each other. They each possess a +field name, which is a unique identifier within the structure. + +A nameless structure can be declared as a field type: + +struct { + ... +} field_name; + +4.2.2 Arrays + +Arrays are fixed-length. Their length is declared in the type declaration within +the metadata. They contain an array of "inner type" elements, which can refer to +any type not containing the type of the array being declared (no circular +dependency). The length is the number of elements in an array. + +Metadata representation of a named array, either: + +typedef array { + length = value; + elem_type = type; +} name; + +or: + +typedef elem_type name[length]; + +E.g.: + +typedef array { + length = 10; + elem_type = uint32_t; +} example; + +A nameless array can be declared as a field type, e.g.: + +array { + length = 5; + elem_type = uint8_t; +} field_name; + +or + +uint8_t field_name[10]; + + +4.2.3 Sequences + +Sequences are dynamically-sized arrays. They start with an integer that specify +the length of the sequence, followed by an array of "inner type" elements. +The length is the number of elements in the sequence. + +Metadata representation for a named sequence, either: + +typedef sequence { + length_type = type; /* integer class */ + elem_type = type; +} name; + +or: + +typedef elem_type name[length_type]; + +A nameless sequence can be declared as a field type, e.g.: + +sequence { + length_type = int; + elem_type = long; +} field_name; + +or + +long field_name[int]; + +The length type follows the integer types specifications, and the sequence +elements follow the "array" specifications. + +4.2.4 Strings + +Strings are an array of bytes of variable size and are terminated by a '\0' +"NULL" character. Their encoding is described in the metadata. In absence of +encoding attribute information, the default encoding is UTF-8. + +Metadata representation of a named string type: + +typedef string { + encoding = UTF8 OR ASCII; +} name; + +A nameless string type can be declared as a field type: + +string field_name; /* Use default UTF8 encoding */ + +5. Event Packet Header + +The event packet header consists of two part: one is mandatory and have a fixed +layout. The second part, the "event packet context", has its layout described in +the metadata. + +- Aligned on page size. Fixed size. Fields either aligned or packed (depending + on the architecture preference). + No padding at the end of the event packet header. Native architecture byte + ordering. + +Fixed layout (event packet header): + +- Magic number (CTF magic numbers: 0xC1FC1FC1 and its reverse endianness + representation: 0xC11FFCC1) It needs to have a non-symmetric bytewise + representation. Used to distinguish between big and little endian traces (this + information is determined by knowing the endianness of the architecture + reading the trace and comparing the magic number against its value and the + reverse, 0xC11FFCC1). This magic number specifies that we use the CTF metadata + description language described in this document. Different magic numbers + should be used for other metadata description languages. +- Trace UUID, used to ensure the event packet match the metadata used. + (note: we cannot use a metadata checksum because metadata can be appended to + while tracing is active) +- Stream ID, used as reference to stream description in metadata. + +Metadata-defined layout (event packet context): + +- Event packet content size (in bytes). +- Event packet size (in bytes, includes padding). +- Event packet content checksum (optional). Checksum excludes the event packet + header. +- Per-stream event packet sequence count (to deal with UDP packet loss). The + number of significant sequence counter bits should also be present, so + wrap-arounds are deal with correctly. +- Timestamp at the beginning and timestamp at the end of the event packet. + Both timestamps are written in the packet header, but sampled respectively + while (or before) writing the first event and while (or after) writing the + last event in the packet. The inclusive range between these timestamps should + include all event timestamps assigned to events contained within the packet. +- Events discarded count + - Snapshot of a per-stream free-running counter, counting the number of + events discarded that were supposed to be written in the stream prior to + the first event in the event packet. + * Note: producer-consumer buffer full condition should fill the current + event packet with padding so we know exactly where events have been + discarded. +- Lossless compression scheme used for the event packet content. Applied + directly to raw data. New types of compression can be added in following + versions of the format. + 0: no compression scheme + 1: bzip2 + 2: gzip + 3: xz +- Cypher used for the event packet content. Applied after compression. + 0: no encryption + 1: AES +- Checksum scheme used for the event packet content. Applied after encryption. + 0: no checksum + 1: md5 + 2: sha1 + 3: crc32 + +5.1 Event Packet Header Fixed Layout Description + +struct event_packet_header { + uint32_t magic; + uint8_t trace_uuid[16]; + uint32_t stream_id; +}; + +5.2 Event Packet Context Description + +Event packet context example. These are declared within the stream declaration +in the metadata. All these fields are optional except for "content_size" and +"packet_size", which must be present in the context. + +An example event packet context type: + +struct event_packet_context { + uint64_t timestamp_begin; + uint64_t timestamp_end; + uint32_t checksum; + uint32_t stream_packet_count; + uint32_t events_discarded; + uint32_t cpu_id; + uint32_t/uint16_t content_size; + uint32_t/uint16_t packet_size; + uint8_t stream_packet_count_bits; /* Significant counter bits */ + uint8_t compression_scheme; + uint8_t encryption_scheme; + uint8_t checksum; +}; + +6. Event Structure + +The overall structure of an event is: + + - Event Header (as specifed by the stream metadata) + - Extended Event Header (as specified by the event header) + - Event Context (as specified by the stream metadata) + - Event Payload (as specified by the event metadata) + + +6.1 Event Header + +One major factor can vary between streams: the number of event IDs assigned to +a stream. Luckily, this information tends to stay relatively constant (modulo +event registration while trace is being recorded), so we can specify different +representations for streams containing few event IDs and streams containing +many event IDs, so we end up representing the event ID and timestamp as densely +as possible in each case. + +We therefore provide two types of events headers. Type 1 accommodates streams +with less than 31 event IDs. Type 2 accommodates streams with 31 or more event +IDs. + +The "extended headers" are used in the rare occasions where the information +cannot be represented in the ranges available in the event header. They are also +used in the rare occasions where the data required for a field could not be +collected: the flag corresponding to the missing field within the missing_fields +array is then set to 1. + +Types uintX_t represent an X-bit unsigned integer. + + +6.1.1 Type 1 - Few event IDs + + - Aligned on 32-bit (or 8-bit if byte-packed, depending on the architecture + preference). + - Fixed size: 32 bits. + - Native architecture byte ordering. + +struct event_header_1 { + uint5_t id; /* + * id: range: 0 - 30. + * id 31 is reserved to indicate a following + * extended header. + */ + uint27_t timestamp; +}; + +The end of a type 1 header is aligned on a 32-bit boundary (or packed). + + +6.1.2 Extended Type 1 Event Header + + - Follows struct event_header_1, which is aligned on 32-bit, so no need to + realign. + - Variable size (depends on the number of fields per event). + - Native architecture byte ordering. + - NR_FIELDS is the number of fields within the event. + +struct event_header_1_ext { + uint32_t id; /* 32-bit event IDs */ + uint64_t timestamp; /* 64-bit timestamps */ + uint1_t missing_fields[NR_FIELDS]; /* missing event fields bitmap */ +}; + + +6.1.3 Type 2 - Many event IDs + + - Aligned on 32-bit (or 8-bit if byte-packed, depending on the architecture + preference). + - Fixed size: 48 bits. + - Native architecture byte ordering. + +struct event_header_2 { + uint32_t timestamp; + uint16_t id; /* + * id: range: 0 - 65534. + * id 65535 is reserved to indicate a following + * extended header. + */ +}; + +The end of a type 2 header is aligned on a 16-bit boundary (or 8-bit if +byte-packed). + + +6.1.4 Extended Type 2 Event Header + + - Follows struct event_header_2, which alignment end on a 16-bit boundary, so + we need to align on 64-bit integer architecture alignment (or 8-bit if + byte-packed). + - Variable size (depends on the number of fields per event). + - Native architecture byte ordering. + - NR_FIELDS is the number of fields within the event. + +struct event_header_2_ext { + uint64_t timestamp; /* 64-bit timestamps */ + uint32_t id; /* 32-bit event IDs */ + uint1_t missing_fields[NR_FIELDS]; /* missing event fields bitmap */ +}; + + +6.2 Event Context + +The event context contains information relative to the current event. The choice +and meaning of this information is specified by the metadata "stream" +information. For this trace format, event context is usually empty, except when +the metadata "stream" information specifies otherwise by declaring a non-empty +structure for the event context. An example of event context is to save the +event payload size with each event, or to save the current PID with each event. +These are declared within the stream declaration within the metadata. + +An example event context type: + + struct event_context { + uint pid; + uint16_t payload_size; + }; + + +6.3 Event Payload + +An event payload contains fields specific to a given event type. The fields +belonging to an event type are described in the event-specific metadata +within a structure type. + +6.3.1 Padding + +No padding at the end of the event payload. This differs from the ISO/C standard +for structures, but follows the CTF standard for structures. In a trace, even +though it makes sense to align the beginning of a structure, it really makes no +sense to add padding at the end of the structure, because structures are usually +not followed by a structure of the same type. + +This trick can be done by adding a zero-length "end" field at the end of the C +structures, and by using the offset of this field rather than using sizeof() +when calculating the size of a structure (see Appendix "A. Helper macros"). + +6.3.2 Alignment + +The event payload is aligned on the largest alignment required by types +contained within the payload. (This follows the ISO/C standard for structures) + + + +7. Metadata + +The meta-data is located in a stream named "metadata". It is made of "event +packets", which each start with an event packet header. The event type within +the metadata stream have no event header nor event context. Each event only +contains a null-terminated "string" payload, which is a metadata description +entry. The events are packed one next to another. Each event packet start with +an event packet header, which contains, amongst other fields, the magic number +and trace UUID. + +The metadata can be parsed by reading through the metadata strings, skipping +newlines and null-characters. Type names may contain spaces. + +trace { + major = value; /* Trace format version */ + minor = value; + uuid = value; /* Trace UUID */ + word_size = value; +}; + +stream { + id = stream_id; + event { + /* Type 1 - Few event IDs; Type 2 - Many event IDs. See section 6.1. */ + header_type = event_header_1 OR event_header_2; + /* + * Extended event header type. Only present if specified in event header + * on a per-event basis. + */ + header_type_ext = event_header_1_ext OR event_header_2_ext; + context_type = struct { + ... + }; + }; + packet { + context_type = struct { + ... + }; + }; +}; + +event { + name = eventname; + id = value; /* Numeric identifier within the stream */ + stream = stream_id; + fields = struct { + ... + }; +}; + +/* More detail on types in section 4. Types */ + +/* Named types */ +typedef some existing type new_type; + +typedef type_class { + ... +} new_type; + +struct name { + ... +}; + +enum name { + ... +}; + +/* Unnamed types, contained within compound type fields or type assignments. */ +struct { + ... +}; + +enum { + ... +}; + +array { + ... +}; + +sequence { + ... +}; + +A. Helper macros + +The two following macros keep track of the size of a GNU/C structure without +padding at the end by placing HEADER_END as the last field. A one byte end field +is used for C90 compatibility (C99 flexible arrays could be used here). Note +that this does not affect the effective structure size, which should always be +calculated with the header_sizeof() helper. + +#define HEADER_END char end_field +#define header_sizeof(type) offsetof(typeof(type), end_field) + + +B. Stream Header Rationale + +An event stream is divided in contiguous event packets of variable size. These +subdivisions allow the trace analyzer to perform a fast binary search by time +within the stream (typically requiring to index only the event packet headers) +without reading the whole stream. These subdivisions have a variable size to +eliminate the need to transfer the event packet padding when partially filled +event packets must be sent when streaming a trace for live viewing/analysis. +An event packet can contain a certain amount of padding at the end. Dividing +streams into event packets is also useful for network streaming over UDP and +flight recorder mode tracing (a whole event packet can be swapped out of the +buffer atomically for reading). + +The stream header is repeated at the beginning of each event packet to allow +flexibility in terms of: + + - streaming support, + - allowing arbitrary buffers to be discarded without making the trace + unreadable, + - allow UDP packet loss handling by either dealing with missing event packet + or asking for re-transmission. + - transparently support flight recorder mode, + - transparently support crash dump. + +The event stream header will therefore be referred to as the "event packet +header" throughout the rest of this document.