Hint that content/packet size fields could be uint64_t
[ctf.git] / common-trace-format-reqs.txt
2RFC: Common Trace Format Requirements (v1.4)
4Mathieu Desnoyers, EfficiOS Inc.
6 The goal of the present document is to gather the trace format requirements
7from the embedded, telecom, high-performance and kernel communities. It consists
8of an overview of the trace format, tracer and trace analyzer requirements to
9consider for a Common Trace Format proposal.
11This document includes requirements from:
13Steven Rostedt <rostedt@goodmis.org>
14Dominique Toupin <dominique.toupin@ericsson.com>
15Aaron Spear <aaron_spear@mentor.com>
16Philippe Maisonneuve <Philippe.Maisonneuve@windriver.com>
17Felix Burton <Felix.Burton@windriver.com>
18Andrew McDermott <Andrew.McDermott@windriver.com>
cf59b300 19Frank Ch. Eigler <fche@redhat.com>
20Michel Dagenais <michel.dagenais@polymtl.ca>
21Stefan Hajnoczi <stefanha@gmail.com>
22Multi-Core Association Tool Infrastructure Workgroup
23 (http://www.multicore-association.org/workgroup/tiwg.php)
26* Trace Format Requirements
28 These are requirements on the trace format per se. This section discusses the
29layout of data in the trace, explaining the rationale behind the choices. The
30rationale for the trace format choices may refer to the tracer and trace
31analyzer requirements stated below. This section starts by presenting the common
32trace model, and then specifies the requirements of an instance of this model
33specifically tailored to efficient kernel- and user-space tracing requirements.
361) Architecture
38This high-level model is meant to be an industry-wide, common model, fulfilling
39the tracing requirements. It is meant to be application-, architecture-, and
421.1) Core model
44- Event
46An event is an information record contained within the trace.
48 - Events must be in physical order within a section. Their physical position
49 relative to other events within the section specify their order relative to
50 other events within the same section.
51 - Event type (numeric identifier: maps to metadata)
52 - Unique ID assigned within a section.
53 - Event payload
54 - Variable event size
55 - Size limitations: maximum event size should be configurable.
56 - Size information available through metadata.
57 - Support various data alignment for architectures, standards, and
58 languages:
59 - Natural alignment of data for architectures with slow non-aligned
60 writes.
61 - Packed layout of headers for architecture with efficient non-aligned
62 writes.
64- Section
66A section within the trace can be thought of as the ELF sections in a ELF
67binary. They contain a sequence of physically contiguous event records.
69 - Multi-level section identifier
70 - e.g.: section name / CPU number
71 - Contains a subset of event types
73The parallel with ELF sections is used here to conceptually demonstrate the idea
74of section, but the similarity stops there. A trace is peculiar in that we have
75to continuously append to each sections, and we need to have ideally no
76interaction between sections. Therefore, for storage, recording all sections
77into a single file is not recommended; a directory made of one file per section
78is better suited.
81- Metadata
83Metadata is the description of the setting of the environment of the
84application. Defines the basic types of the domains. Will define the mapping
85between the event, and the type of the event fields. The metadata scope (what it
86describes) is a whole trace, which consists of one or many sections.
88The metadata can be either contained in the trace (better usability for telecom
89scenarios) or added alongside the trace data by a separate module (for DSP
90scenarios). Metadata checksumming (only for statically generated metadata)
91and/or versioning can be used to ensure consistency between sections and
92metadata in the latter.
94 - Trace version
95 - Major number (increment breaks compabilility)
96 - Minor number (increment keeps compatibility)
97 - Describe the invariant properties of the environment where the trace was
98 generated.
99 - Contain unique domain identifier (kernel, process ID and timestamp,
100 hypervisor)
101 - Describes the runtime environment.
102 - Report target bitness
103 - Report target byte order
104 - Data types (see section 1.2 Extensions below)
105 - Architecture-agnostic (text-based)
106 - Ought to be parsed with a regular grammar
107 - Mapping to event types, e.g. (section, event) tuples, with:
108 ( section identifier, event numerical identifier )
109 - Description of event context fields (per section)
110 - Can be streamed along with the trace as a trace section
111 - Support dynamic addition of new event types while trace is active (required
112 to support module/shared object loading and dynamic probes)
113 - Metadata section should be efficient and reliable. Additional information
114 could be kept in separate sections, outside of metadata.
115 - Metadata description language not imposed by standard
116 - Metadata format identifier placed at the beginning of the metadata.
1191.2) Extensions (optional capabilities)
121- Event
122 - Optional context (thread id, virtual cpu id, execution mode (irq/bh/thread),
123 CPU/board/node id, event ordering identifier, timestamp,
124 current hardware performance counter information, event
125 size)
126 - Optional ordering capability across sections:
127 - Ordering identifier required for trace containing many event streams
128 - Either timestamp-based or based on unique sequence numbers
129 - Optional time-flow capability: per-event timestamps
130 - It should be possible to have context information only in some event records
131 within a section. E.g., timestamp written every few events.
133- Section
134 - Optional context applying to all events contained in that section
135 (thread id, virtual cpu id, execution mode (irq/bh/thread), CPU/board/node
136 id)
137 - Support piece-wise compression
138 - Support checksumming
140- Metadata
141 - Execution environment information
142 - Data types available: integer, strings, arrays, sequence, floats,
143 structures, maps (aka enumerations), bitfields, ...
144 - Describe type alignment.
145 - Describe type size.
146 - Describe type signedness.
147 - Other type examples:
148 - gcc "vector" type. (packed data)
149 http://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html
150 - gcc complex type (e.g. complex short, float, double...)
151 - gcc _Fract and _Accum http://gcc.gnu.org/wiki/FixedPointArithmetic
152 http://gcc.gnu.org/onlinedocs/gcc/Fixed_002dPoint.html
153 - Describes trace capabilities, for instance:
154 - Event ordering across sections
155 - Time flow information
156 - In event header
157 - Or possibly payload of pre-specified sections and/or events
158 - Ability to perform event ordering across traces
160 - Optional per-event "current state tracking" information.
162 This per-event taxonomy allows automated creation of a state machine that
163 keeps track of state updates within the taxonomy tree.
165 Described in an file-system path-like taxonomy with additional []
166 operator which indicates a lookup by value, e.g.:
168 * For events in the trace stream updating the current state only based on
169 information known from the context (either derived from the per-section or
170 per-event context information):
172 E.g., associated with a scheduling change event:
174 "cpu[section/cpu]/thread = field/next_pid"
175 Updates the current value of the current section's cpu "thread" attribute
176 (e.g. currently running thread).
178 E.g., associated with a system call:
180 "thread[cpu[section/cpu]/thread]/syscall[field/syscall_id]/id
181 = field/syscall_id"
183 Updates the state value of the current thread "syscall" attribute.
185 * For events in the trace stream targeting a path that depends on other
186 fields into that same event (would be common for full system state dump at
187 trace start):
189 E.g., associated with a thread listing event:
190 "thread[field/pid]/pid = field/pid"
192 E.g., associated with a thread memory maps listing event:
193 "thread[field/pid]/mmap[field/address]/address = field/address"
194 "thread[field/pid]/mmap[field/address]/end = field/end"
195 "thread[field/pid]/mmap[field/address]/flags = field/flags"
196 "thread[field/pid]/mmap[field/address]/pgoff = field/pgoff"
197 "thread[field/pid]/mmap[field/address]/inode = field/inode"
199 All per-event context information (e.g. repeating the current PID and CPU
200 for each event) can be represented with this taxonomy, e.g., in the
201 section description:
203 "section/pid = field/pid"
204 "section/cpu = field/cpu"
2072) Linux-specific Model
209 (Linux instance, specific to the reference implementation)
211Instance of the model specifically tailored to the Linux kernel and C
212programs/libraries requirements. Allows for either packed events, or events
213aligned following the ISO/C standard.
215- Event
216 - Payload
217 - Initially support ISO C naturally aligned and packed type layouts.
219- Each section represented as a trace stream (typically 1 trace stream per cpu
220 per section) to allow the tracer to easily append to these sections.
221 Identifier: section name / CPU ID
222 Each section has a CPU ID identifier in its context information.
224- Trace stream
225 - Should have no hard-coded limit on size of a file generated by saving the
226 trace stream (64 bit file position is fine)
227 - Event lost count should be localized. It should apply to a limited time
228 interval and to a tracefile, hence to a specific section, so the trace
229 analyzer can provide basic information about what kind of events were lost
230 and where they were lost in the trace.
231 - A stream is divided into packets, which each consists of one or many event
232 records.
233 - Should be optionally compressible piece-wise (packet per packet).
234 - Optional checksum on the packet content (except packet header), with a
235 selection of checksum algorithms. Performed on a per-packet basis.
236 - Packet headers should contain a sequence number to help UDP streaming
237 reassembly.
238 - Packet headers should be allowed to contain extra space reserved for
239 encapsulation into a UDP packet encapsulation without copy.
241- Compact representation
242 - Minimize the overhead in terms of disk/network/serial port/memory bandwidth.
243 - A compact representation can keep more information in smaller buffers,
244 thus needs less memory to keep the same amount of information around.
245 Also useful to improve cache locality in flight recorder mode.
247- Natural alignment of headers for architectures with slow non-aligned writes.
249- Packed layout of headers for architecture with efficient non-aligned writes.
251- Should have a 1 to 1 mapping between the memory buffers and the generated
252 trace files: allows zero-copy with splice().
254- Use target endianness
256- Portable across different host target (tracer)/host (analyzer) architectures
258- It should be possible to generate metadata from descriptions written in header
259 files (extraction with C preprocessor macros is one solution).
262* Requirements on the Tracers
264Higher-level tracer requirements that seem appropriate to support some of the
265trace format requirements stated above.
267Enumerating these higher-level requirements influence the trace format in many
268ways. For instance, a requirement for compactness leads to schemes where all
269information repetition should be eliminated. Thus the need for optional
270per-section context information. Another example is the requirement for speed
271and streaming. The requirement for speed ans treaming leads to zero-copy
272implementations, which imply that the trace format should be written natively by
273the tracer. The tracer requirements stated in this section are stated to ensure
274that the trace format structure makes it possible for a tracer to cope with the
275requirements, not to require that all tracer do so.
279- Low-overhead
280- Handle large trace throughput (multi-GB per minutes)
281- Scalable to high number of cores
282 - Per-cpu memory buffers
283 - Scalability and performance-aware synchronization
286- Environments without filesystem
287 - Need to buffer events in target RAM to send them in group a host for
288 analysis
289- Ability to tune the size of buffers and transmission medium to minimize the
290 impact on the traced system.
291- Streaming (live monitoring)
292 - Through sockets (USB, network)
293 - Through serial ports
294 - There must be a related protocol for streaming this event data.
296- Availability of flight recorder (synonym: overwrite) mode
297 - Exclusive ownership of reader data.
298 - Buffer size should be per group of events.
300- Output trace to disk
301- Trace buffers available in crash dump to allow post-mortem analysis
302- Fine-grained timestamps
304- Lockless (lock-free, ideally wait-free; aka starvation-free)
306- Buffer introspection: event written, read and lost counts.
308- Ability to iteratively narrow the level of details and traced time window
309 following an initial high level "state" overview provided by an initial trace
310 collecting everything.
312- Support kernel module instrumentation
314- Standard way(s) for a host to upload/access trace log data from a
315 target/JTAG device/simulator/etc.
317- Conditional tracing in kernel space.
319- Compatibility with power management subsystem (trace collection shall not be a
320 reason for waking up a device)
322- Well defined and stable trace configuration and control API across kernel
323 versions.
325- Create and run more than one trace session in parallel at the same time
326 - monitoring from system administrators
327 - field engineered to troubleshoot a specific problem
330* Trace Analyzer Requirements
332The trace analyzer requirements stated in this section are stated to ensure that
333the trace format structure makes it possible for a trace analyzer to cope with
334the requirements, not to require that all trace analyzers do so.
336- Ability to cope with huge traces (> 10 GB)
337- Should be possible to do a binary search on the file to find events by time
338 at least. (combined with smart indexing/ summary data perhaps)
339- File format should be as dense as possible, but not at the expense of
340 analysis performance (faster is more important than bigger since disks are
341 getting cheaper)
342- Must not be required to scan through all events in order to start
343 analyzing (by time anyway)
344- Support live viewing of trace streams
345- Standard description of a trace event context.
346 (PERI-XML calls it "Dimensions")
347- Manage system-wide event scoping with the following hierarchy:
348 (address space identifier, section name, event name)
This page took 0.034724 seconds and 4 git commands to generate.