Observed issue
==============
The lttng-tools port builds regularly fail on the CI with the following
error:
```
05-28 16:13:52.864 11219 11219 E PLUGIN/SRC.CTF.LTTNG-LIVE next_stream_iterator_for_trace@lttng-live.c:1067 [lttng-live] Message's timestamp is less than lttng-live's message iterator's last returned timestamp: lttng-live-msg-iter-addr=0xd810220dd30, ts=
1590696831733594090, last-msg-ts=
1590696832514058899
05-28 16:13:52.864 11219 11219 E PLUGIN/SRC.CTF.LTTNG-LIVE lttng_live_msg_iter_next@lttng-live.c:1502 [lttng-live] Error preparing the next batch of messages: live-iter-status=LTTNG_LIVE_ITERATOR_STATUS_ERROR
05-28 16:13:52.864 11219 11219 W LIB/MSG-ITER bt_message_iterator_next@iterator.c:868 Component input port message iterator's "next" method failed: iter-addr=0xd810221c7c0, iter-upstream-comp-name="lttng-live", iter-upstream-comp-log-level=WARNING, iter-upstream-comp-class-type=SOURCE, iter-upstream-comp-class-name="lttng-live", iter-upstream-comp-class-partial-descr="Connect to an LTTng relay daemon", iter-upstream-port-type=OUTPUT, iter-upstream-port-name="out", status=ERROR
05-28 16:13:52.864 11219 11219 W LIB/GRAPH consume_graph_sink@graph.c:466 Component's "consume" method failed: status=ERROR, comp-addr=0xd81022373a0, comp-name="sink.text.details", comp-log-level=WARNING, comp-class-type=SINK, comp-class-name="details", comp-class-partial-descr="Print messages with details.", comp-class-is-frozen=0, comp-class-so-handle-addr=0xd8102233ca0, comp-class-so-handle-path="/tmp/debug_jgalar/build/usr/lib/babeltrace2/plugins/babeltrace-plugin-text.so", comp-input-port-count=1, comp-output-port-count=0
05-28 16:13:52.865 11219 11219 E CLI cmd_run@babeltrace2.c:2530 Graph failed to complete successfully
ERROR: [Babeltrace CLI] (babeltrace2.c:2530)
Graph failed to complete successfully
CAUSED BY [libbabeltrace2] (graph.c:466)
Component's "consume" method failed: status=ERROR, comp-addr=0xd81022373a0,
comp-name="sink.text.details", comp-log-level=WARNING, comp-class-type=SINK,
comp-class-name="details", comp-class-partial-descr="Print messages with
details.", comp-class-is-frozen=0, comp-class-so-handle-addr=0xd8102233ca0,
comp-class-so-handle-path="/tmp/debug_jgalar/build/usr/lib/babeltrace2/plugins/babeltrace-plugin-text.so",
comp-input-port-count=1, comp-output-port-count=0
CAUSED BY [libbabeltrace2] (iterator.c:868)
Component input port message iterator's "next" method failed:
iter-addr=0xd810221c7c0, iter-upstream-comp-name="lttng-live",
iter-upstream-comp-log-level=WARNING, iter-upstream-comp-class-type=SOURCE,
iter-upstream-comp-class-name="lttng-live",
iter-upstream-comp-class-partial-descr="Connect to an LTTng relay daemon",
iter-upstream-port-type=OUTPUT, iter-upstream-port-name="out", status=ERROR
CAUSED BY [lttng-live: 'source.ctf.lttng-live'] (lttng-live.c:1502)
Error preparing the next batch of messages:
live-iter-status=LTTNG_LIVE_ITERATOR_STATUS_ERROR
CAUSED BY [lttng-live: 'source.ctf.lttng-live'] (lttng-live.c:1067)
Message's timestamp is less than lttng-live's message iterator's last returned
timestamp: lttng-live-msg-iter-addr=0xd810220dd30, ts=
1590696831733594090,
last-msg-ts=
1590696832514058899
```
The test that ends up with this error performs a session `clear`
while tracing an application as part of a "live" session.
Cause
=====
Given the following trace,
| pkt seq:0 |<-------discarded packets------>| pkt seq:8 |
0 20 121 140
a CTF source is expected to introduce a "Discarded Packets" message
between packets 0 and 8. The begin and end timestamps of such a message
are synthesized from the timestamps of the last known packet (seq:0)
and the newly decoded packet (seq:8).
For instance, here the Discarded Packets message would have the bounds
[pkt0.end_ts, pkt8.begin_ts].
In the context of a live source, the tracer could report an inactivity
period during the interval of the "Discarded packets" message.
Those live protocol messages eventually translate into an "Iterator
Inactivity" message with a timestamp set at the end of the inactivity
period.
If the tracer reports an inactivity period that ends at a point
between pkt0 and pkt8 (resulting in an "Iterator Inactivity" message),
the live source will send a "Discarded Packets" message that starts
before the preceding "Iterator Inactivity" message, breaking the
monotonicity constraint of the graph.
This happens regularly when `clear` is used on an LTTng live session as
it effectively discards packets, something that would otherwise not
happen (before LTTng 2.12) as most users set their tracing channels in
"discard" mode (the default for that session type).
However, the same problem as described above applies to "Discarded
Events" messages and could occur with older LTTng versions.
Solution
========
"Discarded Events" and Discarded Packets" messages are intercepted
in the lttng-live source's "muxer" and are _adjusted_ so that
they honor the monotonicity guarantee of the graph.
In effect, whenever such a message is seen to have a begin timestamp
that is _before_ the last inactivity timestamp, its begin timestamp
is adjusted to the last inactivity timestamp's value.
From a correctness standpoint, this is okay. If the tracer notified the
relay daemon of an inactivity period on a stream, we can rely on the
fact that no data was produced during that time to affirm that no
packets where lost during that time either.
A test reproducing the "Discarded Packets" scenatio described in
this message is added in a follow-up patch.
Known drawbacks
===============
None.
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: Ia8f32ba6717b33203637bf8d5aa34ff2a78c3e7f
Reviewed-on: https://review.lttng.org/c/babeltrace/+/3616
Tested-by: jenkins <jenkins@lttng.org>
return live_status;
}
+static
+bool is_discarded_packet_or_event_message(const bt_message *msg)
+{
+ const enum bt_message_type msg_type = bt_message_get_type(msg);
+
+ return msg_type == BT_MESSAGE_TYPE_DISCARDED_EVENTS ||
+ msg_type == BT_MESSAGE_TYPE_DISCARDED_PACKETS;
+}
+
+static
+enum lttng_live_iterator_status adjust_discarded_packets_message(
+ bt_self_message_iterator *iter,
+ const bt_stream *stream,
+ const bt_message *msg_in, bt_message **msg_out,
+ uint64_t new_begin_ts)
+{
+ enum lttng_live_iterator_status status = LTTNG_LIVE_ITERATOR_STATUS_OK;
+ enum bt_property_availability availability;
+ const bt_clock_snapshot *clock_snapshot;
+ uint64_t end_ts;
+ uint64_t count;
+
+ clock_snapshot = bt_message_discarded_packets_borrow_end_default_clock_snapshot_const(msg_in);
+ end_ts = bt_clock_snapshot_get_value(clock_snapshot);
+
+ availability = bt_message_discarded_packets_get_count(msg_in, &count);
+ BT_ASSERT_DBG(availability == BT_PROPERTY_AVAILABILITY_AVAILABLE);
+
+ *msg_out = bt_message_discarded_packets_create_with_default_clock_snapshots(
+ iter, stream, new_begin_ts, end_ts);
+ if (!*msg_out) {
+ status = LTTNG_LIVE_ITERATOR_STATUS_NOMEM;
+ goto end;
+ }
+
+ bt_message_discarded_packets_set_count(*msg_out, count);
+end:
+ return status;
+}
+
+static
+enum lttng_live_iterator_status adjust_discarded_events_message(
+ bt_self_message_iterator *iter,
+ const bt_stream *stream,
+ const bt_message *msg_in, bt_message **msg_out,
+ uint64_t new_begin_ts)
+{
+ enum lttng_live_iterator_status status = LTTNG_LIVE_ITERATOR_STATUS_OK;
+ enum bt_property_availability availability;
+ const bt_clock_snapshot *clock_snapshot;
+ uint64_t end_ts;
+ uint64_t count;
+
+ clock_snapshot = bt_message_discarded_events_borrow_end_default_clock_snapshot_const(msg_in);
+ end_ts = bt_clock_snapshot_get_value(clock_snapshot);
+
+ availability = bt_message_discarded_events_get_count(msg_in, &count);
+ BT_ASSERT_DBG(availability == BT_PROPERTY_AVAILABILITY_AVAILABLE);
+
+ *msg_out = bt_message_discarded_events_create_with_default_clock_snapshots(
+ iter, stream, new_begin_ts, end_ts);
+ if (!*msg_out) {
+ status = LTTNG_LIVE_ITERATOR_STATUS_NOMEM;
+ goto end;
+ }
+
+ bt_message_discarded_events_set_count(*msg_out, count);
+end:
+ return status;
+}
+
static
enum lttng_live_iterator_status next_stream_iterator_for_trace(
struct lttng_live_msg_iter *lttng_live_msg_iter,
if (curr_msg_ts_ns >= lttng_live_msg_iter->last_msg_ts_ns) {
stream_iter->current_msg = msg;
stream_iter->current_msg_ts_ns = curr_msg_ts_ns;
+ } else if (stream_iter->last_inactivity_ts > curr_msg_ts_ns &&
+ is_discarded_packet_or_event_message(msg)) {
+ /*
+ * The CTF message iterator emits Discarded
+ * Packets and Events with synthesized begin and
+ * end timestamps from the bounds of the last
+ * known packet and the newly decoded packet
+ * header.
+ *
+ * The CTF message iterator is not aware of
+ * stream inactivity beacons. Hence, we have
+ * to adjust the begin timestamp of those types
+ * of messages if a stream signaled its
+ * inactivity up until _after_ the last known
+ * packet's begin timestamp.
+ *
+ * Otherwise, the monotonicity guarantee would
+ * not be preserved.
+ */
+ const enum bt_message_type msg_type =
+ bt_message_get_type(msg);
+ enum lttng_live_iterator_status adjust_status =
+ LTTNG_LIVE_ITERATOR_STATUS_OK;
+ bt_message *adjusted_message;
+
+ switch (msg_type) {
+ case BT_MESSAGE_TYPE_DISCARDED_EVENTS:
+ adjust_status = adjust_discarded_events_message(
+ lttng_live_msg_iter->self_msg_iter,
+ stream_iter->stream,
+ msg, &adjusted_message,
+ stream_iter->last_inactivity_ts);
+ break;
+ case BT_MESSAGE_TYPE_DISCARDED_PACKETS:
+ adjust_status = adjust_discarded_packets_message(
+ lttng_live_msg_iter->self_msg_iter,
+ stream_iter->stream,
+ msg, &adjusted_message,
+ stream_iter->last_inactivity_ts);
+ break;
+ default:
+ bt_common_abort();
+ }
+
+ if (adjust_status != LTTNG_LIVE_ITERATOR_STATUS_OK) {
+ stream_iter_status = adjust_status;
+ goto end;
+ }
+
+ BT_ASSERT_DBG(adjusted_message);
+ stream_iter->current_msg = adjusted_message;
+ stream_iter->current_msg_ts_ns =
+ stream_iter->last_inactivity_ts;
} else {
/*
* We received a message in the past. To ensure