From: Jérémie Galarneau Date: Thu, 4 Jun 2020 22:32:32 +0000 (-0400) Subject: Fix: source.ctf.lttng-live: muxing failure on clear X-Git-Tag: v2.0.4~18 X-Git-Url: http://git.efficios.com/?p=babeltrace.git;a=commitdiff_plain;h=141914ff5865087c6f69b400d187da9b62f586c3 Fix: source.ctf.lttng-live: muxing failure on clear Observed issue ============== The lttng-tools port builds regularly fail on the CI with the following error: ``` 05-28 16:13:52.864 11219 11219 E PLUGIN/SRC.CTF.LTTNG-LIVE next_stream_iterator_for_trace@lttng-live.c:1067 [lttng-live] Message's timestamp is less than lttng-live's message iterator's last returned timestamp: lttng-live-msg-iter-addr=0xd810220dd30, ts=1590696831733594090, last-msg-ts=1590696832514058899 05-28 16:13:52.864 11219 11219 E PLUGIN/SRC.CTF.LTTNG-LIVE lttng_live_msg_iter_next@lttng-live.c:1502 [lttng-live] Error preparing the next batch of messages: live-iter-status=LTTNG_LIVE_ITERATOR_STATUS_ERROR 05-28 16:13:52.864 11219 11219 W LIB/MSG-ITER bt_message_iterator_next@iterator.c:868 Component input port message iterator's "next" method failed: iter-addr=0xd810221c7c0, iter-upstream-comp-name="lttng-live", iter-upstream-comp-log-level=WARNING, iter-upstream-comp-class-type=SOURCE, iter-upstream-comp-class-name="lttng-live", iter-upstream-comp-class-partial-descr="Connect to an LTTng relay daemon", iter-upstream-port-type=OUTPUT, iter-upstream-port-name="out", status=ERROR 05-28 16:13:52.864 11219 11219 W LIB/GRAPH consume_graph_sink@graph.c:466 Component's "consume" method failed: status=ERROR, comp-addr=0xd81022373a0, comp-name="sink.text.details", comp-log-level=WARNING, comp-class-type=SINK, comp-class-name="details", comp-class-partial-descr="Print messages with details.", comp-class-is-frozen=0, comp-class-so-handle-addr=0xd8102233ca0, comp-class-so-handle-path="/tmp/debug_jgalar/build/usr/lib/babeltrace2/plugins/babeltrace-plugin-text.so", comp-input-port-count=1, comp-output-port-count=0 05-28 16:13:52.865 11219 11219 E CLI cmd_run@babeltrace2.c:2530 Graph failed to complete successfully ERROR: [Babeltrace CLI] (babeltrace2.c:2530) Graph failed to complete successfully CAUSED BY [libbabeltrace2] (graph.c:466) Component's "consume" method failed: status=ERROR, comp-addr=0xd81022373a0, comp-name="sink.text.details", comp-log-level=WARNING, comp-class-type=SINK, comp-class-name="details", comp-class-partial-descr="Print messages with details.", comp-class-is-frozen=0, comp-class-so-handle-addr=0xd8102233ca0, comp-class-so-handle-path="/tmp/debug_jgalar/build/usr/lib/babeltrace2/plugins/babeltrace-plugin-text.so", comp-input-port-count=1, comp-output-port-count=0 CAUSED BY [libbabeltrace2] (iterator.c:868) Component input port message iterator's "next" method failed: iter-addr=0xd810221c7c0, iter-upstream-comp-name="lttng-live", iter-upstream-comp-log-level=WARNING, iter-upstream-comp-class-type=SOURCE, iter-upstream-comp-class-name="lttng-live", iter-upstream-comp-class-partial-descr="Connect to an LTTng relay daemon", iter-upstream-port-type=OUTPUT, iter-upstream-port-name="out", status=ERROR CAUSED BY [lttng-live: 'source.ctf.lttng-live'] (lttng-live.c:1502) Error preparing the next batch of messages: live-iter-status=LTTNG_LIVE_ITERATOR_STATUS_ERROR CAUSED BY [lttng-live: 'source.ctf.lttng-live'] (lttng-live.c:1067) Message's timestamp is less than lttng-live's message iterator's last returned timestamp: lttng-live-msg-iter-addr=0xd810220dd30, ts=1590696831733594090, last-msg-ts=1590696832514058899 ``` The test that ends up with this error performs a session `clear` while tracing an application as part of a "live" session. Cause ===== Given the following trace, | pkt seq:0 |<-------discarded packets------>| pkt seq:8 | 0 20 121 140 a CTF source is expected to introduce a "Discarded Packets" message between packets 0 and 8. The begin and end timestamps of such a message are synthesized from the timestamps of the last known packet (seq:0) and the newly decoded packet (seq:8). For instance, here the Discarded Packets message would have the bounds [pkt0.end_ts, pkt8.begin_ts]. In the context of a live source, the tracer could report an inactivity period during the interval of the "Discarded packets" message. Those live protocol messages eventually translate into an "Iterator Inactivity" message with a timestamp set at the end of the inactivity period. If the tracer reports an inactivity period that ends at a point between pkt0 and pkt8 (resulting in an "Iterator Inactivity" message), the live source will send a "Discarded Packets" message that starts before the preceding "Iterator Inactivity" message, breaking the monotonicity constraint of the graph. This happens regularly when `clear` is used on an LTTng live session as it effectively discards packets, something that would otherwise not happen (before LTTng 2.12) as most users set their tracing channels in "discard" mode (the default for that session type). However, the same problem as described above applies to "Discarded Events" messages and could occur with older LTTng versions. Solution ======== "Discarded Events" and Discarded Packets" messages are intercepted in the lttng-live source's "muxer" and are _adjusted_ so that they honor the monotonicity guarantee of the graph. In effect, whenever such a message is seen to have a begin timestamp that is _before_ the last inactivity timestamp, its begin timestamp is adjusted to the last inactivity timestamp's value. From a correctness standpoint, this is okay. If the tracer notified the relay daemon of an inactivity period on a stream, we can rely on the fact that no data was produced during that time to affirm that no packets where lost during that time either. A test reproducing the "Discarded Packets" scenatio described in this message is added in a follow-up patch. Known drawbacks =============== None. Signed-off-by: Jérémie Galarneau Change-Id: Ia8f32ba6717b33203637bf8d5aa34ff2a78c3e7f Reviewed-on: https://review.lttng.org/c/babeltrace/+/3616 Tested-by: jenkins --- diff --git a/src/plugins/ctf/lttng-live/lttng-live.c b/src/plugins/ctf/lttng-live/lttng-live.c index 66e20b73..04464c10 100644 --- a/src/plugins/ctf/lttng-live/lttng-live.c +++ b/src/plugins/ctf/lttng-live/lttng-live.c @@ -1001,6 +1001,77 @@ end: return live_status; } +static +bool is_discarded_packet_or_event_message(const bt_message *msg) +{ + const enum bt_message_type msg_type = bt_message_get_type(msg); + + return msg_type == BT_MESSAGE_TYPE_DISCARDED_EVENTS || + msg_type == BT_MESSAGE_TYPE_DISCARDED_PACKETS; +} + +static +enum lttng_live_iterator_status adjust_discarded_packets_message( + bt_self_message_iterator *iter, + const bt_stream *stream, + const bt_message *msg_in, bt_message **msg_out, + uint64_t new_begin_ts) +{ + enum lttng_live_iterator_status status = LTTNG_LIVE_ITERATOR_STATUS_OK; + enum bt_property_availability availability; + const bt_clock_snapshot *clock_snapshot; + uint64_t end_ts; + uint64_t count; + + clock_snapshot = bt_message_discarded_packets_borrow_end_default_clock_snapshot_const(msg_in); + end_ts = bt_clock_snapshot_get_value(clock_snapshot); + + availability = bt_message_discarded_packets_get_count(msg_in, &count); + BT_ASSERT_DBG(availability == BT_PROPERTY_AVAILABILITY_AVAILABLE); + + *msg_out = bt_message_discarded_packets_create_with_default_clock_snapshots( + iter, stream, new_begin_ts, end_ts); + if (!*msg_out) { + status = LTTNG_LIVE_ITERATOR_STATUS_NOMEM; + goto end; + } + + bt_message_discarded_packets_set_count(*msg_out, count); +end: + return status; +} + +static +enum lttng_live_iterator_status adjust_discarded_events_message( + bt_self_message_iterator *iter, + const bt_stream *stream, + const bt_message *msg_in, bt_message **msg_out, + uint64_t new_begin_ts) +{ + enum lttng_live_iterator_status status = LTTNG_LIVE_ITERATOR_STATUS_OK; + enum bt_property_availability availability; + const bt_clock_snapshot *clock_snapshot; + uint64_t end_ts; + uint64_t count; + + clock_snapshot = bt_message_discarded_events_borrow_end_default_clock_snapshot_const(msg_in); + end_ts = bt_clock_snapshot_get_value(clock_snapshot); + + availability = bt_message_discarded_events_get_count(msg_in, &count); + BT_ASSERT_DBG(availability == BT_PROPERTY_AVAILABILITY_AVAILABLE); + + *msg_out = bt_message_discarded_events_create_with_default_clock_snapshots( + iter, stream, new_begin_ts, end_ts); + if (!*msg_out) { + status = LTTNG_LIVE_ITERATOR_STATUS_NOMEM; + goto end; + } + + bt_message_discarded_events_set_count(*msg_out, count); +end: + return status; +} + static enum lttng_live_iterator_status next_stream_iterator_for_trace( struct lttng_live_msg_iter *lttng_live_msg_iter, @@ -1072,6 +1143,59 @@ enum lttng_live_iterator_status next_stream_iterator_for_trace( if (curr_msg_ts_ns >= lttng_live_msg_iter->last_msg_ts_ns) { stream_iter->current_msg = msg; stream_iter->current_msg_ts_ns = curr_msg_ts_ns; + } else if (stream_iter->last_inactivity_ts > curr_msg_ts_ns && + is_discarded_packet_or_event_message(msg)) { + /* + * The CTF message iterator emits Discarded + * Packets and Events with synthesized begin and + * end timestamps from the bounds of the last + * known packet and the newly decoded packet + * header. + * + * The CTF message iterator is not aware of + * stream inactivity beacons. Hence, we have + * to adjust the begin timestamp of those types + * of messages if a stream signaled its + * inactivity up until _after_ the last known + * packet's begin timestamp. + * + * Otherwise, the monotonicity guarantee would + * not be preserved. + */ + const enum bt_message_type msg_type = + bt_message_get_type(msg); + enum lttng_live_iterator_status adjust_status = + LTTNG_LIVE_ITERATOR_STATUS_OK; + bt_message *adjusted_message; + + switch (msg_type) { + case BT_MESSAGE_TYPE_DISCARDED_EVENTS: + adjust_status = adjust_discarded_events_message( + lttng_live_msg_iter->self_msg_iter, + stream_iter->stream, + msg, &adjusted_message, + stream_iter->last_inactivity_ts); + break; + case BT_MESSAGE_TYPE_DISCARDED_PACKETS: + adjust_status = adjust_discarded_packets_message( + lttng_live_msg_iter->self_msg_iter, + stream_iter->stream, + msg, &adjusted_message, + stream_iter->last_inactivity_ts); + break; + default: + bt_common_abort(); + } + + if (adjust_status != LTTNG_LIVE_ITERATOR_STATUS_OK) { + stream_iter_status = adjust_status; + goto end; + } + + BT_ASSERT_DBG(adjusted_message); + stream_iter->current_msg = adjusted_message; + stream_iter->current_msg_ts_ns = + stream_iter->last_inactivity_ts; } else { /* * We received a message in the past. To ensure