Fix: extraneous empty/inactive flush on rotation out of a trace chunk
authorJérémie Galarneau <jeremie.galarneau@efficios.com>
Fri, 31 Jul 2020 15:20:40 +0000 (11:20 -0400)
committerJérémie Galarneau <jeremie.galarneau@efficios.com>
Mon, 3 Aug 2020 17:31:22 +0000 (13:31 -0400)
Observed issue
==============

A test (tests/regression/tools/tracefile-limits/test_tracefile_count)
occasionally fails on ppc64. The trace validation steps in the fails in
the case where the trace file count limit is set to 1.

Examining the resulting trace shows that the last packet of data
produced by the test application appears to be missing.

The test case enables a channel in "overwrite" mode. Normally, this
would guarantee that the last data produced will always be available in
the resulting trace.

Cause
=====

An empty/inactive flush is performed when rotating "out" of a non-null
trace chunk to ensure that the trace chunk contained at least one
packet.

Looking at the test's resulting trace and by following the consumerd
logs, we see that the test application runs on one CPU for most of its
lifetime. The stream file is repeatedly replaced to make room for the
latest data.

Eventually, the application is migrated to another CPU. A number of
packets are written to this new stream. The session is then stopped
which causes an active flush to occur to close the current packet of
all streams (see `ust_app_stop_trace_all`).

Then, when the session is destroyed, an empty/inactive flush is
performed to ensure that at least one packet was produced in the current
trace chunk [1].

At the moment of writing this empty packet, the consumer daemon sees
that there is not enough space left in the stream file to honour the
trace file size restriction. It thus overwrites the file resulting in
the loss of the last events to replace them with the empty "end of
chunk" packet that occupies a single page.

While the problem is not specific to PowerPC 64, it has a lot more
chances to occur there as pages are typically configured to be of 64kb
length. Due to current implementation limitations, empty packets have
a size of one page.

In other words, 4kb pages typically fit in the space left in the file,
causing the problem to not be easily reproducible on x64.

Note that while the file size limit is specified as "3 * PAGE_SIZE" in
the test, it is rounded-up to 512kb to accomodate at least one
sub-buffer.

Solution
========

[1] This empty/inactive flush is no longer necessary since f96af312b
as an "open packet" (which performs an empty/inactive flush) is
performed when a stream enters a non-null trace chunk. There is no
concern that a trace chunk will be left empty unless this initial flush
fails (see patch comments and work-around).

The empty flush that was performed for data streams is converted
into an active flush under most circumstances; the packet is simply
closed.

Known drawbacks
===============

None.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I5602b7ab8318374f75060489cf9c27af4e058805

src/common/consumer/consumer.c

index 8d9fa06478cddc7a83904e76664aa21550cf527c..2cd385545ab357faa7faeead6c770db877dc8e97 100644 (file)
@@ -3909,6 +3909,36 @@ unsigned long consumer_get_consume_start_pos(unsigned long consumed_pos,
        return start_pos;
 }
 
+/* Stream lock must be held by the caller. */
+static int sample_stream_positions(struct lttng_consumer_stream *stream,
+                           unsigned long *produced, unsigned long *consumed)
+{
+       int ret;
+
+       ASSERT_LOCKED(stream->lock);
+
+       ret = lttng_consumer_sample_snapshot_positions(stream);
+       if (ret < 0) {
+               ERR("Failed to sample snapshot positions");
+               goto end;
+       }
+
+       ret = lttng_consumer_get_produced_snapshot(stream, produced);
+       if (ret < 0) {
+               ERR("Failed to sample produced position");
+               goto end;
+       }
+
+       ret = lttng_consumer_get_consumed_snapshot(stream, consumed);
+       if (ret < 0) {
+               ERR("Failed to sample consumed position");
+               goto end;
+       }
+
+end:
+       return ret;
+}
+
 /*
  * Sample the rotate position for all the streams of a channel. If a stream
  * is already at the rotate position (produced == consumed), we flag it as
@@ -3971,19 +4001,97 @@ int lttng_consumer_rotate_channel(struct lttng_consumer_channel *channel,
                }
 
                /*
-                * Do not flush an empty packet when rotating from a NULL trace
+                * Do not flush a packet when rotating from a NULL trace
                 * chunk. The stream has no means to output data, and the prior
-                * rotation which rotated to NULL performed that side-effect already.
+                * rotation which rotated to NULL performed that side-effect
+                * already. No new data can be produced when a stream has no
+                * associated trace chunk (e.g. a stop followed by a rotate).
                 */
                if (stream->trace_chunk) {
+                       bool flush_active;
+
+                       if (stream->metadata_flag) {
+                               /*
+                                * Don't produce an empty metadata packet,
+                                * simply close the current one.
+                                *
+                                * Metadata is regenerated on every trace chunk
+                                * switch; there is no concern that no data was
+                                * produced.
+                                */
+                               flush_active = true;
+                       } else {
+                               /*
+                                * Only flush an empty packet if the "packet
+                                * open" could not be performed on transition
+                                * to a new trace chunk and no packets were
+                                * consumed within the chunk's lifetime.
+                                */
+                               if (stream->opened_packet_in_current_trace_chunk) {
+                                       flush_active = true;
+                               } else {
+                                       /*
+                                        * Stream could have been full at the
+                                        * time of rotation, but then have had
+                                        * no activity at all.
+                                        *
+                                        * It is important to flush a packet
+                                        * to prevent 0-length files from being
+                                        * produced as most viewers choke on
+                                        * them.
+                                        *
+                                        * Unfortunately viewers will not be
+                                        * able to know that tracing was active
+                                        * for this stream during this trace
+                                        * chunk's lifetime.
+                                        */
+                                       ret = sample_stream_positions(stream, &produced_pos, &consumed_pos);
+                                       if (ret) {
+                                               goto end_unlock_stream;
+                                       }
+
+                                       /*
+                                        * Don't flush an empty packet if data
+                                        * was produced; it will be consumed
+                                        * before the rotation completes.
+                                        */
+                                       flush_active = produced_pos != consumed_pos;
+                                       if (!flush_active) {
+                                               enum lttng_trace_chunk_status chunk_status;
+                                               const char *trace_chunk_name;
+                                               uint64_t trace_chunk_id;
+
+                                               chunk_status = lttng_trace_chunk_get_name(
+                                                               stream->trace_chunk,
+                                                               &trace_chunk_name,
+                                                               NULL);
+                                               if (chunk_status == LTTNG_TRACE_CHUNK_STATUS_NONE) {
+                                                       trace_chunk_name = "none";
+                                               }
+
+                                               /*
+                                                * Consumer trace chunks are
+                                                * never anonymous.
+                                                */
+                                               chunk_status = lttng_trace_chunk_get_id(
+                                                               stream->trace_chunk,
+                                                               &trace_chunk_id);
+                                               assert(chunk_status ==
+                                                               LTTNG_TRACE_CHUNK_STATUS_OK);
+
+                                               DBG("Unable to open packet for stream during trace chunk's lifetime. "
+                                                               "Flushing an empty packet to prevent an empty file from being created: "
+                                                               "stream id = %" PRIu64 ", trace chunk name = `%s`, trace chunk id = %" PRIu64,
+                                                               stream->key, trace_chunk_name, trace_chunk_id);
+                                       }
+                               }
+                       }
+
                        /*
-                        * For metadata stream, do an active flush, which does not
-                        * produce empty packets. For data streams, empty-flush;
-                        * ensures we have at least one packet in each stream per trace
-                        * chunk, even if no data was produced.
+                        * Close the current packet before sampling the
+                        * ring buffer positions.
                         */
-                       ret = consumer_stream_flush_buffer(
-                                       stream, stream->metadata_flag ? 1 : 0);
+                       ret = consumer_stream_flush_buffer(stream, flush_active);
                        if (ret < 0) {
                                ERR("Failed to flush stream %" PRIu64 " during channel rotation",
                                                stream->key);
This page took 0.030627 seconds and 5 git commands to generate.