From: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Date: Fri, 21 Sep 2018 22:16:05 +0000 (-0400)
Subject: Fix: rotation may never complete in per-PID buffering mode
X-Git-Url: http://git.efficios.com/?p=lttng-tools.git;a=commitdiff_plain;h=92816cc33a1add3c8276839bd6335e17423577dd;hp=92816cc33a1add3c8276839bd6335e17423577dd

Fix: rotation may never complete in per-PID buffering mode

Issue
-----

The current scheme to ensure that a rotation is completed
consists in the following, from the session daemon's perspective:

Iterate on all channels:
  - Ask the consumerd to sample the current "write" positions
  - Increment a count of channels being rotated

Wait for the consumer daemon to notify the session daemon every time
a channel's streams's "read" position have all reached the sampled
"write" position.

The idea behind this is making sure that all the data that was
produced before a rotation was triggered has been consumed (i.e.
been written to a local FS or streamed to the relay daemon) before
marking the rotation as completed.

However, this assumes that the session daemon is always aware of
all channels/streams that exist at the moment at which the rotation is
initiated. This is only true for the kernel domain.

In per-PID buffer mode, it is possible for an application, and its
buffers, to be torn down at any moment. Thus the following scenario
can happen:

- The application fills its buffers, causing the consumerd to fall
  behind
- The application exits, leaving its full buffers behind to be
  extracted by the consumer daemon
- The session daemon removes anything to do with the application from
  its internal structures, including its channels
- A rotation is initiated
- The positions of the application's buffers are never sampled as the
  session daemon does not see the channels when iterating on the
  session's channels

Multiple bad things can happen from there.

First, the rotation can be marked as "completed" while the consumerd
is still exctracting the dead application's buffers, causing readers
to consume an incomplete/corrupted trace.

Second, if the session is being streamed to a relay daemon, it is
possible for the 'rename' command to be issued before the contents
of the buffers has been written causing indexes to fail to be
flushed (as the relay daemon attempts to write them to a now-defunct
location).

Solution
--------

Eliminate the pipe between the session daemon and consumer daemon that
is used to signify that a rotation is completed as the information is
unreliable.

The rotation thread now periodically asks the consumer daemon to check
for channels that have a pending rotation for a given session_id or
that belong to the ongoing rotation archive id.

Hence, for every stream:
  - If the archive id during which it was created is '>' than that of
    the ongoing rotation, we don't need to consider it
  - If the current position is '>=' than the sampled rotation position,
    we can consider its rotation 'done'
  - If it belongs to the pending rotation archive id and doesn't have
    a "target" position, it was unknown to the session daemon and the
    application associated with it is dead. We must wait for the
    stream to be flushed and torn down before assuming that the
    rotation was completed.

Drawbacks
---------

This polling approach is somewhat inefficient and can cause rotations
to take longer to complete than necessary, especially in high-latency
networking conditions.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
---