Fix: relayd: live: data is missing between viewer attach and retry
authorJonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Thu, 6 May 2021 15:14:52 +0000 (11:14 -0400)
committerJérémie Galarneau <jeremie.galarneau@efficios.com>
Fri, 7 May 2021 17:37:57 +0000 (13:37 -0400)
commitc876d657fb08a91ca7c907b92f1b7604aee664ee
treef4483ad19ee35860ff6fbb414fbd2603c3cfeba6
parent5c45dcdaa5b352ba03074a5f472ad6988ea4fcda
Fix: relayd: live: data is missing between viewer attach and retry

Observed issue
==============

Data produced between the time a live viewer attach and a viewer
subsequent "polling" is missing from the point of view of the viewer.

Using the following reproducer:

  # A lttng-sessiond is already running
  lttng-relayd -b -vvv > relayd.log 2>&1
  lttng create live --live
  lttng enable-event -u "hello_world:*"
  lttng start

  date
  # Set the "polling" interval at ~30 seconds
  babeltrace2 --retry-duration=30000000 -i lttng-live net://localhost/host/$HOSTNAME/live &
  pid=$!

  # Make sure we are in the retry phase of bt2
  date
  sleep 2

  # Produce events
  # Note here that hello perform a sleep of 1 seconds before generating
  # 10 events to make sure lttng-ust had time to register. A total of
  # 100 events is generated.
  date
  for i in `seq 1 10`; do
   ./hello
  done

  # Here we expect to see 100 events outputted at some point (~18
  # seconds (30 - 2 - 10) from that time).
  # No events are received by babeltrace2.
  date
  sleep 40
  date

  # We retry the same exercise a second time.
  echo "Moving to phase 2"
  date
  for i in `seq 1 10`; do
   ./hello
  done

  # Here data is received correctly.
  date
  sleep 40
  date

  kill $pid
  wait

  lttng destroy live
  pkill lttng-relayd

Cause
=====

At the moment the viewer attach is done, no streams exist. On the
following viewer_get_new_streams (~30 seconds later), streams now exist
since they were allocated lazily following the first application
registration, but LTTNG_VIEWER_SEEK_LAST is used. This essentially
"discards" any events in the stream that happened between the attach
time and the viewer_get_new_stream execution.

Note that the same problem would happen if a new UID (in per-uid mode)
would happen to begin tracing in between viewer retry.

Solution
========

Always use `LTTNG_VIEWER_SEEK_BEGINNING` during
`viewer_get_new_streams`.

Known drawbacks
=========

This does not fix the per-pid problem where an app run and die in between
the viewer retries.

References
==========

https://lists.lttng.org/pipermail/lttng-dev/2021-May/029953.html

Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Change-Id: I92fbe6511fb93e31d20922b4eef5c9a221e07716
src/bin/lttng-relayd/live.c
This page took 0.028733 seconds and 5 git commands to generate.