Backport: Fix: relayd streams can be leaked on connection error
authorJérémie Galarneau <jeremie.galarneau@efficios.com>
Wed, 21 Feb 2018 05:57:26 +0000 (00:57 -0500)
committerJonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Wed, 27 Jun 2018 20:47:04 +0000 (16:47 -0400)
commita8cad46163850205f0dc2607e361a63b4fab3f7a
tree765450beb7ee9a738d4d11cf6e7896708ceba169
parentc1591e37fbac83d8d12519b3c15e9ba13934b25a
Backport: Fix: relayd streams can be leaked on connection error

There are cases where a connection error can cause streams to be
leaked.

For instance, the control connection could receive an index and
close. Since a packet is in-flight, the stream corresponding to
that index will not close. However, nothing guarantees that
the data connection will be able to receive the packet's data.

If the protocol is respected, this is not a problem. However,
a buggy consumerd or network errors can cause the streams to
remain in the "data in-flight" state and never close.

To mitigate a case observed in the field where a consumerd
would be forcibly closed (network interface brought down) and
cause leaks on the relay daemon, the session is aborted whenever
the control or data connection encounters an error. Aborting
a session causes the streams to be closed regardless of the
fact that data is in-flight.

Currently, only the control connection holds an ownership of
the session object. This can cause the following scenario to leak
streams:

1) Control connection receives an index
  - Stream is put in "in-flight data" mode
2) Control connection is closed/shutdown cleanly
  - try_stream_close refuses to close the stream as data is in-flight,
    but it puts the stream in "closed" mode. When the data is
    received, the stream will be closed as soon as possible.
3) Data connection closes cleanly or due to an error
  - The stream "closing" condition will never be re-evaluated.

Since the data connection has no ownership of the session, it can
never clean-up the streams that are waiting for "in-flight" data to
arrive before closing.

This patch lazily associates the data connection to its session
so that the session can be aborted whenever an error happens on
either the data or control connection.

Note that this leaves the relayd vulnerable to a case which will
still leak. If the control connection receives an index and closes
cleanly, the data connection could have never been established
with the consumer daemon and result in a leak.

Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
18 files changed:
src/bin/lttng-relayd/Makefile.am
src/bin/lttng-relayd/cmd-2-1.c
src/bin/lttng-relayd/cmd-2-1.h
src/bin/lttng-relayd/cmd-2-2.c
src/bin/lttng-relayd/cmd-2-2.h
src/bin/lttng-relayd/cmd-2-4.c
src/bin/lttng-relayd/cmd-2-4.h
src/bin/lttng-relayd/cmd-generic.c [deleted file]
src/bin/lttng-relayd/cmd-generic.h [deleted file]
src/bin/lttng-relayd/cmd.h
src/bin/lttng-relayd/connection.c
src/bin/lttng-relayd/connection.h
src/bin/lttng-relayd/main.c
src/bin/lttng-relayd/session.c
src/bin/lttng-relayd/stream.c
src/common/defaults.h
src/common/sessiond-comm/inet.c
src/common/sessiond-comm/inet6.c
This page took 0.028801 seconds and 5 git commands to generate.