From 9d1103e6d180b0326ea55759b27ffc0391075e32 Mon Sep 17 00:00:00 2001 From: =?utf8?q?J=C3=A9r=C3=A9mie=20Galarneau?= Date: Wed, 14 Feb 2018 16:13:51 -0500 Subject: [PATCH] Fix: consumer socket lock not held during snapshot record MIME-Version: 1.0 Content-Type: text/plain; charset=utf8 Content-Transfer-Encoding: 8bit This missing lock was identified while stress-testing the snapshot tracing mode. The "post_mortem" test case would sometimes hang on a push_metadata() call waiting for a status reply from the consumer daemon. This test demonstrated a race that consists in killing an application and taking a snapshot near-simultaneously. This causes the app management thread to issue a "push metadata" command to the consumerd while the lttng client is issuing a snapshot record command. Since the snapshot record does not acquire the consumer socket lock, the "push metadata" and "snapshot" commands end-up mixed-up on the socket which ultimately causes the "apps management" thread to wait for a reply forever while holding the socket's lock. This prevents the client, invoked by the test script, from completing the "stop" operation on the session. Signed-off-by: Jérémie Galarneau --- src/bin/lttng-sessiond/consumer.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/bin/lttng-sessiond/consumer.c b/src/bin/lttng-sessiond/consumer.c index 251944606..126b01a44 100644 --- a/src/bin/lttng-sessiond/consumer.c +++ b/src/bin/lttng-sessiond/consumer.c @@ -1442,7 +1442,9 @@ int consumer_snapshot_channel(struct consumer_socket *socket, uint64_t key, } health_code_update(); + pthread_mutex_lock(socket->lock); ret = consumer_send_msg(socket, &msg); + pthread_mutex_unlock(socket->lock); if (ret < 0) { goto error; } -- 2.34.1