Fix: consumer socket lock not held during snapshot record
authorJérémie Galarneau <jeremie.galarneau@efficios.com>
Wed, 14 Feb 2018 21:13:51 +0000 (16:13 -0500)
committerJérémie Galarneau <jeremie.galarneau@efficios.com>
Thu, 15 Feb 2018 20:14:03 +0000 (15:14 -0500)
This missing lock was identified while stress-testing the
snapshot tracing mode.

The "post_mortem" test case would sometimes hang on a
push_metadata() call waiting for a status reply from the
consumer daemon.

This test demonstrated a race that consists in killing an
application and taking a snapshot near-simultaneously.

This causes the app management thread to issue a "push metadata"
command to the consumerd while the lttng client is issuing
a snapshot record command.

Since the snapshot record does not acquire the consumer socket lock,
the "push metadata" and "snapshot" commands end-up mixed-up on
the socket which ultimately causes the "apps management" thread
to wait for a reply forever while holding the socket's lock.

This prevents the client, invoked by the test script, from
completing the "stop" operation on the session.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
src/bin/lttng-sessiond/consumer.c

index 7c8f4e83093590e61ac4d4e565d8d0bc9ba7b5f1..cad1587d25bb3a85569be4c605f157a155981954 100644 (file)
@@ -1406,7 +1406,9 @@ int consumer_snapshot_channel(struct consumer_socket *socket, uint64_t key,
        }
 
        health_code_update();
+       pthread_mutex_lock(socket->lock);
        ret = consumer_send_msg(socket, &msg);
+       pthread_mutex_unlock(socket->lock);
        if (ret < 0) {
                goto error;
        }
This page took 0.026936 seconds and 5 git commands to generate.