lttng-tools.git
5 years agoRelayd: disallow-clear option parsing clear-base
Jonathan Rajotte [Mon, 11 Feb 2019 17:53:08 +0000 (12:53 -0500)] 
Relayd: disallow-clear option parsing

Parse LTTNG_RELAYD_DISALLOW_CLEAR env variable if present.
LTTNG_RELAYD_DISALLOW_CLEAR have priority on the command line argument.

Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
5 years agoUpdate lttng_clear_session relevant error code return
Jonathan Rajotte [Mon, 11 Feb 2019 18:49:57 +0000 (13:49 -0500)] 
Update lttng_clear_session relevant error code return

Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
5 years agoUst-consumer: Add channel key to error msg on channel clear
Jonathan Rajotte [Mon, 11 Feb 2019 18:46:46 +0000 (13:46 -0500)] 
Ust-consumer: Add channel key to error msg on channel clear

Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
5 years agoDoc: lttng clear man page
Jonathan Rajotte [Mon, 11 Feb 2019 17:19:58 +0000 (12:19 -0500)] 
Doc: lttng clear man page

Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
5 years agoTest: mi for clear command
Jonathan Rajotte [Mon, 11 Feb 2019 16:45:27 +0000 (11:45 -0500)] 
Test: mi for clear command

Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
5 years agoFix: kernel metadata is generated on the fly for snapshot sessions
Jonathan Rajotte [Fri, 8 Feb 2019 02:30:11 +0000 (21:30 -0500)] 
Fix: kernel metadata is generated on the fly for snapshot sessions

Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
5 years agoTest: clear: take an additional snapshot after clear for per-pid
Jonathan Rajotte [Thu, 14 Feb 2019 19:33:51 +0000 (14:33 -0500)] 
Test: clear: take an additional snapshot after clear for per-pid

Use the before-exit sync points of gen-ust-event to prevent the app from
exiting and generate a single event to test that tracing functionality still
works.

Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
5 years agoTest: lttng clear command for snapshot session
Jonathan Rajotte [Thu, 14 Feb 2019 02:33:35 +0000 (21:33 -0500)] 
Test: lttng clear command for snapshot session

Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
5 years agoGen-ust-events: add touch and wait sync points before exit.
Jonathan Rajotte [Thu, 14 Feb 2019 02:22:06 +0000 (21:22 -0500)] 
Gen-ust-events: add touch and wait sync points before exit.

Allows an app to linger until the wait file is created and signals that
the app is just before the exit.

This is mostly useful for per-pid tracing where trace buffers are
cleaned on application teardown.

Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
5 years agoGen-ust-events: add sync point before last event
Jonathan Rajotte [Thu, 14 Feb 2019 02:40:54 +0000 (21:40 -0500)] 
Gen-ust-events: add sync point before last event

Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
5 years agoGen-ust-events: use options instead of arguments
Jonathan Rajotte [Thu, 14 Feb 2019 02:37:56 +0000 (21:37 -0500)] 
Gen-ust-events: use options instead of arguments

Remove argument dependency and ease usage of feature individually.

Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
5 years agoCLI: Implement lttng clear session command
Jonathan Rajotte [Mon, 11 Feb 2019 16:26:05 +0000 (11:26 -0500)] 
CLI: Implement lttng clear session command

Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
5 years agoLttng-ctl: Expose sessiond cmd_clear_session command
Jonathan Rajotte [Mon, 11 Feb 2019 16:24:38 +0000 (11:24 -0500)] 
Lttng-ctl: Expose sessiond cmd_clear_session command

Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
5 years agoSessiond: Implement cmd_clear_session
Jonathan Rajotte [Mon, 11 Feb 2019 16:18:22 +0000 (11:18 -0500)] 
Sessiond: Implement cmd_clear_session

Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
5 years agoConsumer: implement LTTNG_CONSUMER_CLEAR_CHANNEL
Jonathan Rajotte [Mon, 11 Feb 2019 15:24:35 +0000 (10:24 -0500)] 
Consumer: implement LTTNG_CONSUMER_CLEAR_CHANNEL

Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
5 years agoConsumer: Implement lttng_consumer_clear_channel
Jonathan Rajotte [Fri, 8 Feb 2019 22:05:10 +0000 (17:05 -0500)] 
Consumer: Implement lttng_consumer_clear_channel

This function is responsible for performing all actions needed to
clear a given channel.

It only supports clear operation on unmonitored channel
(snapshot mode) for now.

To do so, flush and clear all the channel streams.

We use an active flush (consumer_flush_buffer(..., 1)) since we consider
the producer active at all time. No reason so far to check for the
quiescent state of the channel. This might need to be revisited.

Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
5 years agoUst consumer: Expose userspace clear buffer operation
Jonathan Rajotte [Tue, 12 Feb 2019 15:57:41 +0000 (10:57 -0500)] 
Ust consumer: Expose userspace clear buffer operation

Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
5 years agoKernel-ctl: Expose kernel clear buffer operation
Jonathan Rajotte [Tue, 12 Feb 2019 15:57:18 +0000 (10:57 -0500)] 
Kernel-ctl: Expose kernel clear buffer operation

Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
5 years agoRefactor: rename lttng_consumer_rotate_channel to lttng_consumer_rotate_sample_channel
Jonathan Rajotte [Tue, 5 Feb 2019 21:58:11 +0000 (16:58 -0500)] 
Refactor: rename lttng_consumer_rotate_channel to lttng_consumer_rotate_sample_channel

lttng_consumer_rotate_channel does not perform a rotation it performs a
sample of the channel. Rename it to reflect this.

Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
5 years agoRefactor: lttng_ustctl_flush_buffer is a duplicate of lttng_ustconsumer_flush_buffer
Jonathan Rajotte [Tue, 5 Feb 2019 21:55:12 +0000 (16:55 -0500)] 
Refactor: lttng_ustctl_flush_buffer is a duplicate of lttng_ustconsumer_flush_buffer

Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
5 years agoPrevent channel buffer allocation larger than memory
Francis Deslauriers [Sat, 17 Nov 2018 03:51:06 +0000 (22:51 -0500)] 
Prevent channel buffer allocation larger than memory

Background
==========
Until recently (before lttng-modules commit 1f0ab1e) it was possible to
trigger an Out-Of-Memory crash by creating a kernel channel buffer
larger than the currently usable memory on the system. The following
commands was triggering the issue on my laptop:
  lttng create
  lttng enable-channel -k --subbuf-size=100G --num-subbuf=1 chan0

The lttng-modules commit 1f0ab1e adds a verification based on an
estimate to prevent this from happening. Since this kernel tracer sanity
check is based on an estimate, it would safer to do a similar check on
the session daemon side.

Approach
========
Verify that there is enough memory available on the system to do all the
allocations needed to enable the channel. If the available memory is
insufficient for the buffer allocation, return an error to the user
without trying to allocate the buffers.

Use the `/proc/meminfo` procfile to get an estimate of the current size
of available memory (using `MemAvailable`). The `MemAvailable` field was
added in the Linux kernel 3.14, so if it's absent, fallback to verifying
that the requested buffer is smaller than the physical memory on the
system.

Compute the size of the requested buffers using the following equation:
  requested_memory = number_subbuffer * size_subbuffer * number_cpu

The following error is returned to the command line user:
  lttng enable-channel -k --subbuf-size=100G --num-subbuf=1 chan0
  Error: Channel chan0: Not enough memory (session auto-20181121-161146)

Side effect
===========
This patch has the interesting side effect to alerting the user with an
error that buffer allocation has failed because of memory availability
in both --kernel and --userspace channel creation.

Drawback
========
The fallback check on older kernels is imperfect and is only to prevent
obvious user errors.

Note
====
In the future, there might be a need for a way to deactivate this check
(by using an environment variable) if a case arises where
`/proc/meminfo` doesn't accurately reflect the state of memory for a
particular use case.

Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
5 years agoFix: destroy called twice on quit pipe
Jérémie Galarneau [Fri, 14 Dec 2018 20:36:20 +0000 (15:36 -0500)] 
Fix: destroy called twice on quit pipe

A consumer management thread can be launched successsfully and yet
still report an error encoutered during its initialization. If
such an error occurs, the cleanup function is invoked explicitly
in the error path and will be called again when the last reference
to the thread is released.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoUse uuid_to_str() when formatting metadata
Jérémie Galarneau [Wed, 12 Dec 2018 20:11:29 +0000 (15:11 -0500)] 
Use uuid_to_str() when formatting metadata

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoAdd an internal uuid formatting utility
Jérémie Galarneau [Wed, 12 Dec 2018 20:10:36 +0000 (15:10 -0500)] 
Add an internal uuid formatting utility

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoRemove duplicate check for dlopen
Michael Jeanson [Thu, 20 Dec 2018 21:16:47 +0000 (16:16 -0500)] 
Remove duplicate check for dlopen

Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoTests: take multiple snapshots in streaming mode
Jonathan Rajotte [Fri, 8 Feb 2019 01:25:41 +0000 (20:25 -0500)] 
Tests: take multiple snapshots in streaming mode

Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoFix: don't destroy the sockets if the snapshot was successful
Jonathan Rajotte [Fri, 8 Feb 2019 01:25:42 +0000 (20:25 -0500)] 
Fix: don't destroy the sockets if the snapshot was successful

Missing a goto to skip the error condition that was destroying the
relayd sockets even if a snapshot was successful. We want to keep them
open to reuse them for the next snapshots.

This is verbatim from the fix 1371fc1228461eb532118280e67ab3e9de015757

It is also the same fix.

Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoFix: run-as thread deadlock on itself in restart error path
Jonathan Rajotte [Wed, 16 Jan 2019 18:38:57 +0000 (13:38 -0500)] 
Fix: run-as thread deadlock on itself in restart error path

The deadlock was found using this backtrace

Thread 5:
0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
1  0x00007efc6b650023 in __GI___pthread_mutex_lock (mutex=mutex@entry=0x55fc37128400 <worker_lock>) at ../nptl/pthread_mutex_lock.c:78
2  0x000055fc36efbe05 in run_as_destroy_worker () at runas.c:1233
3  0x000055fc36efc2e7 in run_as_restart_worker (worker=<optimized out>) at runas.c:998
4  run_as (cmd=cmd@entry=RUN_AS_UNLINK, data=data@entry=0x7efc5b7fa630, ret_value=ret_value@entry=0x7efc5b7fa510, uid=uid@entry=1000, gid=gid@entry=1000) at runas.c:1033
5  0x000055fc36efc9ce in run_as_unlink (path=path@entry=0x7efc5b7fb690 "/home/joraj/lttng-traces/auto-20190116-111518/20190116T111729-0500-33/kernel/index/channel0_3.idx", uid=uid@entry=1000, gid=gid@entry=1000) at runas.c :1120
6  0x000055fc36ef7feb in utils_unlink_stream_file (path_name=path_name@entry=0x7efc5b7fc7e0 "/home/joraj/lttng-traces/auto-20190116-111518/20190116T111729-0500-33/kernel/index", file_name=file_name@entry=0x7efc500085d4 "channel0_3", size=size@entry=0, count=count@entry=0, uid=uid@entry=1000, gid=gid@entry=1000, suffix=0x55fc36f19b26 ".idx") at utils.c:929
7  0x000055fc36f01d4e in lttng_index_file_create (path_name=path_name@entry=0x7efc500087a0 "/home/joraj/lttng-traces/auto-20190116-111518/20190116T111729-0500-33/kernel", stream_name=stream_name@entry=0x7efc500085d4 "channel0_3", uid=1000, gid=1000, size=0, count=0, major=1, minor=1) at index.c:79
8  0x000055fc36ed9475 in rotate_local_stream (ctx=<optimized out>, stream=0x7efc50008460) at consumer.c:4105
9  0x000055fc36ed98b5 in lttng_consumer_rotate_stream (ctx=ctx@entry=0x55fc37428d80, stream=stream@entry=0x7efc50008460, rotated=rotated@entry=0x7efc5b7fdb27) at consumer.c:4181
10 0x000055fc36ee354e in lttng_kconsumer_read_subbuffer (stream=stream@entry=0x7efc50008460, ctx=ctx@entry=0x55fc37428d80, rotated=rotated@entry=0x7efc5b7fdb27) at kernel-consumer.c:1740
11 0x000055fc36ed7a30 in lttng_consumer_read_subbuffer (stream=0x7efc50008460, ctx=0x55fc37428d80) at consumer.c:3383
12 0x000055fc36ed4b74 in consumer_thread_data_poll (data=0x55fc37428d80) at consumer.c:2751
13 0x00007efc6b64d6db in start_thread (arg=0x7efc5b7fe700) at pthread_create.c:463
14 0x00007efc6af6488f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

The owner of the lock is itself:
  print worker_lock.__data.__owner
    $2 = 25725
  thread find 25725
    Thread 5 has target id 'Thread 0x7efc5b7fe700 (LWP 25725)'

The worker_lock is first taken in frame #4: run_as runas.c:1033

  pthread_mutex_lock(&worker_lock);
  if (use_clone()) {
...
    /*
     * If the worker thread crashed the errno is set to EIO. we log
     * the error and  start a new worker process.
     */
    if (ret == -1 && saved_errno == EIO) {
        DBG("Socket closed unexpectedly... "
        "Restarting the worker process");
->      ret = run_as_restart_worker(global_worker);
        if (ret == -1) {
          ERR("Failed to restart worker process.");
          goto err;
        }

Solution
========

Create run_as_restart_worker_no_lock which does not to take the lock on
execution.
Use run_as_restart_worker_no_lock at the run_as error path call site.
Use run_as_restart_worker_no_lock inside run_as_restart_worker while
holding the worker lock to provide identical behaviour to other call sites.

Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoFix: session list lock must be held on session put operation
Jonathan Rajotte [Wed, 19 Dec 2018 18:47:23 +0000 (13:47 -0500)] 
Fix: session list lock must be held on session put operation

Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoSupport minute and hour as time suffixes
Jonathan Rajotte [Fri, 14 Dec 2018 21:32:12 +0000 (16:32 -0500)] 
Support minute and hour as time suffixes

utils_parse_time_suffix now support the following suffix:

    "us" for microsecond,
    "ms" for millisecond,
    "s"  for second,
    "m"  for minute,
    "h"  for hour

This removes the use of "m" for milliseconds and "u" for microseconds.

Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoTest fix: passing bool argument to va_start is undefined
Jérémie Galarneau [Sat, 2 Feb 2019 13:09:55 +0000 (08:09 -0500)] 
Test fix: passing bool argument to va_start is undefined

clang warns that "passing an object that undergoes default argument
promotion to 'va_start' has undefined behaviour [-Wvarargs]".

Since va_start's last argument has no known type, the boolean argument
is promoted to 'int', which is not guaranteed to have the same size
as 'bool'.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoFix: missing rcu read locking in trigger "unregister all" command
Jérémie Galarneau [Wed, 23 Jan 2019 20:29:14 +0000 (15:29 -0500)] 
Fix: missing rcu read locking in trigger "unregister all" command

While the notification subsystem all runs within a single thread,
the iteration over the triggers hash table must be protected using
the RCU read-side lock since the RCU worker may resize the hash
table while the iteration is performed.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoFix: create_kernel_session asserts on failure
Jérémie Galarneau [Fri, 18 Jan 2019 17:40:47 +0000 (12:40 -0500)] 
Fix: create_kernel_session asserts on failure

create_kernel_session() will call trace_kernel_destroy_session()
on failure to create a kernel session (e.g. modules failed to load).

This can be reproduced by enabling kernel events on a session after
the session daemon has failed to load the LTTng kernel modules.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoFix: only free trace_path when it is dynamically allocated
Jérémie Galarneau [Mon, 14 Jan 2019 22:13:32 +0000 (17:13 -0500)] 
Fix: only free trace_path when it is dynamically allocated

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoFix: wrong error check on kernel session creation
Jérémie Galarneau [Mon, 14 Jan 2019 22:09:42 +0000 (17:09 -0500)] 
Fix: wrong error check on kernel session creation

create_kernel_session() returns a positive lttng error code
on error and returns LTTNG_OK on success.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoFix: don't put() thread on shutdown failure
Jérémie Galarneau [Mon, 14 Jan 2019 21:53:38 +0000 (16:53 -0500)] 
Fix: don't put() thread on shutdown failure

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoFix: dereference on NULL pointer on allocation failure
Jérémie Galarneau [Mon, 14 Jan 2019 21:36:21 +0000 (16:36 -0500)] 
Fix: dereference on NULL pointer on allocation failure

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoFix: leak of filter bytecode and expression on agent event re-enable
Jérémie Galarneau [Sat, 12 Jan 2019 19:53:56 +0000 (14:53 -0500)] 
Fix: leak of filter bytecode and expression on agent event re-enable

The agent subsystem does not properly assume the clean-up of an
event's filter bytecode and expression when a previously disabled
event is re-enabled.

This change ensures that the ownership of both the filter bytecode
and expression is assumed by the agent subsystem and discarded
when a matching event is found.

Steps to reproduce the leak:
$ lttng create
$ lttng enable-event --python allo --filter 'a[42] == 241'
$ lttng disable-event --python allo
$ lttng enable-event --python allo --filter 'a[42] == 241'

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoTest fix: python logging test spams its output
Jérémie Galarneau [Sat, 12 Jan 2019 19:21:24 +0000 (14:21 -0500)] 
Test fix: python logging test spams its output

A set -x/+x pair was erroneously committed as part of the
test_python_logging test script which causes the test to be
unnecessarily verbose.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoFix: leak of lttng-consumerd global HTs in run-as worker
Jérémie Galarneau [Sat, 12 Jan 2019 19:17:58 +0000 (14:17 -0500)] 
Fix: leak of lttng-consumerd global HTs in run-as worker

All resources allocated by the consumerd before the launch
of the run-as worker process are leaked since the run-as process
is only fork()'ed (the original process image is preserved).

Moving the launch of the worker earlier in the initialization
of the consumerd works around this problem.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoFix: leak of sessiond configuration on launch of run-as worker
Jérémie Galarneau [Fri, 11 Jan 2019 20:49:44 +0000 (15:49 -0500)] 
Fix: leak of sessiond configuration on launch of run-as worker

The run-as worker is spawned through fork() without using
exec*(). This means that any resource allocated by the session
daemon before the launch of the run-as worker will be leaked in
the run-as worker's process.

A callback is added to the run_as launch interface to allow users
a chance to clean-up after the fork occurs. This mechanism is
fragile as it may not always be easy (or possible) to track all
such resources in the future. This makes a strong argument for using a
new process image (through exec*()) and forego any such problem at
some point.

The lttng-consumerd from a similar (and more severe) problem with its
own run-as worker. A fix adressing the consumerd's problem follows.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoFix: leak of rundir config string
Jérémie Galarneau [Fri, 11 Jan 2019 20:10:08 +0000 (15:10 -0500)] 
Fix: leak of rundir config string

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoFix: only synchronize application configuration on tracing start
Jérémie Galarneau [Thu, 10 Jan 2019 18:48:28 +0000 (13:48 -0500)] 
Fix: only synchronize application configuration on tracing start

The UST configuration of applications is currently replicated as it is
changed from the ltt_ust_{session, channel, event} data structures to
their ust_app_* equivalent as they are modified.

While this worked correctly for the most part, it caused a problem in
per-PID mode since the buffers would get allocated
(and files created, in applicable tracing modes) even though tracing
was never started during some applications' lifetime.

A previous fix attempt, 0498a00cb, adressed this problem but
introduced a regression that caused configurations to become
mismatched between the sessiond and applications in cases where a
tracing session was started, stopped, modified, and started again
within the lifetime of a given application.

This change introduces an explicit "synchronize" set of operations
that ensures that a session's channels and events configurations, as
known by the application(s), match those of the session daemon
whenever a session is started.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoFix: run_command_wait() handle partial write
Mathieu Desnoyers [Thu, 13 Dec 2018 18:56:35 +0000 (13:56 -0500)] 
Fix: run_command_wait() handle partial write

Use lttng_write() to handle partial writes (writing less than the
requested amount of bytes) as well as ret = -1, errno = EINTR.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoFix: do not repurpose iterator while it is being used
Mathieu Desnoyers [Wed, 12 Dec 2018 22:24:11 +0000 (17:24 -0500)] 
Fix: do not repurpose iterator while it is being used

The hash table iteration uses an iterator that needs to stay valid for
the next loop. Using that same iterator variable in a nested lookup
in a different hash table leads to segmentation fault.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoFix: handle_notification_thread_command: handle partial read
Mathieu Desnoyers [Wed, 12 Dec 2018 20:11:15 +0000 (15:11 -0500)] 
Fix: handle_notification_thread_command: handle partial read

Use lttng_read() to handle partial reads (returning less than the
requested amount of bytes) as well as ret = -1, errno == EINTR.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoFix: notification thread: free session trigger list on error
Mathieu Desnoyers [Wed, 12 Dec 2018 20:11:14 +0000 (15:11 -0500)] 
Fix: notification thread: free session trigger list on error

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoFix: notification thread: RCU-safe reclaim of hash table nodes
Mathieu Desnoyers [Wed, 12 Dec 2018 17:16:44 +0000 (12:16 -0500)] 
Fix: notification thread: RCU-safe reclaim of hash table nodes

Nodes that are put in a rculfhash hash table created with the
"auto resize" flag need to beware that a worker thread can access the
hash table nodes as a RCU reader concurrently, and that this worker
thread can modify the hash table content, effectively adding and
removing "bucket" nodes, and changing the size of the hash table
index.

Therefore, even though only a single thread reads and updates the hash
table, a grace period is needed before reclaiming the memory holding
the rculfhash nodes.

Moreover, handle_notification_thread_command_add_channel() misses a
RCU read-side lock around iteration on the triggers hash table. Failure
to hold this read-side lock could cause segmentation faults when
accessing hash table objects if a hash table resize is done by the
worker thread in parallel with iteration over the hash table.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoFix: error logged on partial recvmsg() in MSG_DONTWAIT
Jérémie Galarneau [Tue, 18 Dec 2018 19:01:08 +0000 (14:01 -0500)] 
Fix: error logged on partial recvmsg() in MSG_DONTWAIT

The relay daemon logs a "Resource temporarily unavailable" error
message when the lttcomm_recvmsg_inet_sock() is invoked and
no data is left to be consumed from the lttcomm_sock.

The "recvmsg" socket operation is called in a loop by the relay
daemon to consume the data being received in 64k chunks. If, on
one of those iterations, 0 bytes are available, recvmsg() will
return an error (-1, errno = EAGAIN). This should not be
logged in non-blocking mode.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoPrint UTF-8 SI suffix only when allowed by the locale
Jérémie Galarneau [Fri, 14 Dec 2018 02:24:35 +0000 (21:24 -0500)] 
Print UTF-8 SI suffix only when allowed by the locale

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoCleanup: duplicate LDADD of libcommon for utils unit tests
Jérémie Galarneau [Thu, 13 Dec 2018 22:30:56 +0000 (17:30 -0500)] 
Cleanup: duplicate LDADD of libcommon for utils unit tests

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoMove time utils to their own time.c file
Jérémie Galarneau [Thu, 13 Dec 2018 22:06:29 +0000 (17:06 -0500)] 
Move time utils to their own time.c file

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoFix: sessiond: don't allocate buffers and files for inactive sessions
Mathieu Desnoyers [Mon, 19 Nov 2018 21:13:58 +0000 (16:13 -0500)] 
Fix: sessiond: don't allocate buffers and files for inactive sessions

When tracing is inactive (before start/after stop), the current behavior
is to track all applications registered as UST data producers and
allocate buffers and files.

However, we guarantee that the trace is readable (invariant) after a
"stop" command has waited for data pending to complete. Unfortunately,
tracking additional applications (and adding their files) after tracing
is stopped (for each pid in per-pid buffers, for new uid in per-uid
buffers) does not respect this guarantee.

Fix this by *not* allocating channels, events, contexts when tracing
is inactive, but rather allocate those lazily just before tracing
starts.

One reason why this was not originally done was to ensure we could
have a fast start command. There are however other ways to achieve this
in the future that will respect the stop invariant guarantees.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoCleanup: ust start/stop trace
Mathieu Desnoyers [Mon, 19 Nov 2018 21:12:17 +0000 (16:12 -0500)] 
Cleanup: ust start/stop trace

Move setting/clearing the session "active" state from cmd.c to
cmd_start_trace()/cmd_stop_trace() for better encapsulation of
behavior.

Reduces the amount of code to maintain in the catch-all cmd.c.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoFix: relayd: rotation pending off-by-one
Mathieu Desnoyers [Mon, 19 Nov 2018 21:09:28 +0000 (16:09 -0500)] 
Fix: relayd: rotation pending off-by-one

We need to compare with <= rather than < in the rotation pending
check on the relay daemon side, similarly to the check done in
the consumer daemon check_stream_rotation_pending().

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoFix: tests: test_crash should start sessions
Mathieu Desnoyers [Fri, 16 Nov 2018 18:25:00 +0000 (13:25 -0500)] 
Fix: tests: test_crash should start sessions

test_crash expects side-effects of directory creation to happen while
tracing is still stopped. In preparation for changing that behavior,
ensure that tracing is started when those side-effects are expected.

Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoFix: missing session reference release on kernel poll update
Jérémie Galarneau [Wed, 12 Dec 2018 03:42:19 +0000 (22:42 -0500)] 
Fix: missing session reference release on kernel poll update

The iteration performed on all sessions in update_kernel_poll() does
not release the reference taken on the sessions. This causes the
session(s) to be leaked and prevents the shutdown of the session
daemon as it waits for all sessions to be destroyed before completing
its teardown.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoFix: set client socket permissions after launch of client thread
Jérémie Galarneau [Wed, 12 Dec 2018 03:27:20 +0000 (22:27 -0500)] 
Fix: set client socket permissions after launch of client thread

The client thread is now the owner of the client socket.  As the
client socket is now created by the client thread, the socket's
permissions must be set after the launch of the client thread.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoFix: use assignment-suppression for unused sscanf arguments
Jérémie Galarneau [Wed, 12 Dec 2018 02:26:42 +0000 (21:26 -0500)] 
Fix: use assignment-suppression for unused sscanf arguments

This removes the conversion of elements parsed by sscanf() which
are not used anyhow and eliminates a warning on x86 builds
(%lu used on size_t).

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoFix: report initialization error of app registration thread
Jérémie Galarneau [Tue, 11 Dec 2018 21:54:20 +0000 (16:54 -0500)] 
Fix: report initialization error of app registration thread

The health check tests use the testpoints() in the application
registration thread to force a pthread_exit() or simulate a
catastrophic error within the thread.

The testpoints were moved before the signal that the thread's
initialization was completed by recent changes. This caused the thread
to fail to complete its initialization, causing a deadlock of the
session daemon on launch.

This commit reports initialization errors through the
launch_application_registration_thread() function to the "main" thread
and shuts down the session daemon. It also moves the testpoints after
the thread's initialization to respect the test's intent.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoFix: report initialization error of client thread
Jérémie Galarneau [Tue, 11 Dec 2018 21:54:01 +0000 (16:54 -0500)] 
Fix: report initialization error of client thread

The health check tests use the testpoints() in the client thread
to force a pthread_exit() or simulate a catastrophic error within
the client thread.

The testpoints were moved before the signal that the thread's
initialization was completed by recent changes. This caused the
thread to fail to complete its initialization, causing a deadlock
of the session daemon.

This commit reports initialization errors through the
launch_client_thread() function to the "main" thread and shuts down
the session daemon. It also moves the testpoints after the thread's
initialization to respect the test's intent.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoCleanup: consumer socket creation debug msg always prints fd:-1
Francis Deslauriers [Wed, 28 Nov 2018 23:04:06 +0000 (18:04 -0500)] 
Cleanup: consumer socket creation debug msg always prints fd:-1

Set the fd to -1 _after_ we print the debug message.

Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoCleanup: remove unused label
Francis Deslauriers [Wed, 21 Nov 2018 00:16:18 +0000 (19:16 -0500)] 
Cleanup: remove unused label

Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoFix: may be used uninitialized warnings
Francis Deslauriers [Wed, 21 Nov 2018 00:15:46 +0000 (19:15 -0500)] 
Fix: may be used uninitialized warnings

Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agouserspace-probe: tests: add testcase for unsupported instrumentation
Francis Deslauriers [Mon, 26 Nov 2018 16:41:51 +0000 (11:41 -0500)] 
userspace-probe: tests: add testcase for unsupported instrumentation

Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agouserspace-probe: Print error on binary not found
Francis Deslauriers [Sat, 24 Nov 2018 00:20:45 +0000 (19:20 -0500)] 
userspace-probe: Print error on binary not found

Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agouserspace-probe: Print error for unsupported instrumentation mode
Francis Deslauriers [Sat, 24 Nov 2018 00:03:21 +0000 (19:03 -0500)] 
userspace-probe: Print error for unsupported instrumentation mode

This patch adds an error message printed when the user tries to place an
userspace-probe using unsupported probe location descriptions.
Hopefully, this will help users understand why their command failed.

Here are examples of unsupported uses of an address location:
* "elf:/path/to/binary:0x400430"
* "elf:/path/to/binary:4194364"

Here are examples of unsupported uses of offset from symbol location:
* "elf:/path/to/binary:my_symbol+0x323"
* "elf:/path/to/binary:my_symbol+43"

I expect users to try using the address locations and offset from symbol
locations for ELF instrumentation location because those methods are
available with the --probe option used to instrument the kernel.

Supporting those location descriptions in the future would require
the validation that the address or the offset from a symbol is indeed at
the boundary of an instruction.

Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoFix: use sys/types.h for ssize_t on Cygwin
Michael Jeanson [Tue, 27 Nov 2018 19:24:01 +0000 (14:24 -0500)] 
Fix: use sys/types.h for ssize_t on Cygwin

Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoAdd *.exe to gitignore for Cygwin
Michael Jeanson [Tue, 27 Nov 2018 19:24:27 +0000 (14:24 -0500)] 
Add *.exe to gitignore for Cygwin

Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoRevert stubbing of runas functions
Michael Jeanson [Thu, 29 Nov 2018 21:22:07 +0000 (16:22 -0500)] 
Revert stubbing of runas functions

All the runas functions were stubbed on builds where the sessiond isn't
built which is the case for all platforms except Linux. This was done
because of 2 new commands that require elf.h which is not present on
MacOSX. However the other commands can be used by the relayd.

Revert this and and only stub the relevant commands when "elf.h" is not
present on the system.

Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoRevert stubbing of unix socket functions
Michael Jeanson [Thu, 29 Nov 2018 21:22:06 +0000 (16:22 -0500)] 
Revert stubbing of unix socket functions

Instead of stubbing useful UNIX socket functions to work around
Linux-only credential passing, ifdef the relevant parts like it was
already done for other functions.

Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoFix: warning 'fd' may be used uninitialized
Michael Jeanson [Thu, 29 Nov 2018 21:49:51 +0000 (16:49 -0500)] 
Fix: warning 'fd' may be used uninitialized

Initialize fd to invalid '-1' and remove unnecessary file_opened.

Signed-off-by: Michael Jeanson <mjeanson@efficios.com>
Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoFix: worker structure is leaked in run_as process
Jérémie Galarneau [Thu, 6 Dec 2018 20:49:04 +0000 (15:49 -0500)] 
Fix: worker structure is leaked in run_as process

The run_as structure (handle) is allocated and initialized before
the fork() that spawns the run_as process. Currently, that structure
is only cleaned-up on the parent's end.

This fix performs the clean-up on the worker's side as well.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoFix: ensure the ht clean pipe is empty before processing quit pipe
Jérémie Galarneau [Thu, 6 Dec 2018 20:38:14 +0000 (15:38 -0500)] 
Fix: ensure the ht clean pipe is empty before processing quit pipe

The ht-cleanup thread does not ensure that all data pending on its
ht_clean_pipe has been read (and processed) before processing the
"quit" event on the quit_pipe.

This causes a number of urcu hash tables to be leaked on exit of
the sessiond even though they have been pushed on the cleanup
pipe.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoPerform the clean-up of application notify sockets in main thread
Jérémie Galarneau [Thu, 6 Dec 2018 19:47:34 +0000 (14:47 -0500)] 
Perform the clean-up of application notify sockets in main thread

The notify sockets of applications are owned by the "notify-apps"
thread. If an application exits during the course of the session
daemon's life, the notify thread will take care of cleaning its
associated notify socket.

However, there is no teardown/clean-up code to handle their
clean-up when the daemon is torn down.

This change adds a new step in the sessiond clean-up code that
closes closes all notify sockets present in the
ust_app_by_notify_sock hash table at the time of the sessiond
cleanup.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoClean-up: remove redundant exit labels from sessiond initialization
Jérémie Galarneau [Thu, 6 Dec 2018 19:22:20 +0000 (14:22 -0500)] 
Clean-up: remove redundant exit labels from sessiond initialization

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoMake the launch of the application registration thread blocking
Jérémie Galarneau [Thu, 6 Dec 2018 17:03:48 +0000 (12:03 -0500)] 
Make the launch of the application registration thread blocking

Waiting for the application registration thread to be ready to
accept application connection ensures that the application
registration socket is created and being listened-to before the parent
is signalled.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoStop the application registration thread before orphaned threads
Jérémie Galarneau [Thu, 6 Dec 2018 16:09:32 +0000 (11:09 -0500)] 
Stop the application registration thread before orphaned threads

The application registration thread receives new connections from
applications, provides them to the dispatch thread. The dispatch
thread, in turn, forwards the command and notification sockets of
applications (liblttng-ust) to the application management and
application notification threads.

Not shutting down the application registration thread is problematic
since application connections will be accepted but not associated
to an "ust_app" structure as the following threads are no longer
present.

The remaining threads can then be safely torn down as part of the
orphaned threads.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoRename ust-thread to notify-apps
Jérémie Galarneau [Thu, 6 Dec 2018 14:39:02 +0000 (09:39 -0500)] 
Rename ust-thread to notify-apps

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoTeardown the notification thread after the sessiond clean-up
Jérémie Galarneau [Wed, 5 Dec 2018 20:00:09 +0000 (15:00 -0500)] 
Teardown the notification thread after the sessiond clean-up

The notification thread may receive commands issued through the
call_rcu thread during the destruction of some of the sessiond's
data structure.

This change tears down the notification thread after the clean-up
has occured and the issuance of an RCU barrier. This ensures that
all previously-queued call_rcu work has been performed and that
any ensuing notification thread commands have been queued in return.

It is safe, at that point, to queue a "quit" command in the
notification thread's command queue. The notification thread's
shutdown method will issue the command and wait for its completion
before returning.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoLaunch the consumer management thread using lttng_thread
Jérémie Galarneau [Wed, 5 Dec 2018 04:19:25 +0000 (23:19 -0500)] 
Launch the consumer management thread using lttng_thread

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoStop sessions before destroying on teardown of session daemon
Jérémie Galarneau [Tue, 4 Dec 2018 22:46:00 +0000 (17:46 -0500)] 
Stop sessions before destroying on teardown of session daemon

Stopping sessions ensures that trace data is no longer produced
for a session which will allow pending rotations to complete.

It also ensure that no data is produced beyond the last rename
of a rotated session's output folder.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoRemove the sessiond "ready" counter mechanism
Jérémie Galarneau [Tue, 4 Dec 2018 16:50:30 +0000 (11:50 -0500)] 
Remove the sessiond "ready" counter mechanism

This commit replaces the sessiond "ready" counter scheme with
the use of the lttng_thread util. The launch of the threads which
need to be active before the sessiond can signal its parents
(when launched in daemon mode) is now blocking. This means that
their associated "launch" functions wait until the threads mark
themselves as ready (through the use of a "ready" semaphore)
before returning and allowing the initialization of the sessiond
to continue.

The threads which expose externally-visible resources (UNIX and
TCP sockets) which must be fully initialized before marking the
session daemon as ready are:
  - Health thread,
  - Agent thread,
  - Client thread.

Previously, the "load session" thread was part of this group.
However, it is no longer necessary to perform the loading of
session configurations in a dedicated thread. The main thread
performs that operation itself. It is safe to do so since it
is performed after the launch of the client thread. The client
thread has to be fully initialized as the session loading code
"impersonates" a client to initialize the loaded sessions.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoLoad session configurations from lttng-sessiond's main thread
Jérémie Galarneau [Mon, 3 Dec 2018 00:04:45 +0000 (19:04 -0500)] 
Load session configurations from lttng-sessiond's main thread

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoLaunch the kernel management thread using lttng_thread
Jérémie Galarneau [Sun, 2 Dec 2018 22:06:45 +0000 (17:06 -0500)] 
Launch the kernel management thread using lttng_thread

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoLaunch agent management thread using lttng_thread
Jérémie Galarneau [Sun, 2 Dec 2018 21:20:16 +0000 (16:20 -0500)] 
Launch agent management thread using lttng_thread

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoMark lttng_pipe as const where possible
Jérémie Galarneau [Sat, 1 Dec 2018 01:04:08 +0000 (20:04 -0500)] 
Mark lttng_pipe as const where possible

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoLaunch the application notification thread using lttng_thread
Jérémie Galarneau [Tue, 4 Dec 2018 22:37:45 +0000 (17:37 -0500)] 
Launch the application notification thread using lttng_thread

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoLaunch the application management thread with lttng_thread
Jérémie Galarneau [Sat, 1 Dec 2018 00:25:06 +0000 (19:25 -0500)] 
Launch the application management thread with lttng_thread

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoLaunch application registration thread using lttng_thread
Jérémie Galarneau [Fri, 30 Nov 2018 21:33:38 +0000 (16:33 -0500)] 
Launch application registration thread using lttng_thread

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoLaunch the ust registration dispatch thread using lttng_thread
Jérémie Galarneau [Fri, 30 Nov 2018 20:57:04 +0000 (15:57 -0500)] 
Launch the ust registration dispatch thread using lttng_thread

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoLaunch the client management thread using lttng_thread
Jérémie Galarneau [Fri, 30 Nov 2018 19:29:32 +0000 (14:29 -0500)] 
Launch the client management thread using lttng_thread

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoLaunch the timer thread using lttng_thread
Jérémie Galarneau [Fri, 30 Nov 2018 17:12:12 +0000 (12:12 -0500)] 
Launch the timer thread using lttng_thread

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoLaunch the rotation thread using lttng_thread
Jérémie Galarneau [Wed, 28 Nov 2018 21:53:53 +0000 (16:53 -0500)] 
Launch the rotation thread using lttng_thread

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoFix: flush the rotation thread's job queue on exit
Jérémie Galarneau [Wed, 28 Nov 2018 21:12:25 +0000 (16:12 -0500)] 
Fix: flush the rotation thread's job queue on exit

The rotation thread's job queue can transitively hold references
to a number of ltt_session objects which will prevent the session
daemon to exit as it waits for all sessions to have completed
their destruction.

This fix ensures that the job queue is flushed when activity
is observed on the quit pipe.

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoStop rotation pending check timer from the rotation thread
Jérémie Galarneau [Tue, 27 Nov 2018 23:17:16 +0000 (18:17 -0500)] 
Stop rotation pending check timer from the rotation thread

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoLaunch the notification thread using lttng_thread
Jérémie Galarneau [Tue, 27 Nov 2018 21:37:05 +0000 (16:37 -0500)] 
Launch the notification thread using lttng_thread

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoLaunch the health management thread using lttng_thread
Jérémie Galarneau [Tue, 27 Nov 2018 17:39:06 +0000 (12:39 -0500)] 
Launch the health management thread using lttng_thread

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoLaunch the ht-cleanup thread using lttng_thread util
Jérémie Galarneau [Mon, 26 Nov 2018 20:15:24 +0000 (15:15 -0500)] 
Launch the ht-cleanup thread using lttng_thread util

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
5 years agoAdd a thread utility class and thread list
Jérémie Galarneau [Mon, 26 Nov 2018 20:09:10 +0000 (15:09 -0500)] 
Add a thread utility class and thread list

As part of the re-work of the order of the teardown of the sessiond's
threads, this utility is introduced to track running threads and unify
the mechanisms through which they are launched and shutdown.

This makes it easier to reason about (and track) the order in which
threads are launched and shutdown.

The lttng_thread class allows threads to be implemented by
defining the following methods:
  - An entry point ("main" function),
  - A shutdown method,
  - A clean-up method.

Since the sessiond's threads use a variety of techniques to initiate
their teardown (through an explicit command in a queue, using the
global "quit" pipe, or through a futex + variable), it is more
practical to let them define a shutdown method which notifies the
thread to shutdown than to impose a standard mechanism.

A clean-up method is meant to work around situations where the
ownership of data structures shared between the main thread and
a worker thread can be ambiguous (mostly in error paths).
The clean-up method is invoked when the last reference to a thread
is released.

While some threads need to be shutdown in a particular order, most of
them can be shutdown in bulk. The lttng_thread utility maintains a
global thread list which allows for a generic path through which
threads can be shutdown using the lttng_thread_list_shutdown_orphans()
function.

The lttng_thread_shutdown() method, in return, allows the user
(most likely the main thread) to explicitly teardown threads which
must be shutdown in a specific order before issuing the bulk
lttng_thread_list_shutdown_orphans() call.

Note that lttng_thread objects are reference counted. The thread
list holds a reference to each thread until it is shutdown. Hence,
it is safe to hold a reference to a thread, invoke its shutdown
method, and then invoke lttng_thread_list_shutdown_orphans().

Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
This page took 0.045354 seconds and 5 git commands to generate.