Fix: getgrnam is not MT-Safe, use getgrnam_r Running the test suite under a Yocto musl build resulted in musl coredump due to double freeing. We get the following backtraces: 0 a_crash () at ./arch/x86_64/atomic_arch.h:108 1 unmap_chunk (self=<optimized out>) at src/malloc/malloc.c:515 2 free (p=<optimized out>) at src/malloc/malloc.c:526 3 0x00007f46d9dc3849 in __getgrent_a (f=f@entry=0x7f46d9d1f7e0, gr=gr@entry=0x7f46d9e24460 <gr>, line=line@entry=0x7f46d9e26058 <line>, size=size@entry=0x7f46d92db550, mem=mem@entry=0x7f46d9e26050 <mem>, nmem=nmem@entry=0x7f46d92db558, res=0x7f46d92db548) at src/passwd/getgrent_a.c:45 4 0x00007f46d9dc2e6b in __getgr_a (name=0x487242 "tracing", gid=gid@entry=0, gr=gr@entry=0x7f46d9e24460 <gr>, buf=buf@entry=0x7f46d9e26058 <line>, size=size@entry=0x7f46d92db550, mem=mem@entry=0x7f46d9e26050 <mem>, nmem=0x7f46d92db558, res=0x7f46d92db548) at src/passwd/getgr_a.c:30 5 0x00007f46d9dc3733 in getgrnam (name=<optimized out>) at src/passwd/getgrent.c:37 6 0x0000000000460b29 in utils_get_group_id (name=<optimized out>) at ../../../lttng-tools-2.10.6/src/common/utils.c:1241 7 0x000000000044ee69 in thread_manage_health (data=<optimized out>) at ../../../../lttng-tools-2.10.6/src/bin/lttng-sessiond/main.c:4115 8 0x00007f46d9de1541 in start (p=<optimized out>) at src/thread/pthread_create.c:195 9 0x00007f46d9dee661 in __clone () at src/thread/x86_64/clone.s:22 From another run: 0 a_crash () at ./arch/x86_64/atomic_arch.h:108 1 unmap_chunk (self=<optimized out>) at src/malloc/malloc.c:515 2 free (p=<optimized out>) at src/malloc/malloc.c:526 3 0x00007f5abc210849 in __getgrent_a (f=f@entry=0x7f5abc2733e0, gr=gr@entry=0x7f5abc271460 <gr>, line=line@entry=0x7f5abc273058 <line>, size=size@entry=0x7f5abaef5510, mem=mem@entry=0x7f5abc273050 <mem>, nmem=nmem@entry=0x7f5abaef5518, res=0x7f5abaef5508) at src/passwd/getgrent_a.c:45 4 0x00007f5abc20fe6b in __getgr_a (name=0x487242 "tracing", gid=gid@entry=0, gr=gr@entry=0x7f5abc271460 <gr>, buf=buf@entry=0x7f5abc273058 <line>, size=size@entry=0x7f5abaef5510, mem=mem@entry=0x7f5abc273050 <mem>, nmem=0x7f5abaef5518, res=0x7f5abaef5508) at src/passwd/getgr_a.c:30 5 0x00007f5abc210733 in getgrnam (name=<optimized out>) at src/passwd/getgrent.c:37 6 0x0000000000460b29 in utils_get_group_id (name=<optimized out>) at ../../../lttng-tools-2.10.6/src/common/utils.c:1241 7 0x000000000042dee4 in notification_channel_socket_create () at ../../../../lttng-tools-2.10.6/src/bin/lttng-sessiond/notification-thread.c:238 8 init_thread_state (state=0x7f5abaef5560, handle=0x7f5abbf9be40) at ../../../../lttng-tools-2.10.6/src/bin/lttng-sessiond/notification-thread.c:375 9 thread_notification (data=0x7f5abbf9be40) at ../../../../lttng-tools-2.10.6/src/bin/lttng-sessiond/notification-thread.c:495 10 0x00007f5abc22e541 in start (p=<optimized out>) at src/thread/pthread_create.c:195 11 0x00007f5abc23b661 in __clone () at src/thread/x86_64/clone.s:22 The problem was easily reproducible (~6 crash on ~300 runs). A prototype fix using mutex around the getgrnam yielded no crash in over 1000 runs. This patch yielded the same results as the prototype fix. Unfortunately we cannot rely on a mutex in liblttng-ctl since we cannot enforce the locking for the application using the lib. Use getgrnam_r instead. The previous implementation of utils_get_group_id returned the gid of the root group (0) on error/not found. lttng_check_tracing_group needs to know if an error/not found occured, returning the root group is not enough. We now return the gid via the passed parameter. The caller is responsible for either defaulting to the root group or propagating the error. We also do not want to warn when used in liblttng-ctl context. We might want to move the warning elsewhere in the future. For now, pass a bool if we need to warn or not. Signed-off-by: Jonathan Rajotte <jonathan.rajotte-julien@efficios.com> Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Generate session name and default output on sessiond's end The lttng client currently generates the default session name and output parameters. This has, over time, resulted in a number of problems. Notably, it is possible for scripts to create session too quickly using automatically-generated session names that would clash since the session's creation timestamp is the only variable part of a session "automatic" name. Hence, sessions created in the same second would clash and result in spurious session creation failures. More importantly, generating session names and outputs on the client end makes it impossible to reliably differentiate output locations that were automatically generated vs. those that were explicitly provided. This causes destinations to be "opaque" to the LTTng daemons as the subdir, session name, and session's creation timestamp are all "cooked" as part of the output destination path/subdir. Keeping these path components separate will make it easier to implement output path configurations that allow the grouping of session outputs by name, by host, etc. Since a session's creation time is used as part of its shm-path, an accessor to the session's creation time is added to the public API: lttng_session_get_creation_time(). This creation time attribute can be accessed when an lttng_session structure is created using the session listing API. Note that existing session creation functions are preserved to maintain the binary compatibility with existing liblttng-ctl users. The session creation functions are reimplemented on top of this newly-introduced API. The only function for which compatibility is dropped is the hidden _lttng_create_session_ext(). Overhaul of path separation --- Not generating paths on the client-end has uncovered a number of problems in the path handling of the session daemon, especially when a network output was used. A lot of code presumed that a network session would be created with a URL containing a sub-directory of the form "session_name-timestamp". While this is true for remote sessions created by the lttng client, a sub-directory is not required when liblttng-ctl is used directly. Hence, this commit ensures that session directories are split as base path, chunk directory, domain directory, application directory. A number of changes in this fix ensure that a session's base path contains everything up to the "session" path element _or_ up to the user-specified output directory. For example, creating a local session using default output settings, the session base output is: /home/user/lttng-traces/session-timestamp Creating a remote session using default output settings, the session base output path is: /hostname/session-timestamp/ Using custom output directories, whether locally or remotely, causes the session base path to be set to that custom output directory. For example, using a local output path of /tmp/my_path will result in a session base path of the form: /tmp/my_path Whereas creating a session with a network output of net://localhost/my_path will result in a session base path of the form: /hostname/my_path Another problematic element is the subdir of the kernel_session and ust_session consumer output which in different scenarios contained chunk names and arbitrary parts of the path hierarchy. The consumer output subdir has been renamed to 'domain_subdir' and now only ever contains: "kernel/", "ust/", or "". Finally, the chunk_path session attribute only contains the name of the current chunk directory being produced. Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Fix: lttng_rotate_session does not handle socket close lttng_ctl_ask_sessiond may return 0 if the sessiond process is killed or if its client socket is closed unexpectedly. This causes lttng_rotate_session to assume a rotation command reply has been received, resulting in a NULL pointer dereference later on. Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Fix: missing include can cause structures to not be packed A number of files declaring "packed" structures (using the LTTNG_PACKED macro) do not include common/macros.h, which defines this macro. This results in structures being used in their "unpacked" form, or under both packed and unpacked forms, depending on the other files included at the point of definition and use of these structures. It is unclear which of the users of these structures were actually affected by the bug. Most of these structures are used for IPC over a UNIX socket. In these cases, it is reasonable to assume that lttng-tools will be rebuilt completely to take this change into account. However, the structures declared in common/sessiond-comm/relayd.h are more worrying as they are part of the relay daemon's network protocol. Fortunately, adding the following directive to common/sessiond-comm/relayd.h confirms that the header is included transitively where those structures are used. > #ifndef LTTNG_PACKED > #error Not defined! > #endif Instances of this issue were found using the following script. for file in $(ag -l LTTNG_PACKED); do ag "#include \<common/macros\.h\>" -l ${file} > /dev/null if [ $? -ne 0 ]; then echo "Missing include in" $file fi done Running this script produces the following output (annotated): Missing include in include/lttng/channel-internal.h Missing include in include/lttng/condition/buffer-usage-internal.h Missing include in include/lttng/condition/session-consumed-size-internal.h Missing include in include/lttng/condition/session-rotation-internal.h Missing include in src/common/sessiond-comm/sessiond-comm.h Missing include in src/common/sessiond-comm/relayd.h Missing include in src/common/sessiond-comm/agent.h > LTTNG_PACKED mentioned in comments Missing include in src/common/optional.h > Unneeded. Missing include in src/common/macros.h > lttng-ust-abi.h defines its own version of LTTNG_PACKED > and is included by lttng-ust-ctl.h Missing include in src/bin/lttng-sessiond/lttng-ust-ctl.h Missing include in src/bin/lttng-sessiond/lttng-ust-abi.h Missing include in src/lib/lttng-ctl/filter/filter-bytecode.h > False positives (not source files) Missing include in packed.sh Missing include in ChangeLog Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Prevent channel buffer allocation larger than memory Background ========== Until recently (before lttng-modules commit 1f0ab1e) it was possible to trigger an Out-Of-Memory crash by creating a kernel channel buffer larger than the currently usable memory on the system. The following commands was triggering the issue on my laptop: lttng create lttng enable-channel -k --subbuf-size=100G --num-subbuf=1 chan0 The lttng-modules commit 1f0ab1e adds a verification based on an estimate to prevent this from happening. Since this kernel tracer sanity check is based on an estimate, it would safer to do a similar check on the session daemon side. Approach ======== Verify that there is enough memory available on the system to do all the allocations needed to enable the channel. If the available memory is insufficient for the buffer allocation, return an error to the user without trying to allocate the buffers. Use the `/proc/meminfo` procfile to get an estimate of the current size of available memory (using `MemAvailable`). The `MemAvailable` field was added in the Linux kernel 3.14, so if it's absent, fallback to verifying that the requested buffer is smaller than the physical memory on the system. Compute the size of the requested buffers using the following equation: requested_memory = number_subbuffer * size_subbuffer * number_cpu The following error is returned to the command line user: lttng enable-channel -k --subbuf-size=100G --num-subbuf=1 chan0 Error: Channel chan0: Not enough memory (session auto-20181121-161146) Side effect =========== This patch has the interesting side effect to alerting the user with an error that buffer allocation has failed because of memory availability in both --kernel and --userspace channel creation. Drawback ======== The fallback check on older kernels is imperfect and is only to prevent obvious user errors. Note ==== In the future, there might be a need for a way to deactivate this check (by using an environment variable) if a case arises where `/proc/meminfo` doesn't accurately reflect the state of memory for a particular use case. Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com>
Remove unnecessary check of output parameter It is not necessary to check for `_notification != NULL` as it is done at the beginning of the function. Moreover, it confuses Coverity which warns that `notification` will be leaked if the output parameter is NULL. Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Allow get_next_notification to return when interrupted Applications (and scripts) which consume a given set of notifications indefinitely may fail to exit if a SIGTERM handler is registered. lttng_notification_channel_get_next_notification() blocks indefinitely on recvmsg() until a new notification is available. The wrapper that is used to do so automatically restarts the recvmsg() if it is interrupted, thus not allowing clients a change to cleanly exit. This change causes the notification channel to wait for a message to be available using select() before starting the actual reception of the data and return LTTNG_NOTIFICATION_CHANNEL_STATUS_INTERRUTPED if a signal occurs during the wait. If data is available, it is assumed that the message is well-formed and can promptly be received in its entirety. The goal of this change is to give a monitoring application a chance to leave the get_next_notification() function and check if it should exit. Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Remove unused session current_archive_location accessor This function was replaced by lttng_rotation_handle_get_archive_location() which requires an lttng_rotation_handle to be used, making its use less error-prone. Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Fix: possible null dereference on communication error lttng_ctl_ask_sessiond_fds_varlen() can return a positive error code and NULL buffers if the sessiond uses a command return code that is already negative. Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Fix: use-after-free on error of lttng_event creation and copy Found by Coverity: >>> CID 1395219: Memory - illegal accesses (USE_AFTER_FREE) >>> Using freed pointer "new_event". Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com> Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Fix: userspace probe accessors are not const-correct The accessors of userspace probe locations and lookup methods should be const. This commit marks them as such and modifies the code using those functions to be, in turn, const-correct. Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>
Add lttng_event copy constructor Since the lttng_event structure now has an extended_ptr to store additional data, such as a userspace probe location, we can't simply memcpy the struct to duplicate it. This commit introduces a lttng_event_copy function that allocates a new lttng_event struct and copies the content of both the struct and its extended_ptr. Signed-off-by: Francis Deslauriers <francis.deslauriers@efficios.com> Signed-off-by: Jérémie Galarneau <jeremie.galarneau@efficios.com>