From: Jérémie Galarneau Date: Wed, 17 Jun 2020 16:59:24 +0000 (-0400) Subject: Fix: consumerd: user space metadata not regenerated X-Git-Url: http://git.efficios.com/?p=lttng-tools.git;a=commitdiff_plain;h=55954e07e828c0ec1c059a50996252a358f7dd23;hp=55954e07e828c0ec1c059a50996252a358f7dd23 Fix: consumerd: user space metadata not regenerated Observed Issue ============== The LTTng-IVC tests fail on the `regenerate metadata` tests which essentially: - Setups a user space session - Enables events - Traces an application - Stops tracing - Validates the trace - Truncates the metadata file (empties it) - Starts tracing - Regenerates the metadata - Stops the session - Validates the trace The last trace validation step fails on an empty file (locally) or a garbled file (remote). The in-tree tests did no catch any of this since they essentially don't test much. They verify that the command works (returns 0) but do not validate any of its effects. The issue was bisected down to: commit 6f9449c22eef59294cf1e1dc3610a5cbf14baec0 (HEAD) Author: Jérémie Galarneau Date: Sun May 10 18:00:26 2020 -0400 consumerd: refactor: split read_subbuf into sub-operations [...] Cause ===== The commit that introduced the issue refactored the sub-buffer consumption loop to eliminate code duplications between the user space and kernel consumer daemons. In doing so, it eleminated a metadata version check from the consumption path. The consumption of a metadata sub-buffer follows those relevant high-level steps: - `get` the sub-buffer - /!\ user space specific /!\ - if the `get` fails, attempt to flush the metadata cache's contents to the ring-buffer - populate `stream_subbuffer` properties (size, version, etc.) - check the sub-buffer's version against the last known metadata version (pre-consume step) - if they don't match, a metadata regeneration occurred: reset the metadata consumed position - consume (actual write/send) - `put` sub-buffer [...] As shown above, the user space consumer must manage the flushing of the metadata cache explicitly as opposed to the kernel domain for which the tracer performs the flushing implicitly through the `get` operation. When the user space consumer encounters a `get` failure, it checks if all the metadata cache was flushed (consumed position != cache size), and flushes any remaining contents. However, the metadata version could have changed and yielded an identical cache size: a regeneration without any new metadata will yield the same cache size. Since 6f9449c22, the metadata version check is only performed after a successful `get`. This means that after a regeneration, `get` never succeeds (there is seemingly nothing to consume), and the metadata version check is never performed. Therefore, the metadata stream is never put in the `reset` mode, effectively not regenerating the data. Note that producing new metadata (e.g. a newly registering app announcing new events) would work around the problem here. Solution ======== Add a metadata version check when failing to `get` a metadata sub-buffer. This is done in `commit_one_metadata_packet()` when the cache size is seen to be equal to the consumed position. When this occurs, `consumer_stream_metadata_set_version()`, a new consumer stream method, is invoked which sets the new metadata version, sets the `reset` flag, and discards any previously bucketized metadata. The metadata cache's consumed position is also reset, allowing the cache flush to take place. `metadata_stream_reset_cache()` is renamed to `metadata_stream_reset_cache_consumed_position()` since its name is misleading and since it is used as part of the fix. Know drawbacks ============== None. Change-Id: I3b933c8293f409f860074bd49bebd8d1248b6341 Signed-off-by: Jérémie Galarneau Reported-by: Jonathan Rajotte ---