From 55954e07e828c0ec1c059a50996252a358f7dd23 Mon Sep 17 00:00:00 2001 From: =?utf8?q?J=C3=A9r=C3=A9mie=20Galarneau?= Date: Wed, 17 Jun 2020 12:59:24 -0400 Subject: [PATCH] Fix: consumerd: user space metadata not regenerated MIME-Version: 1.0 Content-Type: text/plain; charset=utf8 Content-Transfer-Encoding: 8bit Observed Issue ============== The LTTng-IVC tests fail on the `regenerate metadata` tests which essentially: - Setups a user space session - Enables events - Traces an application - Stops tracing - Validates the trace - Truncates the metadata file (empties it) - Starts tracing - Regenerates the metadata - Stops the session - Validates the trace The last trace validation step fails on an empty file (locally) or a garbled file (remote). The in-tree tests did no catch any of this since they essentially don't test much. They verify that the command works (returns 0) but do not validate any of its effects. The issue was bisected down to: commit 6f9449c22eef59294cf1e1dc3610a5cbf14baec0 (HEAD) Author: Jérémie Galarneau Date: Sun May 10 18:00:26 2020 -0400 consumerd: refactor: split read_subbuf into sub-operations [...] Cause ===== The commit that introduced the issue refactored the sub-buffer consumption loop to eliminate code duplications between the user space and kernel consumer daemons. In doing so, it eleminated a metadata version check from the consumption path. The consumption of a metadata sub-buffer follows those relevant high-level steps: - `get` the sub-buffer - /!\ user space specific /!\ - if the `get` fails, attempt to flush the metadata cache's contents to the ring-buffer - populate `stream_subbuffer` properties (size, version, etc.) - check the sub-buffer's version against the last known metadata version (pre-consume step) - if they don't match, a metadata regeneration occurred: reset the metadata consumed position - consume (actual write/send) - `put` sub-buffer [...] As shown above, the user space consumer must manage the flushing of the metadata cache explicitly as opposed to the kernel domain for which the tracer performs the flushing implicitly through the `get` operation. When the user space consumer encounters a `get` failure, it checks if all the metadata cache was flushed (consumed position != cache size), and flushes any remaining contents. However, the metadata version could have changed and yielded an identical cache size: a regeneration without any new metadata will yield the same cache size. Since 6f9449c22, the metadata version check is only performed after a successful `get`. This means that after a regeneration, `get` never succeeds (there is seemingly nothing to consume), and the metadata version check is never performed. Therefore, the metadata stream is never put in the `reset` mode, effectively not regenerating the data. Note that producing new metadata (e.g. a newly registering app announcing new events) would work around the problem here. Solution ======== Add a metadata version check when failing to `get` a metadata sub-buffer. This is done in `commit_one_metadata_packet()` when the cache size is seen to be equal to the consumed position. When this occurs, `consumer_stream_metadata_set_version()`, a new consumer stream method, is invoked which sets the new metadata version, sets the `reset` flag, and discards any previously bucketized metadata. The metadata cache's consumed position is also reset, allowing the cache flush to take place. `metadata_stream_reset_cache()` is renamed to `metadata_stream_reset_cache_consumed_position()` since its name is misleading and since it is used as part of the fix. Know drawbacks ============== None. Change-Id: I3b933c8293f409f860074bd49bebd8d1248b6341 Signed-off-by: Jérémie Galarneau Reported-by: Jonathan Rajotte --- src/common/consumer/consumer-stream.c | 20 +++++++---- src/common/consumer/consumer-stream.h | 10 ++++++ src/common/ust-consumer/ust-consumer.c | 48 +++++++++++++++++++++----- 3 files changed, 63 insertions(+), 15 deletions(-) diff --git a/src/common/consumer/consumer-stream.c b/src/common/consumer/consumer-stream.c index deebb58fe..b262b54d5 100644 --- a/src/common/consumer/consumer-stream.c +++ b/src/common/consumer/consumer-stream.c @@ -393,12 +393,8 @@ int metadata_stream_check_version(struct lttng_consumer_stream *stream, } DBG("New metadata version detected"); - stream->metadata_version = subbuffer->info.metadata.version; - stream->reset_metadata_flag = 1; - - if (stream->metadata_bucket) { - metadata_bucket_reset(stream->metadata_bucket); - } + consumer_stream_metadata_set_version(stream, + subbuffer->info.metadata.version); if (stream->read_subbuffer_ops.reset_metadata) { stream->read_subbuffer_ops.reset_metadata(stream); @@ -1053,3 +1049,15 @@ int consumer_stream_enable_metadata_bucketization( end: return ret; } + +void consumer_stream_metadata_set_version( + struct lttng_consumer_stream *stream, uint64_t new_version) +{ + assert(new_version > stream->metadata_version); + stream->metadata_version = new_version; + stream->reset_metadata_flag = 1; + + if (stream->metadata_bucket) { + metadata_bucket_reset(stream->metadata_bucket); + } +} diff --git a/src/common/consumer/consumer-stream.h b/src/common/consumer/consumer-stream.h index cb1dafe39..50e402fe2 100644 --- a/src/common/consumer/consumer-stream.h +++ b/src/common/consumer/consumer-stream.h @@ -120,4 +120,14 @@ bool consumer_stream_is_deleted(struct lttng_consumer_stream *stream); int consumer_stream_enable_metadata_bucketization( struct lttng_consumer_stream *stream); +/* + * Set the version of a metadata stream (i.e. following a metadata + * regeneration). + * + * Changing the version of a metadata stream will cause any bucketized metadata + * to be discarded and will mark the metadata stream for future `reset`. + */ +void consumer_stream_metadata_set_version( + struct lttng_consumer_stream *stream, uint64_t new_version); + #endif /* LTTNG_CONSUMER_STREAM_H */ diff --git a/src/common/ust-consumer/ust-consumer.c b/src/common/ust-consumer/ust-consumer.c index 6d6690a32..6a561b4ca 100644 --- a/src/common/ust-consumer/ust-consumer.c +++ b/src/common/ust-consumer/ust-consumer.c @@ -2430,13 +2430,12 @@ int lttng_ustconsumer_close_wakeup_fd(struct lttng_consumer_stream *stream) } static -void metadata_stream_reset_cache(struct lttng_consumer_stream *stream) +void metadata_stream_reset_cache_consumed_position( + struct lttng_consumer_stream *stream) { DBG("Reset metadata cache of session %" PRIu64, stream->chan->session_id); stream->ust_metadata_pushed = 0; - stream->metadata_version = stream->chan->metadata_cache->version; - stream->reset_metadata_flag = 1; } /* @@ -2452,10 +2451,41 @@ int commit_one_metadata_packet(struct lttng_consumer_stream *stream) int ret; pthread_mutex_lock(&stream->chan->metadata_cache->lock); - if (stream->chan->metadata_cache->max_offset - == stream->ust_metadata_pushed) { - ret = 0; - goto end; + if (stream->chan->metadata_cache->max_offset == + stream->ust_metadata_pushed) { + /* + * In the context of a user space metadata channel, a + * change in version can be detected in two ways: + * 1) During the pre-consume of the `read_subbuffer` loop, + * 2) When populating the metadata ring buffer (i.e. here). + * + * This function is invoked when there is no metadata + * available in the ring-buffer. If all data was consumed + * up to the size of the metadata cache, there is no metadata + * to insert in the ring-buffer. + * + * However, the metadata version could still have changed (a + * regeneration without any new data will yield the same cache + * size). + * + * The cache's version is checked for a version change and the + * consumed position is reset if one occurred. + * + * This check is only necessary for the user space domain as + * it has to manage the cache explicitly. If this reset was not + * performed, no metadata would be consumed (and no reset would + * occur as part of the pre-consume) until the metadata size + * exceeded the cache size. + */ + if (stream->metadata_version != + stream->chan->metadata_cache->version) { + metadata_stream_reset_cache_consumed_position(stream); + consumer_stream_metadata_set_version(stream, + stream->chan->metadata_cache->version); + } else { + ret = 0; + goto end; + } } write_len = ustctl_write_one_packet_to_channel(stream->chan->uchan, @@ -2680,7 +2710,7 @@ static int extract_metadata_subbuffer_info(struct lttng_consumer_stream *stream, goto end; } - subbuf->info.metadata.version = stream->chan->metadata_cache->version; + subbuf->info.metadata.version = stream->metadata_version; end: return ret; @@ -2924,7 +2954,7 @@ static int lttng_ustconsumer_set_stream_ops( stream->read_subbuffer_ops.extract_subbuffer_info = extract_metadata_subbuffer_info; stream->read_subbuffer_ops.reset_metadata = - metadata_stream_reset_cache; + metadata_stream_reset_cache_consumed_position; if (stream->chan->is_live) { stream->read_subbuffer_ops.on_sleep = signal_metadata; ret = consumer_stream_enable_metadata_bucketization( -- 2.34.1