Feature proposal: Expose TSDB Stats as metrics?

Proposal

I would like to know if it would be accepted to expose the TSDB Stats as metrics exposed by prometheus:
https://prometheus.io/docs/prometheus/latest/querying/api/#tsdb-stats

I would like to have a view of my top �TSDB consumers and being able to alert on increase.

We are thinking of implementing it, but we would like to know if it can be accepted as part of prometheus code or if we should go the “prometheus-tsdb-exporter” way?

I believe why it might not be accepted as part of prometheus code, is the risk of having labels leaking every x-hours (blocks compaction) if the TSDB top-k are changing.

I am thinking of having such kind of metrics available:

# Gauge
prometheus_tsdb_label_value_count{label_name="__name__"} 900
prometheus_tsdb_label_value_count{label_name="instance"} 24

# Gauge
prometheus_tsdb_memory_in_bytes{label_name="__name__"} 44331
prometheus_tsdb_memory_in_bytes{label_name="instance"} 697

etc...

WDYT?

1 possible answer(s) on “Feature proposal: Expose TSDB Stats as metrics?

  1. FYI here is the configuration we created

    metrics:
    - help: Head min time
      name: prometheus_status_tsdb_head_min_time_milliseconds
      path: '{ .data.headStats.minTime }'
    - help: Head max time
      name: prometheus_status_tsdb_head_max_time_milliseconds
      path: '{ .data.headStats.maxTime }'
    - help: Head series count
      name: prometheus_status_tsdb_head_series_count
      path: '{ .data.headStats.numSeries }'
    - help: Head label value pairs count
      name: prometheus_status_tsdb_head_label_value_pairs_count
      path: '{ .data.headStats.numLabelPairs }'
    - help: Head chunks count
      name: prometheus_status_tsdb_head_chunks_count
      path: '{ .data.headStats.chunkCount }'
    - help: Series count by metric (top 10)
      labels:
        metric: '{ .name }'
      name: prometheus_status_tsdb_series_by_metric
      path: '{ .data.seriesCountByMetricName[*] }'
      type: object
      values:
        count: '{ .value }'
    - help: Series count by label value pair (top 10)
      labels:
        pair: '{ .name }'
      name: prometheus_status_tsdb_series_by_label_value_pair
      path: '{ .data.seriesCountByLabelValuePair[*] }'
      type: object
      values:
        count: '{ .value }'
    - help: Label values count by label name (top 10)
      labels:
        label: '{ .name }'
      name: prometheus_status_tsdb_label_values
      path: '{ .data.labelValueCountByLabelName[*] }'
      type: object
      values:
        count: '{ .value }'
    - help: Label values size in memory in bytes (top 10)
      labels:
        label: '{ .name }'
      name: prometheus_status_tsdb_label_values_in_memory_size
      path: '{ .data.memoryInBytesByLabelName[*] }'
      type: object
      values:
        bytes: '{ .value }'

    Example of output:

    # HELP prometheus_status_tsdb_head_chunks_count Head chunks count
    # TYPE prometheus_status_tsdb_head_chunks_count untyped
    prometheus_status_tsdb_head_chunks_count 347284
    # HELP prometheus_status_tsdb_head_label_value_pairs_count Head label value pairs count
    # TYPE prometheus_status_tsdb_head_label_value_pairs_count untyped
    prometheus_status_tsdb_head_label_value_pairs_count 4512
    # HELP prometheus_status_tsdb_head_max_time_milliseconds Head max time
    # TYPE prometheus_status_tsdb_head_max_time_milliseconds untyped
    prometheus_status_tsdb_head_max_time_milliseconds 1.615802118368e+12
    # HELP prometheus_status_tsdb_head_min_time_milliseconds Head min time
    # TYPE prometheus_status_tsdb_head_min_time_milliseconds untyped
    prometheus_status_tsdb_head_min_time_milliseconds 1.615795200011e+12
    # HELP prometheus_status_tsdb_head_series_count Head series count
    # TYPE prometheus_status_tsdb_head_series_count untyped
    prometheus_status_tsdb_head_series_count 173933
    # HELP prometheus_status_tsdb_label_values_count Label values count by label name (top 10)
    # TYPE prometheus_status_tsdb_label_values_count untyped
    prometheus_status_tsdb_label_values_count{label="__name__"} 860
    prometheus_status_tsdb_label_values_count{label="test_project"} 112
    prometheus_status_tsdb_label_values_count{label="config"} 171
    prometheus_status_tsdb_label_values_count{label="dialer_name"} 147
    prometheus_status_tsdb_label_values_count{label="group"} 486
    prometheus_status_tsdb_label_values_count{label="instance"} 370
    prometheus_status_tsdb_label_values_count{label="le"} 155
    prometheus_status_tsdb_label_values_count{label="pod_name"} 162
    prometheus_status_tsdb_label_values_count{label="rule_group"} 333
    prometheus_status_tsdb_label_values_count{label="scrape_job"} 144
    # HELP prometheus_status_tsdb_label_values_in_memory_size_bytes Label values size in memory in bytes (top 10)
    # TYPE prometheus_status_tsdb_label_values_in_memory_size_bytes untyped
    prometheus_status_tsdb_label_values_in_memory_size_bytes{label="__name__"} 34492
    prometheus_status_tsdb_label_values_in_memory_size_bytes{label="config"} 2778
    prometheus_status_tsdb_label_values_in_memory_size_bytes{label="dialer_name"} 2359
    prometheus_status_tsdb_label_values_in_memory_size_bytes{label="external_labels"} 113288
    prometheus_status_tsdb_label_values_in_memory_size_bytes{label="file"} 2780
    prometheus_status_tsdb_label_values_in_memory_size_bytes{label="group"} 12637
    prometheus_status_tsdb_label_values_in_memory_size_bytes{label="instance"} 6908
    prometheus_status_tsdb_label_values_in_memory_size_bytes{label="pod_name"} 4410
    prometheus_status_tsdb_label_values_in_memory_size_bytes{label="rule_group"} 31985
    prometheus_status_tsdb_label_values_in_memory_size_bytes{label="scrape_job"} 2329
    # HELP prometheus_status_tsdb_series_by_label_value_pair_count Series count by label value pair (top 10)
    # TYPE prometheus_status_tsdb_series_by_label_value_pair_count untyped
    prometheus_status_tsdb_series_by_label_value_pair_count{metric="app_instance=main"} 19611
    prometheus_status_tsdb_series_by_label_value_pair_count{metric="app_name=prometheus"} 121355
    prometheus_status_tsdb_series_by_label_value_pair_count{metric="app_part_of=test"} 173631
    prometheus_status_tsdb_series_by_label_value_pair_count{metric="container_name=prometheus"} 84196
    prometheus_status_tsdb_series_by_label_value_pair_count{metric="container_name=thanos-sidecar"} 32208
    prometheus_status_tsdb_series_by_label_value_pair_count{metric="job=prometheuses"} 84196
    prometheus_status_tsdb_series_by_label_value_pair_count{metric="job=thanos-sidecar"} 32208
    prometheus_status_tsdb_series_by_label_value_pair_count{metric="namespace=test"} 38097
    prometheus_status_tsdb_series_by_label_value_pair_count{metric="namespace=test-external"} 107052
    prometheus_status_tsdb_series_by_label_value_pair_count{metric="namespace=test-qa"} 23134
    # HELP prometheus_status_tsdb_series_by_metric_count Series count by metric (top 10)
    # TYPE prometheus_status_tsdb_series_by_metric_count untyped
    prometheus_status_tsdb_series_by_metric_count{metric="grpc_server_handled_total"} 14025
    prometheus_status_tsdb_series_by_metric_count{metric="grpc_server_handling_seconds_bucket"} 8685
    prometheus_status_tsdb_series_by_metric_count{metric="prometheus_http_request_duration_seconds_bucket"} 7950
    prometheus_status_tsdb_series_by_metric_count{metric="prometheus_http_response_size_bytes_bucket"} 7155
    prometheus_status_tsdb_series_by_metric_count{metric="prometheus_rule_group_interval_seconds"} 3038
    prometheus_status_tsdb_series_by_metric_count{metric="prometheus_rule_group_iterations_missed_total"} 3038
    prometheus_status_tsdb_series_by_metric_count{metric="prometheus_rule_group_last_duration_seconds"} 3038
    prometheus_status_tsdb_series_by_metric_count{metric="prometheus_rule_group_last_evaluation_timestamp_seconds"} 3038
    prometheus_status_tsdb_series_by_metric_count{metric="prometheus_rule_group_rules"} 3038
    prometheus_status_tsdb_series_by_metric_count{metric="thanos_objstore_bucket_operation_duration_seconds_bucket"} 13755