High Azure Network API calls by azure_sd_config causing API throttling

We are using azure_sd_config to discover VM’s and VMSS.How ever for around 400 VM+VMSS Prometheus is making 60k+ API calls where as Azure has a limit of 10k API calls.This is causing instability for other application who are making azure API calls

Can we optimize the azure discovery codebase to reduce the number of calls ?

System Information :

  • We observed this issue in Prometheus 2.15.2 and then upgraded to 2.24.1 but still facing the same issue
  • Prometheus is Running on Kubernetes 1.16

1 possible answer(s) on “High Azure Network API calls by azure_sd_config causing API throttling

  1. Hi @roidelapluie : Here is my config

    Thank you!! You configuration is correct but there is a way to reduce the calls to the Azure API.

    Prometheus is able to reuse the same SD configs, reusing the same API calls for multiple jobs.

    The condition for this is that the SD config is exactly the same. In you case you should make azure_sd_configs identical.

    That would require you to align your configurations and reuse relabel_configs to change the port. [Because the ports are currently different in the sd configs, Prometheus can not reuse them (even if the rest is identical)].

    In your case, you will also need to split your configuration in multiple jobs (I assume it is the case and you have provided a partial config):

    - job_name: prometheus
       static_configs:
       - targets:
         - localhost:9090
    - job_name: job1
      azure_sd_configs:
       - authentication_method: ManagedIdentity
         environment: AzurePublicCloud
         port: 80
         refresh_interval: 300s
         subscription_id: xxxxx-xxxxx-xxxxx-xxxx-xxxx
      relabel_configs:
      - source_labels: [__meta_azure_machine_tag_privateip]
        regex: (.+)
        replacement: ${1}:9090
        target_label: __address__
    - job_name: job2
      azure_sd_configs:
       - authentication_method: ManagedIdentity
         environment: AzurePublicCloud
         port: 80
         refresh_interval: 300s
         subscription_id: xxxxx-xxxxx-xxxxx-xxxx-xxxx
      relabel_configs:
        - source_labels: [__meta_azure_machine_private_ip]
          regex: (.+)
          replacement: ${1}:9100
          target_label: __address__
    

    That would divide the calls to the azure API by 4.