Skip to content

NativeLink Configuration

This page documents the configuration options for NativeLink.

nativelink_config

cas_server

SchedulerRefName

Name of the scheduler. This type will be used when referencing a scheduler in the `CasConfig::schedulers`‘s map key.

InstanceName

Used when the config references `instance_name` in the protocol.

HttpCompressionAlgorithm

"none"

Variants

  • none: No compression.

  • gzip: Zlib compression.

HttpCompressionConfig

{
send_compression_algorithm: null,
accepted_compression_algorithms: []
}

Note: Compressing data in the cloud rarely has a benefit, since most cloud providers have very high bandwidth backplanes. However, for clients not inside the data center, it might be a good idea to compress data to and from the cloud. This will however come at a high CPU and performance cost. If you are making remote execution share the same CAS/AC servers as client’s remote cache, you can create multiple services with different compression settings that are served on different ports. Then configure the non-cloud clients to use one port and cloud-clients to use another.

Fields

Default: `HttpCompressionAlgorithm::none`

  • accepted_compression_algorithms (list of HttpCompressionAlgorithm): The compression algorithm that the server will accept from clients. The server will broadcast the supported compression algorithms to clients and the client will choose which compression algorithm to use. Enabling this will likely save a lot of data transfer, but will consume a lot of CPU and add a lot of latency. see: https://github.com/tracemachina/nativelink/issues/109

Default: {no supported compression}

AcStoreConfig

{
ac_store: "example_string",
read_only: true
}

Fields

  • ac_store (StoreRefName): The store name referenced in the `stores` map in the main config. This store name referenced here may be reused multiple times.

  • read_only (bool): Whether the Action Cache store may be written to, this if set to false it is only possible to read from the Action Cache.

CasStoreConfig

{
cas_store: "example_string"
}

Fields

  • cas_store (StoreRefName): The store name referenced in the `stores` map in the main config. This store name referenced here may be reused multiple times.

CapabilitiesRemoteExecutionConfig

{
scheduler: "example_string"
}

Fields

  • scheduler (SchedulerRefName): Scheduler used to configure the capabilities of remote execution.

CapabilitiesConfig

{
remote_execution: null
}

Fields

  • remote_execution (optional CapabilitiesRemoteExecutionConfig): Configuration for remote execution capabilities. If not set the capabilities service will inform the client that remote execution is not supported.

ExecutionConfig

{
cas_store: "example_string",
scheduler: "example_string"
}

Fields

  • cas_store (StoreRefName): The store name referenced in the `stores` map in the main config. This store name referenced here may be reused multiple times. This value must be a CAS store reference.

  • scheduler (SchedulerRefName): The scheduler name referenced in the `schedulers` map in the main config.

ByteStreamConfig

{
cas_stores: [
{
example_string: "example_string"
}
],
max_bytes_per_stream: 42,
max_decoding_message_size: 42,
persist_stream_on_disconnect_timeout: 42
}

Fields

Default: 64KiB

  • max_decoding_message_size (usize): Maximum number of bytes to decode on each grpc stream chunk. Default: 4 MiB

  • persist_stream_on_disconnect_timeout (usize): In the event a client disconnects while uploading a blob, we will hold the internal stream open for this many seconds before closing it. This allows clients that disconnect to reconnect and continue uploading the same blob.

Default: 10 (seconds)

WorkerApiConfig

{
scheduler: "example_string"
}

Fields

  • scheduler (SchedulerRefName): The scheduler name referenced in the `schedulers` map in the main config.

PrometheusConfig

{
path: "example_string"
}

Fields

  • path : Path to register prometheus metrics. If path is “/metrics”, and your domain is “example.com”, you can reach the endpoint with: http://example.com/metrics.

Default: “/metrics”

AdminConfig

{
path: "example_string"
}

Fields

  • path : Path to register the admin API. If path is “/admin”, and your domain is “example.com”, you can reach the endpoint with: http://example.com/admin.

Default: “/admin”

HealthConfig

{
path: "example_string"
}

Fields

  • path : Path to register the health status check. If path is “/status”, and your domain is “example.com”, you can reach the endpoint with: http://example.com/status.

Default: “/status”

BepConfig

{
store: "example_string"
}

Fields

  • store (StoreRefName): The store to publish build events to. The store name referenced in the `stores` map in the main config.

ServicesConfig

{
cas: null,
ac: null,
capabilities: null,
execution: null,
bytestream: null,
worker_api: null,
experimental_bep: null,
experimental_prometheus: null,
admin: null,
health: null
}

Fields

  • cas (optional list of objects InstanceName: CasStoreConfig): The Content Addressable Storage (CAS) backend config. The key is the instance_name used in the protocol and the value is the underlying CAS store config.

  • ac (optional list of objects InstanceName: AcStoreConfig): The Action Cache (AC) backend config. The key is the instance_name used in the protocol and the value is the underlying AC store config.

  • capabilities (optional list of objects InstanceName: CapabilitiesConfig): Capabilities service is required in order to use most of the bazel protocol. This service is used to provide the supported features and versions of this bazel GRPC service.

  • execution (optional list of objects InstanceName: ExecutionConfig): The remote execution service configuration. NOTE: This service is under development and is currently just a place holder.

  • bytestream (optional ByteStreamConfig): This is the service used to stream data to and from the CAS. Bazel’s protocol strongly encourages users to use this streaming interface to interact with the CAS when the data is large.

  • worker_api (optional WorkerApiConfig): This is the service used for workers to connect and communicate through. NOTE: This service should be served on a different, non-public port. In other words, `worker_api` configuration should not have any other services that are served on the same port. Doing so is a security risk, as workers have a different permission set than a client that makes the remote execution/cache requests.

  • experimental_bep (optional BepConfig): Experimental - Build Event Protocol (BEP) configuration. This is the service that will consume build events from the client and publish them to a store for processing by an external service.

  • experimental_prometheus (optional PrometheusConfig): Experimental - Prometheus metrics configuration. Metrics are gathered as a singleton but may be served on multiple endpoints.

  • admin (optional AdminConfig): This is the service for any administrative tasks. It provides a REST API endpoint for administrative purposes.

  • health (optional HealthConfig): This is the service for health status check.

TlsConfig

{
cert_file: "example_string",
key_file: "example_string",
client_ca_file: null,
client_crl_file: null
}

Fields

  • cert_file : Path to the certificate file.

  • key_file : Path to the private key file.

  • client_ca_file (optional ): Path to the certificate authority for mTLS, if client authentication is required for this endpoint.

  • client_crl_file (optional ): Path to the certificate revocation list for mTLS, if client authentication is required for this endpoint.

HttpServerConfig

{
http2_keep_alive_interval: null,
experimental_http2_max_pending_accept_reset_streams: null,
experimental_http2_initial_stream_window_size: null,
experimental_http2_initial_connection_window_size: null,
experimental_http2_adaptive_window: null,
experimental_http2_max_frame_size: null,
experimental_http2_max_concurrent_streams: null,
experimental_http2_keep_alive_timeout: null,
experimental_http2_max_send_buf_size: null,
experimental_http2_enable_connect_protocol: null,
experimental_http2_max_header_list_size: null
}

Advanced Http configurations. These are generally should not be set. For documentation on what each of these do, see the hyper documentation: See: https://docs.rs/hyper/latest/hyper/server/conn/struct.Http.html

Note: All of these default to hyper’s default values unless otherwise specified.

Fields

  • http2_keep_alive_interval (optional u32): Interval to send keep-alive pings via HTTP2. Note: This is in seconds.

  • experimental_http2_max_pending_accept_reset_streams (optional u32): No description

  • experimental_http2_initial_stream_window_size (optional u32): No description

  • experimental_http2_initial_connection_window_size (optional u32): No description

  • experimental_http2_adaptive_window (optional bool): No description

  • experimental_http2_max_frame_size (optional u32): No description

  • experimental_http2_max_concurrent_streams (optional u32): No description

  • experimental_http2_keep_alive_timeout (optional u32): Note: This is in seconds.

  • experimental_http2_max_send_buf_size (optional u32): No description

  • experimental_http2_enable_connect_protocol (optional bool): No description

  • experimental_http2_max_header_list_size (optional u32): No description

ListenerConfig

{
http: {
socket_address: "example_string",
compression: {
send_compression_algorithm: null,
accepted_compression_algorithms: []
},
advanced_http: {
http2_keep_alive_interval: null,
experimental_http2_max_pending_accept_reset_streams: null,
experimental_http2_initial_stream_window_size: null,
experimental_http2_initial_connection_window_size: null,
experimental_http2_adaptive_window: null,
experimental_http2_max_frame_size: null,
experimental_http2_max_concurrent_streams: null,
experimental_http2_keep_alive_timeout: null,
experimental_http2_max_send_buf_size: null,
experimental_http2_enable_connect_protocol: null,
experimental_http2_max_header_list_size: null
},
tls: null
}
}

Variants

  • http (HttpListener): Listener for HTTP/HTTPS/HTTP2 sockets.

HttpListener

{
socket_address: "example_string",
compression: {
send_compression_algorithm: null,
accepted_compression_algorithms: []
},
advanced_http: {
http2_keep_alive_interval: null,
experimental_http2_max_pending_accept_reset_streams: null,
experimental_http2_initial_stream_window_size: null,
experimental_http2_initial_connection_window_size: null,
experimental_http2_adaptive_window: null,
experimental_http2_max_frame_size: null,
experimental_http2_max_concurrent_streams: null,
experimental_http2_keep_alive_timeout: null,
experimental_http2_max_send_buf_size: null,
experimental_http2_enable_connect_protocol: null,
experimental_http2_max_header_list_size: null
},
tls: null
}

Fields

  • socket_address : Address to listen on. Example: `127.0.0.1:8080` or `:8080` to listen to all IPs.

  • compression (HttpCompressionConfig): Data transport compression configuration to use for this service.

  • advanced_http (HttpServerConfig): Advanced Http server configuration.

  • tls (optional TlsConfig): Tls Configuration for this server. If not set, the server will not use TLS.

Default: None

ServerConfig

{
name: "example_string",
listener: {
http: {
socket_address: "example_string",
compression: {
send_compression_algorithm: null,
accepted_compression_algorithms: []
},
advanced_http: {
http2_keep_alive_interval: null,
experimental_http2_max_pending_accept_reset_streams: null,
experimental_http2_initial_stream_window_size: null,
experimental_http2_initial_connection_window_size: null,
experimental_http2_adaptive_window: null,
experimental_http2_max_frame_size: null,
experimental_http2_max_concurrent_streams: null,
experimental_http2_keep_alive_timeout: null,
experimental_http2_max_send_buf_size: null,
experimental_http2_enable_connect_protocol: null,
experimental_http2_max_header_list_size: null
},
tls: null
}
},
services: null
}

Fields

  • name : Name of the server. This is used to help identify the service for telemetry and logs.

Default: {index of server in config}

WorkerProperty

{
values: []
}

Variants

  • values (list of ): List of static values. Note: Generally there should only ever be 1 value, but if the platform property key is PropertyType::Priority it may have more than one value.

  • query_cmd : A dynamic configuration. The string will be executed as a command (not sell) and will be split by “\n” (new line character).

EndpointConfig

{
uri: "example_string",
timeout: null,
tls_config: null
}

Generic config for an endpoint and associated configs.

Fields

  • uri : URI of the endpoint.

  • timeout (optional f32): Timeout in seconds that a request should take. Default: 5 (seconds)

  • tls_config (optional ClientTlsConfig): The TLS configuration to use to connect to the endpoint.

UploadCacheResultsStrategy

"success_only"

Variants

  • success_only: Only upload action results with an exit code of 0.

  • never: Don’t upload any action results.

  • everything: Upload all action results that complete.

  • failures_only: Only upload action results that fail.

EnvironmentSource

{
property: "example_string"
}

Variants

  • property : The name of the platform property in the action to get the value from.

  • value : The raw value to set.

  • timeout_millis: The max amount of time in milliseconds the command is allowed to run (requested by the client).

  • side_channel_file: A special file path will be provided that can be used to comminicate with the parent process about out-of-band information. This file will be read after the command has finished executing. Based on the contents of the file, the behavior of the result may be modified.

The format of the file contents should be json with the following schema: { // If set the command will be considered a failure. // May be one of the following static strings: // “timeout”: Will Consider this task to be a timeout. “failure”: “timeout”, }

All fields are optional, file does not need to be created and may be empty.

  • action_directory: A “root” directory for the action. This directory can be used to store temporary files that are not needed after the action has completed. This directory will be purged after the action has completed.

For example: If an action writes temporary data to a path but nativelink should clean up this path after the job has executed, you may create any directory under the path provided in this variable. A common pattern would be to use `entrypoint` to set a shell script that reads this variable, `mkdir $ENV_VAR_NAME/tmp` and `export TMPDIR=$ENV_VAR_NAME/tmp`. Another example might be to bind-mount the `/tmp` path in a container to this path in `entrypoint`.

UploadActionResultConfig

{
ac_store: null,
upload_ac_results_strategy: "success_only",
historical_results_store: null,
upload_historical_results_strategy: null,
success_message_template: "example_string",
failure_message_template: "example_string"
}

Fields

  • ac_store (optional StoreRefName): Underlying AC store that the worker will use to publish execution results into. Objects placed in this store should be reachable from the scheduler/client-cas after they have finished updating. Default: {No uploading is done}

  • upload_ac_results_strategy (UploadCacheResultsStrategy): In which situations should the results be published to the ac_store, if set to SuccessOnly then only results with an exit code of 0 will be uploaded, if set to Everything all completed results will be uploaded.

Default: UploadCacheResultsStrategy::SuccessOnly

  • historical_results_store (optional StoreRefName): Store to upload historical results to. This should be a CAS store if set.

Default: {CAS store of parent}

  • upload_historical_results_strategy (optional UploadCacheResultsStrategy): In which situations should the results be published to the historical CAS. The historical CAS is where failures are published. These messages conform to the CAS key-value lookup format and are always a `HistoricalExecuteResponse` serialized message.

Default: UploadCacheResultsStrategy::FailuresOnly

  • success_message_template : Template to use for the `ExecuteResponse.message` property. This message is attached to the response before it is sent to the client. The following special variables are supported: - {digest_function} - Digest function used to calculate the action digest. - {action_digest_hash} - Action digest hash. - {action_digest_size} - Action digest size. - {historical_results_hash} - HistoricalExecuteResponse digest hash. - {historical_results_size} - HistoricalExecuteResponse digest size.

A common use case of this is to provide a link to the web page that contains more useful information for the user.

An example that is fully compatible with `bb_browser` is: https://example.com/my-instance-name-here/blobs/{digest_function}/action/{action_digest_hash}-{action_digest_size}/

Default: "" (no message)

  • failure_message_template : Same as `success_message_template` but for failure case.

An example that is fully compatible with `bb_browser` is: https://example.com/my-instance-name-here/blobs/{digest_function}/historical_execute_response/{historical_results_hash}-{historical_results_size}/

Default: "" (no message)

LocalWorkerConfig

{
name: "example_string",
worker_api_endpoint: {
uri: "example_string",
timeout: null,
tls_config: null
},
max_action_timeout: 42,
timeout_handled_externally: true,
entrypoint: "example_string",
experimental_precondition_script: null,
cas_fast_slow_store: "example_string",
upload_action_result: {
ac_store: null,
upload_ac_results_strategy: "success_only",
historical_results_store: null,
upload_historical_results_strategy: null,
success_message_template: "example_string",
failure_message_template: "example_string"
},
work_directory: "example_string",
platform_properties: [
{
example_string: {
values: []
}
}
],
additional_environment: null
}

Fields

  • name : Name of the worker. This is give a more friendly name to a worker for logging and metric publishing. Default: {Index position in the workers list}

  • worker_api_endpoint (EndpointConfig): Endpoint which the worker will connect to the scheduler’s WorkerApiService.

  • max_action_timeout (usize): The maximum time an action is allowed to run. If a task requests for a timeout longer than this time limit, the task will be rejected. Value in seconds.

Default: 1200 (seconds / 20 mins)

  • timeout_handled_externally (bool): If timeout is handled in `entrypoint` or another wrapper script. If set to true NativeLink will not honor the timeout the action requested and instead will always force kill the action after max_action_timeout has been reached. If this is set to false, the smaller value of the action’s timeout and max_action_timeout will be used to which NativeLink will kill the action.

The real timeout can be received via an environment variable set in: `EnvironmentSource::TimeoutMillis`.

Example on where this is useful: `entrypoint` launches the action inside a docker container, but the docker container may need to be downloaded. Thus the timer should not start until the docker container has started executing the action. In this case, action will likely be wrapped in another program, like `timeout` and propagate timeouts via `EnvironmentSource::SideChannelFile`.

Default: false (NativeLink fully handles timeouts)

  • entrypoint : The command to execute on every execution request. This will be parsed as a command + arguments (not shell). Example: “run.sh” and a job with command: “sleep 5” will result in a command like: “run.sh sleep 5”. Default: {Use the command from the job request}.

  • experimental_precondition_script (optional ): An optional script to run before every action is processed on the worker. The value should be the full path to the script to execute and will pause all actions on the worker if it returns an exit code other than 0. If not set, then the worker will never pause and will continue to accept jobs according to the scheduler configuration. This is useful, for example, if the worker should not take any more actions until there is enough resource available on the machine to handle them.

  • cas_fast_slow_store (StoreRefName): Underlying CAS store that the worker will use to download CAS artifacts. This store must be a `FastSlowStore`. The `fast` store must be a `FileSystemStore` because it will use hardlinks when building out the files instead of copying the files. The slow store must eventually resolve to the same store the scheduler/client uses to send job requests.

  • upload_action_result (UploadActionResultConfig): Configuration for uploading action results.

  • work_directory : The directory work jobs will be executed from. This directory will be fully managed by the worker service and will be purged on startup. This directory and the directory referenced in local_filesystem_store_ref’s stores::FilesystemStore::content_path must be on the same filesystem. Hardlinks will be used when placing files that are accessible to the jobs that are sourced from local_filesystem_store_ref’s content_path.

  • platform_properties (list of objects : WorkerProperty): Properties of this worker. This configuration will be sent to the scheduler and used to tell the scheduler to restrict what should be executed on this worker.

  • additional_environment (optional list of objects : EnvironmentSource): An optional mapping of environment names to set for the execution as well as those specified in the action itself. If set, will set each key as an environment variable before executing the job with the value of the environment variable being the value of the property of the action being executed of that name or the fixed value.

WorkerConfig

{
local: {
name: "example_string",
worker_api_endpoint: {
uri: "example_string",
timeout: null,
tls_config: null
},
max_action_timeout: 42,
timeout_handled_externally: true,
entrypoint: "example_string",
experimental_precondition_script: null,
cas_fast_slow_store: "example_string",
upload_action_result: {
ac_store: null,
upload_ac_results_strategy: "success_only",
historical_results_store: null,
upload_historical_results_strategy: null,
success_message_template: "example_string",
failure_message_template: "example_string"
},
work_directory: "example_string",
platform_properties: [
{
example_string: {
values: []
}
}
],
additional_environment: null
}
}

Variants

  • local (LocalWorkerConfig): A worker type that executes jobs locally on this machine.

GlobalConfig

{
max_open_files: 42,
idle_file_descriptor_timeout_millis: 42,
disable_metrics: true,
default_digest_hash_function: null,
default_digest_size_health_check: 42
}

Fields

  • max_open_files (usize): Maximum number of open files that can be opened at one time. This value is not strictly enforced, it is a best effort. Some internal libraries open files or read metadata from a files which do not obay this limit, however the vast majority of cases will have this limit be honored. As a rule of thumb this value should be less than half the value of `ulimit -n`. Any network open file descriptors is not counted in this limit, but is counted in the kernel limit. It is a good idea to set a very large `ulimit -n`. Note: This value must be greater than 10.

Default: 512

  • idle_file_descriptor_timeout_millis (u64): If a file descriptor is idle for this many milliseconds, it will be closed. In the event a client or store takes a long time to send or receive data the file descriptor will be closed, and since `max_open_files` blocks new open_file requests until a slot opens up, it will allow new requests to be processed. If a read or write is attempted on a closed file descriptor, the file will be reopened and the operation will continue.

On services where worker(s) and scheduler(s) live in the same process, this also prevents deadlocks if a file->file copy is happening, but cannot open a new file descriptor because the limit has been reached.

Default: 1000 (1 second)

  • disable_metrics (bool): This flag can be used to prevent metrics from being collected at runtime. Metrics are still able to be collected, but this flag prevents metrics that are collected at runtime (performance metrics) from being tallied. The overhead of collecting metrics is very low, so this flag should only be used if there is a very good reason to disable metrics. This flag can be forcably set using the `NATIVELINK_DISABLE_METRICS` variable. If the variable is set it will always disable metrics regardless of what this flag is set to.

Default: true \(disabled\) if no prometheus service enabled, false otherwise

  • default_digest_hash_function (optional ConfigDigestHashFunction): Default hash function to use while uploading blobs to the CAS when not set by client.

Default: ConfigDigestHashFunction::sha256

  • default_digest_size_health_check (usize): Default digest size to use for health check when running diagnostics checks. Health checks are expected to use this size for filling a buffer that is used for creation of digest.

Default: 1024*1024 (1MiB)

CasConfig

{
stores: [
{
example_string: {
memory: {
eviction_policy: null
}
}
}
],
workers: null,
schedulers: null,
servers: [],
global: null
}

Fields

  • stores (list of objects StoreRefName: StoreConfig): List of stores available to use in this config. The keys can be used in other configs when needing to reference a store.

  • workers (optional list of WorkerConfig): Worker configurations used to execute jobs.

  • schedulers (optional list of objects SchedulerRefName: SchedulerConfig): List of schedulers available to use in this config. The keys can be used in other configs when needing to reference a scheduler.

  • servers (list of ServerConfig): Servers to setup for this process.

  • global (optional GlobalConfig): Any global configurations that apply to all modules live here.

schedulers

SchedulerConfig

{
simple: {
supported_platform_properties: null,
retain_completed_for_s: 42,
worker_timeout_s: 42,
max_job_retries: 42,
allocation_strategy: "least_recently_used",
experimental_backend: null
}
}

Variants

PropertyType

"minimum"

When the scheduler matches tasks to workers that are capable of running the task, this value will be used to determine how the property is treated.

Variants

  • minimum: Requires the platform property to be a u64 and when the scheduler looks for appropriate worker nodes that are capable of executing the task, the task will not run on a node that has less than this value.

  • exact: Requires the platform property to be a string and when the scheduler looks for appropriate worker nodes that are capable of executing the task, the task will not run on a node that does not have this property set to the value with exact string match.

  • priority: Does not restrict on this value and instead will be passed to the worker as an informational piece. TODO(allada) In the future this will be used by the scheduler and worker to cause the scheduler to prefer certain workers over others, but not restrict them based on these values.

WorkerAllocationStrategy

"least_recently_used"

When a worker is being searched for to run a job, this will be used on how to choose which worker should run the job when multiple workers are able to run the task.

Variants

  • least_recently_used: Prefer workers that have been least recently used to run a job.

  • most_recently_used: Prefer workers that have been most recently used to run a job.

SimpleScheduler

{
supported_platform_properties: null,
retain_completed_for_s: 42,
worker_timeout_s: 42,
max_job_retries: 42,
allocation_strategy: "least_recently_used",
experimental_backend: null
}

Fields

  • supported_platform_properties (optional list of objects : PropertyType): A list of supported platform properties mapped to how these properties are used when the scheduler looks for worker nodes capable of running the task.

For example, a value of:

{ "cpu_count": "minimum", "cpu_arch": "exact" }

With a job that contains:

{ "cpu_count": "8", "cpu_arch": "arm" }

Will result in the scheduler filtering out any workers that do not have “cpu_arch” = “arm” and filter out any workers that have less than 8 cpu cores available.

The property names here must match the property keys provided by the worker nodes when they join the pool. In other words, the workers will publish their capabilities to the scheduler when they join the worker pool. If the worker fails to notify the scheduler of its (for example) “cpu_arch”, the scheduler will never send any jobs to it, if all jobs have the “cpu_arch” label. There is no special treatment of any platform property labels other and entirely driven by worker configs and this config.

  • retain_completed_for_s (u32): The amount of time to retain completed actions in memory for in case a WaitExecution is called after the action has completed. Default: 60 (seconds)

  • worker_timeout_s (u64): Remove workers from pool once the worker has not responded in this amount of time in seconds. Default: 5 (seconds)

  • max_job_retries (usize): If a job returns an internal error or times out this many times when attempting to run on a worker the scheduler will return the last error to the client. Jobs will be retried and this configuration is to help prevent one rogue job from infinitely retrying and taking up a lot of resources when the task itself is the one causing the server to go into a bad state. Default: 3

  • allocation_strategy (WorkerAllocationStrategy): The strategy used to assign workers jobs.

  • experimental_backend (optional ExperimentalSimpleSchedulerBackend): The storage backend to use for the scheduler. Default: memory

ExperimentalSimpleSchedulerBackend

"memory"

Variants

ExperimentalRedisSchedulerBackend

{
redis_store: "example_string"
}

Fields

  • redis_store (StoreRefName): A reference to the redis store to use for the scheduler. Note: This MUST resolve to a RedisStore.

GrpcScheduler

{
endpoint: {
address: "example_string",
tls_config: null,
concurrency_limit: null
},
retry: {
max_retries: 42,
delay: 3.14,
jitter: 3.14,
retry_on_errors: null
},
max_concurrent_requests: 42,
connections_per_endpoint: 42
}

A scheduler that simply forwards requests to an upstream scheduler. This is useful to use when doing some kind of local action cache or CAS away from the main cluster of workers. In general, it’s more efficient to point the build at the main scheduler directly though.

Fields

  • endpoint (GrpcEndpoint): The upstream scheduler to forward requests to.

  • retry (Retry): Retry configuration to use when a network request fails.

  • max_concurrent_requests (usize): Limit the number of simultaneous upstream requests to this many. A value of zero is treated as unlimited. If the limit is reached the request is queued.

  • connections_per_endpoint (usize): The number of connections to make to each specified endpoint to balance the load over multiple TCP connections. Default 1.

CacheLookupScheduler

{
ac_store: "example_string",
scheduler: "Box"
}

Fields

  • ac_store (StoreRefName): The reference to the action cache store used to return cached actions from rather than running them again. To prevent unintended issues, this store should probably be a CompletenessCheckingStore.

  • scheduler (box of SchedulerConfig): The nested scheduler to use if cache lookup fails.

PlatformPropertyAddition

{
name: "example_string",
value: "example_string"
}

Fields

  • name : The name of the property to add.

  • value : The value to assign to the property.

PropertyModification

{
add: {
name: "example_string",
value: "example_string"
}
}

Variants

  • add (PlatformPropertyAddition): Add a property to the action properties.

  • remove : Remove a named property from the action.

PropertyModifierScheduler

{
modifications: [],
scheduler: "Box"
}

Fields

  • modifications (list of PropertyModification): A list of modifications to perform to incoming actions for the nested scheduler. These are performed in order and blindly, so removing a property that doesn’t exist is fine and overwriting an existing property is also fine. If adding properties that do not exist in the nested scheduler is not supported and will likely cause unexpected behaviour.

  • scheduler (box of SchedulerConfig): The nested scheduler to use after modifying the properties.

serde_utils

stores

StoreRefName

Name of the store. This type will be used when referencing a store in the `CasConfig::stores`‘s map key.

ConfigDigestHashFunction

"sha256"

Variants

StoreConfig

{
memory: {
eviction_policy: null
}
}

Variants

  • memory (MemoryStore): Memory store will store all data in a hashmap in memory.

**Example JSON Config:**

"memory": {
"eviction_policy": {
// 10mb.
"max_bytes": 10000000,
}
}
}
  • experimental_s3_store (S3Store): S3 store will use Amazon’s S3 service as a backend to store the files. This configuration can be used to share files across multiple instances.

This configuration will never delete files, so you are responsible for purging old files in other ways.

**Example JSON Config:**

"experimental_s3_store": {
"region": "eu-north-1",
"bucket": "crossplane-bucket-af79aeca9",
"key_prefix": "test-prefix-index/",
"retry": {
"max_retries": 6,
"delay": 0.3,
"jitter": 0.5
},
"multipart_max_concurrent_uploads": 10
}
  • verify (box of VerifyStore): Verify store is used to apply verifications to an underlying store implementation. It is strongly encouraged to validate as much data as you can before accepting data from a client, failing to do so may cause the data in the store to be populated with invalid data causing all kinds of problems.

The suggested configuration is to have the CAS validate the hash and size and the AC validate nothing.

**Example JSON Config:**

"verify": {
"memory": {
"eviction_policy": {
"max_bytes": 500000000 // 500mb.
}
},
"verify_size": true,
"hash_verification_function": "sha256"
}
  • completeness_checking (box of CompletenessCheckingStore): Completeness checking store verifies if the output files & folders exist in the CAS before forwarding the request to the underlying store. Note: This store should only be used on AC stores.

**Example JSON Config:**

"completeness_checking": {
"backend": {
"filesystem": {
"content_path": "~/.cache/nativelink/content_path-ac",
"temp_path": "~/.cache/nativelink/tmp_path-ac",
"eviction_policy": {
// 500mb.
"max_bytes": 500000000,
}
}
},
"cas_store": {
"ref_store": {
"name": "CAS_MAIN_STORE"
}
}
}
  • compression (box of CompressionStore): A compression store that will compress the data inbound and outbound. There will be a non-trivial cost to compress and decompress the data, but in many cases if the final store is a store that requires network transport and/or storage space is a concern it is often faster and more efficient to use this store before those stores.

**Example JSON Config:**

"compression": {
"compression_algorithm": {
"lz4": {}
},
"backend": {
"filesystem": {
"content_path": "/tmp/nativelink/data/content_path-cas",
"temp_path": "/tmp/nativelink/data/tmp_path-cas",
"eviction_policy": {
// 2gb.
"max_bytes": 2000000000,
}
}
}
}
  • dedup (box of DedupStore): A dedup store will take the inputs and run a rolling hash algorithm on them to slice the input into smaller parts then run a sha256 algorithm on the slice and if the object doesn’t already exist, upload the slice to the `content_store` using a new digest of just the slice. Once all parts exist, an Action-Cache-like digest will be built and uploaded to the `index_store` which will contain a reference to each chunk/digest of the uploaded file. Downloading a request will first grab the index from the `index_store`, and forward the download content of each chunk as if it were one file.

This store is exceptionally good when the following conditions are met: * Content is mostly the same (inserts, updates, deletes are ok) * Content is not compressed or encrypted * Uploading or downloading from `content_store` is the bottleneck.

Note: This store pairs well when used with CompressionStore as the `content_store`, but never put DedupStore as the backend of CompressionStore as it will negate all the gains.

Note: When running `.has()` on this store, it will only check to see if the entry exists in the `index_store` and not check if the individual chunks exist in the `content_store`.

**Example JSON Config:**

"dedup": {
"index_store": {
"memory_store": {
"max_size": 1000000000, // 1GB
"eviction_policy": "LeastRecentlyUsed"
}
},
"content_store": {
"compression": {
"compression_algorithm": {
"lz4": {}
},
"backend": {
"fast_slow": {
"fast": {
"memory_store": {
"max_size": 500000000, // 500MB
"eviction_policy": "LeastRecentlyUsed"
}
},
"slow": {
"filesystem": {
"content_path": "/tmp/nativelink/data/content_path-content",
"temp_path": "/tmp/nativelink/data/tmp_path-content",
"eviction_policy": {
"max_bytes": 2000000000 // 2gb.
}
}
}
}
}
}
}
}
  • existence_cache (box of ExistenceCacheStore): Existence store will wrap around another store and cache calls to has so that subsequent has_with_results calls will be faster. This is useful for cases when you have a store that is slow to respond to has calls. Note: This store should only be used on CAS stores.

**Example JSON Config:**

"existence_cache": {
"backend": {
"memory": {
"eviction_policy": {
// 500mb.
"max_bytes": 500000000,
}
}
},
"cas_store": {
"ref_store": {
"name": "CAS_MAIN_STORE"
}
}
}
  • fast_slow (box of FastSlowStore): FastSlow store will first try to fetch the data from the `fast` store and then if it does not exist try the `slow` store. When the object does exist in the `slow` store, it will copy the data to the `fast` store while returning the data. This store should be thought of as a store that “buffers” the data to the `fast` store. On uploads it will mirror data to both `fast` and `slow` stores.

WARNING: If you need data to always exist in the `slow` store for something like remote execution, be careful because this store will never check to see if the objects exist in the `slow` store if it exists in the `fast` store (ie: it assumes that if an object exists `fast` store it will exist in `slow` store).

***Example JSON Config:***

"fast_slow": {
"fast": {
"filesystem": {
"content_path": "/tmp/nativelink/data/content_path-index",
"temp_path": "/tmp/nativelink/data/tmp_path-index",
"eviction_policy": {
// 500mb.
"max_bytes": 500000000,
}
}
},
"slow": {
"filesystem": {
"content_path": "/tmp/nativelink/data/content_path-index",
"temp_path": "/tmp/nativelink/data/tmp_path-index",
"eviction_policy": {
// 500mb.
"max_bytes": 500000000,
}
}
}
}
  • shard (ShardStore): Shards the data to multiple stores. This is useful for cases when you want to distribute the load across multiple stores. The digest hash is used to determine which store to send the data to.

**Example JSON Config:**

"shard": {
"stores": [
"memory": {
"eviction_policy": {
// 10mb.
"max_bytes": 10000000
},
"weight": 1
}
]
}
  • filesystem (FilesystemStore): Stores the data on the filesystem. This store is designed for local persistent storage. Restarts of this program should restore the previous state, meaning anything uploaded will be persistent as long as the filesystem integrity holds. This store uses the filesystem’s `atime` (access time) to hold the last touched time of the file(s).

**Example JSON Config:**

"filesystem": {
"content_path": "/tmp/nativelink/data-worker-test/content_path-cas",
"temp_path": "/tmp/nativelink/data-worker-test/tmp_path-cas",
"eviction_policy": {
// 10gb.
"max_bytes": 10000000000,
}
}
  • ref_store (RefStore): Store used to reference a store in the root store manager. This is useful for cases when you want to share a store in different nested stores. Example, you may want to share the same memory store used for the action cache, but use a FastSlowStore and have the fast store also share the memory store for efficiency.

**Example JSON Config:**

"ref_store": {
"name": "FS_CONTENT_STORE"
}
  • size_partitioning (box of SizePartitioningStore): Uses the size field of the digest to separate which store to send the data. This is useful for cases when you’d like to put small objects in one store and large objects in another store. This should only be used if the size field is the real size of the content, in other words, don’t use on AC (Action Cache) stores. Any store where you can safely use VerifyStore.verify_size = true, this store should be safe to use (ie: CAS stores).

**Example JSON Config:**

"size_partitioning": {
"size": 134217728, // 128mib.
"lower_store": {
"memory": {
"eviction_policy": {
"max_bytes": "${NATIVELINK_CAS_MEMORY_CONTENT_LIMIT:-100000000}"
}
}
},
"upper_store": {
/// This store discards data larger than 128mib.
"noop": {}
}
}
  • grpc (GrpcStore): This store will pass-through calls to another GRPC store. This store is not designed to be used as a sub-store of another store, but it does satisfy the interface and will likely work.

One major GOTCHA is that some stores use a special function on this store to get the size of the underlying object, which is only reliable when this store is serving the a CAS store, not an AC store. If using this store directly without being a child of any store there are no side effects and is the most efficient way to use it.

**Example JSON Config:**

"grpc": {
"instance_name": "main",
"endpoints": [
{"address": "grpc://${CAS_ENDPOINT:-127.0.0.1}:50051"}
],
"store_type": "ac"
}
  • redis_store (RedisStore): Stores data in any stores compatible with Redis APIs.

Pairs well with SizePartitioning and/or FastSlow stores. Ideal for accepting small object sizes as most redis store services have a max file upload of between 256Mb-512Mb.

**Example JSON Config:**

"redis_store": {
"addresses": [
"redis://127.0.0.1:6379/",
]
}
  • noop: Noop store is a store that sends streams into the void and all data retrieval will return 404 (NotFound). This can be useful for cases where you may need to partition your data and part of your data needs to be discarded.

**Example JSON Config:**

"noop": {}

ShardConfig

{
store: {
memory: {
eviction_policy: null
}
},
weight: null
}

Configuration for an individual shard of the store.

Fields

  • store (StoreConfig): Store to shard the data to.

  • weight (optional u32): The weight of the store. This is used to determine how much data should be sent to the store. The actual percentage is the sum of all the store’s weights divided by the individual store’s weight.

Default: 1

ShardStore

{
stores: []
}

Fields

  • stores (list of ShardConfig): Stores to shard the data to.

SizePartitioningStore

{
size: 42,
lower_store: {
memory: {
eviction_policy: null
}
},
upper_store: {
memory: {
eviction_policy: null
}
}
}

Fields

  • size (u64): Size to partition the data on.

  • lower_store (StoreConfig): Store to send data when object is < (less than) size.

  • upper_store (StoreConfig): Store to send data when object is >= (less than eq) size.

RefStore

{
name: "example_string"
}

Fields

  • name : Name of the store under the root “stores” config object.

FilesystemStore

{
content_path: "example_string",
temp_path: "example_string",
read_buffer_size: 42,
eviction_policy: null,
block_size: 42
}

Fields

  • content_path : Path on the system where to store the actual content. This is where the bulk of the data will be placed. On service bootup this folder will be scanned and all files will be added to the cache. In the event one of the files doesn’t match the criteria, the file will be deleted.

  • temp_path : A temporary location of where files that are being uploaded or deleted will be placed while the content cannot be guaranteed to be accurate. This location must be on the same block device as `content_path` so atomic moves can happen (ie: move without copy). All files in this folder will be deleted on every startup.

  • read_buffer_size (u32): Buffer size to use when reading files. Generally this should be left to the default value except for testing. Default: 32k.

  • eviction_policy (optional EvictionPolicy): Policy used to evict items out of the store. Failure to set this value will cause items to never be removed from the store causing infinite memory usage.

  • block_size (u64): The block size of the filesystem for the running machine value is used to determine an entry’s actual size on disk consumed For a 4KB block size filesystem, a 1B file actually consumes 4KB Default: 4096

FastSlowStore

{
fast: {
memory: {
eviction_policy: null
}
},
slow: {
memory: {
eviction_policy: null
}
}
}

Fields

  • fast (StoreConfig): Fast store that will be attempted to be contacted before reaching out to the `slow` store.

  • slow (StoreConfig): If the object does not exist in the `fast` store it will try to get it from this store.

MemoryStore

{
eviction_policy: null
}

Fields

  • eviction_policy (optional EvictionPolicy): Policy used to evict items out of the store. Failure to set this value will cause items to never be removed from the store causing infinite memory usage.

DedupStore

{
index_store: {
memory: {
eviction_policy: null
}
},
content_store: {
memory: {
eviction_policy: null
}
},
min_size: 42,
normal_size: 42,
max_size: 42,
max_concurrent_fetch_per_get: 42
}

Fields

  • index_store (StoreConfig): Store used to store the index of each dedup slice. This store should generally be fast and small.

  • content_store (StoreConfig): The store where the individual chunks will be uploaded. This store should generally be the slower & larger store.

  • min_size (u32): Minimum size that a chunk will be when slicing up the content. Note: This setting can be increased to improve performance because it will actually not check this number of bytes when deciding where to partition the data.

Default: 65536 (64k)

  • normal_size (u32): A best-effort attempt will be made to keep the average size of the chunks to this number. It is not a guarantee, but a slight attempt will be made.

This value will also be about the threshold used to determine if we should even attempt to dedup the entry or just forward it directly to the content_store without an index. The actual value will be about `normal_size * 1.3` due to implementation details.

Default: 262144 (256k)

  • max_size (u32): Maximum size a chunk is allowed to be.

Default: 524288 (512k)

  • max_concurrent_fetch_per_get (u32): Due to implementation detail, we want to prefer to download the first chunks of the file so we can stream the content out and free up some of our buffers. This configuration will be used to to restrict the number of concurrent chunk downloads at a time per `get()` request.

This setting will also affect how much memory might be used per `get()` request. Estimated worst case memory per `get()` request is: `max_concurrent_fetch_per_get * max_size`.

Default: 10

ExistenceCacheStore

{
backend: {
memory: {
eviction_policy: null
}
},
eviction_policy: null
}

Fields

  • backend (StoreConfig): The underlying store wrap around. All content will first flow through self before forwarding to backend. In the event there is an error detected in self, the connection to the backend will be terminated, and early termination should always cause updates to fail on the backend.

  • eviction_policy (optional EvictionPolicy): Policy used to evict items out of the store. Failure to set this value will cause items to never be removed from the store causing infinite memory usage.

VerifyStore

{
backend: {
memory: {
eviction_policy: null
}
},
verify_size: true,
verify_hash: true
}

Fields

  • backend (StoreConfig): The underlying store wrap around. All content will first flow through self before forwarding to backend. In the event there is an error detected in self, the connection to the backend will be terminated, and early termination should always cause updates to fail on the backend.

  • verify_size (bool): If set the store will verify the size of the data before accepting an upload of data.

This should be set to false for AC, but true for CAS stores.

  • verify_hash (bool): If the data should be hashed and verify that the key matches the computed hash. The hash function is automatically determined based request and if not set will use the global default.

This should be set to None for AC, but hashing function like `sha256` for CAS stores.

CompletenessCheckingStore

{
backend: {
memory: {
eviction_policy: null
}
},
cas_store: {
memory: {
eviction_policy: null
}
}
}

Fields

  • backend (StoreConfig): The underlying store that will have it’s results validated before sending to client.

  • cas_store (StoreConfig): When a request is made, the results are decoded and all output digests/files are verified to exist in this CAS store before returning success.

Lz4Config

{
block_size: 42,
max_decode_block_size: 42
}

Fields

  • block_size (u32): Size of the blocks to compress. Higher values require more ram, but might yield slightly better compression ratios.

Default: 65536 (64k).

  • max_decode_block_size (u32): Maximum size allowed to attempt to deserialize data into. This is needed because the block_size is embedded into the data so if there was a bad actor, they could upload an extremely large block_size’ed entry and we’d allocate a large amount of memory when retrieving the data. To prevent this from happening, we allow you to specify the maximum that we’ll attempt deserialize.

Default: value in `block_size`.

CompressionAlgorithm

{
lz4: {
block_size: 42,
max_decode_block_size: 42
}
}

Variants

  • lz4 (Lz4Config): LZ4 compression algorithm is extremely fast for compression and decompression, however does not perform very well in compression ratio. In most cases build artifacts are highly compressible, however lz4 is quite good at aborting early if the data is not deemed very compressible.

see: https://lz4.github.io/lz4/

CompressionStore

{
backend: {
memory: {
eviction_policy: null
}
},
compression_algorithm: {
lz4: {
block_size: 42,
max_decode_block_size: 42
}
}
}

Fields

  • backend (StoreConfig): The underlying store wrap around. All content will first flow through self before forwarding to backend. In the event there is an error detected in self, the connection to the backend will be terminated, and early termination should always cause updates to fail on the backend.

  • compression_algorithm (CompressionAlgorithm): The compression algorithm to use.

EvictionPolicy

{
max_bytes: 42,
evict_bytes: 42,
max_seconds: 42,
max_count: 42
}

Eviction policy always works on LRU (Least Recently Used). Any time an entry is touched it updates the timestamp. Inserts and updates will execute the eviction policy removing any expired entries and/or the oldest entries until the store size becomes smaller than max_bytes.

Fields

  • max_bytes (usize): Maximum number of bytes before eviction takes place. Default: 0. Zero means never evict based on size.

  • evict_bytes (usize): When eviction starts based on hitting max_bytes, continue until max_bytes - evict_bytes is met to create a low watermark. This stops operations from thrashing when the store is close to the limit. Default: 0

  • max_seconds (u32): Maximum number of seconds for an entry to live since it was last accessed before it is evicted. Default: 0. Zero means never evict based on time.

  • max_count (u64): Maximum size of the store before an eviction takes place. Default: 0. Zero means never evict based on count.

S3Store

{
region: "example_string",
bucket: "example_string",
key_prefix: null,
retry: {
max_retries: 42,
delay: 3.14,
jitter: 3.14,
retry_on_errors: null
},
consider_expired_after_s: 42,
max_retry_buffer_per_request: null,
multipart_max_concurrent_uploads: null,
insecure_allow_http: true,
disable_http2: true
}

Fields

  • region : S3 region. Usually us-east-1, us-west-2, af-south-1, exc…

  • bucket : Bucket name to use as the backend.

  • key_prefix (optional ): If you wish to prefix the location on s3. If None, no prefix will be used.

  • retry (Retry): Retry configuration to use when a network request fails.

  • consider_expired_after_s (u32): If the number of seconds since the `last_modified` time of the object is greater than this value, the object will not be considered “existing”. This allows for external tools to delete objects that have not been uploaded in a long time. If a client receives a NotFound the client should re-upload the object.

There should be sufficient buffer time between how long the expiration configuration of the external tool is and this value. Keeping items around for a few days is generally a good idea.

Default: 0. Zero means never consider an object expired.

  • max_retry_buffer_per_request (optional usize): The maximum buffer size to retain in case of a retryable error during upload. Setting this to zero will disable upload buffering; this means that in the event of a failure during upload, the entire upload will be aborted and the client will likely receive an error.

Default: 5MB.

  • multipart_max_concurrent_uploads (optional usize): Maximum number of concurrent UploadPart requests per MultipartUpload.

Default: 10.

  • insecure_allow_http (bool): Allow unencrypted HTTP connections. Only use this for local testing.

Default: false

  • disable_http2 (bool): Disable http/2 connections and only use http/1.1. Default client configuration will have http/1.1 and http/2 enabled for connection schemes. Http/2 should be disabled if environments have poor support or performance related to http/2. Safe to keep default unless underlying network environment or S3 API servers specify otherwise.

Default: false

StoreType

"cas"

Variants

  • cas: The store is content addressable storage.

  • ac: The store is an action cache.

ClientTlsConfig

{
ca_file: "example_string",
cert_file: null,
key_file: null
}

Fields

  • ca_file : Path to the certificate authority to use to validate the remote.

  • cert_file (optional ): Path to the certificate file for client authentication.

  • key_file (optional ): Path to the private key file for client authentication.

GrpcEndpoint

{
address: "example_string",
tls_config: null,
concurrency_limit: null
}

Fields

  • address : The endpoint address (i.e. grpc(s)://example.com:443).

  • tls_config (optional ClientTlsConfig): The TLS configuration to use to connect to the endpoint (if grpcs).

  • concurrency_limit (optional usize): The maximum concurrency to allow on this endpoint.

GrpcStore

{
instance_name: "example_string",
endpoints: [],
store_type: "cas",
retry: {
max_retries: 42,
delay: 3.14,
jitter: 3.14,
retry_on_errors: null
},
max_concurrent_requests: 42,
connections_per_endpoint: 42
}

Fields

  • instance_name : Instance name for GRPC calls. Proxy calls will have the instance_name changed to this.

  • endpoints (list of GrpcEndpoint): The endpoint of the grpc connection.

  • store_type (StoreType): The type of the upstream store, this ensures that the correct server calls are made.

  • retry (Retry): Retry configuration to use when a network request fails.

  • max_concurrent_requests (usize): Limit the number of simultaneous upstream requests to this many. A value of zero is treated as unlimited. If the limit is reached the request is queued.

  • connections_per_endpoint (usize): The number of connections to make to each specified endpoint to balance the load over multiple TCP connections. Default 1.

ErrorCode

"Cancelled"

The possible error codes that might occur on an upstream request.

Variants

  • Cancelled

  • Unknown

  • InvalidArgument

  • DeadlineExceeded

  • NotFound

  • AlreadyExists

  • PermissionDenied

  • ResourceExhausted

  • FailedPrecondition

  • Aborted

  • OutOfRange

  • Unimplemented

  • Internal

  • Unavailable

  • DataLoss

  • Unauthenticated

RedisStore

{
addresses: [],
response_timeout_s: 42,
connection_timeout_s: 42,
experimental_pub_sub_channel: null,
key_prefix: "example_string",
mode: "Cluster"
}

Fields

  • addresses (list of ): The hostname or IP address of the Redis server. Ex: [“redis://username:password@redis-server-url:6380/99”] 99 Represents database ID, 6380 represents the port.

  • response_timeout_s (u64): The response timeout for the Redis connection in seconds.

Default: 10

  • connection_timeout_s (u64): The connection timeout for the Redis connection in seconds.

Default: 10

  • experimental_pub_sub_channel (optional ): An optional and experimental Redis channel to publish write events to.

If set, every time a write operation is made to a Redis node then an event will be published to a Redis channel with the given name. If unset, the writes will still be made, but the write events will not be published.

Default: (Empty String / No Channel)

  • key_prefix : An optional prefix to prepend to all keys in this store.

Setting this value can make it convenient to query or organize your data according to the shared prefix.

Default: (Empty String / No Prefix)

  • mode (RedisMode): Set the mode Redis is operating in.

Available options are “cluster” for [cluster mode](https://redis\.io/docs/latest/operate/oss\_and\_stack/reference/cluster\-spec/\), “sentinel” for [sentinel mode](https://redis\.io/docs/latest/operate/oss\_and\_stack/management/sentinel/\), or “standard” if Redis is operating in neither cluster nor sentinel mode.

Default: standard,

RedisMode

"Cluster"

Variants

  • Cluster

  • Sentinel

  • Standard

Retry

{
max_retries: 42,
delay: 3.14,
jitter: 3.14,
retry_on_errors: null
}

Retry configuration. This configuration is exponential and each iteration a jitter as a percentage is applied of the calculated delay. For example:

Retry{
max_retries: 7,
delay: 0.1,
jitter: 0.5,
}

will result in: Attempt - Delay 1 0ms 2 75ms - 125ms 3 150ms - 250ms 4 300ms - 500ms 5 600ms - 1s 6 1.2s - 2s 7 2.4s - 4s 8 4.8s - 8s Remember that to get total results is additive, meaning the above results would mean a single request would have a total delay of 9.525s - 15.875s.

Fields

  • max_retries (usize): Maximum number of retries until retrying stops. Setting this to zero will always attempt 1 time, but not retry.

  • delay (f32): Delay in seconds for exponential back off.

  • jitter (f32): Amount of jitter to add as a percentage in decimal form. This will change the formula like:

random(
(2 ^ {attempt_number}) * {delay} * (1 - (jitter / 2)),
(2 ^ {attempt_number}) * {delay} * (1 + (jitter / 2)),
)
  • retry_on_errors (optional list of ErrorCode): A list of error codes to retry on, if this is not set then the default error codes to retry on are used. These default codes are the most likely to be non-permanent. - Unknown - Cancelled - DeadlineExceeded - ResourceExhausted - Aborted - Internal - Unavailable - DataLoss