interuss · the-glu · Jun 25, 2026
diff --git a/docs/architecture/.nav.yml b/docs/architecture/.nav.yml
@@ -2,3 +2,4 @@ flatten_single_child_sections: true
 nav:
   - "Introduction": index.md
   - "Sizing": sizing.md
+  - "Latency": latency.md
diff --git a/docs/architecture/latency.md b/docs/architecture/latency.md
@@ -0,0 +1,253 @@
+# Latency
+
+## Introduction
+
+The DSS keeps its data across all instances of a DSS pool using a single
+distributed database cluster (e.g. CockroachDB, Yugabyte).
+
+Because the cluster is distributed and strongly consistent, the physical
+distance between nodes directly affects how fast the DSS can answer. Latency is
+therefore not a detail: it drives the responsiveness and the throughput of the
+whole service.
+
+## Impact on performance
+
+### Time to answer a request
+
+When a call is made to the DSS, the core service queries the database. The
+database is not a single node: it is a cluster spread across DSS participants.
+To stay strongly consistent, a write must reach a quorum of nodes before it is
+acknowledged, and a read may also need to contact other nodes depending on where
+the data lives.
+
+Each of these internal hops adds to the round-trip latency between nodes. A single
+API call can trigger more than one database operation, so the latencies stack
+up. If the nodes are far apart, every consensus round pays that distance, and
+the time to answer grows accordingly.
+
+![](../assets/generated/latency_tta.png)
+/// caption
+A generic request. Each
+arrow adds cumulative latency
+///
+
+### Bandwidth
+
+Latency and bandwidth interact. With high latency, each operation (assuming no parallelism) takes longer
+to complete, so fewer operations finish per second, which lowers effective
+throughput even when raw bandwidth is available. Synchronization traffic between
+nodes also competes for that bandwidth.
+
+This matters for high-volume cases such as many subscriptions or large query
+results: the more data must be transferred and synchronized between distant
+nodes, the more latency limits how much can realistically be moved in a given
+time window. There is a practical ceiling on how much can be transferred before
+responsiveness degrades.
+
+![](../assets/latency_impact.png)
+/// caption
+**Simulated** throughput vs
+latency on a virtual link, practical throughput is even lower.
+///
+
+### Example of latency vs performance
+
+The following graph shows, in a controlled environment, the impact of latency on
+simple requests.
+
+Tests have been done with DSS version v0.22.0, CockroachDB, 3 USS with one node,
+on a virtual machine with ample resources.
+
+Latency is injected with
+`tc qdisc add dev eth0 root netem delay Xms 2ms distribution paretonormal`, with
+X ranging from 0 to 50, in steps of 5ms. No loss is applied, nor latency between
+a DSS and its datastore.
+
+Performance is measured by creating and deleting RID ISAs, without any
+subscriptions, as it performs simple queries and doesn't create congestion
+issues.
+
+Queries are done against all DSS, with 3 processes in parallel (for a total
+of 9) and at the same time to remove as many variations as possible. Notice
+there is still some expected variance, the goal here is to show global trends.
+
+![](../assets/latency_rid.png)
+/// caption
+Results of testing RID ISA calls with
+various latencies.
+///
+
+This is just an example of one call, but it shows that even with a simple
+operation, there is a limit on how fast queries can be processed, no matter
+optimizations performed.
+
+## Recommendations
+
+Keep nodes as close as possible, ideally within the same region, to minimize the
+round-trip cost of consensus and synchronization.
+
+At the same time, you should probably keep distinct areas, datacenters and
+providers for redundancy. A set of servers in a single rack will be very fast,
+but have very low redundancy, for example if the rack loses power. On the
+opposite side of the options, servers spread throughout the world will offer
+maximum redundancy, for example against country-level issues, but will be very
+slow. The goal is a balance: geographically close enough for low latency,
+diverse enough for redundancy.
+
+Think and plan for the failures you want to handle, as a pool, and then spread
+nodes across failure domains accordingly for resilience.
+
+## Typical scenarios examples
+
+This section shows sample deployments, their latencies and their results on a
+specific test to have as reference real-life examples.
+
+Notice that values here are only a snapshot when they have been run. DSS code
+has and will improve in the future, cloud providers network resources will
+evolve with time and even placement of your virtual machine could randomly
+impact latency and performances.
+
+All examples are run with version v0.22.0, one node per location, with virtual
+machines with ample resources to focus on showing latency variations.
+CockroachDB encryption was off to simplify tests.
+
+Locust instances are always located next to the DSS being tested, on the same
+machine. The test ran is
+`docker run -e AUTH_SPEC="DummyOAuth(http://172.17.0.1:8085/token,localhost)"  -p 8089:8089 -v .:/app/ interuss/monitoring-dev uv run locust -f loadtest/locust_files/ISA.py -H http://172.17.0.1 -u 10 --uss-base-url http://dss.localututm`.
+The test has been chosen to be light and to be able to run it across a wide
+number of configurations, with the possibility of adding one subscription to
+force 'synchronization' between nodes. Others, more complex tests (like
+FlightsInSubs which create one implicit subscription for each call) generate too
+many contentions with version v0.22.0 to be meaningful.
+
+### US-West
+
+Cluster has been deployed on the west coast, in two providers. Regions between
+providers are close, and the two machines in the same provider are in the same
+region, but different availability zone.
+
+Latency measured by CockroachDB is as follows:
+
+![](../assets/generated/latency_west.png)
+
+Results after 2 minutes are:
+
+| Node | q/s   | 50th latency | 95th latency |
+| ---- | ----- | ------------ | ------------ |
+| N1   | 19.35 | 2ms          | 11ms         |
+| N2   | 19.01 | 3ms          | 14ms         |
+| N3   | 18.2  | 17ms         | 97ms         |
+
+showing good performances in general.
+
+Notice primary CockroachDB node is probably located in 'Provider 1'. Other test
+runs showed swapped latency, but that is to be expected.
+
+With one subscription added in the database, forcing a synchronized update:
+
+| Node | q/s   | 50th latency | 95th latency | Failures/s |
+| ---- | ----- | ------------ | ------------ | ---------- |
+| N1   | 19.1  | 2ms          | 17ms         | 0          |
+| N2   | 18.81 | 3ms          | 14ms         | 0          |
+| N3   | 17.53 | 16ms         | 120ms        | 0          |
+
+While this shows a slight decrease in performance (approximately 2 percent for
+q/s and 50th latency), the extra, synchronized step needed is almost invisible.
+
+### Across US
+
+Cluster has been deployed across the US, in two providers: one node in the west,
+one node in the east and, in a different provider, in the center.
+
+Latency measured by CockroachDB is as follows:
+
+![](../assets/generated/latency_coast_to_coast.png)
+
+Results after 2 minutes are:
+
+| Node | q/s   | 50th latency | 95th latency |
+| ---- | ----- | ------------ | ------------ |
+| N1   | 17.42 | 2ms          | 120ms        |
+| N2   | 17.08 | 64ms         | 350ms        |
+| N3   | 17.95 | 35ms         | 170ms        |
+
+That's a 7% loss in performances, 700% increased latency for the 50th percentile
+and more than 1000% of increase for the 95% percentile.
+
+With one subscription added in the database, forcing a synchronized update:
+
+| Node | q/s  | 50th latency | 95th latency | Failures/s |
+| ---- | ---- | ------------ | ------------ | ---------- |
+| N1   | 17.6 | 2ms          | 92ms         | 0          |
+| N2   | 4.2  | 2200ms       | 2500ms       | 0          |
+| N3   | 12.3 | 54ms         | 100ms        | 0          |
+
+That's a 40% loss in performances, 24000% increased latency for the 50th
+percentile and more than 6000% of increase for the 95% percentile, compared to
+the same step in previous scenario.
+
+The impact is way more visible there, as one is below 5q/s and very high
+latency. CockroachDB leader still performs well.
+
+### Across Atlantic Ocean
+
+Cluster has been deployed across the US and in Europe, in two providers: one
+node in the west, one in Europe and, in a different provider, in the center.
+
+Latency measured by CockroachDB is as follows:
+
+![](../assets/generated/latency_europe.png)
+
+Results after 2 minutes are:
+
+| Node | q/s  | 50th latency | 95th latency |
+| ---- | ---- | ------------ | ------------ |
+| N1   | 18.4 | 2ms          | 180ms        |
+| N2   | 14.2 | 150ms        | 630ms        |
+| N3   | 18.4 | 34ms         | 240ms        |
+
+That's a 9% loss in performances, 1650% increased latency for the 50th
+percentile and more than 2000% of increase for the 95% percentile.
+
+With one subscription added in the database, forcing a synchronized update:
+
+| Node | q/s  | 50th latency | 95th latency | Failures/s |
+| ---- | ---- | ------------ | ------------ | ---------- |
+| N1   | 10.4 | 570ms        | 4600ms       | 0.15       |
+| N2   | 1.85 | 10000ms      | 10000ms      | 0.5        |
+| N3   | 5.45 | 1200ms       | 10000ms      | 0.18       |
+
+That's a 69% loss in performances, 122000% increased latency for the 50th
+percentile and more than 35000% of increase for the 95% percentile, compared to
+the same step in the first scenario. Compared to the non-synchronized case, we
+lost about 63% of queries per second.
+
+The DSS is almost unusable there, especially for locations without the
+CockroachDB leader, but even direct queries to it are affected. Most queries in
+Europe timeout.
+
+### Summary
+
+![](../assets/latency_summary.png)
+
+The DSS runs on a 3-node CockroachDB cluster, tested across 3 geographic setups.
+The further apart the nodes, the higher the inter-node latency (8.9ms -> 38.9ms
+-> 94ms), which directly hits performance. Without a synchronized step, the
+impact stays moderate: q/s drops only ~7% (across US) to ~9% (Atlantic). But
+once a synchronized operation is forced (adding a subscription), the gap
+explodes with distance. In US-West it's invisible, coast-to-coast q/s falls ~40%
+with 50th latencies around ~750ms, and across the Atlantic the system becomes
+nearly unusable: average q/s down to ~6 (−69%), latencies capped at 10000ms, and
+failures appearing only there (up to 0.5/s).
+
+This test was done on endpoints performing relatively simple queries. More
+complex ones, like SCD are even more affected.
+
+## Conclusion
+
+Repartition of nodes is a trade-off of latency against redundancy. Because the
+cluster is distributed and strongly consistent, node distance directly shapes
+both response time and effective throughput. Placing nodes close together keeps
+the service fast, while spreading them across failure domains keeps it
+resilient. Choosing where nodes live is mainly about finding the right balance
+between these two for the operational needs of the DSS pool.
diff --git a/docs/assets/generated/latency_coast_to_coast.png b/docs/assets/generated/latency_coast_to_coast.png
diff --git a/docs/assets/generated/latency_europe.png b/docs/assets/generated/latency_europe.png
diff --git a/docs/assets/generated/latency_tta.png b/docs/assets/generated/latency_tta.png
diff --git a/docs/assets/generated/latency_west.png b/docs/assets/generated/latency_west.png
diff --git a/docs/assets/latency_coast_to_coast.gv b/docs/assets/latency_coast_to_coast.gv
@@ -0,0 +1,14 @@
+digraph time_to_answer {
+    rankdir=LR;
+    node [shape=circle, colorscheme=paired8, width=2.0, fixedsize=true];
+    graph [dpi = 300];
+
+    N1 [label="N1\nProvider 1\nRegion W",color=1]
+    N2 [label="N2\nProvider 1\nRegion E",color=2]
+    N3 [label="N3\nProvider 2\nRegion C",color=3]
+
+    N1 -> N2 [label="~58.8ms",dir=both]
+    N1 -> N3 [label="~31.1ms",dir=both]
+    N2 -> N3 [label="~26.7ms",dir=both]
+
+}
diff --git a/docs/assets/latency_europe.gv b/docs/assets/latency_europe.gv
@@ -0,0 +1,14 @@
+digraph time_to_answer {
+    rankdir=LR;
+    node [shape=circle, colorscheme=paired8, width=2.0, fixedsize=true];
+    graph [dpi = 300];
+
+    N1 [label="N1\nProvider 1\nRegion W",color=1]
+    N2 [label="N2\nProvider 1\nRegion E",color=2]
+    N3 [label="N3\nProvider 2\nRegion C",color=3]
+
+    N1 -> N2 [label="~141.9ms",dir=both]
+    N1 -> N3 [label="~31.2ms",dir=both]
+    N2 -> N3 [label="~109ms",dir=both]
+
+}
diff --git a/docs/assets/latency_impact.png b/docs/assets/latency_impact.png
diff --git a/docs/assets/latency_rid.png b/docs/assets/latency_rid.png
diff --git a/docs/assets/latency_summary.png b/docs/assets/latency_summary.png
diff --git a/docs/assets/latency_tta.gv b/docs/assets/latency_tta.gv
@@ -0,0 +1,28 @@
+digraph time_to_answer {
+    rankdir=LR;
+    node [shape=box, colorscheme=paired8];
+    graph [dpi = 300];
+
+    client   [label="Client"];
+    core     [label="DSS"];
+
+    subgraph cluster_db {
+        label="Distributed DB cluster";
+        style=dashed;
+        leader   [label="Node\n(leaseholder)"];
+        peer1    [label="Peer node"];
+        peer2    [label="Peer node"];
+    }
+
+    client -> core    [label="API call"];
+    core   -> leader  [label="query"];
+
+    leader -> peer1   [label="consensus\nround-trip"];
+    leader -> peer2   [label="consensus\nround-trip"];
+    peer1  -> leader  [label="ack"];
+    peer2  -> leader  [label="ack (quorum)"];
+
+    leader -> core    [label="result"];
+    core   -> client  [label="response"];
+
+}
diff --git a/docs/assets/latency_west.gv b/docs/assets/latency_west.gv
@@ -0,0 +1,14 @@
+digraph time_to_answer {
+    rankdir=LR;
+    node [shape=circle, colorscheme=paired8, width=2.0, fixedsize=true];
+    graph [dpi = 300];
+
+    N1 [label="N1\nProvider 1\nRegion 1\nZone 1",color=1]
+    N2 [label="N2\nProvider 1\nRegion 1\nZone 2",color=2]
+    N3 [label="N3\nProvider 2\nRegion 1",color=3]
+
+    N1 -> N2 [label="~0.78ms",dir=both]
+    N1 -> N3 [label="~13.4ms",dir=both]
+    N2 -> N3 [label="~12.6ms",dir=both]
+
+}
diff --git a/docs/deployment_checklist.md b/docs/deployment_checklist.md
@@ -10,6 +10,7 @@ This checklist outlines the major decisions and steps required to deploy a non-l
     * This repository provides Terraform configurations for [Amazon Web Services (EKS)](infrastructure/aws.md) and [Google Cloud (GKE)](infrastructure/google.md) to deploy a Kubernetes cluster (the infrastructure into which the Services will be deployed).
     * This repository provides [Tanka](https://github.com/interuss/dss/blob/master/deploy/services/tanka/) files and [Helm Charts](https://github.com/interuss/dss/blob/master/deploy/services/helm-charts/dss) to be used to deploy Services into a Kubernetes cluster. Terraform will automatically generate these configurations if needed.
     * You may also choose to deploy manually or use custom configuration tools.
+    * Latency is an important factor to achieve good performances. Review the [latency documentation](architecture/latency.md) and plan with others participants of your DSS pool.
 * [ ] Prepare sufficient resources for the services.
     * In particular, review the [CockroachDB recommendations](https://www.cockroachlabs.com/docs/v24.1/recommended-production-settings#cloud-specific-recommendations) and [YugabyteDB recommendations](https://docs.yugabyte.com/stable/deploy/checklist/#public-clouds); the datastore will consume the majority of the resources.
     * Example sizing is also describled in [sizing](architecture/sizing.md).

diff --git a/mkdocs.yml b/mkdocs.yml
@@ -104,5 +104,8 @@ markdown_extensions:
 
   - attr_list  # Enable creation of custom anchors
 
+  - pymdownx.blocks.caption
+
+
 extra_javascript:
   - assets/checkbox.js