diff --git a/docs/architecture/.nav.yml b/docs/architecture/.nav.yml index baa69923d..5608de0d1 100644 --- a/docs/architecture/.nav.yml +++ b/docs/architecture/.nav.yml @@ -2,3 +2,5 @@ flatten_single_child_sections: true nav: - "Introduction": index.md - "Sizing": sizing.md + - "Latency": latency.md + - "Latency examples": latency-examples.md diff --git a/docs/architecture/latency-examples.md b/docs/architecture/latency-examples.md new file mode 100644 index 000000000..555d4c8cf --- /dev/null +++ b/docs/architecture/latency-examples.md @@ -0,0 +1,148 @@ +# Latency: Typical scenarios examples + +This section showcases sample deployments, their latencies, and their results +on a specific test to serve as real-life reference examples. The importance of +latency is described in [this section](https://www.google.com/search?q=latency.md). + +Notice that values here are only a snapshot when they have been run. DSS code +has and will improve in the future, cloud providers network resources will +evolve with time and even placement of your virtual machine could randomly +impact latency and performances. + +All examples are run with version v0.22.0, one node per location, with virtual +machines with ample resources to focus on showing latency variations. +CockroachDB encryption was off to simplify tests. + +Locust instances are always located next to the DSS being tested, on the same +machine. The test ran is: + +```docker run -e AUTH_SPEC="DummyOAuth(http://172.17.0.1:8085/token,localhost)" -p 8089:8089 -v .:/app/ interuss/monitoring-dev uv run locust -f loadtest/locust_files/ISA.py -H http://172.17.0.1 -u 10 --uss-base-url http://dss.localututm```. + +The test has been chosen to be light and to be able to run it across a wide +number of configurations, with the possibility of adding one subscription to +force 'synchronization' between nodes. Others, more complex tests (like +FlightsInSubs which create one implicit subscription for each call) generate too +many contentions with version v0.22.0 to be meaningful. + +## US-West + +Cluster has been deployed on the west coast, in two providers. Regions between +providers are close, and the two machines in the same provider are in the same +region, but different availability zone. + +Latency measured by CockroachDB is as follows: + +![](../assets/generated/latency_west.png) + +Results after 2 minutes are: + +| Node | q/s | 50th latency | 95th latency | +| ---- | ----- | ------------ | ------------ | +| N1 | 19.35 | 2ms | 11ms | +| N2 | 19.01 | 3ms | 14ms | +| N3 | 18.2 | 17ms | 97ms | + +showing good performances in general. + +Notice primary CockroachDB node is probably located in 'Provider 1'. Other test +runs showed swapped latency, but that is to be expected. + +With one subscription added in the database, forcing a synchronized update: + +| Node | q/s | 50th latency | 95th latency | Failures/s | +| ---- | ----- | ------------ | ------------ | ---------- | +| N1 | 19.1 | 2ms | 17ms | 0 | +| N2 | 18.81 | 3ms | 14ms | 0 | +| N3 | 17.53 | 16ms | 120ms | 0 | + +While this shows a slight decrease in performance (approximately 2 percent for +q/s and 50th latency), the extra, synchronized step needed is almost invisible. + +## Across US + +Cluster has been deployed across the US, in two providers: one node in the west, +one node in the east and, in a different provider, in the center. + +Latency measured by CockroachDB is as follows: + +![](../assets/generated/latency_coast_to_coast.png) + +Results after 2 minutes are: + +| Node | q/s | 50th latency | 95th latency | +| ---- | ----- | ------------ | ------------ | +| N1 | 17.42 | 2ms | 120ms | +| N2 | 17.08 | 64ms | 350ms | +| N3 | 17.95 | 35ms | 170ms | + +That's a 7% loss in performances, 700% increased latency for the 50th percentile +and more than 1000% of increase for the 95% percentile. + +With one subscription added in the database, forcing a synchronized update: + +| Node | q/s | 50th latency | 95th latency | Failures/s | +| ---- | ---- | ------------ | ------------ | ---------- | +| N1 | 17.6 | 2ms | 92ms | 0 | +| N2 | 4.2 | 2200ms | 2500ms | 0 | +| N3 | 12.3 | 54ms | 100ms | 0 | + +That's a 40% loss in performances, 24000% increased latency for the 50th +percentile and more than 6000% of increase for the 95% percentile, compared to +the same step in previous scenario. + +The impact is way more visible there, as one is below 5q/s and very high +latency. CockroachDB leader still performs well. + +## Across Atlantic Ocean + +Cluster has been deployed across the US and in Europe, in two providers: one +node in the west, one in Europe and, in a different provider, in the center. + +Latency measured by CockroachDB is as follows: + +![](../assets/generated/latency_europe.png) + +Results after 2 minutes are: + +| Node | q/s | 50th latency | 95th latency | +| ---- | ---- | ------------ | ------------ | +| N1 | 18.4 | 2ms | 180ms | +| N2 | 14.2 | 150ms | 630ms | +| N3 | 18.4 | 34ms | 240ms | + +That's a 9% loss in performances, 1650% increased latency for the 50th +percentile and more than 2000% of increase for the 95% percentile. + +With one subscription added in the database, forcing a synchronized update: + +| Node | q/s | 50th latency | 95th latency | Failures/s | +| ---- | ---- | ------------ | ------------ | ---------- | +| N1 | 10.4 | 570ms | 4600ms | 0.15 | +| N2 | 1.85 | 10000ms | 10000ms | 0.5 | +| N3 | 5.45 | 1200ms | 10000ms | 0.18 | + +That's a 69% loss in performances, 122000% increased latency for the 50th +percentile and more than 35000% of increase for the 95% percentile, compared to +the same step in the first scenario. Compared to the non-synchronized case, we +lost about 63% of queries per second. + +The DSS is almost unusable there, especially for locations without the +CockroachDB leader, but even direct queries to it are affected. Most queries in +Europe timeout. + +## Summary + +![](../assets/latency_summary.png) + +The DSS runs on a 3-node CockroachDB cluster, tested across 3 geographic setups. +The further apart the nodes, the higher the inter-node latency (8.9ms -> 38.9ms +-> 94ms), which directly hits performance. Without a synchronized step, the +impact stays moderate: q/s drops only ~7% (across US) to ~9% (Atlantic). But +once a synchronized operation is forced (adding a subscription), the gap +explodes with distance. In US-West it's invisible, coast-to-coast q/s falls ~40% +with 50th latencies around ~750ms, and across the Atlantic the system becomes +nearly unusable: average q/s down to ~6 (−69%), latencies capped at 10000ms, and +failures appearing only there (up to 0.5/s). + +This test was done on endpoints performing relatively simple queries. More +complex ones, like SCD are even more affected. diff --git a/docs/architecture/latency.md b/docs/architecture/latency.md new file mode 100644 index 000000000..00c1d7378 --- /dev/null +++ b/docs/architecture/latency.md @@ -0,0 +1,101 @@ +# Latency + +The DSS keeps its data across all instances of a DSS pool using a single +distributed database cluster (e.g. CockroachDB, Yugabyte). + +Because the cluster is distributed and strongly consistent, the physical +distance between nodes directly affects how fast the DSS can answer. Latency is +therefore not a detail: it drives the responsiveness and the throughput of the +whole service. + +## Impact on performance + +### Time to answer a request + +When a call is made to the DSS, the core service queries the database. The +database is not a single node: it is a cluster spread across DSS participants. +To stay strongly consistent, a write must reach a quorum of nodes before it is +acknowledged, and a read may also need to contact other nodes depending on where +the data lives. + +Each of these internal hops adds to the round-trip latency between nodes. A single +API call can trigger more than one database operation, so the latencies stack +up. If the nodes are far apart, every consensus round pays that distance, and +the time to answer grows accordingly. + +![](../assets/generated/latency_tta.png) +/// caption +A generic request. Each +arrow adds cumulative latency +/// + +### Bandwidth + +Latency and bandwidth interact. With high latency, each operation (assuming no parallelism) takes longer +to complete, so fewer operations finish per second, which lowers effective +throughput even when raw bandwidth is available. Synchronization traffic between +nodes also competes for that bandwidth. + +This matters for high-volume cases such as many subscriptions or large query +results: the more data must be transferred and synchronized between distant +nodes, the more latency limits how much can realistically be moved in a given +time window. There is a practical ceiling on how much can be transferred before +responsiveness degrades. + +![](../assets/latency_impact.png) +/// caption +**Simulated** throughput vs +latency on a virtual link, practical throughput is even lower. +/// + +### Example of latency vs performance + +The following graph shows, in a controlled environment, the impact of latency on +simple requests. + +Tests have been done with DSS version v0.22.0, CockroachDB, 3 USS with one node, +on a virtual machine with ample resources. + +Latency is injected with +`tc qdisc add dev eth0 root netem delay Xms 2ms distribution paretonormal` ([documentation](https://man7.org/linux/man-pages/man8/tc-netem.8.html)), with +X ranging from 0 to 50, in steps of 5ms. No loss is applied, nor latency between +a DSS and its datastore. + +Performance is measured by creating and deleting RID ISAs, without any +subscriptions, as it performs simple queries and doesn't create congestion +issues. + +Queries are done against all DSS, with 3 processes in parallel (for a total +of 9) and at the same time to remove as many variations as possible. Notice +there is still some expected variance, the goal here is to show global trends. + +![](../assets/latency_rid.png) +/// caption +Results of testing RID ISA calls with +various latencies. +/// + +This is just an example of one call, but it shows that even with a simple +operation, there is a limit on how fast queries can be processed, no matter +optimizations performed. + +## Recommendations + +Keep nodes as close as possible, ideally within the same region, to minimize the +round-trip cost of consensus and synchronization. + +At the same time, you should probably keep distinct areas, datacenters and +providers for redundancy. A set of servers in a single rack will be very fast, +but have very low redundancy, for example if the rack loses power. On the +opposite side of the options, servers spread throughout the world will offer +maximum redundancy, for example against country-level issues, but will be very +slow. The goal is a balance: geographically close enough for low latency, +diverse enough for redundancy. + +Think and plan for the failures you want to handle, as a pool, and then spread +nodes across failure domains accordingly for resilience. + +## Typical scenarios + +Examples of typical scenarios, along with their latencies and impacts on +performance, are shown in this [dedicated section](latency-examples.md). diff --git a/docs/assets/generated/latency_coast_to_coast.png b/docs/assets/generated/latency_coast_to_coast.png new file mode 100644 index 000000000..651b460cc Binary files /dev/null and b/docs/assets/generated/latency_coast_to_coast.png differ diff --git a/docs/assets/generated/latency_europe.png b/docs/assets/generated/latency_europe.png new file mode 100644 index 000000000..fc46d61fb Binary files /dev/null and b/docs/assets/generated/latency_europe.png differ diff --git a/docs/assets/generated/latency_tta.png b/docs/assets/generated/latency_tta.png new file mode 100644 index 000000000..54a4336c0 Binary files /dev/null and b/docs/assets/generated/latency_tta.png differ diff --git a/docs/assets/generated/latency_west.png b/docs/assets/generated/latency_west.png new file mode 100644 index 000000000..405c0348a Binary files /dev/null and b/docs/assets/generated/latency_west.png differ diff --git a/docs/assets/latency_coast_to_coast.gv b/docs/assets/latency_coast_to_coast.gv new file mode 100644 index 000000000..bda7efa8e --- /dev/null +++ b/docs/assets/latency_coast_to_coast.gv @@ -0,0 +1,14 @@ +digraph time_to_answer { + rankdir=LR; + node [shape=circle, colorscheme=paired8, width=2.0, fixedsize=true]; + graph [dpi = 300]; + + N1 [label="N1\nProvider 1\nRegion W",color=1] + N2 [label="N2\nProvider 1\nRegion E",color=2] + N3 [label="N3\nProvider 2\nRegion C",color=3] + + N1 -> N2 [label="~58.8ms",dir=both] + N1 -> N3 [label="~31.1ms",dir=both] + N2 -> N3 [label="~26.7ms",dir=both] + +} diff --git a/docs/assets/latency_europe.gv b/docs/assets/latency_europe.gv new file mode 100644 index 000000000..6e50e8f1b --- /dev/null +++ b/docs/assets/latency_europe.gv @@ -0,0 +1,14 @@ +digraph time_to_answer { + rankdir=LR; + node [shape=circle, colorscheme=paired8, width=2.0, fixedsize=true]; + graph [dpi = 300]; + + N1 [label="N1\nProvider 1\nRegion W",color=1] + N2 [label="N2\nProvider 1\nRegion E",color=2] + N3 [label="N3\nProvider 2\nRegion C",color=3] + + N1 -> N2 [label="~141.9ms",dir=both] + N1 -> N3 [label="~31.2ms",dir=both] + N2 -> N3 [label="~109ms",dir=both] + +} diff --git a/docs/assets/latency_impact.png b/docs/assets/latency_impact.png new file mode 100644 index 000000000..90c7328f2 Binary files /dev/null and b/docs/assets/latency_impact.png differ diff --git a/docs/assets/latency_rid.png b/docs/assets/latency_rid.png new file mode 100644 index 000000000..82fcce2db Binary files /dev/null and b/docs/assets/latency_rid.png differ diff --git a/docs/assets/latency_summary.png b/docs/assets/latency_summary.png new file mode 100644 index 000000000..505fb7aac Binary files /dev/null and b/docs/assets/latency_summary.png differ diff --git a/docs/assets/latency_tta.gv b/docs/assets/latency_tta.gv new file mode 100644 index 000000000..00a0ed2e7 --- /dev/null +++ b/docs/assets/latency_tta.gv @@ -0,0 +1,28 @@ +digraph time_to_answer { + rankdir=LR; + node [shape=box, colorscheme=paired8]; + graph [dpi = 300]; + + client [label="Client"]; + core [label="DSS"]; + + subgraph cluster_db { + label="Distributed DB cluster"; + style=dashed; + leader [label="Node\n(leaseholder)"]; + peer1 [label="Peer node"]; + peer2 [label="Peer node"]; + } + + client -> core [label="API call"]; + core -> leader [label="query"]; + + leader -> peer1 [label="consensus\nround-trip"]; + leader -> peer2 [label="consensus\nround-trip"]; + peer1 -> leader [label="ack"]; + peer2 -> leader [label="ack (quorum)"]; + + leader -> core [label="result"]; + core -> client [label="response"]; + +} diff --git a/docs/assets/latency_west.gv b/docs/assets/latency_west.gv new file mode 100644 index 000000000..0aa919c6f --- /dev/null +++ b/docs/assets/latency_west.gv @@ -0,0 +1,14 @@ +digraph time_to_answer { + rankdir=LR; + node [shape=circle, colorscheme=paired8, width=2.0, fixedsize=true]; + graph [dpi = 300]; + + N1 [label="N1\nProvider 1\nRegion 1\nZone 1",color=1] + N2 [label="N2\nProvider 1\nRegion 1\nZone 2",color=2] + N3 [label="N3\nProvider 2\nRegion 1",color=3] + + N1 -> N2 [label="~0.78ms",dir=both] + N1 -> N3 [label="~13.4ms",dir=both] + N2 -> N3 [label="~12.6ms",dir=both] + +} diff --git a/docs/deployment_checklist.md b/docs/deployment_checklist.md index b781cd136..03f13714d 100644 --- a/docs/deployment_checklist.md +++ b/docs/deployment_checklist.md @@ -10,6 +10,7 @@ This checklist outlines the major decisions and steps required to deploy a non-l * This repository provides Terraform configurations for [Amazon Web Services (EKS)](infrastructure/aws.md) and [Google Cloud (GKE)](infrastructure/google.md) to deploy a Kubernetes cluster (the infrastructure into which the Services will be deployed). * This repository provides [Tanka](https://github.com/interuss/dss/blob/master/deploy/services/tanka/) files and [Helm Charts](https://github.com/interuss/dss/blob/master/deploy/services/helm-charts/dss) to be used to deploy Services into a Kubernetes cluster. Terraform will automatically generate these configurations if needed. * You may also choose to deploy manually or use custom configuration tools. + * Latency is an important factor to achieve good performances. Review the [latency documentation](architecture/latency.md) and plan with others participants of your DSS pool. * [ ] Prepare sufficient resources for the services. * In particular, review the [CockroachDB recommendations](https://www.cockroachlabs.com/docs/v24.1/recommended-production-settings#cloud-specific-recommendations) and [YugabyteDB recommendations](https://docs.yugabyte.com/stable/deploy/checklist/#public-clouds); the datastore will consume the majority of the resources. * Example sizing is also describled in [sizing](architecture/sizing.md). diff --git a/mkdocs.yml b/mkdocs.yml index a044de754..e45deba43 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -104,5 +104,8 @@ markdown_extensions: - attr_list # Enable creation of custom anchors + - pymdownx.blocks.caption + + extra_javascript: - assets/checkbox.js