[2/N] [History Server Beta] [Feat] LRU cache#4796
Draft
JiangJiaWei1103 wants to merge 28 commits into
Draft
Conversation
- Add package doc - Move flag declarations into a var() block - Add description strings to each flag - Add section dividers for readability - Group imports into stdlib / third-party / project - Remove dead initialization of runtimeClassConfigPath (overwritten by flag default) No behavior change. Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>
- Use logrus.Fatalf(...) for consistent log formatting - Add required-flag check on --runtime-class-name for fast fail Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>
- Replace manual sigChan + signal.Notify with signal.NotifyContext - Add bridge goroutine that closes the legacy stop ch when serverCtx fires No behavior change. Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>
Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>
- Add a Supervisor as /enter_cluster broker - Use in-process loaded set to record processed sessions - Add a Pipeline that wraps the parse steps - Remove eager processing on startup and ticker Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>
Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>
Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>
Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>
Member
Author
|
Alpha e2e tests pass and the following demonstrates manual test: History Server Beta - Lazy Loading + LRU Cache Will switch to "Ready for review" once #4795 ([1/N]) is merged. |
Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>
…name Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>
Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>
Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>
Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>
Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>
Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>
Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>
Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>
Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>
Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>
Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>
Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>
Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>
Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>
Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>
Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>
Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>
071c280 to
135c954
Compare
Signed-off-by: JiangJiaWei1103 <waynechuang97@gmail.com>
4 tasks
2 tasks
4 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Why are these changes needed?
Alpha Problems (Continued from [1/N])
EventHandlerin-memory maps without evictionBeta Strategy
In this PR, we add per-replica LRU cache bounded by the session snapshot count. So, the memory usage is bounded by
O(#sessions x max_snapshot_size).Note
This PR keeps the shared
EventHandlerpopulated alongside the LRU so reviewers can focus on bounded memory via LRU eviction with behavior matching [1/N]. Overall cleanup of shared maps will be done in the next PR.Overview
System Architecture
Diff vs [1/N]:
Supervisor's loaded set (binary present/absent) replaced bySnapshotLoader(bounded, evicting)Pipelinebuilds theSessionSnapshotfromEventHandler's per-session viewRequest Data Flow
Takeaways:
{clusterNameID}/{sessionName}(matches singleflight key shape)Primeplants the freshly-built snapshot into the LRU so the immediate follow-up handler call is zero-IORetry-After; frontend re-fires/enter_clusterChange Summary
SessionSnapshot(new)pkg/snapshot/snapshot.goSnapshotLoader(new)pkg/historyserver/cache.goLoadreturnsErrSnapshotNotFoundon miss;Primeinserts a freshly-built snapshotGetRawEventsByJobID(new method)pkg/eventserver/types/log_event.goLogEventview (vs. existing camelCase API view) for snapshot buildingPipeline(modified)pkg/historyserver/pipeline.goProcessSessionreturns(SessionStatus, *SessionSnapshot, error); new helpersbuildSnapshotFromHandler+groupTasksByIDSupervisor(modified)pkg/historyserver/supervisor.goloaded set, usingSnapshotLoader.Load+SnapshotLoader.Primepkg/historyserver/router.go/tasks,/actors,/jobs,/nodes,/events,/api/cluster_status) read fromloader.Loadpkg/historyserver/timeline.gogenerateTimelineFromSnapshot+ helpers (taskPrefix,getChromeTraceColor,extractActorIDFromTaskID)ServerHandler(modified)pkg/historyserver/server.go*SnapshotLoaderreference for handler readscmd/historyserver/main.goSnapshotLoader; new--snapshot-cache-sizeflag (default 100)Related issue number
Part of #4709.
How to test
Note on
pkg/eventserverfailures. Runninggo test ./pkg/eventserver/shows 5 pre-existing failures on master:Checks