-
Notifications
You must be signed in to change notification settings - Fork 12
Incremental Parsing #2387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Incremental Parsing #2387
Changes from 19 commits
8e006a4
fc133cb
448776e
b877c4c
1857dad
984c3b1
e278045
68502d5
7ea0881
c7ad0bc
889bc91
22413ae
5b6ee14
af3d393
26bcdce
817f912
7928a59
d6ea352
4b9d1af
ea96a77
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -39,7 +39,9 @@ import { FlowrAnalyzerPlugin } from '../project/plugins/flowr-analyzer-plugin'; | |
| import { FlowrAnalyzerEnvironmentContext } from '../project/context/flowr-analyzer-environment-context'; | ||
| import { FlowrAnalyzerFunctionsContext } from '../project/context/flowr-analyzer-functions-context'; | ||
| import { FlowrAnalyzerMetaContext } from '../project/context/flowr-analyzer-meta-context'; | ||
| import { FlowrAnalyzerIncrementalAnalysisContext } from '../project/context/flowr-analyzer-incremental-analysis-context'; | ||
| import { FlowrConfig } from '../config'; | ||
| import { FlowrInlineTextFile } from '../project/context/flowr-file'; | ||
|
|
||
| async function analyzerQuickExample() { | ||
| const analyzer = await new FlowrAnalyzerBuilder() | ||
|
|
@@ -99,11 +101,12 @@ ${ | |
| 'How to add a new plugin': undefined, | ||
| }, | ||
| 'Context Information': { | ||
| 'Files Context': undefined, | ||
| 'Loading Order Context': undefined, | ||
| 'Dependencies Context': undefined, | ||
| 'Environment Context': undefined, | ||
| 'Meta Context': undefined, | ||
| 'Files Context': undefined, | ||
| 'Loading Order Context': undefined, | ||
| 'Dependencies Context': undefined, | ||
| 'Environment Context': undefined, | ||
| 'Meta Context': undefined, | ||
| 'Incremental Analysis Context': undefined, | ||
| }, | ||
| 'Caching': undefined | ||
| }) | ||
|
|
@@ -478,6 +481,50 @@ and the project namespace via | |
| ${ctx.linkM(FlowrAnalyzerMetaContext, 'getNamespace', { codeFont: true, realNameWrapper: 'i' })}. | ||
|
|
||
|
|
||
| ${section('Incremental Analysis Context', 3)} | ||
|
|
||
| The ${ctx.link(FlowrAnalyzerIncrementalAnalysisContext)} is a context that stores analysis information needed for making the next analysis run incremental by reusing the previous analysis results: | ||
|
|
||
| ${ctx.hierarchy(FlowrAnalyzerIncrementalAnalysisContext, { showImplSnippet: false })} | ||
|
|
||
| This context is not an analysis-result cache by itself. | ||
| Instead, it carries forward the minimal state needed by future incremental phases after an invalidation happened. | ||
| At the moment, it is used for incremental parsing with Tree-sitter, but it is intended to become the shared context for additional incremental analysis stages as well. | ||
|
|
||
| If the analyzer or context is reset, the incremental information is discarded via | ||
| ${ctx.linkM(FlowrAnalyzerIncrementalAnalysisContext, 'reset', { codeFont: true, realNameWrapper: 'i' })}. | ||
| In other words, this context only transports incremental handoff state between analysis runs. | ||
|
|
||
| ${section('Incremental Parsing', 4)} | ||
|
|
||
| Currently, the implemented use of this context is Tree-sitter's incremental parsing support. | ||
| When a file is represented by a mutable file provider such as ${ctx.link('FlowrInlineTextFile')} and its content is invalidated via | ||
| ${ctx.linkM(FlowrInlineTextFile, 'invalidate', { codeFont: true, realNameWrapper: 'i' })}, | ||
| the analyzer receives a file invalidation event. | ||
| At that point, the incremental context only records the file path together with the old source text. | ||
| No edit region is computed eagerly during invalidation. | ||
|
|
||
| After a successful parse-oriented analysis run, the analyzer cache stores the latest Tree-sitter parse trees in this context via | ||
| ${ctx.linkM(FlowrAnalyzerIncrementalAnalysisContext, 'storeOldParseResults', { codeFont: true, realNameWrapper: 'i' })}. | ||
| This gives the next parse run access to the last completed parse snapshot for each file path. | ||
|
|
||
| On the next parse run, Tree-sitter combines both pieces of information lazily: | ||
|
|
||
| * the previous parse tree obtained from | ||
| ${ctx.linkM(FlowrAnalyzerIncrementalAnalysisContext, 'getOldParseResultOf', { codeFont: true, realNameWrapper: 'i' })} | ||
| * the old source text obtained from | ||
| ${ctx.linkM(FlowrAnalyzerIncrementalAnalysisContext, 'getAndRemoveOldContentOf', { codeFont: true, realNameWrapper: 'i' })} | ||
|
|
||
| Using these together with the current file content, flowR computes a minimal ${ctx.link('Parser.Edit')} only when a new parse is actually requested. | ||
| If the file content did not change, the previous tree can be reused directly. | ||
| Otherwise, the edit is applied to the previous tree and Tree-sitter reparses incrementally instead of starting from scratch. | ||
| The stored old-content entry is consumed when it is used, so invalidation state only survives until the next relevant parse. | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. der letzte satz ist grammatikalisch vlt noch ein wenig verwirrend, das könnte man noch ein bisschne umformulieren 🙈 |
||
|
|
||
| ${section('Incremental Dataflow', 4)} | ||
|
|
||
| This context is planned to also support future incremental dataflow graph computation. | ||
|
|
||
|
|
||
| ${section('Caching', 2)} | ||
|
|
||
| To speed up analyses, flowR provides a caching mechanism that stores intermediate results of the analysis. | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,5 +1,5 @@ | ||
| import type { KnownParser } from '../../r-bridge/parser'; | ||
| import { type CacheInvalidationEvent, CacheInvalidationEventType, FlowrCache } from './flowr-cache'; | ||
| import type { KnownParser, ParseStepOutput } from '../../r-bridge/parser'; | ||
| import { type InvalidationEvent, InvalidationEventType, FlowrCache } from './flowr-cache'; | ||
| import { | ||
| createDataflowPipeline, | ||
| type DEFAULT_DATAFLOW_PIPELINE, | ||
|
|
@@ -18,7 +18,7 @@ import type { FlowrAnalyzerContext } from '../context/flowr-analyzer-context'; | |
| import { FlowrAnalyzerControlFlowCache } from './flowr-analyzer-controlflow-cache'; | ||
| import type { CallGraph } from '../../dataflow/graph/call-graph'; | ||
| import { computeCallGraph } from '../../dataflow/graph/call-graph'; | ||
|
|
||
| import type { Tree } from 'web-tree-sitter'; | ||
| interface FlowrAnalyzerCacheOptions<Parser extends KnownParser> { | ||
| parser: Parser; | ||
| context: FlowrAnalyzerContext; | ||
|
|
@@ -56,30 +56,33 @@ export class FlowrAnalyzerCache<Parser extends KnownParser> extends FlowrCache<A | |
| }) as AnalyzerPipelineExecutor<Parser>; | ||
| this.controlFlowCache = new FlowrAnalyzerControlFlowCache(); | ||
| this.callGraphCache = undefined; | ||
| this.computeIfAbsent(true, () => this.pipeline?.getResults(true)); | ||
| } | ||
|
|
||
| public static create<Parser extends KnownParser>(data: FlowrAnalyzerCacheOptions<Parser>): FlowrAnalyzerCache<Parser> { | ||
| return new FlowrAnalyzerCache<Parser>(data); | ||
| } | ||
|
|
||
| public override receive(event: CacheInvalidationEvent): void { | ||
| public override receive(event: InvalidationEvent): void { | ||
| super.receive(event); | ||
| switch(event.type) { | ||
| case CacheInvalidationEventType.Full: | ||
| const type = event.type; | ||
| switch(type) { | ||
| case InvalidationEventType.Full: | ||
| case InvalidationEventType.FileInvalidate: | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Da wir die bisher nich batchen noch |
||
| this.initCacheProviders(); | ||
| break; | ||
| default: | ||
| assertUnreachable(event.type); | ||
| assertUnreachable(type); | ||
| } | ||
| } | ||
|
|
||
| private get(): AnalyzerCacheType<Parser> { | ||
| /* this will do a ref assignment, so indirect force */ | ||
| return this.computeIfAbsent(false, () => this.pipeline.getResults(true)); | ||
| return this.computeIfAbsent(false, () => this.pipeline?.getResults(true)); | ||
| } | ||
|
|
||
| public reset() { | ||
| this.receive({ type: CacheInvalidationEventType.Full }); | ||
| this.receive({ type: InvalidationEventType.Full }); | ||
| } | ||
|
|
||
| private async runTapeUntil<T>(force: boolean | undefined, until: () => T | undefined): Promise<T> { | ||
|
|
@@ -92,10 +95,26 @@ export class FlowrAnalyzerCache<Parser extends KnownParser> extends FlowrCache<A | |
| while((g = until()) === undefined && this.pipeline.hasNextStep()) { | ||
| await this.pipeline.nextStep(); | ||
| } | ||
|
|
||
| this.storeIncrementalSnapshotIfAvailable(); | ||
|
|
||
| guard(g !== undefined, 'Could not reach the desired pipeline step, invalid cache state(?)'); | ||
| return g; | ||
| } | ||
|
|
||
| private storeIncrementalSnapshotIfAvailable(): void { | ||
| if(this.args.parser.name !== 'tree-sitter') { | ||
| return; | ||
| } | ||
|
|
||
| const parse = this.peekParse(); | ||
| if(parse !== undefined) { | ||
| this.args.context.inc.storeOldParseResults( | ||
| parse as ParseStepOutput<Tree> // cast needed because of TypeScript's limited narrowing capabilities | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. man könnte vlt in 106 eher einen type check darauf machen als das, aber soweit ist das erstmal ok. |
||
| ); | ||
| } | ||
| } | ||
|
|
||
| /** | ||
| * Get the parse output for the request, parsing if necessary. | ||
| * @param force - Do not use the cache, instead force a new parse. | ||
|
|
@@ -112,7 +131,7 @@ export class FlowrAnalyzerCache<Parser extends KnownParser> extends FlowrCache<A | |
| * @see {@link FlowrAnalyzerCache#parse} - to get the parse output, parsing if necessary. | ||
| */ | ||
| public peekParse(): NonNullable<AnalyzerCacheType<Parser>['parse']> | undefined { | ||
| return this.get().parse; | ||
| return this.get()?.parse; | ||
| } | ||
|
|
||
| /** | ||
|
|
@@ -131,7 +150,7 @@ export class FlowrAnalyzerCache<Parser extends KnownParser> extends FlowrCache<A | |
| * @see {@link FlowrAnalyzerCache#normalize} - to get the normalized AST, normalizing if necessary. | ||
| */ | ||
| public peekNormalize(): NonNullable<AnalyzerCacheType<Parser>['normalize']> | undefined { | ||
| return this.get().normalize; | ||
| return this.get()?.normalize; | ||
| } | ||
|
|
||
| /** | ||
|
|
@@ -150,7 +169,7 @@ export class FlowrAnalyzerCache<Parser extends KnownParser> extends FlowrCache<A | |
| * @see {@link FlowrAnalyzerCache#dataflow} - to get the dataflow graph, computing if necessary. | ||
| */ | ||
| public peekDataflow(): NonNullable<AnalyzerCacheType<Parser>['dataflow']> | undefined { | ||
| return this.get().dataflow; | ||
| return this.get()?.dataflow; | ||
| } | ||
|
|
||
| /** | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,37 +1,52 @@ | ||
| import { assertUnreachable } from '../../util/assert'; | ||
| import type { StringableContent } from '../context/flowr-file'; | ||
|
|
||
| export const enum CacheInvalidationEventType { | ||
| Full = 'full' | ||
| export const enum InvalidationEventType { | ||
| Full = 'full', | ||
| FileInvalidate = 'file-invalidate', | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. hier wäre zmd ein wenig doc schön wie man das auslösen kann |
||
| } | ||
| export type CacheInvalidationEvent = | ||
| { type: CacheInvalidationEventType.Full }; | ||
|
|
||
| export interface CacheInvalidationEventReceiver { | ||
| receive(event: CacheInvalidationEvent): void | ||
| export interface FileContentInvalidateEvent<Content extends StringableContent = StringableContent> { | ||
| readonly type: InvalidationEventType.FileInvalidate; | ||
| readonly oldContent: Content | undefined; | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. analog hier - wer ist verantwortlich dafür oldContent zu befüllen |
||
| readonly filePath: string; | ||
| } | ||
|
|
||
| export type InvalidationEvent<Content extends StringableContent = StringableContent> = | ||
| { type: InvalidationEventType.Full } | ||
| | FileContentInvalidateEvent<Content>; | ||
|
|
||
|
|
||
| export type InvalidationEventHandler<Content extends StringableContent = StringableContent> = (event: InvalidationEvent<Content>) => void; | ||
|
|
||
| export interface InvalidationEventReceiver<Content extends StringableContent = StringableContent> { | ||
| receive: InvalidationEventHandler<Content> | ||
| } | ||
|
|
||
| /** | ||
| * Central class for caching analysis results in FlowR. | ||
| */ | ||
| export abstract class FlowrCache<Cache> implements CacheInvalidationEventReceiver { | ||
| export abstract class FlowrCache<Cache> implements InvalidationEventReceiver { | ||
| private value: Cache | undefined = undefined; | ||
| private dependents: CacheInvalidationEventReceiver[] = []; | ||
| private dependents: InvalidationEventReceiver[] = []; | ||
|
|
||
| public registerDependent(dependent: CacheInvalidationEventReceiver) { | ||
| public registerDependent(dependent: InvalidationEventReceiver) { | ||
| this.dependents.push(dependent); | ||
| } | ||
| public removeDependent(dependent: CacheInvalidationEventReceiver) { | ||
| public removeDependent(dependent: InvalidationEventReceiver) { | ||
| this.dependents = this.dependents.filter(d => d !== dependent); | ||
| } | ||
|
|
||
| receive(event: CacheInvalidationEvent): void { | ||
| receive(event: InvalidationEvent): void { | ||
| const type = event.type; | ||
| /* we will update this as soon as we support incremental update patterns */ | ||
| switch(event.type) { | ||
| case CacheInvalidationEventType.Full: | ||
| switch(type) { | ||
| case InvalidationEventType.Full: | ||
| case InvalidationEventType.FileInvalidate: | ||
| this.value = undefined; | ||
| break; | ||
| default: | ||
| assertUnreachable(event.type); | ||
| assertUnreachable(type); | ||
| } | ||
| /* in the future we want to defer this *after* the dataflow is re-computed, then all receivers can decide whether they need to update */ | ||
| this.notifyDependents(event); | ||
|
|
@@ -40,7 +55,7 @@ export abstract class FlowrCache<Cache> implements CacheInvalidationEventReceive | |
| /** | ||
| * Notify all dependents of a cache invalidation event. | ||
| */ | ||
| public notifyDependents(event: CacheInvalidationEvent) { | ||
| public notifyDependents(event: InvalidationEvent) { | ||
| for(const dependent of this.dependents) { | ||
| dependent.receive(event); | ||
| } | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Das currently hier kannst du weglassen, das is ja auch der langzeit plan :D