Skip to content

Guards all StepRegistry mutations to make them thread-safe.#287

Open
jensakejohansson wants to merge 5 commits into
getgauge:masterfrom
system-verification:feature/fix-false-positive-duplicate-step-msg
Open

Guards all StepRegistry mutations to make them thread-safe.#287
jensakejohansson wants to merge 5 commits into
getgauge:masterfrom
system-verification:feature/fix-false-positive-duplicate-step-msg

Conversation

@jensakejohansson
Copy link
Copy Markdown
Contributor

@jensakejohansson jensakejohansson commented Mar 31, 2026

In VSC Gauge shows false positives about 'Duplicate step implementation'.

I've have seen the issue for years across different projects & environments (macOS/Windows etc). It was a nightmare to debug - I added logging in many places - finally it occurred to me it's likely a race condition.

My understanding is that VSC sends gRPC messages for opened files and the these gets dispatched on separate threads. All requests share the same singleton StaticLoader and its StepRegistry - which has no synchronization.

ReloadSteps (called for Opened/Changed events) performs a two-step compound operation:

  1. RemoveSteps(filepath) - rebuild the registry dictionary without this file's steps
  2. LoadStepsFromText(...) - re-add the steps from the file content

These two steps were not atomic. When two CacheFileRequest messages are processed concurrently for different files unpredictable results may happen resulting in VSC reporting false "duplicate step implementation" errors for steps that only have a single implementation.

Fix: Lock in StaticLoader: Added a lock (_registryLock) around all public methods that mutate the registry (ReloadSteps, RemoveSteps, LoadStepsFromText). This makes the compound remove-then-add operation in ReloadSteps atomic — no other thread can read or modify the registry between the removal and re-addition of a file's steps.

It's hard to write a deterministic unit test due to the nature of the issue (at least I don't know how to do it). I've used the fix locally and it seems promising.

For the sake of user-friendless I also enriched the error message so VSC can show to the user where the duplicates are actually found. At the moment this message is not used by Gauge(!) - it's part of the proto contract, but ignored and not relayed to VSC. I will do a Gauge PR for that (getgauge/gauge#2837).

Comment on lines +36 to +43
@@ -37,7 +37,10 @@ public Task<StepValidateResponse> Process(int stream, StepValidateRequest reques
{
isValid = false;
errorType = StepValidateResponse.Types.ErrorType.DuplicateStepImplementation;
errorMessage = string.Format("Multiple step implementations found for : {0}", stepToValidate);
var implementations = _stepRegistry.MethodsFor(stepToValidate);
var locations = string.Join("\n", implementations.Select(m =>
$" {m.ClassName}.{m.Name} in {m.FileName}:{m.Span.StartLinePosition.Line + 1}"));
errorMessage = $"Step: {stepToValidate}\n{locations}";
Copy link
Copy Markdown
Contributor

@chadlwilson chadlwilson Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems there is a check-then-act possible race condition here since I don't think there is any mutex and I believe this validation could be happening while there are changes being made to to the steps in parallel? is that right?

If so, we might want to consider locking on the stepRegistry mutex directly, or change this to an atomic operation such as StepRegistry.GetMultipleImplementations(stepToValidate) per my other comment so we can lock here too?

Comment thread src/Loaders/StaticLoader.cs Outdated
Comment on lines +27 to +31
// Guards all StepRegistry mutations. gRPC dispatches CacheFileRequest messages concurrently on separate threads,
// all sharing this singleton. Without synchronization, concurrent ReloadSteps calls can interleave their
// RemoveSteps/AddStep operations — one thread's dictionary rebuild overwrites another's, leaving stale entries
// that cause false "duplicate step implementation" warnings in the IDE.
private readonly object _registryLock = new object();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be sufficient to lock on the registry object itself rather than using a discrete lock; so we can also lock from other places such as the StepValidationProcessor?

Comment thread src/Loaders/StaticLoader.cs Outdated
{
_stepRegistry.RemoveSteps(filepath);
LoadStepsFromText(content, filepath);
}
Copy link
Copy Markdown
Contributor

@chadlwilson chadlwilson Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest that we would be better to split this into two steps, so we can lock as little as possible, load all of the files in as atomicish/quick an operation as possible, and then push the responsibility of synchronization for the bulk operation to the StepRegistry itself rather than using external locking.

i.e (pseudocode)

var stepValues = LoadStepsFromText(content, filepath);
_stepRegistry.ReplaceSteps(filepath, stepValues);

And then

public void ReplaceSteps(string filepath filepath, Dictionary<string, GaugeMethod> stepValues)
{
    lock (_registry)
    {
        RemoveSteps(filepath);
        foreach (var (stepValue, gaugeMethod) in _stepValues)
        {
            AddStep(stepValue, GaugeMethod);
        }
    }
}

Then we can move all of the locks into the StepRegistry so it is self-contained, and we can potentially move it to use a ConcurrentDictionary internally where we can further reduce the locking necessary, improve iteration safety and fix the check-then-act issues the code currently has?

We may still have some issues around concurrent file changes while it is processing that are impossible to synchronise on due to external file system changes; but if possible I think it might be easier to reason about without the big external lock, and we can hope for eventual consistency to prevail.

The challenge with 'big locks' is that where you can't actually prevent something changing with the lock (e.g the external file content), in some cases it might make it worse when you can't interleave things, or if there is a huge number of changes made, and lots of events get queued up in quasi-indeterminate orders. This is just theoretical intuition from many years of dealing with concurrency like this though.

Am I missing something? WDYT?

@jensakejohansson
Copy link
Copy Markdown
Contributor Author

Thanks for your insightful feedback @chadlwilson!

I agree that locking on StepRegistry would be a better approach and less of a 'quick-fix'.

I've updated the PR to the best of my understanding. It became a rather big change. Main things that I tried to address here:

  1. All public StepRegistry methods now hold a lock to prevent concurrent mutation no matter what path is used to access the StepRegistry.

  2. New LookupStep(parsedStepText) method returns a StepLookupResult containing Exists, HasMultipleImplementations, and Methods in a single atomic snapshot hopefully eliminating race conditions in previous ContainsStep + HasMultipleImplementations + MethodsFor etc.

  3. New ReplaceSteps(filepath, entries) atomically removes all steps for a file and adds the new ones, replacing the non-atomic RemoveSteps + AddSteps sequence.

I'm personally not too worried about events getting queued up in this context. As far as I understand this is all very quick in-memory operations and the number of files/steps to process is most likely not that large.

What do you think?

@jensakejohansson
Copy link
Copy Markdown
Contributor Author

@chadlwilson Unclear to me why the FT-ubuntu tests failed. Can you please trigger the workflow again when you have time to re-check? I suspect some intermittent glitch unrelated to this change. Works on Windows both in workflow and locally.

@chadlwilson
Copy link
Copy Markdown
Contributor

chadlwilson commented Apr 3, 2026

I reran and they are OK.

I think the failure here is a regular one which I have not been able to track down. The functional tests are trying to find "Successfully generated html-report" after checking other things, but in some cases it's not there in the console logs - which either indicates a race condition in the design, or some kind of console flushing problem.

I really wish I could fix it, as it drives me nuts doing re-runs across the projects. The following is orthogonal to this PR, so ignore it unless you feel like exploring :-)


For posterity the raw gauge.log for the failing spec is

02-04-2026 06:41:43.346 [Gauge] [INFO] Initializing template from https://github.com/getgauge/template-dotnet/releases/latest/download/dotnet.zip
02-04-2026 06:41:43.347 [Gauge] [DEBUG] Download URL https://github.com/getgauge/template-dotnet/releases/latest/download/dotnet.zip
02-04-2026 06:41:43.347 [Gauge] [DEBUG] Downloading https://github.com/getgauge/template-dotnet/releases/latest/download/dotnet.zip
02-04-2026 06:41:43.513 [Gauge] [DEBUG] Downloaded to /tmp/gauge_temp1775112103346745253/dotnet.zip
02-04-2026 06:41:43.514 [Gauge] [DEBUG] Unzipped to => /tmp/gauge_temp1775112103346745253
02-04-2026 06:41:43.514 [Gauge] [INFO] Copying Gauge template dotnet to current directory ...
02-04-2026 06:41:43.515 [Gauge] [INFO] Successfully initialized the project. Run specifications with "gauge run specs/".
02-04-2026 06:41:43.547 [Gauge] [DEBUG] gauge run specs/
02-04-2026 06:41:43.547 [Gauge] [DEBUG] n : [4]
02-04-2026 06:41:43.547 [Gauge] [DEBUG] parallel : [true]
02-04-2026 06:41:43.547 [Gauge] [DEBUG] verbose : [true]
02-04-2026 06:41:43.547 [Gauge] [DEBUG] env will be loaded from default directory 'env'
02-04-2026 06:41:43.547 [Gauge] [DEBUG] 'gauge_environment' set to 'default'
02-04-2026 06:41:43.548 [Gauge] [DEBUG] Created gauge_screenshots_dir at /tmp/parallel_data_driven_spec16_793330509692225383/.gauge/screenshots
02-04-2026 06:41:43.548 [Gauge] [DEBUG] Plugin dotnet is already installed.
02-04-2026 06:41:43.548 [Gauge] [DEBUG] Plugin html-report is already installed.
02-04-2026 06:41:43.548 [Gauge] [DEBUG] Parsing started.
02-04-2026 06:41:43.548 [Gauge] [DEBUG] Started concepts parsing.
02-04-2026 06:41:43.548 [Gauge] [DEBUG] 0 concepts parsing completed.
02-04-2026 06:41:43.548 [Gauge] [DEBUG] Started specifications parsing.
02-04-2026 06:41:43.549 [Gauge] [DEBUG] 1 specifications parsing completed.
02-04-2026 06:41:43.549 [Gauge] [DEBUG] Parsing completed.
02-04-2026 06:41:43.676 [Gauge] [DEBUG] Checking updates...
02-04-2026 06:41:43.676 [Gauge] [DEBUG] Downloading https://downloads.gauge.org/plugin/dotnet?l=dotnet&p=dotnet,html-report,java,screenshot,xml-report&o=linux&a=amd64
02-04-2026 06:41:43.817 [Gauge] [DEBUG] Downloading https://downloads.gauge.org/plugin/html-report?l=dotnet&p=dotnet,html-report,java,screenshot,xml-report&o=linux&a=amd64
02-04-2026 06:41:43.874 [Gauge] [DEBUG] Downloading https://downloads.gauge.org/plugin/java?l=dotnet&p=dotnet,html-report,java,screenshot,xml-report&o=linux&a=amd64
02-04-2026 06:41:43.917 [Gauge] [DEBUG] Downloading https://downloads.gauge.org/plugin/screenshot?l=dotnet&p=dotnet,html-report,java,screenshot,xml-report&o=linux&a=amd64
02-04-2026 06:41:43.952 [Gauge] [DEBUG] Downloading https://downloads.gauge.org/plugin/xml-report?l=dotnet&p=dotnet,html-report,java,screenshot,xml-report&o=linux&a=amd64
02-04-2026 06:41:45.762 [dotnet] [DEBUG] Loading assembly from : /tmp/parallel_data_driven_spec16_793330509692225383/gauge_bin/dotnet-template.dll
02-04-2026 06:41:45.936 [dotnet] [DEBUG] Try load Gauge.CSharp.Lib in GaugeLoadContext
02-04-2026 06:41:45.950 [dotnet] [DEBUG] Try load dotnet-template in GaugeLoadContext
02-04-2026 06:41:45.950 [dotnet] [DEBUG] Try load System.Runtime in GaugeLoadContext
02-04-2026 06:41:45.950 [dotnet] [DEBUG] Try load System.Collections in GaugeLoadContext
02-04-2026 06:41:45.950 [dotnet] [DEBUG] Try load System.Console in GaugeLoadContext
02-04-2026 06:41:45.950 [dotnet] [DEBUG] Try load Shouldly in GaugeLoadContext
02-04-2026 06:41:45.950 [dotnet] [DEBUG] Try load System.Linq in GaugeLoadContext
02-04-2026 06:41:45.952 [dotnet] [DEBUG] Try load System.Runtime in GaugeLoadContext
02-04-2026 06:41:45.953 [dotnet] [DEBUG] Number of AssembliesReferencingGaugeLib : 1
02-04-2026 06:41:45.954 [dotnet] [DEBUG] Try load System.Runtime in GaugeLoadContext
02-04-2026 06:41:45.954 [dotnet] [DEBUG] Try load System.Collections in GaugeLoadContext
02-04-2026 06:41:45.955 [dotnet] [DEBUG] Building StepRegistry...
02-04-2026 06:41:45.957 [dotnet] [DEBUG] 5 Step implementations found. Adding to registry...
02-04-2026 06:41:45.959 [dotnet] [DEBUG] 'First step': implementation found in StepRegistry, setting reflected methodInfo
02-04-2026 06:41:45.960 [dotnet] [DEBUG] 'print {} {} {}': implementation found in StepRegistry, setting reflected methodInfo
02-04-2026 06:41:45.960 [dotnet] [DEBUG] 'Vowels in English language are {}.': implementation found in StepRegistry, setting reflected methodInfo
02-04-2026 06:41:45.960 [dotnet] [DEBUG] 'The word {} has {} vowels.': implementation found in StepRegistry, setting reflected methodInfo
02-04-2026 06:41:45.961 [dotnet] [DEBUG] 'Almost all words have vowels {}': implementation found in StepRegistry, setting reflected methodInfo
02-04-2026 06:41:45.961 [dotnet] [DEBUG] Building StepRegistry...
02-04-2026 06:41:45.961 [dotnet] [DEBUG] 5 Step implementations found. Adding to registry...
02-04-2026 06:41:45.961 [dotnet] [DEBUG] 'First step': implementation found in StepRegistry, setting reflected methodInfo
02-04-2026 06:41:45.961 [dotnet] [DEBUG] 'print {} {} {}': implementation found in StepRegistry, setting reflected methodInfo
02-04-2026 06:41:45.961 [dotnet] [DEBUG] 'Vowels in English language are {}.': implementation found in StepRegistry, setting reflected methodInfo
02-04-2026 06:41:45.961 [dotnet] [DEBUG] 'The word {} has {} vowels.': implementation found in StepRegistry, setting reflected methodInfo
02-04-2026 06:41:45.961 [dotnet] [DEBUG] 'Almost all words have vowels {}': implementation found in StepRegistry, setting reflected methodInfo
02-04-2026 06:41:46.006 [Gauge] [DEBUG] Attempting to connect to grpc server at port: 41055
02-04-2026 06:41:46.006 [Gauge] [DEBUG] Successfully made the connection with runner with port: 41055
02-04-2026 06:41:46.006 [Gauge] [DEBUG] Validation started.
02-04-2026 06:41:46.134 [Gauge] [DEBUG] Validation completed.
02-04-2026 06:41:46.134 [Gauge] [DEBUG] Run started
02-04-2026 06:41:46.134 [Gauge] [DEBUG] Starting Html Report plugin
02-04-2026 06:41:46.138 [html-report] [DEBUG] Setting MaxRecvMsgSize = 1024 MB
02-04-2026 06:41:46.139 [Gauge] [DEBUG] Attempting to connect to grpc server at port: 33413
02-04-2026 06:41:46.139 [Gauge] [DEBUG] Successfully made the connection with plugin with port: 33413
02-04-2026 06:41:46.139 [Gauge] [INFO] Executing in 3 parallel streams.
02-04-2026 06:41:46.510 [dotnet] [DEBUG] Loading assembly from : /tmp/parallel_data_driven_spec16_793330509692225383/gauge_bin/dotnet-template.dll
02-04-2026 06:41:46.515 [dotnet] [DEBUG] GAUGE_CUSTOM_BUILD_PATH is set, skipping static loading
02-04-2026 06:41:46.516 [dotnet] [DEBUG] Try load Gauge.CSharp.Lib in GaugeLoadContext
02-04-2026 06:41:46.525 [dotnet] [DEBUG] Loading assembly from : /tmp/parallel_data_driven_spec16_793330509692225383/gauge_bin/dotnet-template.dll
02-04-2026 06:41:46.530 [dotnet] [DEBUG] GAUGE_CUSTOM_BUILD_PATH is set, skipping static loading
02-04-2026 06:41:46.531 [dotnet] [DEBUG] Try load Gauge.CSharp.Lib in GaugeLoadContext
02-04-2026 06:41:46.543 [dotnet] [DEBUG] Try load dotnet-template in GaugeLoadContext
02-04-2026 06:41:46.543 [dotnet] [DEBUG] Try load System.Runtime in GaugeLoadContext
02-04-2026 06:41:46.543 [dotnet] [DEBUG] Try load System.Collections in GaugeLoadContext
02-04-2026 06:41:46.544 [dotnet] [DEBUG] Try load System.Console in GaugeLoadContext
02-04-2026 06:41:46.544 [dotnet] [DEBUG] Try load Shouldly in GaugeLoadContext
02-04-2026 06:41:46.544 [dotnet] [DEBUG] Try load System.Linq in GaugeLoadContext
02-04-2026 06:41:46.546 [dotnet] [DEBUG] Try load System.Runtime in GaugeLoadContext
02-04-2026 06:41:46.547 [dotnet] [DEBUG] Number of AssembliesReferencingGaugeLib : 1
02-04-2026 06:41:46.552 [dotnet] [DEBUG] Try load System.Runtime in GaugeLoadContext
02-04-2026 06:41:46.552 [dotnet] [DEBUG] Try load System.Collections in GaugeLoadContext
02-04-2026 06:41:46.554 [dotnet] [DEBUG] Building StepRegistry...
02-04-2026 06:41:46.558 [dotnet] [DEBUG] 5 Step implementations found. Adding to registry...
02-04-2026 06:41:46.560 [dotnet] [DEBUG] Try load dotnet-template in GaugeLoadContext
02-04-2026 06:41:46.560 [dotnet] [DEBUG] Try load System.Runtime in GaugeLoadContext
02-04-2026 06:41:46.560 [dotnet] [DEBUG] Try load System.Collections in GaugeLoadContext
02-04-2026 06:41:46.560 [dotnet] [DEBUG] Try load System.Console in GaugeLoadContext
02-04-2026 06:41:46.560 [dotnet] [DEBUG] Try load Shouldly in GaugeLoadContext
02-04-2026 06:41:46.561 [dotnet] [DEBUG] Try load System.Linq in GaugeLoadContext
02-04-2026 06:41:46.563 [dotnet] [DEBUG] Try load System.Runtime in GaugeLoadContext
02-04-2026 06:41:46.564 [dotnet] [DEBUG] Number of AssembliesReferencingGaugeLib : 1
02-04-2026 06:41:46.569 [dotnet] [DEBUG] Try load System.Runtime in GaugeLoadContext
02-04-2026 06:41:46.569 [dotnet] [DEBUG] 'First step': no implementation in StepRegistry, adding via reflection
02-04-2026 06:41:46.569 [dotnet] [DEBUG] Try load System.Collections in GaugeLoadContext
02-04-2026 06:41:46.570 [dotnet] [DEBUG] Building StepRegistry...
02-04-2026 06:41:46.572 [dotnet] [DEBUG] 'print {} {} {}': no implementation in StepRegistry, adding via reflection
02-04-2026 06:41:46.573 [dotnet] [DEBUG] 'Vowels in English language are {}.': no implementation in StepRegistry, adding via reflection
02-04-2026 06:41:46.573 [dotnet] [DEBUG] 5 Step implementations found. Adding to registry...
02-04-2026 06:41:46.573 [dotnet] [DEBUG] 'The word {} has {} vowels.': no implementation in StepRegistry, adding via reflection
02-04-2026 06:41:46.574 [dotnet] [DEBUG] 'Almost all words have vowels {}': no implementation in StepRegistry, adding via reflection
02-04-2026 06:41:46.574 [dotnet] [DEBUG] Building StepRegistry...
02-04-2026 06:41:46.574 [dotnet] [DEBUG] 5 Step implementations found. Adding to registry...
02-04-2026 06:41:46.574 [dotnet] [DEBUG] 'First step': implementation found in StepRegistry, setting reflected methodInfo
02-04-2026 06:41:46.575 [dotnet] [DEBUG] 'print {} {} {}': implementation found in StepRegistry, setting reflected methodInfo
02-04-2026 06:41:46.575 [dotnet] [DEBUG] 'Vowels in English language are {}.': implementation found in StepRegistry, setting reflected methodInfo
02-04-2026 06:41:46.575 [dotnet] [DEBUG] 'The word {} has {} vowels.': implementation found in StepRegistry, setting reflected methodInfo
02-04-2026 06:41:46.575 [dotnet] [DEBUG] 'Almost all words have vowels {}': implementation found in StepRegistry, setting reflected methodInfo
02-04-2026 06:41:46.586 [dotnet] [DEBUG] 'First step': no implementation in StepRegistry, adding via reflection
02-04-2026 06:41:46.588 [dotnet] [DEBUG] 'print {} {} {}': no implementation in StepRegistry, adding via reflection
02-04-2026 06:41:46.591 [dotnet] [DEBUG] 'Vowels in English language are {}.': no implementation in StepRegistry, adding via reflection
02-04-2026 06:41:46.591 [dotnet] [DEBUG] 'The word {} has {} vowels.': no implementation in StepRegistry, adding via reflection
02-04-2026 06:41:46.591 [dotnet] [DEBUG] 'Almost all words have vowels {}': no implementation in StepRegistry, adding via reflection
02-04-2026 06:41:46.591 [dotnet] [DEBUG] Building StepRegistry...
02-04-2026 06:41:46.591 [dotnet] [DEBUG] 5 Step implementations found. Adding to registry...
02-04-2026 06:41:46.591 [dotnet] [DEBUG] 'First step': implementation found in StepRegistry, setting reflected methodInfo
02-04-2026 06:41:46.591 [dotnet] [DEBUG] 'print {} {} {}': implementation found in StepRegistry, setting reflected methodInfo
02-04-2026 06:41:46.591 [dotnet] [DEBUG] 'Vowels in English language are {}.': implementation found in StepRegistry, setting reflected methodInfo
02-04-2026 06:41:46.591 [dotnet] [DEBUG] 'The word {} has {} vowels.': implementation found in StepRegistry, setting reflected methodInfo
02-04-2026 06:41:46.591 [dotnet] [DEBUG] 'Almost all words have vowels {}': implementation found in StepRegistry, setting reflected methodInfo
02-04-2026 06:41:46.653 [Gauge] [DEBUG] Attempting to connect to grpc server at port: 37081
02-04-2026 06:41:46.653 [Gauge] [DEBUG] Successfully made the connection with runner with port: 37081
02-04-2026 06:41:46.673 [Gauge] [DEBUG] Attempting to connect to grpc server at port: 33553
02-04-2026 06:41:46.673 [Gauge] [DEBUG] Successfully made the connection with runner with port: 33553
02-04-2026 06:41:46.673 [Gauge] [DEBUG] Initialising suite data store.
02-04-2026 06:41:46.673 [Gauge] [DEBUG] Initialising suite data store.
02-04-2026 06:41:46.673 [Gauge] [DEBUG] Initialising suite data store.
02-04-2026 06:41:46.680 [dotnet] [DEBUG] Try load System.Collections.Concurrent in GaugeLoadContext
02-04-2026 06:41:46.778 [dotnet] [DEBUG] Loaded Instance Manager of Type: Gauge.CSharp.Lib.DefaultClassInstanceManager
02-04-2026 06:41:46.790 [Gauge] [INFO] # data table driven parallel execution
02-04-2026 06:41:46.827 [dotnet] [DEBUG] Calling InvokeMethod to call method Void Name17094574403620260402064143()
02-04-2026 06:41:46.828 [dotnet] [DEBUG] Try load System.Threading in GaugeLoadContext
02-04-2026 06:41:46.836 [dotnet] [DEBUG] BeforeSpec Hook execution failed : Name17094583768120260402064143.Name17094574403620260402064143
02-04-2026 06:41:46.852 [dotnet] [DEBUG] Try load System.Diagnostics.Process in GaugeLoadContext
02-04-2026 06:41:46.856 [dotnet] [DEBUG] Try load System.Collections.Concurrent in GaugeLoadContext
02-04-2026 06:41:46.857 [dotnet] [ERROR] panic:
02-04-2026 06:41:46.857 [dotnet] [ERROR] empty display string
02-04-2026 06:41:46.857 [dotnet] [ERROR] goroutine
02-04-2026 06:41:46.858 [dotnet] [ERROR] 1 [running]:
02-04-2026 06:41:46.858 [dotnet] [ERROR] main.main()
02-04-2026 06:41:46.858 [dotnet] [ERROR] 	/home/runner/work/gauge_screenshot/gauge_screenshot/main.go
02-04-2026 06:41:46.858 [dotnet] [ERROR] :44 +0x167
02-04-2026 06:41:46.858 [dotnet] [DEBUG] Try load System.Collections.Concurrent in GaugeLoadContext
02-04-2026 06:41:46.868 [Gauge] [ERROR] Error Message: System error.
02-04-2026 06:41:46.868 [Gauge] [ERROR] Stacktrace: 
   at Name17094583768120260402064143.Name17094574403620260402064143() in /tmp/parallel_data_driven_spec16_793330509692225383/StepImplementation.cs:line 55
   at System.Reflection.MethodBaseInvoker.InterpretedInvoke_Method(Object obj, IntPtr* args)
   at System.Reflection.MethodBaseInvoker.InvokeWithNoArgs(Object obj, BindingFlags invokeAttr)
02-04-2026 06:41:46.869 [Gauge] [INFO] # data table driven parallel execution
02-04-2026 06:41:46.879 [dotnet] [DEBUG] Calling InvokeMethod to call method Void Name17094574403620260402064143()
02-04-2026 06:41:46.880 [dotnet] [DEBUG] BeforeSpec Hook execution failed : Name17094583768120260402064143.Name17094574403620260402064143
02-04-2026 06:41:46.885 [dotnet] [ERROR] panic:
02-04-2026 06:41:46.885 [dotnet] [ERROR] empty display string
02-04-2026 06:41:46.885 [dotnet] [ERROR] goroutine
02-04-2026 06:41:46.885 [dotnet] [ERROR] 1
02-04-2026 06:41:46.885 [dotnet] [ERROR] [
02-04-2026 06:41:46.885 [dotnet] [ERROR] running
02-04-2026 06:41:46.885 [dotnet] [ERROR] ]:
02-04-2026 06:41:46.885 [dotnet] [ERROR] main.main
02-04-2026 06:41:46.885 [dotnet] [ERROR] (
02-04-2026 06:41:46.885 [dotnet] [ERROR] )
02-04-2026 06:41:46.885 [dotnet] [ERROR] 	
02-04-2026 06:41:46.885 [dotnet] [ERROR] /home/runner/work/gauge_screenshot/gauge_screenshot/main.go
02-04-2026 06:41:46.885 [dotnet] [ERROR] :
02-04-2026 06:41:46.885 [dotnet] [ERROR] 44
02-04-2026 06:41:46.885 [dotnet] [ERROR] +
02-04-2026 06:41:46.885 [dotnet] [ERROR] 0x167
02-04-2026 06:41:46.888 [Gauge] [ERROR] Error Message: System error.
02-04-2026 06:41:46.888 [Gauge] [ERROR] Stacktrace: 
   at Name17094583768120260402064143.Name17094574403620260402064143() in /tmp/parallel_data_driven_spec16_793330509692225383/StepImplementation.cs:line 55
   at InvokeStub_Name17094583768120260402064143.Name17094574403620260402064143(Object, Object, IntPtr*)
   at System.Reflection.MethodBaseInvoker.InvokeWithNoArgs(Object obj, BindingFlags invokeAttr)
02-04-2026 06:41:46.888 [Gauge] [INFO] # data table driven parallel execution
02-04-2026 06:41:46.894 [dotnet] [DEBUG] Calling InvokeMethod to call method Void Name17094574403620260402064143()
02-04-2026 06:41:46.894 [dotnet] [DEBUG] BeforeSpec Hook execution failed : Name17094583768120260402064143.Name17094574403620260402064143
02-04-2026 06:41:46.898 [dotnet] [ERROR] panic:
02-04-2026 06:41:46.898 [dotnet] [ERROR] empty display string
02-04-2026 06:41:46.898 [dotnet] [ERROR] goroutine
02-04-2026 06:41:46.898 [dotnet] [ERROR] 1
02-04-2026 06:41:46.898 [dotnet] [ERROR] [
02-04-2026 06:41:46.898 [dotnet] [ERROR] running
02-04-2026 06:41:46.898 [dotnet] [ERROR] ]:
02-04-2026 06:41:46.898 [dotnet] [ERROR] main.main
02-04-2026 06:41:46.899 [dotnet] [ERROR] (
02-04-2026 06:41:46.899 [dotnet] [ERROR] )
02-04-2026 06:41:46.899 [dotnet] [ERROR] 	/home/runner/work/gauge_screenshot/gauge_screenshot/main.go:44 +0x167
02-04-2026 06:41:46.902 [Gauge] [ERROR] Error Message: System error.
02-04-2026 06:41:46.902 [Gauge] [ERROR] Stacktrace: 
   at Name17094583768120260402064143.Name17094574403620260402064143() in /tmp/parallel_data_driven_spec16_793330509692225383/StepImplementation.cs:line 55
   at InvokeStub_Name17094583768120260402064143.Name17094574403620260402064143(Object, Object, IntPtr*)
   at System.Reflection.MethodBaseInvoker.InvokeWithNoArgs(Object obj, BindingFlags invokeAttr)
02-04-2026 06:41:46.909 [dotnet] [DEBUG] KillProcessrequest received
02-04-2026 06:41:46.965 [dotnet] [DEBUG] Loaded Instance Manager of Type: Gauge.CSharp.Lib.DefaultClassInstanceManager
02-04-2026 06:41:46.970 [dotnet] [DEBUG] Loaded Instance Manager of Type: Gauge.CSharp.Lib.DefaultClassInstanceManager
02-04-2026 06:41:46.984 [dotnet] [DEBUG] KillProcessrequest received
02-04-2026 06:41:46.986 [dotnet] [DEBUG] KillProcessrequest received
02-04-2026 06:41:46.988 [html-report] [DEBUG] Transformed SuiteResult to report structure
02-04-2026 06:41:47.005 [Gauge] [DEBUG] Sending kill message to Html Report plugin.
02-04-2026 06:41:47.006 [Gauge] [INFO] Specifications:	1 executed	0 passed	1 failed	0 skipped
02-04-2026 06:41:47.006 [Gauge] [INFO] Scenarios:	0 executed	0 passed	0 failed	0 skipped
02-04-2026 06:41:47.006 [Gauge] [INFO] 
Total time taken: 852ms

Can ignore the stuff like this, it's just caused by a best-efforts attempt to take screenshot on the (intentionally) failing spec:

        panic:
        empty display string
        goroutine
        1 [running]:
        main.main()
        	/home/runner/work/gauge_screenshot/gauge_screenshot/main.go
        :44 +0x167

Despite its terrible naming and lack of context, the error here is intentional, the test is testing handling of errors thrown in hooks during parallel execution:

02-04-2026 06:41:46.902 [Gauge] [ERROR] Error Message: System error.
02-04-2026 06:41:46.902 [Gauge] [ERROR] Stacktrace: 
   at Name17094583768120260402064143.Name17094574403620260402064143() in /tmp/parallel_data_driven_spec16_793330509692225383/StepImplementation.cs:line 55
   at InvokeStub_Name17094583768120260402064143.Name17094574403620260402064143(Object, Object, IntPtr*)
   at System.Reflection.MethodBaseInvoker.InvokeWithNoArgs(Object obj, BindingFlags invokeAttr)

The actual problem seems to be here

02-04-2026 06:41:46.988 [html-report] [DEBUG] Transformed SuiteResult to report structure
02-04-2026 06:41:47.005 [Gauge] [DEBUG] Sending kill message to Html Report plugin.

We should see the outcome of these two lines from html-report reflecting in gauge.log as part of processing the suite result message, before it kills plugins.

logger.Infof("Successfully generated html-report to => %s\n", filepath.Join(reportDir, "index.html"))
...
logger.Debugf("Done generating HTML report using theme from %s", t)

So I guess Gauge is either

  • killing the plugin before it's done (unlikely)
  • OR killing it before its stdout has been flushed
  • OR not flushing the captured stdout correctly to the gauge.log before discarding the plugin infra that handles this

@jensakejohansson
Copy link
Copy Markdown
Contributor Author

I have no real insight into the test implementations or any idea were to start searching, so I'll have to add this on the todo-list for another time 😅 although unreliable regression is a serious headache. There is already the slow FT tests on Windows and I also noted that if a log-line is over a certain length Gauge fails to handle it properly and logs incorrectly; it logs the line as a raw json string using incorrect log-level - some sort of fallback occurs. Just to name a couple of things. The more I dig, the more I find... but I'll try to contribute the best I can.

It's really a shame ThoughtWorks stopped to actively fund this project since the fundamental idea/design principles in my opinion were absolutely brilliant compared to the major 'competitors'.

Anyhow, what would be the next step here? Do we feel comfortable enough to merge this?

@sriv - you are usually knowledgeable and helpful too - if you have time, do you see any obvious 'red-flags' with this?

@chadlwilson
Copy link
Copy Markdown
Contributor

I haven't had time to take another proper look, but will get to it.

@sriv
Copy link
Copy Markdown
Member

sriv commented Apr 6, 2026

The functional tests are trying to find "Successfully generated html-report"

I think this flakiness comes from occassional failure to install html-report. I suspect there is some rate limit to the downloads from GH when there's too many concurrent requests. html-report plugin is not mandatory for gauge functionality, but is included as a default plugin, and I believe this message was included to listen on test execution completion. A hack, I admit!

Will think of. away to make this better. I think we subsequently added support for hooks which we should now consider using.

@chadlwilson
Copy link
Copy Markdown
Contributor

The functional tests are trying to find "Successfully generated html-report"

I think this flakiness comes from occassional failure to install html-report. I suspect there is some rate limit to the downloads from GH when there's too many concurrent requests. html-report plugin is not mandatory for gauge functionality, but is included as a default plugin, and I believe this message was included to listen on test execution completion. A hack, I admit!

Will think of. away to make this better. I think we subsequently added support for hooks which we should now consider using.

This could be an issue, but i can see html report starting here (just not finishing per the logs) so don't think it's this in the case ive looked at?

@jensakejohansson
Copy link
Copy Markdown
Contributor Author

Just a friendly reminder, if you've had time to look at this?

I've been running this version in my dev. env. since the PR was created without noticing any issues and have not seen any false duplicate step warnings since. Worth noting is that I don't run any parallel executions.

@sriv
Copy link
Copy Markdown
Member

sriv commented Apr 22, 2026

I haven't had the time to go through this in depth. But considering that you've been running this without problems, I will merge this off.

@sriv
Copy link
Copy Markdown
Member

sriv commented Apr 22, 2026

The branch needs to be rebased @jensakejohansson . Please, could you sync this up and we can merge this

@chadlwilson
Copy link
Copy Markdown
Contributor

Yeah, sorry, my PR review queue across projects got a bit out of control. Since you mentioned you don't run with parallel, I wanted to think through a bit more carefully about whether deadlock was possible, with all the locks - especially as it got a bit more involved than I expected.

Signed-off-by: jensakejohansson <jens.johansson@systemverification.com>
Signed-off-by: jensakejohansson <jens.johansson@systemverification.com>
…o an single atomic snapshot.

Signed-off-by: jensakejohansson <jens.johansson@systemverification.com>
….log

Signed-off-by: jensakejohansson <jens.johansson@systemverification.com>
Signed-off-by: jensakejohansson <jens.johansson@systemverification.com>
@jensakejohansson jensakejohansson force-pushed the feature/fix-false-positive-duplicate-step-msg branch from ae951d9 to f488037 Compare April 22, 2026 06:41
@jensakejohansson
Copy link
Copy Markdown
Contributor Author

The branch needs to be rebased @jensakejohansson . Please, could you sync this up and we can merge this

Done!

@jensakejohansson
Copy link
Copy Markdown
Contributor Author

My thoughts while doing this was that StepRegistry has no nested locks. Locking methods does not rely on external callbacks / awaits anything, but should be quick in-memory operations. Also, most locks occurs when editing files - that's when the original problem of not locking was evident - not while executing.

Regarding parallelism: From my understanding the procedure of "loading of steps" happens during runner startup and is not affected by parallelism. During actual execution it's the LookupStep-method that will be called intensively - once per step per parallel stream - and may theoretically cause performance problems since it's a hot-path that will be serialized.

However, it's a quick memory look-up and a small allocation so for real world scenarios I'm not worried. It would require heavy parallelism and steps that are executed very quickly to cause any noticeable 'delay' and in those situations I guess it still would not be noticeable in relation to other things of greater concern in the gauge/runner design.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants