Skip to content

Operator-autocomplete dropdown for advanced search (stacked on #2822)#2826

Draft
bendichter wants to merge 21 commits into
dandi:masterfrom
bendichter:advanced-search-autocomplete
Draft

Operator-autocomplete dropdown for advanced search (stacked on #2822)#2826
bendichter wants to merge 21 commits into
dandi:masterfrom
bendichter:advanced-search-autocomplete

Conversation

@bendichter

Copy link
Copy Markdown
Member

Draft. Stacked on #2822.

GitHub-issues-style autocomplete for the dandiset list search box. Click in → dropdown of every available operator. Type → list narrows to operators whose names match the prefix at the cursor. Click / Enter / Tab → inserts name: at the cursor with the caret placed right after the colon, ready for the value.

Behavior Trigger
Open dropdown Focus the input
Filter list Typing (prefix-matched against the token under the cursor)
Select & insert name: Click, Enter, or Tab
Move highlight ↑ / ↓
Dismiss Esc, blur, or typing past a :

The token-at-cursor uses the same whitespace-bounded definition the backend parser uses, so what the user sees in the dropdown matches what the parser will recognize. Once the user types :, autocomplete is suppressed — this PR doesn't try to suggest values, only operator keys.

Files

  • web/src/components/advancedSearchOperators.ts (new) — canonical operator catalog (name, description, value example) plus tokenAtCursor() and suggestionsFor() helpers. Mirrors the backend allowlist; backend stays the source of truth (it validates and returns "Did you mean?" suggestions for unknown keys).
  • web/src/components/DandisetSearchField.vue — adds a v-menu anchored to the wrapping form, controlled manually via autocompleteOpen. Keyboard handlers on the input drive selection + completion. The existing ? help popover is unchanged; the two are complementary (popover = static cheat-sheet, autocomplete = interactive).

Test plan

  • npm run lint clean
  • vue-tsc --noEmit clean
  • Manual: click into the search box on the dandiset list; the dropdown appears with all operators. Typing aut narrows to author:. Pressing Enter inserts author: with the caret after the colon. Typing the value and submitting performs the search as before.

Known limits / future

  • No value autocomplete yet. Once the user types key:, the dropdown closes. The discovery-API idea @yarikoptic suggested in Add contributor + per-role operators to advanced search #2822 (?search=species:? returning candidate values) would unlock this — but that's a separate backend addition.
  • Operator catalog is duplicated between the frontend (advancedSearchOperators.ts) and backend (operators.py). The discovery API would also let the frontend fetch this dynamically. For now, the duplication is intentional: keeps the autocomplete fast (no network round-trip per keystroke) and the backend remains the validation source of truth.
  • No quote-balancing assist. Typing technique:" doesn't auto-insert the closing quote (could be a small future polish).

Once #2822 merges, this PR's base will collapse to master automatically.

bendichter and others added 20 commits May 13, 2026 15:52
Filters dandisets to those owned by a given user. The value is matched
case-insensitively against User.username OR User.email. The special form
`owner:me` resolves to the requesting user (consistent with the existing
?user=me query parameter) and returns 400 if the request is anonymous.

Implementation reuses the existing `get_owned_dandisets()` permission
helper. We pass `with_superuser=False` so `owner:admin` returns only
what admin explicitly owns — guardian's default would otherwise inflate
to the entire archive for any superuser.

Unknown users return zero results (not an error): a search for a
nonexistent owner is a valid 0-hit query.

Tests cover username/email lookup, case-insensitivity, unknown user,
`owner:me` for an authenticated user, anonymous `owner:me` → 400, the
superuser non-inflation guarantee, and combination with other operators.

OpenAPI help text and the frontend operator popover updated.
Real users encounter the dandiset list with owners shown by display
name (e.g. "Super User"), not by username. Searching that string was
returning 0 because the lookup only matched username/email.

Now matches case-insensitively against username, email, first_name,
last_name, OR "first_name last_name" — so owner:"Super User" works
the same as owner:ben.dichter@gmail.com.

Multiple users may match (e.g. shared last name); we union dandisets
owned by any of them via a direct DandisetUserObjectPermission query.
Updated OpenAPI help text and the frontend popover example to
`owner:"Jane Doe"` so users discover the new shape.
Round-2 review feedback on dandi#2821:

- @yarikoptic flagged that owner:me silently shadows a real user named "Me".
  Fix: distinguish quoted vs unquoted at the parser level. Unquoted
  owner:me → magic alias for the requesting user. Quoted owner:"me" →
  literal lookup (matches a user whose first/last name is "Me"). Same
  pattern lets owner:"Me Someoneyou" reach the literal full-name match
  while keeping the convenient owner:me shortcut.

  Implementation: ParsedSearch.operators is now a list of `Operator`
  dataclasses (key, value, quoted) instead of bare tuples. Filters
  consume the new shape and the owner filter switches on the quoted
  flag.

- Replaced personal email (ben.dichter@gmail.com) in the full-name test
  fixture with a generic example user.

- Consolidated 10 small owner-tests into 3 denser ones that share setup
  per @yarikoptic's "make each test matter more" feedback. Coverage is
  unchanged (every documented lookup path is asserted; cross-key AND
  with another operator; multi-user union via shared last name; unknown
  user → 0; superuser non-inflation; owner:me magic; owner:"me"
  literal-escape; anonymous owner:me → 400). DB setup runs ~3x instead
  of ~10x.

Updated OpenAPI help text and the search popover to mention the
owner:me alias and the quoted-escape.
The unquoted owner:me → current-user shortcut required threading a
`quoted` flag through the parser and a `request_user` arg through the
filter dispatch — non-trivial machinery to support one alias.

Per dandi#2822 review discussion, removing it from this PR keeps the owner
operator focused on literal lookup-by-value (username / email / first /
last / "first last") and avoids the design debate about the right escape
mechanism for "I literally want a user named Me." The alias can come
back in a focused follow-up PR if/when there's appetite for it.

Concrete drops:
- owner:me magic + 400-on-anonymous in `_apply_owner_filter`
- `Operator.quoted` field on the parser dataclass
- `quoted` and `request_user` parameters on `_apply_owner_filter`
- `get_owned_dandisets` import (no longer used here)
- `test_advanced_search_owner_me_magic_and_literal_escape` test
- The two `owner-me-quoted` / `owner-me-unquoted` parser test cases
- "owner:me" mentions in OpenAPI help text and the popover entry
…okup

29 new operators total: catch-all `contributor:` plus one per dandi-schema
RoleType (`author`, `data_curator`, `funder`, `contact_person`, etc.).
Independent-operator semantics — `author:Doe funder:NIH` returns
dandisets where SOME contributor has Doe-as-Author AND SOME contributor
(possibly different) has NIH-as-Funder. Each role-specific operator
constrains a single contributor[] element to have BOTH the name match
AND the role.

Implementation:
- A single `_CONTRIBUTOR_ROLE_OPS` dict drives both the parser allowlist
  and the filter dispatch; adding a future role is one new entry.
- `_contributor_jsonpath()` builds a Postgres jsonb_path_exists predicate
  that ORs across `name`, `email`, AND `identifier` (so ORCID for Persons
  and ROR URL for Organizations both work, including bare-ID substring
  forms like `01cwqze88` matching the full ROR URL).
- All contributor operators in a single query AND on the same Version's
  metadata so a draft + published version with disjoint contributor lists
  never combine into a spurious match.

Why 29 separate operators rather than a `contributor: + role:` pair:
independent operators compose cleanly (cross-key AND falls out
naturally; no ambiguity about which role applies to which contributor
when there are multiple). Same precedent as Gmail's `from:`/`to:`/`cc:`.
The 28 role names come straight from `dandischema.RoleType`.

Test: one consolidated test covers catch-all + role-specific lookup,
case-insensitivity, identifier (ORCID + ROR + bare-ID substring),
role-substring matching `dcite:`-prefixed stored values, role + ORCID
composition (positive and negative), and independent cross-role AND.
Plus a separate test for the typo → 400-with-suggestion path.
Anonymous test fixtures use generic Doe placeholders, no real names.

OpenAPI help text and the search popover updated.
The previous commit treated `affiliation:` as a role-name match (looking
for `dcite:Affiliation` in `contributor[].roleName`), but real DANDI data
never uses that role; affiliations live in a separate nested field
`contributor[].affiliation[]`. The operator silently returned 0 hits
despite plenty of (e.g.) Stanford-affiliated contributors.

Fix: route `affiliation:` through a dedicated jsonpath that scans
`$.contributor[*].affiliation[*]` and matches against the affiliation's
`name` OR `identifier` (case-insensitive substring). So:

  affiliation:Stanford                    → matches Stanford University
  affiliation:"University College London"  → quoted multi-word
  affiliation:00f54p054                    → matches via ROR ID substring

Composes with role/contributor operators on the same Version, same as
the other contributor-style operators (independent-operator AND).

Also refactored `_apply_contributor_filters` to accept a list of
(where, params) pairs rather than (value, role) — cleaner since both
the role-based and affiliation operators now share the same dispatch.
Per review: `other:` would be a thin surface for "uncategorized
contributors" — not a useful filter — and `ethics_approval:` isn't a
contributor-style role users would search by. Removing them tightens
the operator vocabulary to the 25 substantive RoleType values + the
contributor catch-all + affiliation.
Two structural improvements + one product trim, in response to the
review on dandi#2822:

1. New `dandiapi/api/services/search/operators.py` (pure Python, no
   Django) holds every operator-vocabulary constant: DATE_OPS,
   ASSET_OPS, OWNER_OPS, AFFILIATION_OPS, CONTRIBUTOR_ROLE_OPS,
   FILE_TYPE_ALIASES, ASSET_NAME_PATH_OPS, AFFILIATION_JSONPATH.
   OPERATOR_KEYS is now the union of those tables — single source of
   truth, no more duplication between parser.py (allowlist) and
   filters.py (dispatch). Adding a new operator is one entry; the
   parser automatically knows about it.

2. Trim the role-restricting shortcuts from 25 to 9. After review
   discussion: most RoleType values aren't operators users actually
   reach for (`conceptualization:`, `methodology:`, `validation:`,
   `visualization:`, etc.). Kept the ones that map to common search
   intents:

     contributor (catch-all), author, contact_person, data_collector,
     data_curator, data_manager, maintainer, project_lead, funder,
     sponsor

   The catch-all `contributor:` still matches anyone in any role; only
   the role-restricting shortcuts are pruned. `project_lead:` is
   intentionally shorter than the schema name `ProjectLeader`.

3. Shrank the verbose docstrings on private filter helpers (the rationale
   stays in commit messages, not as documentation rot on internal API).

4. Added test_contributor_role_ops_match_actual_dandischema_roletype as
   a drift guard: every non-catch-all CONTRIBUTOR_ROLE_OPS value must be
   a real RoleType.name. Renames or removals on the schema side trip
   the test, forcing an explicit decision instead of silently changing
   public search syntax.

OpenAPI help text and the search popover updated to reflect the trimmed
list (`project_lead`, `data_collector`, `data_manager`, `sponsor` now
shown; the misleading "many more" tail removed).
- Variable renames: ds_baker_curator → ds_doe_curator,
  ds_baker_author_only → ds_doe_author_only (the test data was already
  Doe; only the variable names still carried the old name).
- One stale query string `AUTHOR:baker` updated to `AUTHOR:doe`.
- One fixture email field `'jane.doe.com'` (broken: no @) restored to
  `'jane.doe@example.com'` — leftover from the earlier perl rename
  that stripped @example out.
Per dandi#2822 review discussion: the old semantics required all asset
operators to be satisfied by a SINGLE asset, which meant
`species:mouse species:rat` only matched dandisets with a multi-species
recording (rare). The natural user reading is "the dandiset has mouse
data AND has rat data" — those can be on different assets, and that's
the common case for comparative-species dandisets.

Implementation: each asset operator now builds an independent
AssetSearch subquery and the dandiset queryset is filtered with
`id__in=...` per operator. Django generates one subquery per operator
and AND's them at the dandiset level.

Cross-key likewise: `species:mouse approach:electrophysiological` now
matches any dandiset that has SOME mouse asset AND SOME ephys asset,
not just dandisets with a mouse-ephys asset.

Tests updated:
- `test_advanced_search_repeated_same_key_operator_combines_with_and`
  is now `..._combines_at_dandiset_level`, with a new fixture that has
  two separate assets (one mouse, one rat) to actually exercise the
  cross-asset case the old semantic excluded.
- `test_advanced_search_repeated_asset_operators_intersect` is now
  `test_advanced_search_asset_operators_combine_at_dandiset_level`,
  with a similar two-assets-split fixture that demonstrates the new
  inclusive behavior.

Contributor / affiliation semantics unchanged — those still AND on
the same Version's metadata (since contributors live per-version, not
per-asset). Within that single version, predicates can match different
contributor[] entries.
Postgres jsonpath quirk: `like_regex` requires its pattern to be a
STRING LITERAL inside the jsonpath text — not a `$variable`. The
contributor + affiliation builders I wrote tried to use the `vars`
argument of `jsonb_path_exists` for the regex pattern, which Postgres
rejects with `syntax error at or near "$val" of jsonpath input`.

(The asset operators avoid this by concatenating `to_jsonb(?::text)::text`
into the jsonpath at SQL execution time — the regex pattern ends up as
a properly-quoted JSON string literal in the path. The user value is
still bound as a parameter, never inlined into the SQL.)

Refactor: applied the same SQL-time concatenation trick to the contributor
+ affiliation builders. Three new helpers — `_contributor_where`,
`_affiliation_where`, and a shared `_LIKE_REGEX_PATTERN` constant — replace
the old `_contributor_role_jsonpath` + `_build_jsonpath_where` pair that
relied on the broken `vars` mechanism. Removed the unused
`AFFILIATION_JSONPATH` constant from operators.py and dropped the
`json` import from filters.py since we no longer marshal `vars` objects.

Net behavior unchanged; the failing CI tests should pass now.
CI surfaced an assertion that AUTHOR:doe should match the same set as
author:doe. The old _TOKEN_RE / _BARE_OP_RE only accepted lowercase
operator keys, so uppercase tokens fell through to free text and
returned 0 results.

Accept either case in the regex and lowercase the captured key before
validation/dispatch. Matches user expectations (GitHub's search
operators are case-insensitive on the key side too).
Co-authored-by: Isaac To <candleindark@users.noreply.github.com>
Per @candleindark's review: a contributor can be an Organization as well
as a Person, and the affiliation jsonpath (which traverses
`contributor[*].affiliation[*]`) should walk past Organizations
(which have no `affiliation` field of their own) without exploding.

Added Organization contributors to both `ds_stanford` and `ds_ucl`:
NIH as a Funder on ds_stanford and Wellcome Trust as a Funder on
ds_ucl. The new assertions confirm:

- `affiliation:Stanford` (and the other affiliation queries) keep
  working with mixed Person/Organization contributors.
- The Organization's own `identifier` is NOT matched by `affiliation:`
  (it's not an affiliation; the test pins this).
- Cross-key with `funder:NIH affiliation:Stanford` works — different
  contributor elements on the same Version.

Also: used `National Institutes of Health (NIH)` for the org name so
the `funder:NIH` substring test actually matches (the abbreviation
isn't part of the spelled-out form alone). Realistic — DANDI
contributors often use this parenthetical form.
GitHub-issues-style autocomplete: clicking into the search box opens a
dropdown listing all available operators. Typing narrows it to the
operators whose names match the prefix at the cursor. Selecting one
(click, Enter, or Tab) inserts `name:` at the cursor position with the
caret placed right after the colon, ready for the value.

Implementation:

- New `web/src/components/advancedSearchOperators.ts` module holds the
  canonical operator catalog (name, description, value example) and two
  helpers — `tokenAtCursor()` to find the token straddling the caret,
  and `suggestionsFor()` to filter the catalog by prefix. Mirrors the
  backend allowlist in `dandiapi/api/services/search/operators.py`;
  the backend stays the source of truth (it validates and returns
  "Did you mean?" suggestions for unknown keys).

- `DandisetSearchField.vue` adds a `v-menu` anchored to the wrapping
  form, controlled manually via `autocompleteOpen` so we can drive
  visibility from focus / input / cursor changes. Arrow-up/down moves
  the highlighted suggestion; Enter and Tab both complete; Esc
  dismisses; click selects. `mousedown.prevent` on list items keeps
  focus in the input across the click.

- The token-at-cursor logic uses the same whitespace-bounded token
  definition as the backend parser, so what the user sees in the
  dropdown matches what the parser will recognize. If the user has
  already typed a colon (i.e. they're typing the value), suggestions
  are suppressed — this PR doesn't try to autocomplete values yet
  (that's the discovery-API follow-up @yarikoptic suggested).

The existing `?` help popover is unchanged. The autocomplete and the
help popover are complementary: the popover is a static cheat-sheet
with examples; the autocomplete is interactive.
The autocomplete v-menu was rendering inline (because of `attach`), making
it a sibling of the results panel ("0 results found") that sits below the
search field — so the panel's stacking context could occlude it.

Fix: drop `attach` so Vuetify teleports the menu to the document body
(the default), escaping all local stacking contexts. To keep the
dropdown's width matching the search field, capture the form element's
`clientWidth` on mount and on window resize, and bind it to the menu's
`min-width`/`max-width`.
Previously, Enter selected the highlighted suggestion when the dropdown
was open. That made it impossible to actually search for free text that
happens to be a prefix of an operator name (e.g. typing `publ` and
hitting Enter would auto-complete to `published_after:` instead of
running the search).

Reserve Enter for "submit search" unconditionally. Tab and click are
still the explicit completion gestures — they're the universal
keyboard / mouse autocomplete idioms anyway.
@bendichter

Copy link
Copy Markdown
Member Author
Screen.Recording.2026-05-13.at.6.44.19.PM.mov

@bendichter

Copy link
Copy Markdown
Member Author

@yarikoptic

For species, approach, technique, standard, and file_type — when the
caret moves past the colon, the dropdown switches into value mode and
shows matching values. species values come from the existing
`/search/species` endpoint (debounced, sequence-checked); the others
use short static lists. Multi-word values are auto-quoted on insert so
the parser sees them as a single token.

Picking a key still inserts `name:` and leaves the caret right after
the colon — so the user immediately sees the relevant value list with
no extra keystroke.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@bendichter

Copy link
Copy Markdown
Member Author

Update — value-mode autocomplete (5920f05)

The dropdown now switches into value mode when the caret moves past : for any operator with a known vocabulary:

Operator Source
species: live fetch from /search/species?species=PREFIX (debounced 200 ms, sequence-checked so a slow earlier reply can't overwrite a faster later one)
approach: static list (4 values from real metadata)
technique: static list (~11 common measurement techniques)
standard: static list (NWB, BIDS, NIfTI)
file_type: static list (nwb, image, text, video)

Multi-word suggestions get auto-quoted on insert (technique:"spike sorting technique"). All other operators (dates, contributors, owner, affiliation) close the dropdown after the colon — same as before.

UX flow now reads end-to-end without the user ever typing past a colon by hand: pick species from the key dropdown → caret lands after : → value dropdown appears immediately with live matches → click → search.

@bendichter

Copy link
Copy Markdown
Member Author
Screen.Recording.2026-05-13.at.6.47.01.PM.mov

@bendichter

Copy link
Copy Markdown
Member Author

I know you are going to complain about a lot of this being hard-coded. I agree, but I just wanted to get the UX down first

@yarikoptic

Copy link
Copy Markdown
Member

Thanks for taking it on. Let's finalize that #2822 to make this reviewable

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants