Regex objects not supported

### Description
Docstring of `esm_datastore.search` has an example using `re.compile(...)`. However, this support seems to have broken in the last updates.

```python3
import intake

cat = intake.open_esm_datastore('intake-esm/tutorial-catalogs/AWS-CMIP6.json')

# Get institutions that do not start with M
cat.search(institution_id=re.compile('^(?!M.*)')
```
Fails with :
```python3
TypeError                                 Traceback (most recent call last)
Cell In[37], line 1
----> 1 cat2.search(institution_id=re.compile('^(?!M.*)')).df

File ~/miniforge3/envs/intesm-dev/lib/python3.14/site-packages/pydantic/_internal/_validate_call.py:40, in update_wrapper_attributes.<locals>.wrapper_function(*args, **kwargs)
     38 @functools.wraps(wrapped)
     39 def wrapper_function(*args, **kwargs):
---> 40     return wrapper(*args, **kwargs)

File ~/miniforge3/envs/intesm-dev/lib/python3.14/site-packages/pydantic/_internal/_validate_call.py:137, in ValidateCallWrapper.__call__(self, *args, **kwargs)
    134 if not self.__pydantic_complete__:
    135     self._create_validators()
--> 137 res = self.__pydantic_validator__.validate_python(pydantic_core.ArgsKwargs(args, kwargs))
    138 if self.__return_pydantic_validator__:
    139     return self.__return_pydantic_validator__(res)

File ~/Projets/intake-esm/intake_esm/core.py:462, in esm_datastore.search(self, require_all_on, **query)
    406 """Search for entries in the catalog.
    407 
    408 Parameters
   (...)    458 4    landCoverFrac
    459 """
    461 # step 1: Search in the base/main catalog
--> 462 esmcat_results = self.esmcat.search(require_all_on=require_all_on, query=query)
    464 # step 2: Search for entries required to derive variables in the derived catalogs
    465 # This requires a bit of a hack i.e. the user has to specify the variable in the query
    466 derivedcat_results = []

File ~/Projets/intake-esm/intake_esm/cat.py:443, in ESMCatalogModel.search(self, query, require_all_on)
    415 """
    416 Search for entries in the catalog.
    417 
   (...)    432 
    433 """
    435 _query = (
    436     query
    437     if isinstance(query, QueryModel)
   (...)    440     )
    441 )
--> 443 results = search(
    444     df=self.df, query=_query.query, columns_with_iterables=self.columns_with_iterables
    445 )
    446 if _query.require_all_on is not None and not results.empty:
    447     results = search_apply_require_all_on(
    448         df=results,
    449         query=_query.query,
    450         require_all_on=_query.require_all_on,
    451         columns_with_iterables=self.columns_with_iterables,
    452     )

File ~/Projets/intake-esm/intake_esm/_search.py:46, in search(df, query, columns_with_iterables)
     42 column_is_stringtype = isinstance(
     43     df[column].dtype, object | pd.core.arrays.string_.StringDtype
     44 )
     45 column_has_iterables = column in columns_with_iterables
---> 46 for value in values:
     47     if column_has_iterables:
     48         mask = df[column].str.contains(value, regex=False)

TypeError: 're.Pattern' object is not iterable
```

### Case with PyArrow
However, I don't think this is fixable. When opening a catalog from a csv file, the resulting dataframe has string columns with a `large_string[pyarrow]` dtype. Pandas will then delegate the pattern matching to pyarrow, which doesn't support re objects.

```python3
cat = intake.open_esm_datastore('intake-esm/tests/sample-catalogs/cesm1-lens-netcdf.json')
cat.df.experiment.str.contains(re.compile('^C.*'))
```
fails with `TypeError: expected bytes, re.Pattern found`

Moreover, pyarrow uses a different regex than python. It uses Google RE2. A major difference (to me atleast) is the absence of negative matches in Google RE2. For example, `^(?!CCCma.*)` (match strings not starting with "CCCma") is not a valid pattern. 

### How to fix
I think we could simply remove the example using `re.compile` from the documentation and note somewhere that because of PyArrow usage, intake-esm's search function only officially supports the intersection between python's and Google RE2's regex syntaxes.

### Version information: output of `intake_esm.show_versions()`

<details>

Paste the output of `intake_esm.show_versions()` here:

```python
INSTALLED VERSIONS
------------------

cftime: 1.6.5
dask: 2026.3.0
fastprogress: 1.1.3
fsspec: 2026.2.0
gcsfs: 2026.2.0
intake: 2.0.9
intake_esm: 2025.12.12.post7+g414c4cfc1
netCDF4: 1.7.4
pandas: 3.0.2
requests: 2.33.1
s3fs: 2026.2.0
xarray: 2026.4.0
zarr: 3.1.6
```

</details>


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Regex objects not supported #783

Description

Case with PyArrow

How to fix

Version information: output of `intake_esm.show_versions()`

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Regex objects not supported #783

Description

Description

Case with PyArrow

How to fix

Version information: output of intake_esm.show_versions()

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Version information: output of `intake_esm.show_versions()`