All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
- Log much less from
falconeri_workerby default, and make it configurable. This fixes an issue where the newer tracing code was causing the worker to log far too much.
- This version hard-coded a very low logging level. It was yanked because the low logging level would have made it impossible to debug falconeri issues discovered in the field, and because it was never fully released.
- Prevent key constraint error when retrying failed datums (Issue #33). But see Issue #36; we still don't do the right thing when output files are randomly named.
- Reduce odds of birthday paradox collision when naming jobs (Issue #35).
- Hard-code PostgreSQL version to prevent it from getting accidentally upgraded by Kubernetes.
- Use correct file name to upload release assets (again).
- Use correct file name to upload release assets.
- Attempted to fix binary builds on Linux (yet again).
- Attempted to fix binary builds on Linux (again).
- Attempted to fix binary builds on Linux. Not even trying on the Mac.
- Work around issue where
--field-selectordidn't find all running pods, resulting in accidental worker terminations.
- Fix
job_timeoutconversion tottlActiveSecondsin the Kubernetes YAML.
This release adds a "babysitter" process inside each falconerid. We use this to monitor jobs and datums, and detect and/or recover from various types of errors. Updating an existing cluster should be fine, but it's likely to spend a minute or two detecting and marking problems with old jobs. So please exercise appropriate caution.
We plan to stabilize a falconeri 1.0 with approximately this feature set. It has been in production for years, and the babysitter was the last missing critical feature.
- If worker pod disappears off the cluster while processing a datum, detect this and set the datum to
status = Status::Error. This is handled automatically by a "babysitter" thread infalconerid. - Add support for
datum_triesin the pipeline JSON. Set this to 2, 3, etc., to automatically retry failed datums. This is also handled by the babysitter. - Periodically check to see whether a job has finished without being correctly marked as such. This is mostly intended to clean up existing clusters.
- Periodically check to see whether a Kubernetes job has unexpectedly disappeared, and mark the corresponding
falconerijob as having failed. - Add trace spans for most low-level database access.
- We now correctly update
updated_aton all tables that have it.
- Wrote some basic developer documentation to supplement the
justfiles. - Allow specifying
--falconerid-log-levelforfalconeri deploy. This uses standardRUST_LOGsyntax, as described in the CLI help.
- Cleaned up tracing output a bit.
- Switched to using
rustlsfor HTTPS. Database connections still indirectly require OpenSSL thanks tolibpq.
- Attempt to fix TravisCI binary releases.
- Don't show interactive progress bar when uploading outputs.
- Support
job_timeoutin pipeline schemas. This allows you to specify when an entire job should be stopped, even if it isn't done. Values include "300s", "2h", "2d", etc. - Add much better tracing support when
RUST_LOG=traceis passed.
- We update most of our dependencies, including Rust libraries and our Docker base images. But this shouldn't affect normal use.
- Set
ttlSecondsAfterFinishedto 1 day so that old jobs don't hang around forever on the backplane wasting storage.