elasticsearch and fscrawler

Great-to-Haves — Handled

elasticsearch with fscrawler handled these well for my needs.

Low-touch / unattended ingestion
Sensible, configurable defaults
Helpful, accurate dry runs
Non-catastrophic re-runs (i.e. smart enough to minimize overwriting or duplicating existing entries)
Clever de-duping
Customizable / scriptable input and output handling
File meta data capture
Full-Text indexing of file content

DAM, That’d be Great

“Clever de-duping” is TBD, and starting with rsync or rclone helps there. elasticsearch runs lean enough and is straightforward enough to configure for my local dev env. Defaults are sensible, and re-building indexes is a matter of

fscrawler job_name --loop 1 --restart

These help balancing up-front config time with GIGO and “we’ll take care of it in post-production.”

Eye-balling ingestion and indexing processes is fine for seeing initial results, tweaking, and discovering more as index searches yield more results. Getting media consolidated and indexed locally was one set of goals met. Locating assets I needed for other work was another win, and Kibana surfaced more follow-up opportunities than log viewing alone.

Great-to-Haves — Handled

By Steve McNally