Great-to-Haves — Handled
elasticsearch with fscrawler handled these well for my needs.
- Low-touch / unattended ingestion
- Sensible, configurable defaults
- Helpful, accurate dry runs
- Non-catastrophic re-runs (i.e. smart enough to minimize overwriting or duplicating existing entries)
- Clever de-duping
- Customizable / scriptable input and output handling
- File meta data capture
- Full-Text indexing of file content
“Clever de-duping” is TBD, and starting with rsync or rclone helps there. elasticsearch runs lean enough and is straightforward enough to configure for my local dev env. Defaults are sensible, and re-building indexes is a matter of
fscrawler job_name --loop 1 --restart
These help balancing up-front config time with GIGO and “we’ll take care of it in post-production.”
Eye-balling ingestion and indexing processes is fine for seeing initial results, tweaking, and discovering more as index searches yield more results. Getting media consolidated and indexed locally was one set of goals met. Locating assets I needed for other work was another win, and Kibana surfaced more follow-up opportunities than log viewing alone.