About a week ago at work I was looking at a support email about a latent process that parses large volumes of email. During the ensuing investigation it dawned on me we had no metrics around the process. I am not a fan of crap shot guesses.
I purposed we spec a data base (AKA DB) table and code out a change to an upstream process that feed the troubled process so we could start gathering some basic metrics: filename, hostname, size_bytes, and count_msgs.
I hastily began cobbling the idea together and about a day and a half in it dawned on me in a quite moment – the initial design was inept.
For starters the DB table would only store data for one process. But the message data being parsed was part of a broader data pipeline. The make up of the pipeline was a bit vague like the developing wave below, but a it has the hallmarks of an ETL system.

The solution would need a few tweaks:
- Three data base tables – note: host table already exists
- msg_master
- file_id
- file_name
- file_dt
- start_dt
- end_dt
- created_dt
- updated_dt
- msg_proc_metrics
- proc_id
- host_id
- file_id
- size_bytes
- count_msgs
- runtime
- version
- created_dt
- msg_procs
- proc_id
- procname
- msg_master
- Directory for all processes in the ETL system to write metrics files to.
- A script, cron scheduled, to load metrics files to the data base.
The new DB model would enable any process to store metrics and normalizes fairly well. Writing metrics to JSON files in a directory insulates ETL processes from DB latency. I am a big fan of JSON as the data is more portable and readable with a size cost. However, were only dealing with ~25,000 records/day which is ideal for JSON. Avro would be a good data serialization solution if the records numbers were a few order of magnitude larger.
With a better functional framework designed the bit rot front line troops may be getting some better weapons soon.
- Monitoring
- Alerting
- Performance Analysis
- Trending
- Quantitative Analysis