The process is pretty informal, so there are no hard requirements. that said:
- Failures per day: let's say 0-100
- How long to retain records: no hard requirements? I guess a few months at least, some failures are pretty rare
- What's the lifecycle of a failure? Scripts record it, team members investigate it and assign to "root cause".
- Custom scripts:
(1) create ticket per failure
(2) create failure reports (to prioritize work - for example if there were 50 failure reports with root cause of 'github was down' , the priority of "set up github mirror" will get bumped up)
(3) mass-update tickets (for example if github.com is down, there will be few dozens of failed processes because of that)
(4) handle rules for automatic classification (again, if github.com is down, it'd be lovely if I can have a rule: "for the next 48 hours, every ticket which mentions github.com and 503 is auto-assigned to 'github was down' root cause")
- SSO, audit, compliance: nice but not required
- JIRA problems: search sucks. "Find similar ticket" sucks. Rules are missing (or need admin). Even something as simple as "close those 20 tickets and link them all to ABC-1234" is impossible.
- Google sheets: not enough automation. At least I can do "filter rows, copy-paste the 'root cause' field into all of them", and it is pretty fast, but: multi-line outputs don't look good and there are no automation (we did not explore App Script, maybe we should have...)
And yeah, I am getting the feeling this would be a custom job. We have resources in house to do so, but I was hoping there was an existing product. Surely there are people out there who run batch-like jobs and want them to be reliable? Something like data conversion jobs, CI builds, training jobs, etc...
Perhaps it's a good thing for generative AI, I've heard it's pretty good at making websites (and security/availability is not an issue, as this will be internal website not exposed to internet). Or I may revisit Google's App Script...
Thanks for your reply. I suggest looking at Airtable and _maybe_ Linear. They have API and automations. You could likely get AI to rewrite your scripts.
If those don't work, you may have a business case for building it.
I'm a founder and dev looking to for a good problem to solve. If the need could be proven (e.g. 10 people with decision power said they wanted it), I'd consider making it.
- Failures per day: let's say 0-100
- How long to retain records: no hard requirements? I guess a few months at least, some failures are pretty rare
- What's the lifecycle of a failure? Scripts record it, team members investigate it and assign to "root cause".
- Custom scripts:
(1) create ticket per failure
(2) create failure reports (to prioritize work - for example if there were 50 failure reports with root cause of 'github was down' , the priority of "set up github mirror" will get bumped up)
(3) mass-update tickets (for example if github.com is down, there will be few dozens of failed processes because of that)
(4) handle rules for automatic classification (again, if github.com is down, it'd be lovely if I can have a rule: "for the next 48 hours, every ticket which mentions github.com and 503 is auto-assigned to 'github was down' root cause")
- SSO, audit, compliance: nice but not required
- JIRA problems: search sucks. "Find similar ticket" sucks. Rules are missing (or need admin). Even something as simple as "close those 20 tickets and link them all to ABC-1234" is impossible.
- Google sheets: not enough automation. At least I can do "filter rows, copy-paste the 'root cause' field into all of them", and it is pretty fast, but: multi-line outputs don't look good and there are no automation (we did not explore App Script, maybe we should have...)
And yeah, I am getting the feeling this would be a custom job. We have resources in house to do so, but I was hoping there was an existing product. Surely there are people out there who run batch-like jobs and want them to be reliable? Something like data conversion jobs, CI builds, training jobs, etc...
Perhaps it's a good thing for generative AI, I've heard it's pretty good at making websites (and security/availability is not an issue, as this will be internal website not exposed to internet). Or I may revisit Google's App Script...