Saad
I build instruments that hold machines accountable: measured before they ship, logged while
they run, capped in what they spend.
The work is public:
- llm-reliability-evals — open evals for
how LLMs fail at ordinary work; five frontier models measured, every verdict carries its
evidence. The write-up.
- audit-event-mcp — hash-chained audit log for AI
agents; tamper-evident records of what an agent actually did.
- gvnr — budget and cost control for AI agents; hard
per-agent spend caps and rate limits.
- sovereignty-scan-mcp — EU AI Act vendor
sovereignty scanning.
- The audit — I run your agent through the eval
instrument and hand you its failure fingerprint. First three clients, $1,900 flat.