Saad

I build instruments that hold machines accountable: measured before they ship, logged while they run, capped in what they spend.

The work is public:

llm-reliability-evals — open evals for how LLMs fail at ordinary work; five frontier models measured, every verdict carries its evidence. The write-up.
audit-event-mcp — hash-chained audit log for AI agents; tamper-evident records of what an agent actually did.
gvnr — budget and cost control for AI agents; hard per-agent spend caps and rate limits.
sovereignty-scan-mcp — EU AI Act vendor sovereignty scanning.
The audit — I run your agent through the eval instrument and hand you its failure fingerprint. First three clients, $1,900 flat.