Rehab Bench Clinical LLM Stress Tests
Deep dives

Benchmark notes.

Longer writeups can sit here when a model needs more than a scorecard: tool-use behavior, evidence traps, safety misses, and domain-level analysis.