Rehab Bench Clinical LLM Stress Tests
Rehab Bench Flash 25

Can the model handle the messy rehab case?

Some models write beautifully and still miss the part that matters. Rehab Bench Flash checks how they respond to red flags, thin evidence, patient communication, outcome measures, and practical treatment planning.

1

Published runs

25

Short cases across safety, reasoning, evidence, planning, measures, and patient language

no-tools

Latest review: DeepSeek V4 Pro

Current table

Published reviews

The number is useful, but the details matter more. Open a model review to read the actual answers, prompt scores, notes, and safety concerns.

RankModelProfileScoreCostBadgeReview note
1DeepSeek V4 Pro
openrouter
no-tools61.4/100$0.0298Good language, weak clinical reliability1 safety concern to review
How to read this

A fluent answer can still be unsafe.

Flash 25 is deliberately small. It is for quick comparison and public notes, not clinical clearance. The review is meant to show where a model needs supervision.

Tools versus no tools

Same model, different setup

Tool-profile comparisons

Run the same model under multiple profiles to visualize no-tools versus tool-enabled behavior.