Two layers. First, a fast keyword detector scores the audio: phrases like "thanks for holding" or "my name is" boost the human score; "please hold," "your call is important," and music cues lower it.
For ambiguous cases (which is most of them), the audio gets sent to Claude with the last 20 transcript entries as context. Claude decides: human, AI agent, hold message, IVR menu, or wait. The dual-layer design is fast for the obvious cases and accurate for the tricky ones.
Yes, sometimes she gets it wrong. Every call you give feedback on (👍/👎) helps tune the prompts.