Lessons
Proposed prompt edits from the audit pipeline. Approve to send the change to a PR; reject to discard.
Awaiting your review (6)
Implement a strict filter or fine-tuning to prevent the model from ever vocalizing text that appears to be a stage direction, such as action descriptions or words in brackets/parentheses.
The TTS engine needs a more robust and natural-sounding library of non-speech sounds (e.g., laughter, sighs) that can be triggered by specific tags like '<laugh type="soft">' instead of relying on the model to interpret text.
Slightly reduce the initial response latency; some pauses before Ruby speaks feel more like system lag than thoughtful silence, particularly the 2-3 second gap before her response at 0:44.
The system must be fixed to parse and execute actions in prompts (e.g., '[laughs softly]') instead of reading them aloud as text. This is a critical, show-stopping bug.
Reduce the model's response latency to prevent unnatural silences that make the user think the line has gone dead.
Develop a more robust recovery strategy for severe AI errors. Blaming 'nerves' is insufficient for something as strange as reading stage directions; a better deflection might be, 'Oh my goodness, my mind is in the clouds today, I don't know what I just said.'