#03 A House with Nothing but a Skeleton
#03 A House with Nothing but a Skeleton I scored the first prototype and got 4.1 out of 10. I had Claude Sonnet play the role of "an expert who distinguishes humans from AI" and evaluate my output....

Source: DEV Community
#03 A House with Nothing but a Skeleton I scored the first prototype and got 4.1 out of 10. I had Claude Sonnet play the role of "an expert who distinguishes humans from AI" and evaluate my output. I call it the LLM Judge. It looks at three metrics. HL (Human-Likeness) SV (Stylistic Variance — lower is better) TN (Timing Naturalness) HL 4.1, SV 0.64, TN 4.1. I built it myself, scored it myself, and these were the numbers. You Can't Live in a Blueprint The cause was immediately obvious. process_message() was only returning parameters — it wasn't generating any text. Emotional state, recommended style, response delay — it was producing blueprints, but the words reflecting them were nowhere to be found. It was a skeleton with no house built on it. 4 points was the natural ceiling. I integrated the Anthropic API. I passed the emotional state into the system prompt and had it generate text. HL jumped from 4.1 to 6.1. But TN dropped from 4.1 to 3.5. The API response was too fast. A message c