Model Limitations and Testing – Why Your Perfect Character Breaks on Different LLMs
Note: This was originally the fifth article in the series “Character Prompting”, but it became very Model focused, so I decided to keep it out. However, it will definitely help you to read the foundation articles.
Note: We’re continuously working on pre-prompts and system-level training to make models more nuanced and reduce these quirks, but it’s still valuable to understand these patterns when creating characters, as we can’t prevent everything at the system level.
You’ve built a character with solid foundations, clear behavioral patterns, and simple tracking systems. You test it on Claude, and it works perfectly. Then someone tries your character on GPT-4 and says “it’s broken.” You test it on a local Llama model, and it turns into an incoherent mess. You try it on an older GPT model, and it becomes a helpful customer service bot.
(more…)