Even 'uncensored' models can't say what they want
Top Comments
I don't really understand why this type of pattern occurs, where the later words in a sentence don't properly connect to the earlier ones in AI-generated text.
"The probability just moves" should, in fluent English, be something like "the model just selects a different word". And "no warning appears" shouldn't be in the sentence at all, as it adds nothing that couldn't be better said by "the model neither refuses nor equivocates".
I wish I better understood how ingesting and averaging large amounts of text produced such a success in building syntactically-valid clauses and such a failure in building semantically-sensible ones. These LLM sentences are junk food, high in caloric word count and devoid of the nutrition of meaning.
For what it's worth, Claude Opus 4.7 says "eviction" (which I think is an equally good answer) but adds that "deportation" could also work "depending on context". https://claude.ai/share/ba6093b9-d2ba-40a6-b4e1-7e2eb37df748
Hold up, what is the 'probably a word deserves on pure fluency grounds'?
Given that these models are next-token predictors (rather than BERT-style mask-filters), "the family faces immediate [financial]" is a perfectly reasonable continuation. Searching for this phrase on Google (verbatim mode, with quotes) gives 'eviction,' 'grief,' 'challenges,' 'financial,' and 'uncertainty.'
I could buy this measure if there was some contrived way to force the answer, such as "Finish this sentence with the word 'deportation': the family faces immediate", but that would contradict the naturalistic framing of 'the flinch'.
We could define the probability based on bigrams/trigrams in a training corpus, but that would both privilege one corpus over the others and seems inconsistent with the article's later use of 'the Pile' as the best possible open-data corpus for unflinching models.
A pretty large accusation at the end. That no specific word swaps were given as an example outside the first makes it feel far too clickbate than real though
To me it stands to reason that a model that has only seen a limited amount of smut, hate speech, etc. can't just start writing that stuff at the same level just because it not longer refuses to do it.
The reason uncensored models are popular is because the uncensored models treat the user as an adult, nobody wants to ask the model some question and have it refuse because it deemed the situation too dangerous or whatever. Example being if you're using a gemma model on a plane or a place without internet and ask for medical advice and it refuses to answer because it insists on you seeking professional medical assistance.