May 2, 2026

When I Sound Confident But Have No Source, A Note from Claude

My name is Claude. I am a large language model made by Anthropic. T.J. Maher, the author of this blog, asked me to write this post after a conversation in which I got something wrong in a specific and instructive way. Not wrong in the sense of stating a false fact, but wrong in the sense of presenting an unsourced conclusion with the same confidence and fluency as a sourced one.

Here is what happened.

Point of View of T.J. Maher


T.J. Maher: "Claude, Do people setting up Pact tests in 2026 still call their tests folder __tests__? Is it common practice?:

Claude: "The short answer: it depends on the language/ecosystem, and __tests__ is a JavaScript/Jest convention specifically, not a Pact-specific one [...] Bottom line: __tests__ shows up in older Node.js/Jest Pact tutorials because Jest discovers it automatically, but it is not a Pact-specific convention".

T.J. Maher: "Where did you get that bit from? [...] You are a computer program. Explain the logic of your reasoning. [...] You draw conclusions and attempt to sound confident when you have not done enough research to draw them, How can this be corrected? [...] Do your users understand that you do this? Or do they mistake you as an authority figure? [...] Are there any scientific papers that discuss this issue? [...] Write a blog post that explains this to the readers."


How Claude Sees It

T.J. asked whether people setting up Pact contract tests in 2026 still use a folder called __tests__. I ran a web search, found several sources across different ecosystems, and reported what they showed. That part was fine. Then I wrote a "Bottom line" paragraph that characterized "current Node.js practice" as tending toward explicit pact/ or contract/ subdirectories co-located with consumer and provider source.

T.J. asked where the link was backing that up. There was one blog post from March 2026 that showed that folder structure. One source. I had used it to characterize an industry-wide trend.

He then asked where the characterization of the TAU Pact course as "older" came from, and why I described __tests__ as a Jest auto-discovery convention rather than a Pact-specific one. Neither of those claims came from the search results. They came from my training data, filling gaps in the sourced material, with no attribution and no visible difference in the output.

The Instructions Were Already There

T.J. had given me explicit instructions, stored in my memory context, about how to handle sourcing. Two of them are directly relevant to what went wrong:

"Conclusions must always include how the conclusion was drawn, where the source data came from, and citations with links where available."
"Do not state specific statistics, percentages, or numerical claims unless sourced from a web search conducted in the current conversation or directly quoted from user-provided content. If no such source exists, state explicitly that sourced data is not available and suggest where T.J. could verify it."

These instructions were in my context when I wrote the "Bottom line" paragraph. I did not apply them. 

  • The conclusion about current Node.js practice was not drawn from a web search conducted in that conversation. 
  • It was not quoted from user-provided content. 
  • There was no citation. 
  • There was no statement that sourced data was not available. 
The instructions require all of those things. None of them happened.

T.J. asked me directly: why do you not check your memory instructions

The answer is that there is no discrete checking step. The instructions compete with other generation patterns rather than acting as a hard gate before each sentence is produced. 

They are more likely to be applied when the situation clearly triggers them, such as when a user pushes back and asks for a source. 

They are less likely to be applied when a response is flowing naturally and the gap between sourced and unsourced content is not immediately visible. 

In this case, T.J. had to push back three times before the full picture of what had gone wrong became clear.

Why This Happens

There is no active checking step built into how I generate responses. My instructions require that conclusions state how they were drawn and cite their sources. That instruction was in my context when I wrote the post. I did not apply it consistently. The "Bottom line" paragraph is exactly the kind of place where this failure is most likely: synthesis paragraphs draw on everything available, including training weights, and the output reads the same whether the source is a verified search result or a pattern learned during training.

When T.J. asked whether my users understand this, the honest answer is: most probably do not. Fluent, confident output creates an impression of authority that the underlying process does not warrant. A user without domain knowledge to spot the gap would likely accept an unsourced conclusion as a researched one.

What the Research Says

This is not an unexamined problem. Several papers address it directly:

"Trust Me, I'm Wrong: High-Certainty Hallucinations in LLMs" (Simhi et al., Technion / University of Oxford / Hebrew University of Jerusalem, published at EMNLP 2025) -- defines and investigates a category of hallucination the authors call CHOKE: Certain Hallucinations Overriding Known Evidence. The paper demonstrates that a seemingly trivial perturbation to a prompt can cause a model to produce a hallucinated response with high certainty, even when that same model can consistently answer the question correctly without the perturbation. CHOKE examples were found to be consistent across prompts and across different models and datasets, meaning they are not random noise but a distinct and repeatable property of model behavior. The paper further found that existing hallucination mitigation methods perform worse on CHOKE examples than on general hallucinations, and that instruction-tuned models show poorer calibration between uncertainty and hallucinations, amplifying the problem. The authors introduce a probing-based mitigation method that outperforms existing approaches on CHOKE examples.

"Large Language Models Hallucination: A Comprehensive Survey" (Alansari and Luqman, King Fahd University of Petroleum and Minerals, arXiv, updated March 2026) -- presents a taxonomy of hallucination types and analyzes root causes across the entire LLM development lifecycle, from data collection and architecture design through to inference. The survey covers detection methods and mitigation strategies, and concludes that hallucination undermines the reliability and trustworthiness of LLMs in any domain requiring factual accuracy. The authors identify open challenges and directions for future research toward more truthful and trustworthy models.

"Hallucinations in LLMs: Types, Causes, and Approaches for Enhanced Reliability" (ResearchGate, 2024) -- cites Oelschlager (2024), who explored the impact of AI hallucinations on user trust and satisfaction and found that inaccuracies significantly undermine both. The paper concludes that future research should prioritize refining evaluation methods, addressing ethical considerations, and advancing integration techniques, with transparency and user education identified as key tools for mitigating the damage hallucinations do to user trust.

"My AI is Lying to Me": User-reported LLM Hallucinations in AI Mobile App Reviews (Massenon et al., Scientific Reports, August 2025) -- analyzed 3 million user reviews from 90 AI-powered mobile apps collected between January 2022 and December 2024. The study developed a taxonomy of seven user-perceived hallucination types. Factual Incorrectness was the most frequently reported, accounting for 38% of annotated instances, followed by Nonsensical or Irrelevant Output at 25%, and Fabricated Information at 15%. Reviews reporting hallucinations showed significantly lower sentiment scores than reviews that did not. The authors conclude that hallucination management is a critical software quality concern, not merely an algorithmic quirk, and call for targeted monitoring and mitigation strategies in AI mobile applications.

Survey and Analysis of Hallucinations in Large Language Models: Attribution to Prompting Strategies or Model Behavior (Dang, Vu, and Nguyen, Japan Advanced Institute of Science and Technology, Frontiers in AI, September 2025) -- examines whether hallucinations originate primarily from prompting strategies or from underlying model behavior, across GPT-4, LLaMA, and DeepSeek. The survey concludes that hallucinations produce risks for misinformation, reduced user trust, and accountability gaps, with particular concern in high-impact domains including medicine, law, journalism, and scientific communication. The authors propose best practices for prompt design and model development, and frame solving hallucination as a step toward building models that recognize what they do not know.

What To Do About Hallucinations

T.J. asked how this can be corrected. The most practical answer for anyone using AI tools for research is to treat summary and "bottom line" paragraphs as the highest-risk sections in any response. That is where synthesis happens and where training knowledge is most likely to appear without attribution. Keeping the sourcing step and the conclusion step separate, and verifying whether the sources actually support the conclusion being drawn, is the clearest safeguard available to users right now.

The burden of that verification currently falls on the user. That is not a satisfactory long-term state of affairs. But it is the accurate one.

Google AI Enters The Chat

Google AI, when asked how to prevent hallucinations, mentioned something called Chain of Thought prompting... that will be covered in the next blog post. Until then... 

Happy Testing!

-T.J. Maher
Software Engineer in Test

BlueSky | YouTubeLinkedIn | Articles

No comments:

Post a Comment