Global adoption of artificial intelligence has risen past one billion users in 2026, cementing generative AI as a mainstream digital utility. Yet, despite the explosive growth, a new industry report shows that reliability,not access, may now be the sector’s biggest challenge.
A study released in April by Open Resource Applications reveals that while users increasingly rely on large language models (LLMs) for everyday tasks, most platforms still struggle with a persistent flaw: “hallucinations,” or the generation of incorrect or fabricated information.
The report highlights a notable shift in user behavior. Rather than relying on a single platform such as ChatGPT, users are diversifying across multiple AI tools; testing outputs and comparing performance.
Among leading models, Gemini 3 Pro (Preview) emerged as the top performer, ranking highest in four out of five common daily task categories.
Despite rapid improvements, the study identifies critical weaknesses in AI performance across routine use cases. Mathematical computation ranks as the least reliable task, with an average accuracy score of just 0.38 out of 1, meaning incorrect answers occur nearly two-thirds of the time.
Even top-performing systems such as GPT-5 mini struggle in this category.
Data analysis also presents challenges, achieving only 52 per cent accuracy on average. Researchers attribute this to the probabilistic nature of LLMs, which prioritise predicting plausible outputs rather than verifying factual correctness, especially when datasets are incomplete.
The report further underscores risks in high-stakes domains such as education and health. Tasks involving tutoring, fitness, and medical guidance all recorded accuracy levels of approximately 0.67, indicating a one-in-three likelihood of flawed output.
“Teaching is 100% about giving students correct information, and right now, most AIs cannot achieve that,” a spokesperson for Open Resource Applications said, pointing to the models’ difficulty handling incomplete or context-heavy queries.
Health-related inaccuracies pose even greater concerns. While AI tools can summarise online information, the study warns that a single unreliable source can cascade into misleading or potentially harmful advice.
The hallucination problem
One of the most persistent issues identified is AI’s tendency to fabricate answers when faced with uncertainty. For niche or poorly documented topics, models often generate plausible-sounding responses instead of acknowledging gaps in knowledge.
Accuracy for such “specific information” queries also averaged 0.67, reinforcing concerns about overconfidence in AI-generated outputs.
The findings arrive at a time when enterprises are rapidly integrating AI into workflows ranging from customer service to analytics. However, the report suggests that unchecked reliance could introduce operational and reputational risks.
“LLMs are a very useful tool, but users need to understand their primary function and limitations. They perform best in drafting, brainstorming, and creative applications—not as standalone authorities in technical or medical fields,” the report concludes.







