AI’s Genius? 🤯 Truth or Hallucination? 🤔

April 08, 2026| AuthorABR-INSIGHTS Tech Hub

🎧 Audio Summaries

🛒 Shop on Amazon

🧠Quick Intel

AI Overviews, Google’s Gemini-powered search robot, has faced criticism for its accuracy since its 2024 launch, providing the correct answer only 90% of the time, leaving 10% of responses inaccurate.
A new analysis conducted by The New York Times, in collaboration with Oumi, utilized the SimpleQA benchmark consisting of over 4,000 questions with verifiable answers to assess generative models like Gemini.
Initially run with Gemini 2.5, the SimpleQA benchmark revealed an 85% accuracy rate, while a subsequent rerun following the Gemini 3 update showed an 91% accuracy.
The shift to the Gemini 3 update resulted in a notable increase in accuracy for AI Overviews, reaching 91% on the SimpleQA benchmark, but extrapolating the miss rate to all Google searches suggests tens of millions of incorrect answers generated daily.
When asked about Bob Marley’s former home, AI Overviews provided three pages, two of which contained no date information, and the third offered a contradictory year.
Google’s spokesperson Ned Adriance criticized SimpleQA, arguing that it contains incorrect information and doesn’t reflect actual user searches.
Google employs multiple AI models, including Gemini 3.1 Pro and faster Gemini Flash models, for different queries to address hallucinations and speed constraints.
The Times’ analysis revealed discrepancies between the AI’s output and readily available information, with Google prominently displaying the disclaimer: “AI can make mistakes, so double-check responses.”

📝Summary

A recent analysis by The New York Times, utilizing the SimpleQA test with over 4,000 questions, revealed Gemini-powered AI Overviews achieved 90 percent accuracy. Initial tests with Gemini 2.5 showed 85 percent, rising to 91 percent after the Gemini 3 update. However, the analysis highlighted that one in ten answers were incorrect, as exemplified by misidentifying the date of Bob Marley’s former home museum. Google spokesperson Ned Adriance expressed reservations about the testing methodology, noting discrepancies regarding Yo Yo Ma’s induction into the classical music hall of fame. Google’s internal benchmarks, measuring factuality between 60 and 80 percent, were conducted without web search. Ultimately, the study’s limitations and potential for AI hallucination underscore the continued need for users to independently verify AI-generated responses.

💡Insights

▼

THE RISE OF AI OVERVIEWS: A SCANTY ACCURACY
AI Overviews, Google’s Gemini-powered search robot, has faced criticism for its accuracy since its 2024 launch. Despite improvements, the system still struggles, providing the correct answer only 90% of the time, leaving 10% of responses inaccurate—a staggering number of “lies” circulating globally. This issue highlights the challenges of relying on AI for information retrieval.

THE OUMI ANALYSIS: A RIGOROUS TEST
A new analysis conducted by The New York Times, in collaboration with the startup Oumi, aimed to quantify the accuracy of AI Overviews. Oumi utilized the SimpleQA evaluation, a benchmark consisting of over 4,000 questions with verifiable answers, to assess generative models like Gemini. Initially run with Gemini 2.5, the benchmark revealed an 85% accuracy rate. A subsequent rerun following the Gemini 3 update showed an impressive 91% accuracy. This analysis provides a concrete metric for evaluating AI Overviews' performance.

GEMINI 3 AND BEYOND: ACCURACY INCREASES, BUT…
The shift to the Gemini 3 update resulted in a notable increase in accuracy for AI Overviews, reaching 91% on the SimpleQA benchmark. However, this improvement doesn’t eliminate the problem. Extrapolating the miss rate to all Google searches suggests tens of millions of incorrect answers generated daily. The core issue remains that while the underlying models improve, the system’s reliance on faster, less precise models for typical search queries contributes to the ongoing inaccuracies.

CASE STUDIES: WHEN AI OVERVIEWS FAILS
Several examples illustrate the shortcomings of AI Overviews. When asked about the date Bob Marley’s former home became a museum, the AI confidently provided three pages, two of which contained no date information, and the third offered a contradictory year. Similarly, when prompted about Yo Yo Ma’s induction into the Classical Music Hall of Fame, the AI asserted the non-existence of the institution despite citing its website. These instances demonstrate the AI’s propensity to fabricate information or select incorrect details.

GOOGLE’S RESPONSE AND THE COMPLEXITIES OF AI EVALUATION
Google’s response to the Times’ report highlights the challenges in evaluating AI models. Spokesperson Ned Adriance criticized SimpleQA, arguing that it contains incorrect information and doesn’t reflect actual user searches. Google utilizes a similar test, SimpleQA Verified, employing a smaller, more vetted set of questions. Furthermore, the evaluation process itself is often subjective, with companies employing different methodologies to showcase their models’ capabilities. The non-deterministic nature of generative AI adds another layer of complexity, as models can provide correct answers one moment and miss them entirely upon re-querying.

HALLUCINATIONS AND MULTI-MODEL APPROACHES
The assessment process is further complicated by the tendency of AI models, including those used by Oumi, to “hallucinate” – generate false information. Recognizing this, Google employs multiple AI models for different queries. While Gemini 3.1 Pro offers the best accuracy, its speed and cost constraints necessitate the use of faster Gemini Flash models for immediate search results. This multi-model approach contributes to the variability in accuracy observed.

FACTUAL GROUNDING AND THE ROLE OF BLUE LINKS
Despite the increased accuracy of Gemini models, the fundamental issue persists: AI Overviews relies on external data, primarily the vast knowledge base of the internet. Grounding an AI with this wealth of information makes it more accurate than the model itself. However, the truth often resides within the “blue links” – the sources Google provides – and AI Overviews encourages users to accept its summaries without verifying those sources. This reliance on AI summaries over independent verification raises concerns about the potential for misinformation.

DISCLAIMERS AND THE ACCEPTANCE OF ERROR
The Times’ analysis revealed discrepancies between the AI’s output and the information readily available. This underscores the inherent challenges in assessing AI factuality. Google acknowledges these mistakes, prominently displaying the disclaimer: “AI can make mistakes, so double-check responses.” This constant reminder reflects the current limitations of AI-powered search and the need for critical evaluation of its outputs.

Our editorial team uses AI tools to aggregate and synthesize global reporting. Data is cross-referenced with public records as of April 2026.

AI’s Genius? 🤯 Truth or Hallucination? 🤔

ABR-INSIGHTS Tech Hub Picks

🧠Quick Intel

📝Summary

💡Insights

Related Articles

AI Takeover? ⚠️ Code Hacks & Risks 🚀

🤯 Google AI Edge: iOS Keyboard Coming? ⌨️

AI Security Nightmare 😱: Claude's Shocking Secrets 💥