The AI we trust

Written by Nestor Gomez | Oct 7, 2025 11:09:56 AM

The AI we already trust and why imperfection shouldn't stop adoption

Ten or so years ago, during a trip to Las Vegas, I was completing a payment when my credit card was declined. This was followed by an SMS from my bank alerting of a possible credit card fraud with the details matching the place, the type of transaction, and a call back number. It was an inconvenience, but a quick call later, the issue was resolved, and I was able to complete the transaction.

This was my first experience of this kind.

Banks fraud detection algorithms block billions in theft every year at the cost of a few false alarms that we clear up in seconds. There are probably 99 attempts blocked by your bank that you never heard of, before a false positive goes through and you get irritated.

We tolerate imperfect digital systems because the benefits outweigh the irritations and the inconveniences. We already live with machines that guess and sometimes guess wrong, because the payoff is worth it.

Take weather forecast, for instance. We trust a 24hr forecast, but we take a 72hr forecast with a grain of salt, and, beyond that our confidence drops exponentially.

The travel times predicted by your car’s navigation system - yes a machine learning algorithm! - are often off by a few minutes in daily commute. Perhaps by an hour or two, if your trip involves thousands of miles through multiple cities with complex traffic congestion patterns, and multiple alternative routes. Yet logistics companies trust these systems to plan delivery routes, save fuel, reduce carbon footprint, and give us a better consumer experience with a more predictable time of arrival.

Gmail has long used machine learning to block most spam. In 2019, Google reported that its machine learning algorithms were blocking 99.9 percent of spam, phishing, and malware from reaching your inbox. Back then, 1.5 Billion people and 5 million paying business were using Gmail daily. Google then incorporated TensorFlow to the equation, their open-source machine learning model, that added ~100M extra blocked messages/day.

We accept rare false positives and AI mistakes because the net benefit is huge and we whitelist when needed.

I can go on for another ten pages about the number of embedded and imperfect AI and machine learning algorithms behind dozens of systems and digital products we use and depend on regularly: from your Netflix recommendations to wildfire prediction; from AI enhanced mammograms to epidemic propagation models.

Yet when it comes to generative AI and using solutions such as Gemini, ChatGPT, Co-pilot and Claude, we push back. Why is that?

The real issue is to design around uncertainty

We've had decades to build trust in weather forecasts. And years to trust spam filters through repeated experience, and learning where they help and where they fail.

Generative AI is new, its mistakes are visible and sometimes spectacular (fake citations, confidently wrong answers), and we haven't yet developed the intuition for where it works versus where it breaks down.

The answer lays in designing robust checks and balances that incorporate the human element (i.e.: human in the loop). This means building workflows where human oversight is integral to the process, ensuring that decisions made by AI or automated systems are subject to human review and intervention when necessary.

By adding these checkpoints we can incorporate transparency, manage uncertainty, mitigate risks, and avoid catastrophic failures that could result from unchecked automation.

Ultimately, this approach supports the responsible deployment of AI technologies, maintaining a balance between efficiency and accountability in complex, real-world environments.

Five places where generative AI works well

Coding and computer programming. In a 2023 randomized study, by Microsoft, developers using GitHub Copilot completed a coding task 56% faster than those without it. The tool sets up the basic structure and suggests simple checks. A human engineer still reviews and decides what gets approved.
Drafts and workplace writing. In an experiment with 444 professionals, access to ChatGPT cut task time and improved quality. If you treat the model as a fast drafter, then have a human shape, fact-check, and tone-match the result, the results are superior.
Customer support / case management. In a study with over 5,000 support agents, those using an AI assistant solved about 15% more customer issues per hour. The biggest gains came from new or less-experienced agents, the ones that managers usually find hardest to train and ramp up.
Reliable answers from your own documents. You can make AI tools far more trustworthy by connecting them to your company’s verified files instead of letting them guess from the open internet. This approach is called “retrieval-augmented generation” (RAG) and it’s now considered a best practice. The U.S. National Institute of Standards and Technology (NIST), which sets many global tech and safety standards, recommends this method because it keeps answers traceable, verifiable, and easier to audit.
Technical documentation and production support. Mercedes-Benz deployed ChatGPT Enterprise across production facilities to generate and update work instructions, troubleshoot production issues by querying internal documentation, and create shift handover reports. Technicians now ask questions in plain language and get answers drawn from thousands of pages of technical manuals, maintenance logs, and quality procedures. The AI drafts the response, a supervisor reviews it for accuracy, and the system learns which sources are most reliable. This cuts troubleshooting time by 30-40% while keeping human experts in control of critical decisions.

And yet there are still many areas where generative AI struggles and needs closer supervision.

In law, for example, generative AI can be extremely useful drafting documents, summarizing collections of documents, organizing evidence, comparing contracts to identify anomalies, missing clauses, or deviation from company standards. Courts have accepted supervised learning to sort documents for more than a decade, and AI can save lawyers hours of routine work and help teams find information faster.

But generative AI should not be trusted blindly for creating arguments, citing case law, or filing documents without review. A lawyer used ChatGPT to write a court brief and ended up presenting fake cases as precedents that looked real. Generative AI can help you prepare, but it cannot stand in for professional legal judgment.

In Finance, AI is well suited for explaining financial reports, preparing summaries, or drafting client communications. It can also very effectively help analysts test scenarios or spot patterns in historical data. Where it falls short is in making real financial decisions like approving loans, valuing assets, or interpreting regulations without human validation. Those depend on live data, context, and ethical considerations that current AI models can’t reliably handle.

Yet.

Conclusion

Imperfection should not be a logical reason to reject generative AI.
We've trusted imperfect AI systems for decades because they solve real problems at scale. But those AI systems provide explicability and support their inaccuracies with quantitative metrics.
The key to remove the fear of the uncertainty and build trust in generative AI is to wrap it in workflows that absorb uncertainty and that contain guardrails, checks and balances to prevent catastrophic side effects.
Organizations winning with AI aren't waiting for perfection, they are building these smart guardrails, training their people, and learning by doing.

Is yours winning?

View full post