This was my first experience of this kind.
Banks fraud detection algorithms block billions in theft every year at the cost of a few false alarms that we clear up in seconds. There are probably 99 attempts blocked by your bank that you never heard of, before a false positive goes through and you get irritated.
We tolerate imperfect digital systems because the benefits outweigh the irritations and the inconveniences. We already live with machines that guess and sometimes guess wrong, because the payoff is worth it.
Take weather forecast, for instance. We trust a 24hr forecast, but we take a 72hr forecast with a grain of salt, and, beyond that our confidence drops exponentially.
The travel times predicted by your car’s navigation system - yes a machine learning algorithm! - are often off by a few minutes in daily commute. Perhaps by an hour or two, if your trip involves thousands of miles through multiple cities with complex traffic congestion patterns, and multiple alternative routes. Yet logistics companies trust these systems to plan delivery routes, save fuel, reduce carbon footprint, and give us a better consumer experience with a more predictable time of arrival.
Gmail has long used machine learning to block most spam. In 2019, Google reported that its machine learning algorithms were blocking 99.9 percent of spam, phishing, and malware from reaching your inbox. Back then, 1.5 Billion people and 5 million paying business were using Gmail daily. Google then incorporated TensorFlow to the equation, their open-source machine learning model, that added ~100M extra blocked messages/day.
We accept rare false positives and AI mistakes because the net benefit is huge and we whitelist when needed.
I can go on for another ten pages about the number of embedded and imperfect AI and machine learning algorithms behind dozens of systems and digital products we use and depend on regularly: from your Netflix recommendations to wildfire prediction; from AI enhanced mammograms to epidemic propagation models.
Yet when it comes to generative AI and using solutions such as Gemini, ChatGPT, Co-pilot and Claude, we push back. Why is that?
The real issue is to design around uncertainty
We've had decades to build trust in weather forecasts. And years to trust spam filters through repeated experience, and learning where they help and where they fail.
Generative AI is new, its mistakes are visible and sometimes spectacular (fake citations, confidently wrong answers), and we haven't yet developed the intuition for where it works versus where it breaks down.
The answer lays in designing robust checks and balances that incorporate the human element (i.e.: human in the loop). This means building workflows where human oversight is integral to the process, ensuring that decisions made by AI or automated systems are subject to human review and intervention when necessary.
By adding these checkpoints we can incorporate transparency, manage uncertainty, mitigate risks, and avoid catastrophic failures that could result from unchecked automation.
Ultimately, this approach supports the responsible deployment of AI technologies, maintaining a balance between efficiency and accountability in complex, real-world environments.
Five places where generative AI works well
And yet there are still many areas where generative AI struggles and needs closer supervision.
In law, for example, generative AI can be extremely useful drafting documents, summarizing collections of documents, organizing evidence, comparing contracts to identify anomalies, missing clauses, or deviation from company standards. Courts have accepted supervised learning to sort documents for more than a decade, and AI can save lawyers hours of routine work and help teams find information faster.
But generative AI should not be trusted blindly for creating arguments, citing case law, or filing documents without review. A lawyer used ChatGPT to write a court brief and ended up presenting fake cases as precedents that looked real. Generative AI can help you prepare, but it cannot stand in for professional legal judgment.
In Finance, AI is well suited for explaining financial reports, preparing summaries, or drafting client communications. It can also very effectively help analysts test scenarios or spot patterns in historical data. Where it falls short is in making real financial decisions like approving loans, valuing assets, or interpreting regulations without human validation. Those depend on live data, context, and ethical considerations that current AI models can’t reliably handle.
Yet.
Conclusion