DApp Store | Web3 Hub tapahtumille ja peleille

Trendaavat aiheet

It's sometimes hard to grasp the significance of the reasoning and logic updates that are starting to emerge in powerful models, like GPT-5. Here's a *very simple* example of how powerful these models are getting. I took a recent NVIDIA earnings call transcript document that came in at 23 pages long and had 7,800 words. I took part of the sentence "and gross margin will improve and return to the mid-70s" and modified "mid-70s" to "mid-60s". For a remotely tuned-in financial analyst, this would look out of place, because the margins wouldn't "improve and return" to a lower number than the one described as a higher number elsewhere. But probably 95% of people reading this press release would not have spotted the modification because it easily fits right into the other 7,800 words that are mentioned. With Box AI, testing a variety of AI models, I then asked a series of models "Are there any logical errors in this document? Please provide a one sentence answer." GPT-4.1, GPT4.1 mini, and a handful of other models that were state of the art just ~6 months ago generally came back and returned that there were no logical errors in the document. For these models, the document probably seems coherent and follows what it would expect an earnings transcript to look like, so nothing really stands out for them on what to pay attention to - sort of a reverse hallucination. GPT-5, on the other hand, quickly discovered the issue and responded with: "Yes — the document contains an internal inconsistency about gross-margin guidance, at one point saying margins will “return to the mid-60s” and later saying they will be “in the mid-70s” later this year." Amazingly, this happened with GPT-5, GPT-5 mini, and, remarkably, *even* GPT-5 nano. Bear in mind, the output tokens of GPT-5 nano are priced at 1/20th of GPT-4.1's tokens. So, more intelligent (at this use-case) for 5% the cost. Now, while doing error reviews on business documents isn't often a daily occurrence for every knowledge worker, these types of issues show up in a variety of ways when dealing with large unstructured data sets, like financial documents, contracts, transcripts, reports, and more. It can be finding a fact, figuring out a logical fallacy, running a hypothetical, or requiring sophisticated deductive reasoning. And the ability to apply more logic and reasoning to enterprise data becomes especially critical when deploying AI Agents in the enterprise. So, it's amazing to see the advancements in this space right now, and this is going to open up a ton more use-cases for businesses.

191,8K

Johtavat

Rankkaus

Suosikit

Ketjussa trendaava

Trendaa X:ssä

Viimeisimmät suosituimmat rahoitukset

Merkittävin