AI Corner - March

March 24th, 2026

Tracking the latest in AI – from major industry developments to the research and perspectives shaping our thinking at Sagard.

For math enthusiasts, we have some good news! Again. GPT-5.4 Pro recently achieved 38% accuracy on FrontierMath Tier-4, one of the hardest AI math benchmarks, which contains 50 research-level problems that can take mathematicians days or weeks to solve. This is a major leap from about 2% a year ago, when models like o3 represented the state of the art. Even strong open-source models such as Kimi K2.5 reach only around 4.2%, showing a large performance gap. At 38%, GPT-5.4 Pro effectively solved about 19 of the 50 extremely difficult problems. FrontierMath is designed to test deep multi-step reasoning across advanced mathematical fields, not simple pattern recognition. While the model still fails on over 60% of the problems, the jump from 2% to nearly 40% in roughly a year highlights how rapidly AI’s ability to tackle complex mathematics is improving.
Why should LLMs have all the fun? Inception Labs is an under-the-radar AI startup founded by professors from Stanford, Cornell, and UCLA, and they recently released Mercury 2, one of the first diffusion-based models for reasoning, language, and code. Unlike traditional autoregressive LLMs, Mercury 2 generates outputs through diffusion sampling, enabling significantly faster generation. The company claims the model is up to 10x faster than comparable models while being among the cheapest for its performance tier. Although it does not yet match frontier systems like Claude 4.6 or GPT-5.x-class models, it performs strongly relative to its cost and latency. This effectively reshapes the Pareto frontier for price-to-quality and latency-to-quality tradeoffs in AI models. If diffusion-based models like Mercury eventually reach frontier performance, they could dramatically disrupt the economics of large language models by making high-quality AI far cheaper and faster to run.

Exhausted man defeats AI model in world coding championship! At the 2025 AtCoder World Tour Finals in Tokyo, Polish programmer Przemysław “Psyho” Dębiak pulled off a dramatic win over an advanced OpenAI AI model after a brutal 10-hour coding marathon solving a complex optimization challenge. Running on almost no sleep after three days of competitions, Dębiak pushed himself to the limit and ultimately beat the AI, which still finished an impressive second place in what may be the first major programming championship where AI competed directly against elite human coders. After the victory he posted, “Humanity has prevailed (for now!)”, admitting he was “barely alive” from exhaustion. While the human win was symbolic, the close result highlights how quickly AI is improving at complex coding and strategic reasoning.

What is so surreal about the SaaSpocalypse? A few weeks ago, Anthropic published 11 open-source plugins for Claude Cowork on GitHub [LINK], primarily intended for knowledge workers. Previously, companies like Anthropic sold AI models via APIs, and SaaS companies built products on top of them. But the Claude plugins did something different; they automated complete workflows such as legal contract review, compliance checks, financial analysis, sales preparation, etc. These are exactly the core features many SaaS products sell! As a result, Goldman Sachs software basket had its worst day since 2025 April’s tariff selloff. Bloomberg put total damage at $285 billion across software and financial services. But here’s the thing; these plugins are ….. just prompts [Example]. Configurations. System instructions telling Claude how to approach legal documents. That’s it. No proprietary model fine-tuned on case law. No special legal reasoning engine. It’s Claude being Claude, but with a structured workflow wrapper. And this set of markdown files was reported to have caused a significant impact in the legal tech sector.