Claude 4.5 crushes CORE-Bench, Google fuses RNNs+Transformers, OpenAI forced to share logs, Codex improves bug-catching, Anthropic intros AI interviewer, and buys Neptune.
Excellent analysis! The breakdown of Claude Opus hitting 95% on CORE-Bench, recreating results from scratch, is truely impressive and insightful.
Excellent analysis! The breakdown of Claude Opus hitting 95% on CORE-Bench, recreating results from scratch, is truely impressive and insightful.