ChatGPT o3 Proved to Be the Most Useful and Best AI Model for Scientists
Experts at the Allen Institute have introduced the SciArena platform, which enables scientists to evaluate the usefulness of AI models in research. Only researchers with at least two published papers can participate on the platform, and they must first undergo a one-hour briefing.
On SciArena, a scientist submits a question, and the system selects relevant scientific articles from the Semantic Scholar database. These materials are then provided to two randomly selected AI models, which generate detailed responses based on the data. The scientist reviews both answers and selects the better one—only then is the identity of the winning model revealed.
Currently, ChatGPT o3 leads with a score of 1,172 points. It is followed by Claude Opus 4 (1,080 points), Gemini 2.5 Pro (1,063 points), DeepSeek R1-0528 (1,062 points), and ChatGPT o4-mini (1,054 points). ChatGPT o3 has also ranked highest across all four major categories: engineering, healthcare, natural sciences, and humanities & social sciences.
It’s important to note that SciArena is intended primarily for professional scientists and relies solely on verified, credible sources of information.