1 If there's Intelligent Life out There
Adela Elmer edited this page 4 months ago


Optimizing LLMs to be great at particular tests backfires on Meta, Stability.

-. -. -. -. -. -. -

When you buy through links on our site, we might make an affiliate commission. Here's how it works.

Hugging Face has released its 2nd LLM leaderboard to rank the finest language designs it has evaluated. The brand-new leaderboard seeks to be a more tough uniform requirement for testing open big language model (LLM) performance across a range of tasks. Alibaba's Qwen designs appear dominant in the leaderboard's inaugural rankings, taking three spots in the leading 10.

Pumped to reveal the brand brand-new open LLM leaderboard. We burned 300 H100 to re-run brand-new examinations like MMLU-pro for all major open LLMs!Some learning:- Qwen 72B is the king and Chinese open designs are dominating general- Previous assessments have actually become too easy for current ... June 26, 2024

Hugging Face's second leaderboard tests language designs across 4 jobs: understanding screening, thinking on incredibly long contexts, complicated math capabilities, and guideline following. Six criteria are used to check these qualities, with tests consisting of resolving 1,000-word murder mysteries, explaining PhD-level questions in layperson's terms, and many complicated of all: high-school math formulas. A full breakdown of the standards utilized can be discovered on Hugging Face's blog.

The frontrunner of the brand-new leaderboard is Qwen, Alibaba's LLM, which takes first, 3rd, and 10th place with its handful of variations. Also showing up are Llama3-70B, elclasificadomx.com Meta's LLM, and a handful of smaller sized open-source tasks that handled to outshine the pack. Notably absent is any sign of ChatGPT