1 If there's Intelligent Life out There
Abbey Imlay edited this page 1 month ago


Optimizing LLMs to be good at specific tests backfires on Meta, Stability.

-. -. -. -. -. -. -

When you buy through links on our site, we may make an affiliate commission. Here's how it works.

Hugging Face has actually released its 2nd LLM leaderboard to rank the very best language designs it has checked. The brand-new leaderboard seeks to be a more difficult consistent standard for evaluating open large language model (LLM) performance across a variety of tasks. Alibaba's Qwen models appear dominant in the leaderboard's inaugural rankings, taking three spots in the 10.

Pumped to reveal the brand brand-new open LLM leaderboard. We burned 300 H100 to re-run brand-new evaluations like MMLU-pro for all significant open LLMs!Some knowing:- Qwen 72B is the king and Chinese open designs are dominating total- Previous examinations have become too easy for recent ... June 26, 2024

Hugging Face's second leaderboard tests language models throughout 4 jobs: understanding testing, thinking on very long contexts, intricate math abilities, and guideline following. Six standards are utilized to test these qualities, with tests including fixing 1,000-word murder mysteries, explaining PhD-level questions in layman's terms, and the majority of difficult of all: high-school math formulas. A complete breakdown of the criteria utilized can be found on Hugging Face's blog site.

The frontrunner of the brand-new leaderboard is Qwen, Alibaba's LLM, which takes 1st, 3rd, and king-wifi.win 10th place with its handful of variants. Also appearing are Llama3-70B, Meta's LLM, and code.snapstream.com a handful of smaller open-source projects that handled to outshine the pack. Notably missing is any indication of ChatGPT