desampan

1 If there's Intelligent Life out There

Optimizing LLMs to be proficient at particular tests backfires on Meta, Stability.

-. -. -. -. -. -. -

When you purchase through links on our site, prawattasao.awardspace.info we might make an affiliate commission. Here's how it works.

Hugging Face has actually released its second LLM leaderboard to rank the best language designs it has tested. The new leaderboard looks for to be a more difficult consistent standard for evaluating open big language model (LLM) efficiency throughout a variety of jobs. Alibaba's Qwen designs appear dominant in the leaderboard's inaugural rankings, taking three spots in the leading 10.

Pumped to announce the brand name new open LLM leaderboard. We burned 300 H100 to re-run brand-new assessments like MMLU-pro for all significant open LLMs!Some learning:- Qwen 72B is the king and Chinese open models are dominating total- Previous assessments have actually become too simple for current ... June 26, 2024

Hugging Face's second leaderboard tests language designs throughout 4 jobs: knowledge testing, thinking on very long contexts, complex math capabilities, and direction following. Six benchmarks are utilized to check these qualities, with tests including fixing 1,000-word murder mysteries, explaining PhD-level questions in layman's terms, and the majority of complicated of all: high-school math formulas. A complete breakdown of the criteria utilized can be found on Hugging Face's blog.

The frontrunner of the new leaderboard is Qwen, Alibaba's LLM, which takes first, 3rd, and 10th location with its handful of variants. Also appearing are Llama3-70B, Meta's LLM, nerdgaming.science and a handful of smaller open-source tasks that managed to outperform the pack. Notably missing is any indication of ChatGPT