iBet uBet web content aggregator. Adding the entire web to your favor.
iBet uBet web content aggregator. Adding the entire web to your favor.



Link to original content: http://huggingface.co/lewtun
lewtun (Lewis Tunstall)

Lewis Tunstall PRO

lewtun

AI & ML interests

LLMs, LLMs, LLMs

Articles

Organizations

Hugging Face's profile picture AutoNLP's profile picture Natural Language Processing with Transformers's profile picture BigScience Workshop's profile picture Hugging Face Internal Testing Organization's profile picture Ought's profile picture Hugging Face Course's profile picture Testing Benchmarks on the Hub's profile picture NLP en ES's profile picture GEM benchmark's profile picture SetFit's profile picture GEM benchmark submissions's profile picture Benchmarks Hosting's profile picture ALPS test's profile picture Evaluation datasets's profile picture Deep Learning for Particle Physicists's profile picture fast.ai community's profile picture trl internal testing's profile picture DreamBooth Hackathon's profile picture SomosNLP's profile picture Marsyas  (Music Analysis, Retrieval and Synthesis for Audio Signals)'s profile picture ONNXConfig for all's profile picture HF Course Demos's profile picture How to teach Hugging Face?'s profile picture Jet Universe's profile picture Evaluation on the Hub's profile picture The ML Landscape of Top Taggers's profile picture HuggingFaceM4's profile picture HF Canonical Model Maintainers's profile picture TRL's profile picture BigCode's profile picture Hugging Face H4's profile picture Inference Endpoints's profile picture Hugging Face OSS Metrics's profile picture BigCode Data's profile picture Reading Group's profile picture Hugging Face H4 Community's profile picture Hugging Face TB Research's profile picture Hugging Face Smol Cluster's profile picture Open LLM Leaderboard's profile picture EPFL LLM Team's profile picture H4 Alignment Handbook's profile picture h4-argilla-collab's profile picture ZeroGPU Explorers's profile picture Project-Numina's profile picture ORPO Explorers's profile picture Kato's profile picture Distillation Hugs's profile picture Hugging Face Discord Community's profile picture Data Agents's profile picture nltpt's profile picture IOPO Experiments's profile picture Hugging Face FineVideo's profile picture Reliable Agents's profile picture Hugging Face Science's profile picture HF CMU Collab's profile picture

Posts 2

view post
Post
4823
Introducing Zephyr 141B-A35B 🪁:

HuggingFaceH4/zephyr-orpo-141b-A35b-v0.1

Yesterday, Mistral released their latest base model (via magnet link of course 😅) and the community quickly converted it to transformers format and pushed it to the Hub: mistral-community/Mixtral-8x22B-v0.1

Early evals of this model looked extremely strong, so we teamed up with Argilla and KAIST AI to cook up a Zephyr recipe with a few new alignment techniques that came out recently:

🧑‍🍳 Align the base model with Odds Ratio Preference Optimisation (ORPO). This novel algorithm developed by @JW17 and @nlee-208 and @j6mes and does not require an SFT step to achieve high performance and is thus much more computationally efficient than methods like DPO and PPO.

🦫 Use a brand new dataset of 7k high-quality, multi-turn preferences that has been developed by our friends at Argilla. To create this dataset, they took the excellent Capybara SFT dataset from @LDJnr LDJnr/Capybara and converted it into a preference dataset by augmenting the final turn with responses from new LLMs that were then ranked by GPT-4.

What we find especially neat about this approach is that training on 7k samples only takes ~1.3h on 4 H100 nodes, yet produces a model that is very strong on chat benchmarks like IFEval and BBH.

Kudos to @alvarobartt @JW17 and @nlee-208 for this very nice and fast-paced collab!

For more details on the paper and dataset, checkout our collection: HuggingFaceH4/zephyr-orpo-6617eba2c5c0e2cc3c151524
view post
Post
Can we align code generation models to be good at chat without compromising their base capabilities 🤔?

This was the question the H4 team asked itself when BigCode released StarCoder2 a bit over a week ago. We knew that code models like deepseek-ai/deepseek-coder-6.7b-instruct and m-a-p/OpenCodeInterpreter-DS-33B get impressive scores on code benchmarks like HumanEval, but they tend to score poorly on chat benchmarks like MT Bench and IFEval. We also knew that the Zephyr recipe we applied to Mistral 7B produced a strong chat model, so we wondered -- could be tweaked to produce a strong coding assistant?

It turns out the answer is yes and I'm happy to share StarChat2, a DPO fine-tune of StarCoder2 15B that scores highly on both HumanEval and MT Bench / IFEval 🌟!

The most interesting lesson for me was that you get better models by blending in more code/math data than chat during the SFT step - in terms of tokens, we found a ratio of 3:1 worked best.

Anyway, here's a demo of the model, along with all the code and datasets we used to train it:

* Demo: HuggingFaceH4/starchat2-playground
* Collection: HuggingFaceH4/starchat2-15b-65f068417b330fafad751fce
* Recipe: https://github.com/huggingface/alignment-handbook

Hope it's useful to others!