LocalLLaMA@sh.itjust.worksEnglish · 2 years ago

Mistral 7B model

mistral.ai

cross-posted to:
programming

Mistral 7B model

mistral.ai

rufus@discuss.tchncs.de to

LocalLLaMA@sh.itjust.worksEnglish · 2 years ago

cross-posted to:
programming

Mistral 7B

mistral.ai

The best 7B model to date, Apache 2.0

Yesterday Mistral AI released a new language model called Mistral 7B. @[email protected] already posted the Sliding attention part here in LocalLLaMA, yesterday. But I think the model and the company behind that are even more noteworthy and the release of the model is worth it’s own post.

Mistral 7B is not based on Llama. And they claim it outperforms Llama2 13B on all benchmarks (at it’s size of 7B). It has additional coding abilities and a 8k sequence length. And it’s released under the Apache 2.0 license. ~~So truly an ‘open’ model, usable without restrictions.~~ [Edit: Unfortunately I couldn’t find the dataset or a paper. They call it ‘open-weight’. So my conclusion regarding the open-ness might be a bit premature. We’ll see.]

(It uses Grouped-query attention and Sliding Window Attention.)

Also worth to note: Mistral AI (the company) is based in Paris. They are one of the few big european AI startups and collected $113 million funding in June.

Details are on Mistral AI’s Announcement
techcrunch news article including information about the company
They released an base/foundation model and an instruction-tuned one on HuggingFace
And llama.cpp is already compatible and GGUF versions out there.

I’ve tried it and it indeed looks promising. It certainly has features that distinguishes it from Llama. And I like the competition. Our world is currently completely dominated by Meta. And if it performs exceptionally well at its size, I hope people pick up on it and fine-tune it for all kinds of specific tasks. (The lack of a dataset and detail regarding the training could be a downside, though. These were not included in this initial release of the model.)

EDIT 2023-10-12: Paper released at: https://arxiv.org/abs/2310.06825 (But I’d say no new information in it, they mostly copied their announcement)

As of now, it is clear they don’t want to publish any details about the training.

Chat

rufus@discuss.tchncs.deOP
link
fedilink
English
arrow-up
1·
edit-2
2 years ago
I just found that interview with Zuckerberg from a few days ago:

https://youtu.be/9aCg7jH4S1w?feature=shared&t=1238

Starting at 23:00 Zuckerberg talks about “open-sourcing” Llama.

LocalLLaMA@sh.itjust.works

localllama@sh.itjust.works

Create a post

You are not logged in. However you can subscribe from another Fediverse account, for example Lemmy or Mastodon. To do this, paste the following into the search field of your instance: [email protected]

Welcome to LocalLLaMA! Here we discuss running and developing machine learning models at home. Lets explore cutting edge open source neural network technology together.

Get support from the community! Ask questions, share prompts, discuss benchmarks, get hyped at the latest and greatest model releases! Enjoy talking about our awesome hobby.

As ambassadors of the self-hosting machine learning community, we strive to support each other and share our enthusiasm in a positive constructive way.

Visibility: Public

This community can be federated to other instances and be posted/commented in by their users.

96 users / day
201 users / week
505 users / month
944 users / 6 months
59 local subscribers
2.91K subscribers
310 Posts
1.36K Comments
Modlog