- cross-posted to:
- programming
- cross-posted to:
- programming
Yesterday Mistral AI released a new language model called Mistral 7B. @[email protected] already posted the Sliding attention part here in LocalLLaMA, yesterday. But I think the model and the company behind that are even more noteworthy and the release of the model is worth it’s own post.
Mistral 7B is not based on Llama. And they claim it outperforms Llama2 13B on all benchmarks (at it’s size of 7B). It has additional coding abilities and a 8k sequence length. And it’s released under the Apache 2.0 license. So truly an ‘open’ model, usable without restrictions. [Edit: Unfortunately I couldn’t find the dataset or a paper. They call it ‘open-weight’. So my conclusion regarding the open-ness might be a bit premature. We’ll see.]
(It uses Grouped-query attention and Sliding Window Attention.)
Also worth to note: Mistral AI (the company) is based in Paris. They are one of the few big european AI startups and collected $113 million funding in June.
- Details are on Mistral AI’s Announcement
- techcrunch news article including information about the company
- They released an base/foundation model and an instruction-tuned one on HuggingFace
- And llama.cpp is already compatible and GGUF versions out there.
I’ve tried it and it indeed looks promising. It certainly has features that distinguishes it from Llama. And I like the competition. Our world is currently completely dominated by Meta. And if it performs exceptionally well at its size, I hope people pick up on it and fine-tune it for all kinds of specific tasks. (The lack of a dataset and detail regarding the training could be a downside, though. These were not included in this initial release of the model.)
EDIT 2023-10-12: Paper released at: https://arxiv.org/abs/2310.06825 (But I’d say no new information in it, they mostly copied their announcement)
As of now, it is clear they don’t want to publish any details about the training.
To be honest, the same could be said of LLaMa/Facebook (which doesn’t particularly claim to be “open”, but I don’t see many people criticising Facebook for doing a potential future marketing “bait and switch” with their LLMs).
They’re only giving these away for free because they aren’t commercially viable. If anyone actually develops a leading-edge LLM, I doubt they will be giving it away for free regardless of their prior “ethics”.
And the chance of a leading-edge LLM being developed by someone other than a company with prior plans to market it commercially is quite small, as they wouldn’t attract the same funding to cover the development costs.
I think the critizism on Meta (LLaMa licensing) has just dialed down a bit. In the days of LLaMA 1 I read quite a few “f*** Meta” and people had zero respect for their licensing. They even spent quite some money to train an open version with the RedPajama dataset and wanted to break free from Meta.
Meta also uses the words “open”, “open science” and even “open source” for their models. But I think they mean yet another thing with that. And in reality they have stopped providing the exact sources starting with their paper on Llama2(?!)
I still hate that nowadays everyone invents their own license. I mean once your dependencies all have distinct and incompatible licensing, you can’t incorporate anything into your project any more. The free software world works by incremental improvements and combining stuff. This is very difficult without proper free licenses. And furthermore, no one likes their “Acceptable Use Policy”.
I didn’t mean “bait and switch”. I think I didn’t find the right words. I mean we won’t ever build real scientific advancements upon this, because that process is a trade secret. The big companies and AI startups will do the science behind closed doors and decide for us in which direction AI develops.
And the “commercially viable” is exactly the point. Now, they still can affort to give things away for free. A Llama2 is still far away from being a viable product in itself. But once smartphones/computers/edge-devices have 12GB of fast(er) memory and AI acellerators, AI gets more intelligent, hallucinates less and gets adapters for specific tasks and multimodal capabilities, you have a viable product you can tie into your ecosystem and sell millions of times. And that’s where I expect their gifts to stop. I will still have my chatbot / AI companion. But not the smart assistant that organizes my everyday-life, translates between arbitraty languages on the fly and helps me with whatever I take a picture of, or record with my phone.
I think that’s my main point. And for me it has already started a long time ago. I own a de-googled smartphone. I struggle with simple things like having a TTS that gives me directions while driving (in my native language). Because TTS is part of the proprietary Google services. The camera is significantly worse without all the enhancement that is clever trickery and machine learning. Again, part of the proprietary parts and a trade secret. I expect other parts of machine learning to become worse, too.
deleted by creator
I just found that interview with Zuckerberg from a few days ago:
https://youtu.be/9aCg7jH4S1w?feature=shared&t=1238
Starting at 23:00 Zuckerberg talks about “open-sourcing” Llama.