LLM Rankings

Compare models by tokens processed

GPT-4o ("o" for "omni") is OpenAI's latest AI model, supporting both text and image inputs with text outputs. It maintains the intelligence level of [GPT-4 Turbo](/models/openai/gpt-4-turbo) while being twice as fast and 50% more cost-effective. GPT-4o also offers improved performance in processing non-English languages and enhanced visual capabilities. For benchmarking against other models, it was briefly called ["im-also-a-good-gpt2-chatbot"](https://twitter.com/LiamFedus/status/1790064963966370209) #multimodal • 128000 context

1.49B tokens

new

Google: Gemini Flash 1.5 (preview)

Gemini 1.5 Flash is a foundation model that performs well at a variety of multimodal tasks such as visual understanding, classification, summarization, and creating content from image, audio and video. It's adept at processing visual and text inputs such as photographs, documents, infographics, and screenshots. Gemini 1.5 Flash is designed for high-volume, high-frequency tasks where cost and latency matter. On most common tasks, Flash achieves comparable quality to other Gemini Pro models at a significantly reduced cost. Flash is well-suited for applications like chat assistants and on-demand content generation where speed and scale matter. #multimodal • 2800000 context

196M tokens

new

Google: PaLM 2 Chat

PaLM 2 is a language model by Google with improved multilingual, reasoning and coding capabilities. • 25804 context

21.9M tokens

1283%

Google: PaLM 2 Code Chat

PaLM 2 fine-tuned for chatbot conversations that help with code-related questions. • 20070 context

591K tokens

590%

OpenAI: Shap-e

A generative model developed by OpenAI that generates 3D objects conditioned on text, capable of directly generating parameters of implicit functions that can be rendered as textured meshes and neural radiance fields. • 2048 context

2K tokens

391%

Qwen 1.5 7B Chat

Qwen1.5 7B is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. In comparison with the previous released Qwen, the improvements include: - Significant performance improvement in human preference for chat models - Multilingual support of both base and chat models - Stable support of 32K context length for models of all sizes For more details, see this [blog post](https://qwenlm.github.io/blog/qwen1.5/) and [GitHub repo](https://github.com/QwenLM/Qwen1.5). Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE). • 32768 context

5.16M tokens

387%

Meta: CodeLlama 70B Instruct

Code Llama is a family of large language models for code. This one is based on [Llama 2 70B](/models/meta-llama/llama-2-70b-chat) and provides zero-shot instruction-following ability for programming tasks. • 2048 context

2.86M tokens

335%

Anthropic: Claude Instant (older v1)

Anthropic's model for low-latency, high throughput text generation. Supports hundreds of pages of text. • 100000 context

2.62M tokens

310%

Meta: CodeLlama 34B Instruct

Code Llama is built upon Llama 2 and excels at filling in code, handling extensive input contexts, and folling programming instructions without prior training for various programming tasks. • 8192 context

1.34M tokens

289%

10.

FireLLaVA 13B

A blazing fast vision-language model, FireLLaVA quickly understands both text and images. It achieves impressive chat skills in tests, and was designed to mimic multimodal GPT-4. The first commercially permissive open source LLaVA model, trained entirely on open source LLM generated instruction following data. • 4096 context

8.27M tokens

283%

11.

LLaVA v1.6 34B

LLaVA Yi 34B is an open-source model trained by fine-tuning LLM on multimodal instruction-following data. It is an auto-regressive language model, based on the transformer architecture. Base LLM: [NousResearch/Nous-Hermes-2-Yi-34B](/models/nousresearch/nous-hermes-yi-34b) It was trained in December 2023. • 4096 context

2.53M tokens

264%

12.

Hugging Face: Zephyr 7B

Zephyr is a series of language models that are trained to act as helpful assistants. Zephyr-7B-β is the second model in the series, and is a fine-tuned version of [mistralai/Mistral-7B-v0.1](/models/mistralai/mistral-7b-instruct) that was trained on a mix of publicly available, synthetic datasets using Direct Preference Optimization (DPO). • 4096 context

7.53M tokens

242%

13.

Google: PaLM 2 Code Chat 32k

PaLM 2 fine-tuned for chatbot conversations that help with code-related questions. • 91750 context

2.13M tokens

179%

14.

Midnight Rose 70B

A merge with a complex family tree, this model was crafted for roleplaying and storytelling. Midnight Rose is a successor to Rogue Rose and Aurora Nights and improves upon them both. It wants to produce lengthy output by default and is the best creative writing merge produced so far by sophosympatheia. Descending from earlier versions of Midnight Rose and [Wizard Tulu Dolphin 70B](https://huggingface.co/sophosympatheia/Wizard-Tulu-Dolphin-70B-v1.0), it inherits the best qualities of each. • 4096 context

14.9M tokens

135%

15.

OLMo 7B Instruct

OLMo 7B Instruct by the Allen Institute for AI is a model finetuned for question answering. It demonstrates **notable performance** across multiple benchmarks including TruthfulQA and ToxiGen. **Open Source**: The model, its code, checkpoints, logs are released under the [Apache 2.0 license](https://choosealicense.com/licenses/apache-2.0). - [Core repo (training, inference, fine-tuning etc.)](https://github.com/allenai/OLMo) - [Evaluation code](https://github.com/allenai/OLMo-Eval) - [Further fine-tuning code](https://github.com/allenai/open-instruct) - [Paper](https://arxiv.org/abs/2402.00838) - [Technical blog post](https://blog.allenai.org/olmo-open-language-model-87ccfc95f580) - [W&B Logs](https://wandb.ai/ai2-llm/OLMo-7B/reports/OLMo-7B--Vmlldzo2NzQyMzk5) • 2048 context

522K tokens

134%

16.

Qwen 1.5 32B Chat

Qwen1.5 32B is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. In comparison with the previous released Qwen, the improvements include: - Significant performance improvement in human preference for chat models - Multilingual support of both base and chat models - Stable support of 32K context length for models of all sizes For more details, see this [blog post](https://qwenlm.github.io/blog/qwen1.5/) and [GitHub repo](https://github.com/QwenLM/Qwen1.5). Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE). • 32768 context

5.57M tokens

126%

17.

Nous: Hermes 2 Mixtral 8x7B SFT

Nous Hermes 2 Mixtral 8x7B SFT is the supervised finetune only version of [the Nous Research model](/models/nousresearch/nous-hermes-2-mixtral-8x7b-dpo) trained over the [Mixtral 8x7B MoE LLM](/models/mistralai/mixtral-8x7b). The model was trained on over 1,000,000 entries of primarily GPT-4 generated data, as well as other high quality data from open datasets across the AI landscape, achieving state of the art performance on a variety of tasks. #moe • 32768 context

9.7M tokens

112%

18.

OpenAI: GPT-3.5 Turbo 16k (older v1106)

The latest GPT-3.5 Turbo model with improved instruction following, JSON mode, reproducible outputs, parallel function calling, and more. Training data: up to Sep 2021. • 16385 context

30.9M tokens

86%

19.

Qwen 1.5 14B Chat

Qwen1.5 14B is the beta version of Qwen2, a transformer-based decoder-only language model pretrained on a large amount of data. In comparison with the previous released Qwen, the improvements include: - Significant performance improvement in human preference for chat models - Multilingual support of both base and chat models - Stable support of 32K context length for models of all sizes For more details, see this [blog post](https://qwenlm.github.io/blog/qwen1.5/) and [GitHub repo](https://github.com/QwenLM/Qwen1.5). Usage of this model is subject to [Tongyi Qianwen LICENSE AGREEMENT](https://huggingface.co/Qwen/Qwen1.5-110B-Chat/blob/main/LICENSE). • 32768 context

30.7M tokens

81%

20.

Google: PaLM 2 Chat 32k

PaLM 2 is a language model by Google with improved multilingual, reasoning and coding capabilities. • 91750 context

18.5M tokens

71%

LLM Rankings

Compare models by tokens processed

Weekly active models

LLM Rankings

Compare models by tokens processed

Weekly active models