At Inflection, our mission is to create a personal AI for everyone. In May 2023, we released Pi ( – your personal AI, designed to be empathetic, useful, and safe (Pi press release).

We believe that pre-training is as important as finetuning when it comes to creating high quality, safe, and useful AI experiences. That’s why we set out to develop our own state-of-the-art LLMs. As a vertically integrated AI studio, we do everything in-house for AI training and inference: from data ingestion, to model design, to high-performance infrastructure.

To offer our users superb quality and speed, we needed to develop a model that is both scalable in production, as well as more capable than widely deployed LLMs such as GPT-3.5 and LLaMA. We are excited to share that we have now achieved this goal.

Today, we are announcing “Inflection-1”, our in-house LLM, which powers and will soon be available via our conversational API.

Inflection-1 was trained using thousands of NVIDIA H100 GPUs on a very large dataset. Our team has been able to take advantage of our end-to-end pipeline to develop a number of proprietary technical advances that have enabled these results. This technical memo summarizes our evaluations and compares our performance against other LLMs.

The memo shows that Inflection-1 is the best model in its compute class, outperforming GPT-3.5, LLaMA, Chinchilla, and PaLM-540B on a wide range of benchmarks commonly used for comparing LLMs. We will also be releasing a technical memo detailing one of our models in the same compute class as PaLM-2 and GPT-4.

This is an achievement we are proud of, having started Inflection just over a year ago. We expect dramatic improvements in the coming months as we continue to scale and innovate to deliver on our mission to build the most capable and safe AI products, accessible to millions of users.

Summary Evaluation Results

We evaluated Inflection-1 on a wide range of benchmarks against models in the same compute class, defined as models trained using at most the FLOPs of PaLM-540B. A summary of the six most popular benchmarks follows. Further details are available in our technical memo.

