[Submitted on 30 Mar 2023]
Abstract: Solving complicated AI tasks with different domains and modalities is a key
step toward artificial general intelligence (AGI). While there are abundant AI
models available for different domains and modalities, they cannot handle
complicated AI tasks. Considering large language models (LLMs) have exhibited
exceptional ability in language understanding, generation, interaction, and
reasoning, we advocate that LLMs could act as a controller to manage existing
AI models to solve complicated AI tasks and language could be a generic
interface to empower this. Based on this philosophy, we present HuggingGPT, a
system that leverages LLMs (e.g., ChatGPT) to connect various AI models in
machine learning communities (e.g., HuggingFace) to solve AI tasks.
Specifically, we use ChatGPT to conduct task planning when receiving a user
request, select models according to their function descriptions available in
HuggingFace, execute each subtask with the selected AI model, and summarize the
response according to the execution results. By leveraging the strong language
capability of ChatGPT and abundant AI models in HuggingFace, HuggingGPT is able
to cover numerous sophisticated AI tasks in different modalities and domains
and achieve impressive results in language, vision, speech, and other
challenging tasks, which paves a new way towards AGI.
Submission history
From: Yongliang Shen [view email]
[v1]
Thu, 30 Mar 2023 17:48:28 UTC (2,931 KB)