Cappy: Outperforming and boosting large multi-task language models with a small scorer
Back to Home
news

Cappy: Outperforming and boosting large multi-task language models with a small scorer

February 27, 20265 views2 min read
In a significant development for the field of artificial intelligence, researchers have introduced 'Cappy,' a novel technique designed to enhance the performance and efficiency of multi-task language models (LLMs). This innovative approach leverages a scoring-based pre-training strategy that differentiates between high-quality and low-quality responses, thereby integrating contrastive information that sets it apart from traditional methods. Cappy, with just 360 million parameters, has demonstrated impressive results, outperforming larger models like OPT-175B and OPT-IML-30B, and matching the accuracy of the best existing multi-task LLMs such as T0-11B and OPT-IML-175B. These findings highlight the parameter efficiency of Cappy, which is credited to its unique pre-training methodology. The technique was tested across eleven held-out language understanding classification tasks from PromptSource, where it showed superior performance. Furthermore, when applied to complex tasks from BIG-Bench—a set of manually curated tasks considered beyond the capability of many LLMs—Cappy significantly enhanced the performance of FLAN-T5 models. The enhancement was so substantial that it consistently outperformed the most effective baseline achieved through sample selection using self-scoring of the LLM itself. According to the researchers, Cappy's approach not only improves the accuracy of LLMs but also does so in a more parameter-efficient manner. The model's potential extends beyond single LLMs, with researchers suggesting that it could be used in other creative ways in the future. Cappy represents a step forward in making LLMs more efficient and effective, offering a promising path for future AI advancements. The research was supported by valuable feedback from experts in the field, including Bowen Tan, Jindong Chen, Lei Meng, Abhanshu Sharma, and Ewa Dominowska, as well as suggestions from Eric Xing and Zhiting Hu.

Related Articles