May 30, 2023
Flan-T5: sweet results with the smaller, more efficient LLM
Written By:
Harry Mellor
May 30, 2023
Written By:
Harry Mellor
We're Hiring
Join us and build the next generation AI stack - including silicon, hardware and software - the worldwide standard for AI compute
Join our teamSee also Flan-T5 XXL Fine Tuning (added July 2023)
In the world of AI language models, there's no one-size-fits-all solution. Commercial users are increasingly coming to the realisation that Ultra-Large Language Models, while broadly capable, are AI overkill for many applications.
The penny (or dollar) usually drops when they receive an outsize bill from the owners of their preferred proprietary model, or cloud compute provider. That's assuming they can even secure GPU availability for the A100 and H100 system needed to run advanced models.
Instead, many are looking to more efficient, open-source alternatives to the likes of GPT-3/4.
In December 2022 Google published in which they perform extensive fine-tuning for a broad collection of tasks across a variety of models (PaLM, T5, U-PaLM).
Part of this publication was the release of Flan-T5 checkpoints, “which achieve strong few-shot performance†with relatively modest parameter counts “even compared to much larger models†like the largest members of the GPT family.
In this blog, we will show how you can use Flan-T5 running on a Paperspace Gradient Notebook, powered by 91ÊÓƵAPP IPUs. Flan-T5-Large can be run on an IPU-POD4, using Paperspace's six hour free trial, while Flan-T5-XL can be run on a paid IPU-POD16.
We will look at a range of common NLP workloads and consider the following:
Let’s start by looking at some performance numbers from the Google-authored paper:
These results are astounding. Notice that:
This establishes Flan-T5 as an entirely different beast to the T5 that you may know. Now ±ô±ð³Ù’s see how Flan-T5-Large and Flan-T5-XL compare to other models in the MMLU benchmark:
Noting that Flan-T5 had MMLU held out from training, this table shows that:
Since the Flan-T5 checkpoints are available on Hugging Face, you can use 91ÊÓƵAPP’s Hugging Face integration (🤗 ) to easily run Flan-T5 with a standard inference pipeline.
If you already have an existing Hugging Face-based application that you’d like to try on IPUs, then it is as simple as:
- from transformers import pipeline
+ from optimum.graphcore import pipeline
- text_generator = pipeline("text2text-generation", model="google/flan-t5-large")
+ text_generator = pipeline("text2text-generation", model="google/flan-t5-large", ipu_config="91ÊÓƵAPP/t5-large-ipu")
text_generator("Please solve the following equation: x^2 - 9 = 0")
[{'generated_text': '3'}]
Now ±ô±ð³Ù’s define a text generator of our own to use in the rest of this notebook. First, make sure that your Python virtual environment has the latest version of installed:
%pip install "optimum-graphcore>=0.6.1, <0.7.0"
The location of the cache directories can be configured through environment variables or directly in the notebook:
import os
executable_cache_dir=os.getenv("POPLAR_EXECUTABLE_CACHE_DIR", "./exe_cache/")
num_available_ipus=int(os.getenv("NUM_AVAILABLE_IPU", 4))
Next, ±ô±ð³Ù’s import pipeline
from optimum.graphcore
and create our Flan-T5 pipeline for the appropriate number of IPUs:
from optimum.graphcore import pipeline
size = {4: "large", 16: "xl"}
flan_t5 = pipeline(
"text2text-generation",
model=f"google/flan-t5-{size[num_available_ipus]}",
ipu_config=f"91ÊÓƵAPP/t5-{size[num_available_ipus]}-ipu",
max_input_length=896,
ipu_config=ipu_config,
)
flan_t5.model.ipu_config.executable_cache_dir = executable_cache_dir
Now, ±ô±ð³Ù’s ask it some random questions:
questions = [
"Solve the following equation for x: x^2 - 9 = 0",
"At what temperature does nitrogen freeze?",
"In order to reduce symptoms of asthma such as tightness in the chest, wheezing, and difficulty breathing, what do you recommend?",
"Which country is home to the tallest mountain in the world?"
]
for out in flan_t5(questions):
print(out)
Graph compilation: 100%|██████████| 100/100 [05:20<00:00]
Graph compilation: 100%|██████████| 100/100 [02:56<00:00]
{'generated_text': '3'}
{'generated_text': '-32 °C'}
{'generated_text': 'ibuprofen'}
{'generated_text': 'nepal'}
Note that some of these answers may be wrong, information retrieval from the model itself is not the purpose of Flan-T5. However, if you use Flan-T5-XL they are less likely to be wrong (come back to this notebook with an IPU-POD16 to see the difference!)
Flan-T5 has been fine-tuned on thousands of different tasks across hundreds of datasets. So no matter what your task might be, it’s worth seeing if Flan-T5 can meet your requirements. Here we will demonstrate a few of the common ones:
sentiment_analysis = (
"Review: It gets too hot, the battery only can last 4 hours. Sentiment: Negative\n"
"Review: Nice looking phone. Sentiment: Positive\n"
"Review: Sometimes it freezes and you have to close all the open pages and then reopen where you were. Sentiment: Negative\n"
"Review: Wasn't that impressed, went back to my old phone. Sentiment:"
)
flan_t5(sentiment_analysis)[0]["generated_text"]
Negative
The following snippets are adapted from the Wikipedia pages corresponding to each mentioned company.
advanced_ner = """Microsoft Corporation is a company that makes computer software and video games. Bill Gates and Paul Allen founded the company in 1975
[Company]: Microsoft, [Founded]: 1975, [Founders]: Bill Gates, Paul Allen
Amazon.com, Inc., known as Amazon , is an American online business and cloud computing company. It was founded on July 5, 1994 by Jeff Bezos
[Company]: Amazon, [Founded]: 1994, [Founders]: Jeff Bezos
Apple Inc. is a multinational company that makes personal computers, mobile devices, and software. Apple was started in 1976 by Steve Jobs and Steve Wozniak."""
flan_t5(advanced_ner)[0]["generated_text"]
[Company]: Apple, [Founded]: 1976, [Founders]: Steve Jobs, Steve Wozniak
The following snippet came from the dataset.
context = 'Super Bowl 50 was an American football game to determine the champion of the National Football League (NFL) for the 2015 season. The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24-10 to earn their third Super Bowl title. The game was played on February 7, 2016, at Levi\'s Stadium in the San Francisco Bay Area at Santa Clara, California. As this was the 50th Super Bowl, the league emphasized the "golden anniversary" with various gold-themed initiatives, as well as temporarily suspending the tradition of naming each Super Bowl game with Roman numerals (under which the game would have been known as "Super Bowl L"), so that the logo could prominently feature the Arabic numerals 50.'
question = "Which NFL team represented the AFC at Super Bowl 50?"
# The correct answer is Denver Broncos
flan_t5(f"{context} {question}")[0]['generated_text']
Denver Broncos
intent_classification = """[Text]: I really need to get a gym membership, I'm exhausted.
[Intent]: get gym membership
[Text]: What do I need to make a carbonara?
[Intent]: cook carbonara
[Text]: I need all these documents sorted and filed by Monday.
[Intent]:"""
flan_t5([intent_classification])[0]["generated_text"]
file documents
The following snippets came from the dataset.
summarization="""
Document: Firstsource Solutions said new staff will be based at its Cardiff Bay site which already employs about 800 people.
The 300 new jobs include sales and customer service roles working in both inbound and outbound departments.
The company's sales vice president Kathryn Chivers said: "Firstsource Solutions is delighted to be able to continue to bring new employment to Cardiff."
Summary: Hundreds of new jobs have been announced for a Cardiff call centre.
Document: The visitors raced into a three-goal first-half lead at Hampden.
Weatherson opened the scoring with an unstoppable 15th-minute free-kick, and he made it 2-0 in the 27th minute.
Matt Flynn made it 3-0 six minutes later with a fine finish.
Queen's pulled a consolation goal back in stoppage time through John Carter.
Summary: Peter Weatherson netted a brace as Annan recorded only their second win in eight matches.
Document: Officers searched properties in the Waterfront Park and Colonsay View areas of the city on Wednesday.
Detectives said three firearms, ammunition and a five-figure sum of money were recovered.
A 26-year-old man who was arrested and charged appeared at Edinburgh Sheriff Court on Thursday.
Summary:
"""
flan_t5(summarization)[0]["generated_text"]
A man has been arrested after a firearm was found in a property in Edinburgh.
text_classification_1 = """A return ticket is better value than a single.
topic: travel cost
You can start from the basic stitches, and go from there.
topic: learning knitting
The desk which I bought yesterday is very big.
topic: furniture size
George Washington was president of the United States from 1789 to 1797.
topic:"""
flan_t5(text_classification_1)[0]["generated_text"]
George Washington presidency
text_classification_2 = """FLAN-T5 was released in the paper Scaling Instruction-Finetuned Language Models - it is an enhanced version of T5 that has been finetuned in a mixture of tasks.
keywords: released, enhanced, finetuned
The IPU, or Intelligence Processing Unit, is a highly flexible, easy-to-use parallel processor designed from the ground up for AI workloads.
keywords: processor, AI
Paperspace is the platform for AI developers. providing the speed and scale needed to take AI models from concept to production.
keywords:"""
flan_t5(text_classification_2)[0]["generated_text"]
paperspace, AI, scale
As we saw earlier, when looking at the results from the paper, Flan-T5-XL is roughly 40% (on average) better than Flan-T5-Large across its validation tasks. Therefore when deciding if Flan-T5-XL is worth the cost for you, ask yourself the following questions:
To demonstrate, let us now look at an example of a task where the answer to all of the above questions is yes. Let’s say you have a customer service AI that you use to answer basic questions in order to reduce the workload of your customer service personnel. This needs:
Looking at the code below, we see some context about 91ÊÓƵAPP provided in the input, as well as a primer for a conversational response from the model. As you can see from the example, Flan-T5-XL was able to understand the information provided in the context and provide useful and natural answers to the questions it was asked.
from IPython.display import clear_output
class ChatBot:
def __init__(self, model, context) -> None:
self.model = model
self.initial_context = context
self.context = self.initial_context
self.user, self.persona = [x.split(":")[0] for x in context.split("\n")[-2:]]
def ask(self, question):
question += "." if question[-1] not in [".", "?", "!"] else ""
x = f"{self.context}\n{self.user}: {question}\n{self.persona}: "
# print(f"\n{x}\n")
y = self.model(x)
response = y[0]["generated_text"]