Introducing the Production Release of PyTorch for the IPU

We've just released Poplar SDK 1.4 and with it the first production version of PyTorch for the IPU. Known as PopTorch, our connecting library brings together the performance of the IPU-M2000 platform and the developer-ready accessibility of PyTorch.

This tightly coupled solution allows users to run standard PyTorch programs on the IPU by simply changing a couple of lines of code:

import torch
import poptorch

In opening up Graphcore technology to PyTorch developers, we are bringing together the most advanced AI compute platform - designed around the needs of next-generation models - with the framework that has become synonymous with innovation in machine intelligence.

Support for the PyTorch framework was first made available in preview earlier in 2020. Since then it has been extensively refined and extended, based on developer feedback, making this release not just a team effort but a product of the wider PyTorch community.

Open source

Continuing the theme of a community-centric product development, we are open-sourcing PyTorch for IPU, with the code available on . Contributions can be made as standard GitHub pull requests, once our (CLA) has been accepted.

As well as helping to refine and accelerate the evolution of PyTorch for IPU, open sourcing allows developers to dive deep into our code, building understanding of how Graphcore’s broader hardware and software offering works.

Our commitment to open source extends across the Poplar software stack, including , , and port.

Key features

PyTorch for IPU is a simple wrapper for PyTorch programs that allows them to run models for both training and inference on the Graphcore IPU.

When using multiple IPUs, individual layers are wrapped in an IPU helper which designated the target IPU. The model is then parallelised via PopART.

The PyTorch model is passed down through Graphcore’s software stack through the Graph Compiler which schedules workloads for execution on the available IPUs.

The PyTorch for IPU interface library supports popular features developers will be familiar with from other hardware platforms, with some additional capabilities:

Support for inference and training
Data and model parallel support, model replication up to 64 IPUs
Optimised host dataloader support
FP32.32, FP16.32, FP16.16 precision with FLOAT32, FLOAT16, INT32, BOOL data types
Support for popular optimisers, SGD, RMSprop, AdamW, LAMB and features to support float16 models such as loss scaling
Broad PyTorch loss function coverage with support for arbitrary loss functions
Multi-convolution support
Ability to implement custom optimised operators
PyTorch for IPU Docker containers
Full support within Graphcore’s analysis tools
Examples and tutorials available from

Other SDK 1.4 features

In addition to the production release and open sourcing of PyTorch for IPU, our SDK 1.4 release provides many other features including:

Significant Poplar compiler optimisations to reduce compile time for faster development as well as kernel-level optimisations to take advantage of the MK2 IPU architecture including enabling larger batch sizes and greater model coverage
Optimised distributed deployment tools including a distributed configuration library to improve scale out of data-parallel applications across multiple IPU-POD systems
The new PopVision System Analyser to better understand and optimise large distributed systems
More detailed reporting features to understand and optimise models better in PopVision Graph Analyser V2.2
Additional ONNX operator coverage and model support

Preview features in 1.4

Updated sparsity kernel libraries using dynamic/reconfigurable sparsity patterns
Support for CentOS 8
Support for Ubuntu 20.04

Getting started with PyTorch

PyTorch for IPU is designed to require minimal manual alterations to PyTorch models. This example shows the code changes (in comments) required to perform inference using a standard pre-trained BERT PyTorch model on the IPU.

PyTorch Model Support and Performance

We are publishing new benchmarks for our IPU-M2000 system today too, including some PyTorch training and inference results. We also provide reference implementations for a range of models on . In most cases, the models require very few code changes to run IPU systems. We will be regularly adding more PyTorch-based performance results to our website and as code examples on , so please keep checking in.

We also have a , which is covered in the video below: