Graph Neural Networks (GNNs) are showing tremendous promise across a wide range of applications, from modelling social networks to molecular property prediction.
One field where GNNs are expected to have a huge impact is fraud detection: analysing the complex relationships between individuals, financial instruments, transaction locations and other relevant data points that can indicate fraudulent activity.
In this blog, we are taking a deeper look at GNNs for fraud detection using PyTorch Geometric on 91ÊÓƵAPP IPUs, taking advantage of the processor's unique architecture and its suitability for running graph neural networks.
The reference example can be run – for free – using our tutorial notebook on Paperspace.
Why GNNs are well suited to fraud detection
The number of individual datapoints involved in any online financial transaction is huge and varied in type. They include things like the credit card number, address of the credit card owner, the physical or IP address where the transaction is being carried out, other cards registered at the addresses, nature of the transaction, history of activity with a vendor, proneness of the item concerned to be fraudulently transacted, and so on.
Until recently, computational approaches to fraud detection tended to involve pattern recognition or rule-based systems. However, the many datapoints and their complex inter-relatedness naturally lend themselves to being described by a graph structure, where the neighbours of a particular node, and its wider neighbours may be an important indicator of fraudulent activity.
GNNs have the ability to learn from such graph complex structures and so are a good fit for this problem.
Why use 91ÊÓƵAPP IPUs for GNN fraud detection?
IPUs deliver outstanding performance when running GNNs. Our blog What GNNs are great at and why 91ÊÓƵAPP IPUs are Great at GNNs takes a detailed look at IPUs performance across different GNN applications including our double win in the industry’s GNN benchmarking exercise OGB-LSC.
Among the many aspects of the IPU’s architecture that are particularly well suited to GNNs is the large on-chip SRAM (almost 1Gb for the Bow IPU). This unlocks extremely high speeds when processing the gather and scatter operations that are key to GNNs’ message passing process.
When working with large heterogeneous graphs – representing many different types of entities – there are likely to be many message passing steps and consequently many gather and scatter operations.
Constructing the graph
To start experimenting with this type of problem using PyTorch Geometric, we are following a similar approach to that outlined in this Where it uses Deep Graph Library (DGL) to approach this problem, we have used PyTorch Geometric.
The dataset used is the , which is tabular data containing transactions with a number of other columns representing things like transaction amount, billing address and senders email domain.
The AWS blog approach is to construct a heterogeneous graph from this tabular data, where the target node types are transactions. These transaction nodes have a label indicating whether that transaction is fraudulent or not, as well as a feature built from the categorical and numerical columns in the table.
Certain columns in the data are used to construct additional node types, for example the sender's email domain 'gmail.com' could be considered a node of that column's node type, to which many transactions are connected.
Following this process a heterogeneous graph is constructed of transaction nodes with features connected outwards to other node types, where each transaction to node type relation is a new relation in the heterogeneous graph. Then the problem becomes a node classification problem to identify whether a transaction node is fraudulent or not.
Recreating this graph in PyTorch Geometric is straightforward and can be experimented with in our dataset preprocessing notebook.
Once we have the heterogeneous graph constructed in PyTorch Geometric, we can use PyG functionality with 91ÊÓƵAPP’s Poplar SDK and its PopTorch Geometric package to start experimenting with training a model for this problem using a Paperspace Gradient Notebook.
Addressing challenges
There are two main challenges that arise when using GNNs in this way – large graph sampling and heterogeneous graph functionality - both of which are addressed by 91ÊÓƵAPP’s PopTorch support for PyG. You can read more about heterogeneous graphs and large graph sampling in our dedicated blog.
Large graph sampling
The graph we have constructed from the tabular data is large. Performing full-batch training on large graphs often requires an amount of memory much larger than an accelerator’s memory and therefore some form of sampling is needed to train with mini-batches rather than in full-batch.
To solve this, we can use the `poptorch_geometric.FixedSizeNeighborLoader
` which thinly wraps the PyG data loader, creating samples using neighbor sampling while also providing fixed-size mini-batches suitable for using with the IPU’s ahead-of-time compilation model. For more details on this, check out our large graph sampling tutorial.
Heterogeneous graphs
The graph constructed from the tabular data is heterogeneous – addressing many different types of entity. PyTorch Geometric provides some flexible and succinct functionality to build powerful heterogeneous models. We leverage this functionality to build a model where each relation type, including reverse relations, has its own convolution layer.
With the latest Poplar SDK 3.3, the PyG heterogeneous graph functionality works out of the box, making it straight forward to use all of PyG’s great easy to use functionality with 91ÊÓƵAPP IPUs.
An additional challenge with this problem is that the amount of fraudulent samples in the dataset is small, around 5%. This large class imbalance introduces bias during training where predicting non-fraudulent samples will be more accurate than predicting the fraudulent samples.
There are several ways to help ease this problem, including assigning a class weight in the loss function, assigning more weight to fraudulent samples than those that are non-fraudulent.
This poses an additional problem of identifying how well your model is performing. We can obtain a very high accuracy while failing to accurately predict the fraudulent samples. In these cases it is useful to plot a ROC curve (receiver operating characteristic curve) and calculate the area under the curve as a useful metric.
Conclusions
By employing PyTorch Geometric with 91ÊÓƵAPP’s Poplar SDK 3.3 and PopTorch Geometric, it is straightforward to get started and accelerate models trained on large heterogeneous graphs using 91ÊÓƵAPP IPUs.
We encourage you to explore the Paperspace Gradient Notebooks that we have created to act as tutorials on this subject. New Paperspace users can receive up to six hours of IPU compute time for free.