GNN 101( 二 ) _生活百科

文章插图

Graph-level prediction 预测整图或子图的类别或性质

文章插图
HowWorkflow

文章插图

以fraud detection为例：

Tabformer数据集

文章插图

workflow

文章插图

软件栈

文章插图

计算平面

文章插图

【GNN 101】数据平面

文章插图
SW ChallengesGraph SamplerFor many small graphs datasets, full batch training works most time. Full batch training means we can do training on whole graph; When it comes to one large graph datasets, in many real scenarios, we meet Neighbor Explosion problem;

Neighbor Explosion:

文章插图

Graph sampler comes to rescue. Only sample a fraction of target nodes, and furthermore, for each target node, we sample a sub-graph of its ego-network for training.This is called mini-batch training. Graph sampling is triggered for each data loading.And the hops of the sampled graph equals the GNN layer number . Which means graph sampler in data loader is important in GNN training.

文章插图
Challenge: How to optimize sampler both as standalone and in training pipe?
When graph comes to huge(billions of nodes, tens of billions of edges), we meet new at-scale challenges:

How to store the huge graph across node? -> graph partition
How to build a training system w/ not only distributed model computing but also distributed graph store and sampling?
- How to cut the graph while minimize cross partition connections?

文章插图
A possible GNN distributed training architecture:

文章插图
Scatter-Gather

Fuse adjacent graphs ops

One common fuse pattern for GCN & GraphSAGE：

文章插图

Challenge: How to fuse more GNN patterns on different ApplyEdge and ApplyVertex,automatically?
How to implement fused Aggregate

文章插图
Challenge:
- Different graph data structureslead to different implementations in same logic operations;
- Different graph characteristics favors different data structures;(like low-degree graphs favor COO, high-degree graphs favor CSR)
- How to find the applicable zone for each and hide such complexity to data scientists?

Inference challenge
- GNN inference needs full batch inference, how to make it efficient?
- Distributed inference for big graph?
- Vector quantization for node and edge features?
- GNN distilled to MLP?
SW-HW co-design challenge
- How to relief irregular memory access in scatter-gather?
- Do we need some data flow engine for acceleration?
…