algorithm analysis – Why does it take O(n!) time to specify a canonical ordering for learning flatten adjacency matrices/graphs?

I was reading a paper for learning graphs (paper is GraphRNN) and it says in section 2.2 (emphasis by me):

Vector-representation based models. One naive approach would be to represent G by flattening $A^π$ into a vector in $R^{n^2}$, which is then used as input to any off-the-shelf generative model, such as a VAE or GAN. However, this approach suffers from serious drawbacks: it cannot naturally generalize to graphs of varying size, and requires training on all possible node permutations or specifying a canonical permutation,both of which require O(n!) time in general

I understand how flattening the adjacency matrix would require the algorithm to receive all possible orderings and thus it takes $O(n!)$ to learn it – but I don’t understand why specifying a canonical ordering has that issue to. With some abuse of notation just do $pi(G,A) = G^pi,A^pi$ for all graphs before giving them to the training algorithm. Something like (say we have $N$ graphs with $n$ verticies and we loop T times):

for t in T
  for G,A in Graphs.Dataset
      Gpi, Api = pi(G,A)  # takes worst case O(n^2) to at least do print the adj matrix
      y = mdl(Gpi, Api)
      loss(y,Gpi,Api).backward().sgd() # assume we are doing SGD or something

The run time seems $O(N)*O(n^2)$ not $O(N)*O(n!)$. So I am not sure why the canonical representation takes $O(n!)$ (note I am aware that just multiplying big-Os like that can lead to problems but specifying the summation explicitly in this case I think leads to no issues).

Thanks for the help!