I hope someone can help me with checking this code to make it faster.
My goal is to recursively create a graph:
- Query of a node and its neighbors from a database (not available here)
- Add it and its neighbors to a chart
- until you find a target, recur (1, 2) for each neighbor
I also set a control (maxnodes_in_graph) if the target is not found.
I've tried an optimization: I do not want to reprocess a node if I've already met someone else's neighbor and already added its child nodes.
So I use
The central part is this:
// version using networkx // keep track of process nodes_processed = () nodes_to_process = () def construct_simple_sub_graph(startNode, g): // get my list of edges like ( (startNode, n1) ... ) g.add_edges_from( edge_list) // I just proccessed nodes_processed.append( startNode ) for node in list(g.nodes): // here I am doing something like this: // I check if already I processed this node // if not done, I will add it to a TODO list nodes_to_process // if done, I will remove it from the TODO list if node not in nodes_processed: if node not in nodes_to_process: nodes_to_process.append( node ) else: if node in nodes_to_process: nodes_to_process.remove( node ) return g
and then I put this feature in a while:
def subgraph(fromNode, g): while not ( g.has_node( fromNode) and g.has_node( toNode) ): for node in nodes_to_process: g = construct_simple_sub_graph( node, g) if (g.number_of_nodes() > maxNodes) : return g return g // first line will populate initial nodes in graph, and put // populate nodes_to_process with the first neighbors g = construct_simple_sub_graph( fromNode, g) g = subgraph(fromNode, g)
Well, this code does what I want, but it takes a lot of time.
This is probably the reason for how
networkx is done.
I've tried to revise the code with simple lists and sets, use adjoin lists instead of graphs, and only build them when I'm done.
I have revised the code as:
// version usings lists and sets() // adj_list is like (parent, neighbor) adj_list = () // store the nodes in my graph / adj adj_nodes = () // same as above nodes_processed = set() nodes_to_process = () def construct_simple_adj_list(startNode, nodes_to_process): // get edge_list // isFound is a bool that tell me if target node is met in someone's neighbors adj_list.extend( edge_list ) # nodes in graph adj_nodes.append(startNode) adj_nodes.extend((e(1) for e in edge_list)) # to process nodes_processed.add( startNode) // I make a copy of the nodes to process: I need to update the TODO list nodes_to_process = nodes_to_process.copy() for node in nodes_processed: if node in nodes_to_process: nodes_to_process.remove(node) return is_found, nodes_to_process
and back to while:
// first iteration is_found, nodes_to_process = construct_simple_adj_list( fromNode, adj_nodes) while not (is_found): for node in nodes_to_process: is_found, nodes_to_process = construct_simple_adj_list( node, nodes_to_process) if (len(adj_nodes) > maxNodes) : return adj_list, nodes_to_process return adj_list, nodes_to_proces
This second code is 2 orders of magnitude faster (YAI)
but I noticed that the final graphic is very different
and I get lower quality results.
Could you please help us to recognize the differences between the two approaches?
Can you help provide a recursive version that uses only lists and set manipulations to recursively create an Adj matrix?