Social Network Analysis (SNA) is one of those Use Cases that everyone mentions when talking about the strengths of graph databases. It’s not a secret that the network of people interacting together makes instantly a good image of a graph in everyone’s head. Once you have constructed the social graph it opens plenty of possibilities to explore it wisely in order to effectively answer questions like the one we are going to deal in today’s post: how to discover whom is more likely to make my message viral in the network.

To give a more insight about how to construct a good algorithm that will find us the most viral users and which exploits the capabilities of the graph we are using the literature and will refer to the “greedy algorithm”. For those still not familiar with this algorithm let us introduce its definition.

The greedy algorithm starting with a solution tree is able to calculate those solutions that maximise a defined function f(n). Therefore for each iteration the algorithm will take a look at the child nodes of a certain source node and select the one that maximises the f(n) and move forward.

Figure 1.0 shows an example of an execution of the greedy algorithms. Blue nodes are the ones already included in our solution, yellow nodes are the ones being evaluated with our function f(n) (we also call these nodes candidates) and the white nodes that are those never visited and thus not evaluated. It’s of vital importance that we are able to establish the best heuristic for f(n) so the algorithm delivers the optimal solution. We can see how important is to tune the algorithm on the example shown in Figure 1.0 where we are looking for three consecutive nodes that maximise the sum of their values. A simple greedy algorithm will answer the blue nodes (5, 7 and 5) while a more optimal solution in our example would be nodes 5, 3 and 50.

*Figure 1.0 – Example of a Greedy Algorithm *

Let’s see then which ideas you could use to construct a good function for a greedy algorithm to discover the most viral users in our social network. For each node (users) you can evaluate a weight so the greedy algorithm can move through the ones that maximise that propagation weight. The measure of propagation should take into account things like the previous propagations of that user, that propagation could also be valued against the rest of propagations of the other users or the number of documents ever propagated by that user. Also one important matter that we could maybe consider are restricting to only previous propagations from a similar theme.

With all those ideas you should be able to tune your own and unique function of propagation that could then be used in an algorithm such as the following:

Require: A graph G and a node N Ensure: I are infected by N 1: I = empty set; 2: P = pendent nodes with N queue; 3: V = visited set; 4: while P no empty do 5: x P.dequeue(); 6: edges edges(x, source); 7: for edge 2 edges do 8: tail = edge.tail(); 9: if V not contains tail then 10: V union( V, tail); 11: P union( P, tail); 12: if Math:random() > edge:weight() then 13: if not tail 2 I then 14: I union( I, tail); 15: end if 16: end if 17: end if 18: end for 19: end while 20: return I;

Hope you find our successful story of using Sparksee with this greedy algorithm to discover the most viral users inspiring for your Social Networks Analysis projects.

Download now Sparksee for free, start building your social graph to search for propagation constructing the algorithms explained here!