# Social Network Analysis

Bài học Progress
0% Complete

Social Network Analysis

In this lesson, you’re expected to learn how to measure influence and centrality within a social network.

Degree and Average Degree

The main property of each node is its degree (k). The degree of a node represents the number of links that the nodes have to other nodes in the network.

In the context of cell-phone calls, it quantifies the number of different people someone has spoken to. An important property that characterizes an undirected network is its average degree.

Clustering

The clustering or transitivity of a network measures the total number of closed triangles in a network. It is of vital importance in sociology or for social network analysis.

In the context of social networks, a high clustering indicates that my friends are also friends with one another. On the other hand, a low clustering means that your friends do not know each other. Most networks, including social networks represent a high clustering. Thus, it is probable that your friends know each other.

The clustering of a network can be defined as shown below:
where a “connected triplet” consists of three nodes that are connected by two (open triplet) or three (closed triplet) undirected links.

Measuring Influence

A network approach to our data can be very useful to find influential individuals. For example, a company interested in monitoring its reputation on social media needs to know what are the most influential accounts.

Measures of centrality determine which nodes are the most influential on the network.

For example, for Twitter, we could build the retweet network, where we link users according to who retweets whom, and after that perform a centrality analysis to identify the most influential accounts.

But why should I bother with computing network centrality measures when I already know the number of direct retweets the user gets? Well maybe a user does not get many direct retweets but is very influential. How? Though he gets a few retweets, some popular accounts tend to propagate his messages.

Thus, once retweeted by these popular accounts, these posts quickly propagate through the full net. Hence, his messages reach a big audience on a second or third step. Local information would fail to capture this phenomenon.

In the context of network science, there are several centrality measures that quantify the influence of each node in a given network.

The goal of centrality is to determine the importance or influence of a given node. The most basic measure of centrality is the degree.

The main advantage of the degree is that each node can be computed with just local information from the node. Hence, it is a measure of direct influence with a very low computation cost.

Other measures of centrality include:

– Betweenness centrality
– Closeness centrality
– PageRank centrality

1) Betweenness Centrality

Betweenness is a centrality measure of influence of a node within networks. This measure quantifies the number of times a node acts as an intermediary along the shortest path between two other nodes. It is useful to quantify the control that an individual can achieve on the communication between other humans in a social network.

2) Closeness Centrality

Closeness centrality quantifies how close (or far) a node is from the rest of the nodes in a network.Intuitively, it represents how long it will take a node to spread information to all other nodes.

3) PageRank Centrality

PageRank centrality is a network-based diffusion algorithm initially proposed by Google to rank web pages. However, recently it has arisen as one of the preferred methods to rank a vast amount of data in different types of networks. For instance, it has been used to effectively rank streets, ecological species or leadership groups on social networks.

We could think about it in the following way. If a ’random surfer’ surfs the net by following links between nodes, eventually the surfer decides to jump to a randomly chosen node and continue the process. The probability of the surfer visiting each node is determined by its page rank. Hence, a node has a high page rank when highly connected or when attached to leading ones.

For example, the account of my company may not have too many followers on Twitter. However, if the community manager manages to get followed by an extremely high profile account, our page rank will automatically be high. Does this makes sense? Absolutely, being connected to influential entities enhances the probability that our tweets will reach a large audience.

Community Structure

All social networks present a community structure. A community can be defined as a subgraph of densely connected nodes, within a larger and sparser network. In a social network context, a community would correspond to a group of closely related friends, while in a political context, it would correspond to a political party.

The Figure below shows the network of political blogs. This network was analyzed by Lada Adamic et al where they showed that the network exhibited highly segregated community structure divided by political party.

Image Source: http://dl.acm.org/
[Optional] How Social Connections and Business Ties Can Boost Trade: An Application of Social Network Analysis
[Optional] Gephi for Network Analysis and Visualization