Data Science Behind HBO's "Game of Thrones" ?

Professionals are mining HBO's "Game of Thrones" for business secrets, from how to use machine learning to deduce who is likely to die, to why Duolingo started teaching High Valyrian.

The final season, which debuted Sunday night, leaves HBO with a challenge of its own: how to keep viewers coming back.

How does it work?

1. Measuring the lexical diversity

Similar to bio-diversity in an ecosystem we are looking into metrics such as volume, variability and density in order to say something of the perceived richness in numerical terms: Volume: length of the text in number of words Variability: ratio of the number of unique words to the total number of words Density: estimated measure of information density

2. Character footprints and word frequency analysis

The chapters of ‘A Song of Ice and Fire’ are presented through different point-of-views (POV) determining through whose perspective the story is told. Picking three of the main POV-characters we can plot a word dispersion plot where we get an idea of presence of the different names throughout the novels: Let’s take closer look at some select words and plot the Kernel Density Estimate (KDE) of them as they appear throughout the novels.

The KDE is a method to estimate and plot the underlying distribution based on a set of observations. It helps us to get a smooth version of the corresponding histogram and is tuned for tendencies rather than illustrating the observations individually.

3. Calculating the importance of the characters using network theory

Network theory will help us to calculate the most important characters by evaluating how central they are in the entire web of connected characters.

Specifically we will try to calculate the importance of a single character (node) in relation to all the other characters (nodes) using four key measures of centrality in network theory:

Degree centrality: the proportion of nodes directly connected to the node in question as share of the total number of nodes

Closeness centrality: measures ‘degrees of separation’, i.e. how many steps away is the node on average to reach all the other nodes

Betweenness centrality: quantifies the number of times a node acts as a bridge along the shortest path between two other nodes

Prestige centrality (aka Eigencentrality): doesn’t focuses on the number of connections to a certain node, but rather on the importance of the connecting nodes.

Highly connected nodes are considered more important than less connected nodes. This centrality measure is often used in web search algorithms.

Calculating the above metrics for all characters in ‘A Song of Ice and Fire’ we end up with the following highest ranked characters:

From this we can read that Jon, Tyrion and Jaime are the most important characters in terms their centrality in connecting with all other characters.

On the other hand Daenerys is not very well connected. The main characters have been identified as being characters that have more than five chapter with ‘point-of-view’ perspective (red nodes).

In addition, we are adding characters that have significant betweenness centrality but are not defined as POV-characters (grey nodes). Betweenness centrality will also be reflected in the size of the characters circle in the diagram. The graphs helps us get an intuitive understanding of the relative importance of our main characters.

All-in-all it is very clear that, from a network theory perspective, Jon is the most important character and would indeed be hard to kill off in order to maintain a connected and coherent story.