Chapter 4: Formation of Networks
About This Chapter
Every social network you have ever studied — the citation graph of a scientific field, the follower graph of Twitter, the friendship graph of a university cohort — was not designed. Nobody sat down with a blueprint and decided who should be connected to whom. Instead, the network grew out of millions of individual decisions: two researchers who shared an interest and wrote a paper together, a celebrity who posted something that attracted followers, two students who sat next to each other in a course and remained friends for years. The question this chapter addresses is whether such decentralized, seemingly random processes leave behind recognizable signatures in the structure of the resulting network — and if so, what those signatures tell us about the social forces that created them.
The answer, it turns out, is yes — but different social forces produce different signatures. Three generative models have become the canonical framework for understanding this:
The Erdős–Rényi random graph treats each potential edge as an independent coin flip. It captures the baseline: what would a network look like if connections were formed purely at random, with no regard for geography, shared interests, or popularity?
The Watts–Strogatz small-world model begins with a highly structured, locally clustered network and introduces a small number of random long-range links. It captures the observation that real social networks combine local density (your friends know each other) with global efficiency (you are never more than a handful of handshakes from anyone else on Earth).
The Barabási–Albert preferential-attachment model builds the network incrementally, with new nodes preferring to attach to nodes that already have many connections. It captures the “rich-get-richer” dynamic that produces the extreme inequality of connections observed in the World Wide Web, citation networks, and social media platforms.
Each model illuminates something true about real networks. None of them is the whole truth. Together, they give you a vocabulary — both qualitative and mathematical — for diagnosing what kind of formation process is most likely at work when you pick up an unknown network and start measuring it. That diagnostic skill is what this chapter builds.
Table of Contents
- The Erdős–Rényi Random Graph \(G(n, p)\)
- What Erdős–Rényi Gets Wrong
- The Watts–Strogatz Small-World Model
- The Six Degrees of Kevin Bacon and Milgram’s Experiment
- Preferential Attachment and the Barabási–Albert Model
- Why Scale-Free Networks Matter: Robustness and Fragility
- Diffusion on Networks: The SI and SIR Models
- Mini Case Study: Fitting the Three Models to a Real Graph
The Erdős–Rényi Random Graph \(G(n, p)\)
Mathematical definition
In 1959, Paul Erdős and Alfréd Rényi published two papers that introduced rigorous probability theory to the study of networks. Their model is deceptively simple. Fix a positive integer \(n\) (the number of nodes) and a probability \(p \in [0, 1]\). Form a graph \(G(n, p)\) by considering every pair of nodes \(\{i, j\}\) independently, and including the edge \(\{i, j\}\) with probability \(p\) and omitting it with probability \(1 - p\). There are \(\binom{n}{2} = \frac{n(n-1)}{2}\) potential edges in total, and each one is decided by an independent Bernoulli trial.
The expected number of edges in \(G(n, p)\) is therefore:
\[\mathbb{E}[|E|] = \binom{n}{2} p = \frac{n(n-1)}{2} p\]
More useful for most purposes is the expected degree of a single node. Node \(i\) has \(n - 1\) potential neighbors, and each one is connected to \(i\) with probability \(p\), so:
\[\mathbb{E}[k_i] = (n - 1) p \approx np \quad \text{for large } n\]
The quantity \(\lambda = np\) is the mean degree and is the single most important parameter of the model. Throughout this chapter we will write \(\lambda\) when we want to emphasize the mean degree rather than the edge probability.
Degree distribution: convergence to a Poisson
What fraction of nodes have exactly \(k\) neighbors? Since each of the \(n - 1\) other nodes connects to node \(i\) independently with probability \(p\), the degree of node \(i\) follows a Binomial distribution:
\[P(k_i = k) = \binom{n-1}{k} p^k (1-p)^{n-1-k}\]
For large \(n\) with \(p = \lambda / n\) held fixed (so that the mean degree \(\lambda\) remains constant), the Binomial distribution converges to a Poisson distribution:
\[P(k) \to \frac{\lambda^k e^{-\lambda}}{k!}\]
This is a celebrated result that connects combinatorics to probability theory. What it means practically is that in a large random graph with mean degree \(\lambda\), most nodes have degrees close to \(\lambda\), and the fraction of nodes with degree far from the mean decays very rapidly — exponentially fast. There are no hubs in an Erdős–Rényi graph, at least not in any systematic sense.
The phase transition: emergence of a giant component
Erdős and Rényi’s most striking discovery was that the global connectivity of \(G(n, p)\) undergoes a phase transition at a critical threshold. Let \(\lambda = np\). When \(\lambda < 1\), the graph is fragmented into many small components, the largest of which has at most \(O(\log n)\) nodes. When \(\lambda > 1\), a single giant component abruptly emerges, containing a positive fraction of all \(n\) nodes, while all remaining components are still small.
The phase transition is sharp. Crossing the boundary \(p = 1/n\) (equivalently \(\lambda = 1\)) takes the network from “essentially disconnected” to “essentially connected” in a narrow interval of \(p\) values. Jackson (2008) describes this as one of the most remarkable theorems in graph theory: a tiny increment in edge probability causes a qualitative reorganization of the entire network’s structure.
The critical value \(p^* = 1/n\) has a clean interpretation. At \(p^*\), the expected degree is exactly 1 — each node has, on average, one neighbor. Below this point, the network is subcritical and tree-like; above it, cycles form and the giant component crystallizes.
The phase transition in \(G(n, p)\) is not just a mathematical curiosity. It is an approximate model of epidemic thresholds in disease spread, percolation thresholds in material science, and information cascade thresholds in social networks. In epidemiology, the basic reproduction number \(R_0\) plays the same role as \(\lambda = np\): when \(R_0 < 1\) an infection dies out in small clusters; when \(R_0 > 1\) it spreads to a macroscopic fraction of the population. The structural analogy between network connectivity and epidemic dynamics is one reason network models became central to public health modeling during COVID-19.
Live experiment: sweep \(p\) and watch the giant component emerge
The cell below constructs 50 values of \(p\) from 0 to \(5/n\) on a network of \(n = 100\) nodes, builds the Erdős–Rényi graph at each \(p\), and records the size of the largest connected component. Run it and observe the step: the giant component is essentially absent until \(p \approx 0.01 = 1/n\), then grows rapidly as \(p\) increases.
Reading the figure. Below the dashed red line, the largest connected component is a tiny fraction of the network — a handful of nodes, essentially noise. At \(p = 1/n\) you see the inflection point, and above it the giant component grows steeply. By \(p = 0.03\) (mean degree \(\lambda = 3\)), roughly 95% of all nodes belong to a single connected component. This is the phase transition: structurally, crossing \(p^*\) is the difference between a world where information cannot travel globally and one where it can.
What Erdős–Rényi Gets Wrong
The Erdős–Rényi model is mathematically tractable and theoretically beautiful, but it makes two predictions that are systematically violated in almost every real social network.
The clustering problem. In \(G(n, p)\), the probability that two neighbors of a node are themselves connected equals \(p\), the same as for any random pair of nodes. There is no tendency for “friends of friends to be friends.” In real social networks, the opposite is true: if Alice knows Bob and Alice knows Carol, the probability that Bob and Carol also know each other is many times higher than the background probability of a random connection. This property is measured by the clustering coefficient, which we studied in Chapter 1. Empirically, social networks have clustering coefficients in the range of 0.3 to 0.6; a random graph with the same number of nodes and edges has a clustering coefficient close to zero.
The hub problem. In \(G(n, p)\), degree follows a Poisson distribution. The Poisson distribution has a thin tail: the probability of finding a node with degree much larger than \(\lambda\) decays exponentially. In real networks — the World Wide Web, Twitter follower counts, citation networks, the network of sexual contacts — the degree distribution follows a power law, \(P(k) \propto k^{-\gamma}\), which has a much heavier tail. A few nodes accumulate degrees orders of magnitude larger than the average. These hubs are absent from the random graph.
These two failures point to two real-world mechanisms that \(G(n, p)\) ignores: local social structure (triadic closure, the tendency to form triangles) and cumulative advantage (the tendency of popular nodes to attract more connections). The Watts–Strogatz and Barabási–Albert models address these in turn.
The Watts–Strogatz Small-World Model
Construction and intuition
In 1998, Duncan Watts and Steven Strogatz published a paper in Nature that sparked an explosion of interest in complex networks. Their insight was that a very simple modification of a completely regular lattice could reproduce two seemingly contradictory properties of real social networks simultaneously: high clustering and short average path length.
The construction proceeds in three steps:
Build a ring lattice. Place \(n\) nodes on a ring and connect each node to its \(k\) nearest neighbors (\(k/2\) on each side), where \(k\) is even. This produces a highly clustered, highly regular graph. The clustering coefficient is approximately \(3(k-2)/[4(k-1)] \approx 3/4\) for large \(k\), which is high. But the average path length is approximately \(n/(2k)\), which grows linearly with \(n\) — an extremely “long” world for large networks.
Rewire edges randomly. Go through every edge in the lattice and, with probability \(\beta\), replace it by a randomly chosen edge (a “shortcut” connecting two nodes chosen uniformly at random). The probability \(\beta\) is the single parameter that interpolates between the two extremes.
Observe the interpolation. When \(\beta = 0\), nothing changes and you have the original lattice. When \(\beta = 1\), every edge has been replaced and you approach an Erdős–Rényi random graph. At intermediate values of \(\beta\), something remarkable happens: the average path length drops precipitously while the clustering coefficient barely moves.
The mathematical intuition is elegant. A handful of random long-range shortcuts drastically reduces the diameter of the network — any two nodes can now reach each other by traveling along familiar local edges and then “jumping” via a shortcut. But shortcuts are rare, so the dense local neighborhood structure (the source of high clustering) is almost entirely preserved.
Formally, let \(L(\beta)\) denote the average shortest path length and \(C(\beta)\) the average clustering coefficient, both normalized by their values at \(\beta = 0\). Watts and Strogatz showed empirically (and later researchers verified analytically) that there is a wide range of \(\beta\) — roughly \(0.01 < \beta < 0.1\) — for which:
\[\frac{L(\beta)}{L(0)} \approx 0.1 \quad \text{while} \quad \frac{C(\beta)}{C(0)} \approx 0.9\]
The path length has collapsed to a tenth of its original value while the clustering coefficient is still ninety percent of its original value. This is the small-world regime.
Small-world structure has been documented in an impressive range of empirical networks. The C. elegans neural connectome — the complete wiring diagram of a 302-neuron organism, the only nervous system ever fully mapped — has a clustering coefficient 5.6 times higher than an equivalent random graph but an average path length only 1.18 times longer. The power grid of the Western United States shows similar properties, as does the co-authorship network of researchers in neuroscience. The interpretation is the same across all of these contexts: local clustering supports robustness and specialization, while shortcuts enable fast global coordination. In brain science, the co-existence of densely clustered local circuits with long-range projection fibers is thought to be one of the structural signatures of an efficient information-processing architecture.
Live experiment: replicate the Watts–Strogatz figure
The cell below replicates the classic figure from the 1998 Watts–Strogatz paper. It sweeps \(\beta\) on a logarithmic scale from 0.0001 to 1, builds a Watts–Strogatz graph at each \(\beta\), and measures the normalized clustering coefficient and normalized average path length. Run the cell and look for the regime where the two curves diverge: that is the small-world window.
Reading the figure. The red curve (path length) drops sharply as soon as \(\beta\) rises from zero, because even a tiny fraction of random shortcuts creates dramatically shorter routes across the network. The blue curve (clustering) remains high across a wide range of \(\beta\), because removing a single edge from a dense local neighborhood leaves most triangles intact. The green shaded region marks the small-world regime — the range of \(\beta\) where you have simultaneously short paths and high clustering. Real social networks sit in or near this window.
The Six Degrees of Kevin Bacon and Milgram’s Experiment
The phrase “six degrees of separation” entered popular culture through a 1990 play by John Guare, but the empirical evidence underlying it is considerably older. In 1967, Harvard psychologist Stanley Milgram sent 296 letters to randomly selected residents of Omaha, Nebraska, and Wichita, Kansas. Each letter was addressed to a target person in Boston, Massachusetts. The instructions were simple: if you do not know the target personally, forward the letter to someone you know on a first-name basis who is more likely to know the target. Milgram tracked the letters that ultimately reached the target and counted the number of intermediaries. Of the letters that arrived, the median chain length was 5.5 — roughly six steps.
Milgram’s experiment attracted enormous attention, but it also attracted criticism. The completion rate was low (only about 20–25% of letters arrived), raising questions about selection bias in the completed chains. Jeff Travers and Milgram repeated the experiment in 1969 with a larger sample and more careful controls, obtaining a median chain length of 6.2. In 2003, Duncan Watts, Peter Dodds, and Mark Newman replicated the experiment over email with a global target set, finding median chain lengths of 5–7, consistent across many target countries and occupations.
The “Kevin Bacon game,” popular since 1994, operationalizes this intuition using the Hollywood film network. Every actor is assigned a “Bacon number” equal to the length of the shortest path to Kevin Bacon in the co-appearance graph (two actors are connected if they appeared in the same film). The average Bacon number across all Hollywood actors is approximately 2.9, not six — the film network is considerably denser and more interconnected than a social network of the general population.
Both examples are empirical demonstrations of the Watts–Strogatz mechanism: a network with local clustering and a small number of long-range connections has a short average path length. The mathematical insight of Watts and Strogatz was to show that you do not need many shortcuts to achieve this — a few percent of random rewirings is enough.
Preferential Attachment and the Barabási–Albert Model
The observation: scale-free degree distributions
In 1999, Réka Albert and Albert-László Barabási published measurements of the World Wide Web that would change the field. They found that the number of web pages with \(k\) incoming links follows a power law:
\[P(k) \propto k^{-\gamma}\]
with \(\gamma \approx 2.1\). The same pattern appeared in the citation network of physics papers (\(\gamma \approx 3\)), the actor co-appearance network, and eventually hundreds of other datasets. A power law degree distribution has a qualitatively different character from the Poisson distribution of the random graph. The Poisson is concentrated near its mean; the power law has a heavy tail that ensures a non-negligible fraction of nodes with degrees many standard deviations above the mean. Such networks are called scale-free because a power law looks the same at every scale (the ratio \(P(2k)/P(k) = 2^{-\gamma}\) is constant, independent of \(k\)).
Power-law degree distributions appear throughout the digital economy. On Twitter, the follower count distribution follows a power law with exponent \(\gamma \approx 2.4\): most accounts have tens of followers, but a handful have tens of millions. On YouTube, viewership per video follows a similar distribution. In citation networks, most papers receive no citations, but a handful of landmark papers accumulate thousands. The economic consequences are profound: in a scale-free network, a tiny fraction of nodes — the hubs — account for a disproportionate share of traffic, influence, and value. Platform businesses understand this: the top 1% of content creators on a platform often generate 50–70% of total engagement, which is precisely why creator monetization programs focus on retaining the high-degree nodes.
The mechanism: preferential attachment
Erdős–Rényi graphs fail to produce hubs for a simple reason: every potential edge is equally likely regardless of the current degree of its endpoints. Real network growth does not work that way. A new researcher entering a field is more likely to cite already-famous papers; a new Twitter user is more likely to follow already-popular accounts; a new web page is more likely to link to already-visible sites. This mechanism — preferential attachment, or the “rich-get-richer” rule — was formalized by Barabási and Albert in their 1999 model.
The BA model has two ingredients:
Growth. The network grows over time. At each time step, one new node is added, and it attaches \(m\) edges to existing nodes.
Preferential attachment. Each new edge connects to an existing node \(j\) with probability proportional to that node’s current degree \(k_j\):
\[\Pi_j = \frac{k_j}{\sum_i k_i}\]
A node with degree 10 is twice as likely to attract a new link as a node with degree 5. Popularity compounds over time.
Deriving the degree distribution: the master equation
Let \(p_k(t)\) be the fraction of nodes with degree \(k\) at time \(t\). When a new node arrives with \(m\) edges, it changes the degree distribution in two ways: it adds \(m\) links to existing nodes (drawn proportionally to their degree) and introduces itself as a degree-\(m\) node.
The rate at which a node of degree \(k\) gains a new edge is \(m \cdot \Pi_k = m \cdot k / (2m t) = k / (2t)\), since the total degree at time \(t\) (with \(t\) nodes each having expected degree \(2m\) — every edge contributes two to the degree count) is approximately \(2mt\). Writing the balance equation for the fraction of degree-\(k\) nodes:
\[\frac{d}{dt}(t \cdot p_k) = \frac{(k-1)}{2} p_{k-1} - \frac{k}{2} p_k + \delta_{k, m}\]
where the first term accounts for nodes of degree \(k-1\) that receive a new edge and move to degree \(k\), the second term accounts for nodes of degree \(k\) that receive a new edge and move to degree \(k+1\), and \(\delta_{k,m}\) accounts for the newly added node which arrives at degree \(m\). Seeking a stationary solution \(p_k(t) \to p_k\) (independent of \(t\)) and solving the recursion yields:
\[p_k = \frac{2m(m+1)}{k(k+1)(k+2)}\]
For large \(k\) this simplifies to:
\[p_k \sim \frac{2m^2}{k^3} \propto k^{-3}\]
The degree distribution follows a power law with exponent \(\gamma = 3\), regardless of \(m\). The parameter \(m\) controls only the density of the network, not the shape of the degree distribution. This is the central theoretical result of the BA model.
Live experiment: grow a BA graph and verify the power law
Reading the output. The left panel shows a steeply declining histogram in linear coordinates — a shape that looks qualitatively similar to many skewed economic distributions. The right panel plots the same data on logarithmic axes on both dimensions. If the distribution is a true power law, the points should fall on a straight line in this space, and the fitted line (red) should have a slope close to \(-3\). The fitted exponent reported in the text output will typically lie between \(-2.8\) and \(-3.2\) depending on the random seed — consistent with the theoretical prediction of \(\gamma = 3\).
Why Scale-Free Networks Matter: Robustness and Fragility
The asymmetry of attack and failure
The power-law degree distribution creates a profound asymmetry in how a network responds to node removal. This asymmetry has large practical consequences for the design of resilient infrastructure, the spread of disease, and the structure of financial contagion.
Consider two scenarios. In the first — random failure — each node is removed independently with probability \(f\) (a uniform random fraction of the network fails). In the second — targeted attack — nodes are removed in decreasing order of degree (the highest-degree hubs are removed first). In a random graph, these two scenarios produce similar degradation: the network breaks apart at roughly the same threshold \(f\) in both cases. In a scale-free network, the two curves diverge dramatically.
The intuition is straightforward. In a scale-free network, the overwhelming majority of nodes have very low degree. A random removal is therefore almost certain to remove a low-degree node, leaving the hubs intact. The hubs are what holds the network together as a connected whole. Random failures therefore cause only gradual degradation — the giant component shrinks slowly even as \(f\) increases. This property is called robustness to random failure.
Targeted attack is the reverse. Remove the top \(1\%\) of nodes by degree, and you have removed the hubs — the very nodes on which global connectivity depends. The giant component collapses rapidly. This property is called fragility to targeted attack.
Reka Albert, Hawoong Jeong, and Albert-László Barabási demonstrated both properties in a 2000 Nature paper using the internet topology as their empirical network. They showed that the internet could absorb the random failure of 80% of its nodes with only modest impact on global connectivity, but that targeting as few as 5–10% of the highest-degree routers caused complete fragmentation.
Live experiment: compare random failure vs. targeted attack
Reading the figure. The two curves start at the same point (no nodes removed, giant component at full size) but diverge as \(f\) increases. The blue curve (random failure) degrades slowly — even after removing 40–50% of nodes at random, a large connected component remains. The red curve (targeted attack) drops sharply: removing 10–20% of the highest-degree nodes is enough to fragment the network substantially. The key takeaway is that the same structural property — the heavy-tailed degree distribution — that makes scale-free networks robust against random failures simultaneously makes them fragile against deliberate attack.
The robustness-fragility duality has direct implications for cybersecurity, financial regulation, and public health. In cybersecurity, an adversary who knows the internet’s degree distribution will target high-degree routers and exchanges rather than random hosts — and a handful of such targeted strikes can cause widespread outages. In finance, the 2008 crisis revealed that the interbank lending network had scale-free properties: a handful of systemically important financial institutions (SIFIs) held the network together, and the failure or near-failure of Lehman Brothers, Bear Stearns, and AIG threatened to collapse the entire global system. Regulators responded by designating SIFIs for additional capital requirements — an explicit attempt to reduce the fragility of the high-degree nodes in the financial network.
Diffusion on Networks: The SI and SIR Models
Motivation: why network structure shapes diffusion
The three generative models produce networks with different structures: an Erdős–Rényi graph with roughly uniform connectivity, a Watts–Strogatz graph with high clustering and short paths, and a Barabási–Albert graph with hubs and a power-law degree distribution. A natural question is: how do these structural differences affect the spread of information, innovations, or diseases?
The workhorse framework for modeling diffusion on networks comes from mathematical epidemiology. Two models are particularly useful.
The SI model (Susceptible–Infected) is the simplest possible model of irreversible diffusion. Each node is in one of two states: Susceptible (\(S\)) or Infected (\(I\)). At each time step, each \(S\)–\(I\) neighbor pair produces a new infection with probability \(\beta\) (the transmission rate). Once infected, a node stays infected forever. The SI model is a reasonable first approximation for the spread of an irreversible adoption — the purchase of a new technology, the adoption of a social norm, the spread of a rumor that cannot be “un-heard.”
The SIR model (Susceptible–Infected–Recovered) adds a third state. An infected node recovers with probability \(\gamma\) per time step and moves to the Recovered (\(R\)) state, in which it is immune and no longer infectious. The SIR model better approximates the spread of a disease with a finite infectious period, or the adoption of a technology that can be abandoned.
For an SIR model on a network with mean degree \(\langle k \rangle\) and mean squared degree \(\langle k^2 \rangle\), the epidemic threshold condition is:
\[\frac{\beta}{\gamma} > \frac{\langle k \rangle}{\langle k^2 \rangle - \langle k \rangle}\]
For a scale-free network with \(\gamma_{\text{deg}} \leq 3\) (the exponent of the degree distribution), \(\langle k^2 \rangle\) diverges as \(n \to \infty\), so the right-hand side of this inequality approaches zero. The epidemic threshold vanishes: any nonzero transmission rate is sufficient to cause a large-scale outbreak. This is the mathematical statement of why hubs make epidemics (and viral information) almost impossible to contain in scale-free networks without targeting the high-degree nodes.
Live experiment: SI diffusion on ER, WS, and BA graphs
The cell below builds one network of each type (100 nodes each, comparable density), seeds a single infected node, and simulates the SI diffusion process for 30 time steps. It then plots the fraction of infected nodes over time for each network type. Run it and compare the speed and eventual reach of the spreading process.
Reading the output. The BA network (scale-free, red dashed-dotted line) typically shows the fastest early spread. This happens because the seed node may or may not be a hub, but the hubs are always present and quickly become infected and then infect many of their neighbors simultaneously. The Watts–Strogatz small-world network (blue dashed line) often shows faster diffusion than ER at early steps because its short average path length means any infected node quickly reaches geographically distant parts of the network. The ER graph (gray solid line) lies somewhere between the two, depending on parameter values. What is most important is the asymptotic behavior: in all three cases diffusion reaches a large fraction of the network, but the speed of diffusion depends critically on network structure.
Before running the cell, predict: which network type will show the highest fraction infected after just 5 time steps? Think about which structural property — hubs, short paths, or neither — gives diffusion the biggest early advantage. Write down your prediction, then run the cell and check.
Mini Case Study: Fitting the Three Models to a Real Graph
Setup: the Karate Club network
Zachary’s karate club network is one of the most studied social graphs in network science. In 1977, Wayne Zachary documented the social interactions among 34 members of a university karate club over three years. He recorded a tie between two members if they had a documented social interaction outside of club activities. The network has 34 nodes and 78 edges, and it famously split into two factions (around the instructor and around the club president) following a dispute over fees — a real-world example of community structure, which we studied in Chapter 3.
Here we use the karate club network for a different purpose: as a reference against which to compare the structural properties of the three generative models. For each model, we will grow a graph with the same number of nodes and approximately the same number of edges as the karate club network, and then compare three statistics: average clustering coefficient, average shortest path length, and degree distribution shape.
Comparing degree distributions visually
Interpreting the comparison
After running the two cells above, a clear pattern emerges, and it is worth reading carefully because it captures the central lesson of this chapter.
Erdős–Rényi matches the real network on mean degree (by construction) but produces a symmetric, bell-shaped degree distribution tightly concentrated around the mean. The karate club network has a much more spread-out distribution with a visible tail — a few nodes with degree 12 or higher (the club officers and central members) that the ER model cannot reproduce. The ER clustering coefficient is also consistently lower than the real network’s clustering coefficient, often by a factor of two to three.
Watts–Strogatz does substantially better on clustering. With \(\beta = 0.2\) in the small-world regime, the WS graph produces a clustering coefficient close to the real network’s value, and its average path length is also a reasonable match for a network this small. The weakness of WS for this network is the degree distribution: the ring lattice construction forces all degrees to be close to \(k\), producing an even more concentrated distribution than ER. The karate club’s degree heterogeneity is still not captured.
Barabási–Albert captures the tail of the degree distribution best — the right skew and the presence of high-degree nodes are natural outputs of preferential attachment. However, with only 34 nodes the power-law tail does not yet manifest as clearly as it does in larger networks. The BA model also produces the lowest clustering coefficient of the three, which is a known weakness: preferential attachment does not generate triangles, so BA graphs have near-zero clustering for small \(n\).
What this tells us about the karate club. The real network sits in a regime that none of the three canonical models fully captures: it has more clustering than ER or BA, more degree heterogeneity than WS, and shorter paths than a pure lattice. This is consistent with a social formation process that combines at least two mechanisms: triadic closure (a friend of a friend becomes your friend — contributing to the high clustering) and mild status effects (central members attract more connections — contributing to the degree heterogeneity). This combination points toward models like the Jackson–Rogers model (2007), which combines random meeting with network-based meeting, or the fitness model of Bianconi and Barabási (2001), in which nodes have heterogeneous intrinsic attractiveness in addition to preferential attachment. These extended models are beyond the scope of this chapter, but the diagnostic exercise we just performed is exactly how network scientists use generative models in practice.
In business and policy applications, fitting generative models to observed networks is not merely an academic exercise. At a financial regulator, fitting an interbank exposure network to an ER or BA model tells you whether systemic risk arises from dense random cross-holdings (ER-like) or from a few heavily connected banks that serve as hubs (BA-like) — and the policy response is different in each case. In platform design, understanding whether your user-follow graph looks more like a small-world or a scale-free network tells you whether recommendations should prioritize local exploration (triadic closure — “people like you also follow…”) or hub amplification (“followed by 2 million people”). The three models in this chapter are the vocabulary for those conversations.
Summary
This chapter built three mathematical models of network formation and connected each one to the structural properties and social processes it captures.
The Erdős–Rényi random graph \(G(n, p)\) establishes the baseline. Edges form independently and uniformly at random; the degree distribution converges to a Poisson; and the network undergoes a sharp phase transition at \(p = 1/n\), at which point a giant connected component emerges. ER captures the idea of random, anonymous interaction, but misses the clustering and degree heterogeneity that characterize real social networks.
The Watts–Strogatz model adds local structure to the random baseline. Starting from a regular ring lattice and rewiring a small fraction of edges at random, the model produces networks that are simultaneously highly clustered (like a lattice) and have short average path lengths (like a random graph). This small-world property has been documented in brain connectomes, trade networks, co-authorship graphs, and the Hollywood actor network. The Milgram experiment and the Kevin Bacon game are empirical confirmations of the same phenomenon in human social networks.
The Barabási–Albert model captures the growth and preferential attachment that generate scale-free degree distributions. The master equation derivation shows that preferential attachment inevitably produces \(P(k) \propto k^{-3}\), regardless of the number of edges \(m\) added per step. Scale-free networks are robust to random failure but fragile to targeted attack on hubs — a structural property with direct implications for internet security, financial regulation, and epidemic control.
Diffusion on networks is faster and reaches more nodes in small-world and scale-free networks than in random graphs, for different structural reasons: small-world networks have short paths that let infections jump quickly across the network, while scale-free networks have hubs that simultaneously infect many neighbors.
The mini case study on the karate club network showed that real social networks typically combine features from multiple models — moderate clustering (suggesting triadic closure processes closer to WS) and moderate degree heterogeneity (suggesting some preferential attachment). Diagnosing which model is the better fit is the starting point for any deeper analysis of what social process generated the network.
Looking ahead. Chapter 5 takes the networks we have studied as given and asks how agents embedded in those networks learn, update beliefs, and converge (or fail to converge) to consensus. The structural properties developed in this chapter — path length, clustering, the presence of hubs — turn out to be central to the speed and accuracy of social learning. A Watts–Strogatz world, with its short paths and tight clusters, produces fast but potentially biased social learning; a world with strong hubs can generate information cascades that sweep the network regardless of private signals. Chapter 5 develops the formal models that make these intuitions precise.