Social Network Analysis in R

Author

Kevin Reuning

Published

May 3, 2024

For anyone discovering this, the code below here is pulled from quarto slides so there are a lot of headers. I urge you to use the Table of Contents on the right to navigate this. The beginning of each section should indicate what packages are being used. In general the following packages are used:

The course this was written for followed Analyzing Social Networks in R but did not make use of the xUCINET package as it appears to be no longer maintained (it is not on CRAN).

Basic R

Matrices in R

One basic format for network data is the matrix. You can make basic matrices using the matrix() function:

mat <- matrix(c(0, 1, 0, 
                0, 0, 0, 
                1, 1, 0), 
                nrow=3, ncol=3, 
                byrow=T)
mat
     [,1] [,2] [,3]
[1,]    0    1    0
[2,]    0    0    0
[3,]    1    1    0

Adding Names

The rownames() and colnames() functions are used access or set the row names and column names.

rownames(mat) <- c("JoJo", "Frida", "Kevin ")
colnames(mat) <- rownames(mat)
mat
       JoJo Frida Kevin 
JoJo      0     1      0
Frida     0     0      0
Kevin     1     1      0

Using igraph

igraph Objects

Although we can do a lot with basic matrices, what we really want to create is an object that has a the properties of a network.

The igraph package has a graph_from_adjacency_matrix() function used to create network data. It turns a matrix into an igraph object.

library(igraph)

Attaching package: 'igraph'
The following objects are masked from 'package:stats':

    decompose, spectrum
The following object is masked from 'package:base':

    union
net <- graph_from_adjacency_matrix(mat, mode="directed") 

graph_from_adjacency_matrix()

There are some options we can set:

  • mode=
    • "directed" directed network
    • "undirected" undirected, using upper triangle to make
  • weighted=
    • NULL (default) the numbers in matrix give the number of edges between
    • TRUE creates edge weights.
  • diag= where to include the diagonal, set to FALSE to ignore diagonals.
  • add.colnames=
    • NULL (default) use column names as the vertex names.
    • NA ignore the column names.

We can call our network object to then see some information about it.

net
IGRAPH a75aeb6 DN-- 3 3 -- 
+ attr: name (v/c)
+ edges from a75aeb6 (vertex names):
[1] JoJo  ->Frida Kevin ->JoJo  Kevin ->Frida

This shows the the attributes for the vertices (name) and some of the edges using the names of the vertices.

Loading Data

Lets load some more interesting data. We need to read in a csv (creating a data frame) of an adjacency csv. Because we know we will make this into an adjacency matrix we set row.names=1 to convert the first column of the csv into row names.

The data is here and is an adjacency matrix based on interactions in the first epsiode of Amazon Prime’s Wheel of Time.

net_mat <- read.csv("network_data/wheel_of_time_ep1.csv", row.names=1)
# setting row.names=1, turns the first column into rownames
net_mat[1:5, 1:5] # look at first 5 rows and cols
         Moraine Lan Marin Nynavene Rand
Moraine        0   4     1        1    1
Lan            4   0     0        1    0
Marin          1   0     0        1    1
Nynavene       1   1     0        0    1
Rand           1   0     0        0    0

Converting to igraph object

We again use graph_from_adjacency_matrix to convert to an igraph object, but first put net_mat into as.matrix(). Why do we set weighted=TRUE?

library(igraph)
net <- graph_from_adjacency_matrix(as.matrix(net_mat),
    mode="directed", weighted=TRUE) 

Converting Back to Adjacency Matrix

If we ever want to go back to the adjacency matrix we can:

mat <- as.matrix(net)
mat[1:5,1:5]
5 x 5 sparse Matrix of class "dgCMatrix"
         Moraine Lan Marin Nynavene Rand
Moraine        .   1     1        1    1
Lan            1   .     .        1    .
Marin          1   .     .        1    1
Nynavene       1   1     .        .    1
Rand           1   .     .        .    .

Another Way to Access the Adj Matrix

The iGraph object actually has the adjacency matrix always there

net[1:5,1:5]
5 x 5 sparse Matrix of class "dgCMatrix"
         Moraine Lan Marin Nynavene Rand
Moraine        .   4     1        1    1
Lan            4   .     .        1    .
Marin          1   .     .        1    1
Nynavene       1   1     .        .    1
Rand           1   .     .        .    .

Basic Information

What would we want to know about our network as a whole?

  • Number of edges, number of nodes?
  • Number of components?

Number of Edges and vertices/nodes

ecount() counts the number of edges, vcount() the number of vertices

ecount(net)
[1] 63
vcount(net)
[1] 22

. . .

edge_density() the proportion of possible edges that exist.

edge_density(net)
[1] 0.1363636

Number of Components

The count_components() function returns the number of components within a network.

count_components(net)
[1] 2

For directed data it defaults to weak components, we can also switch to strong components using mode=:

count_components(net, mode="strong")
[1] 7

Simple Plot

We are going to work on this more later, but it is useful for us to be able to quickly visualize our data (and check components).

plot(net)

Making it Undirected (Symmetrizing)

We can also create an undirected network out of a directed network using as.undirected(). The way it creates these depends on mode=.

und_net <- as.undirected(net, mode="mutual") 
## Undirected edge exists only if both have an edge 
as.matrix(und_net)[1:5,1:5]
5 x 5 sparse Matrix of class "dgCMatrix"
         Moraine Lan Marin Nynavene Rand
Moraine        .   1     1        1    1
Lan            1   .     .        1    .
Marin          1   .     .        .    .
Nynavene       1   1     .        .    .
Rand           1   .     .        .    .

Switching it to mode="collapse"

und_net <- as.undirected(net, mode="collapse") 
## Undirected edge exists if either have an edge 
as.matrix(und_net)[1:5,1:5]
5 x 5 sparse Matrix of class "dgCMatrix"
         Moraine Lan Marin Nynavene Rand
Moraine        .   1     1        1    1
Lan            1   .     .        1    .
Marin          1   .     .        1    1
Nynavene       1   1     1        .    1
Rand           1   .     1        1    .

Access Vertices and Edges

We use the V() and E() functions to access the vertices or edges of our network (note the capitalization)

V(net)
+ 22/22 vertices, named, from 32d1bee:
 [1] Moraine        Lan            Marin          Nynavene       Rand          
 [6] Mat            Perrin         Egwene         False.Dragon   Eladia        
[11] Dying.Man      Tam            Laila          Tom            Brandelwyn    
[16] Cabin.Trolloc  Danya          Matt.s.Mom     Matt.s.Sisters Padan.Fain    
[21] Town.Trolloc   Daise         
E(net)
+ 63/63 edges from 32d1bee (vertex names):
 [1] Moraine ->Lan            Moraine ->Marin          Moraine ->Nynavene      
 [4] Moraine ->Rand           Moraine ->Mat            Moraine ->Perrin        
 [7] Moraine ->Egwene         Lan     ->Moraine        Lan     ->Nynavene      
[10] Marin   ->Moraine        Marin   ->Nynavene       Marin   ->Rand          
[13] Marin   ->Egwene         Nynavene->Moraine        Nynavene->Lan           
[16] Nynavene->Rand           Nynavene->Perrin         Nynavene->Egwene        
[19] Nynavene->Dying.Man      Rand    ->Moraine        Rand    ->Mat           
[22] Rand    ->Perrin         Rand    ->Egwene         Rand    ->Tam           
[25] Mat     ->Moraine        Mat     ->Rand           Mat     ->Perrin        
[28] Mat     ->Danya          Mat     ->Matt.s.Mom     Mat     ->Matt.s.Sisters
+ ... omitted several edges

Vertex and Edge Attributes

We can have attributes at three different levels of the network: whole network, vertex, and edges. We can get the full list of the different attributes using *_attr_names() functions:

vertex_attr_names(net)
[1] "name"
edge_attr_names(net)
[1] "weight"
graph_attr_names(net)
character(0)

Accessing a Attribute

There are several weights to access a vertex or edge attribute, but the easiest one is to something like: V(net)$attr where attr is the name of the attribute. You can do this with E() as well

V(net)$name
 [1] "Moraine"        "Lan"            "Marin"          "Nynavene"      
 [5] "Rand"           "Mat"            "Perrin"         "Egwene"        
 [9] "False.Dragon"   "Eladia"         "Dying.Man"      "Tam"           
[13] "Laila"          "Tom"            "Brandelwyn"     "Cabin.Trolloc" 
[17] "Danya"          "Matt.s.Mom"     "Matt.s.Sisters" "Padan.Fain"    
[21] "Town.Trolloc"   "Daise"         
E(net)$weight
 [1] 4 1 1 1 1 1 1 4 1 1 1 1 2 1 1 1 1 3 1 1 3 4 4 3 1 3 4 1 2 2 1 1 3 3 4 1 1 4
[39] 4 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 2 1 2 1 2 1 1 1

Setting Attributes/Modifying Networks

You modify attributes in a way similar to how dataframes are modified in R: E(net)$attr <- "Good"

V(net)$type <- "Character"
V(net)$type
 [1] "Character" "Character" "Character" "Character" "Character" "Character"
 [7] "Character" "Character" "Character" "Character" "Character" "Character"
[13] "Character" "Character" "Character" "Character" "Character" "Character"
[19] "Character" "Character" "Character" "Character"

Indexing Vertices and Edges

Each vertex and edge can be indexed using [ ]. For example V(net)[2] will return the second vertex

V(net)[2]
+ 1/22 vertex, named, from 32d1bee:
[1] Lan
E(net)[2]
+ 1/63 edge from 32d1bee (vertex names):
[1] Moraine->Marin

Modifying Attributes

We can also use the indexing to modify the attributes:

V(net)$name[9] ## Missing a space
[1] "False.Dragon"
V(net)$name[9] <- "False Dragon"
V(net)$name ## Fixed
 [1] "Moraine"        "Lan"            "Marin"          "Nynavene"      
 [5] "Rand"           "Mat"            "Perrin"         "Egwene"        
 [9] "False Dragon"   "Eladia"         "Dying.Man"      "Tam"           
[13] "Laila"          "Tom"            "Brandelwyn"     "Cabin.Trolloc" 
[17] "Danya"          "Matt.s.Mom"     "Matt.s.Sisters" "Padan.Fain"    
[21] "Town.Trolloc"   "Daise"         

Handling Edges

Dealing with edges can be a bit more confusing, they also have IDs. The best way to identify them is to identify them by what vertex they are incident to.

Identifying Edge IDs

get.edge.ids() will give you the edges from one vertex to another (given using the vp= argument)

V(net)[5] # Rand
+ 1/22 vertex, named, from 32d1bee:
[1] Rand
V(net)[6] # Mat
+ 1/22 vertex, named, from 32d1bee:
[1] Mat
edid <- get.edge.ids(net, vp=c(V(net)[5], V(net)[6]))
edid
[1] 21
E(net)[edid]
+ 1/63 edge from 32d1bee (vertex names):
[1] Rand->Mat

Getting all Incident Edges

incident() is used to get all edges that are incident to a vertex. You can set mode= to decide what to do with directed edges.

incident(net, V(net)[5], mode="in")
+ 7/63 edges from 32d1bee (vertex names):
[1] Moraine ->Rand Marin   ->Rand Nynavene->Rand Mat     ->Rand Perrin  ->Rand
[6] Egwene  ->Rand Tam     ->Rand
incident(net, V(net)[5], mode="out")
+ 5/63 edges from 32d1bee (vertex names):
[1] Rand->Moraine Rand->Mat     Rand->Perrin  Rand->Egwene  Rand->Tam    
incident(net, V(net)[5], mode="all")
+ 12/63 edges from 32d1bee (vertex names):
 [1] Rand    ->Moraine Moraine ->Rand    Marin   ->Rand    Nynavene->Rand   
 [5] Rand    ->Mat     Mat     ->Rand    Rand    ->Perrin  Perrin  ->Rand   
 [9] Rand    ->Egwene  Egwene  ->Rand    Rand    ->Tam     Tam     ->Rand   

Basic Network Statistics

Degree

We set mode="in" for the number of edges pointing towards a node, mode="out" for number pointing out and mode="all" for all.

It can be useful to then add this back to our network data.

degree(net) [1:5] ## Show first 5 so I don't go off screen
 Moraine      Lan    Marin Nynavene     Rand 
      13        4        8       13       12 
mean(degree(net)) # average degree
[1] 5.727273
V(net)$degree <- degree(net)
V(net)$indegree <- degree(net, mode="in")
V(net)$outdegree <- degree(net, mode="out")

Component Membership

We can do the same thing but with membership in different components. We are going to use component.dist() now which returns an object with information about the components.

comps <- components(net, mode="strong")
comps$no #number of components 
[1] 7
comps$membership #Which component each vertex is in
       Moraine            Lan          Marin       Nynavene           Rand 
             4              4              4              4              4 
           Mat         Perrin         Egwene   False Dragon         Eladia 
             4              4              4              3              3 
     Dying.Man            Tam          Laila            Tom     Brandelwyn 
             5              4              4              6              4 
 Cabin.Trolloc          Danya     Matt.s.Mom Matt.s.Sisters     Padan.Fain 
             7              4              4              4              4 
  Town.Trolloc          Daise 
             2              1 
V(net)$components <- comps$membership 

Tangent

One thing I like to do is convert numbers into letters for labels. All R sessions have two variables letters and LETTERS that are the letters from a to z.

letters[c(1, 3, 5)]
[1] "a" "c" "e"
LETTERS[comps$members]
 [1] "D" "D" "D" "D" "D" "D" "D" "D" "C" "C" "E" "D" "D" "F" "D" "G" "D" "D" "D"
[20] "D" "B" "A"

Cutpoints

We can check which vertices are cutpoints using the articulation_points() function.

For directed networks this can only identify cutpoints for weak components.

articulation_points(net)
+ 5/22 vertices, named, from 32d1bee:
[1] Tam      Egwene   Daise    Mat      Nynavene
cut_points <- articulation_points(net)
V(net)$cuts <- FALSE 
V(net)$cuts[cut_points] <- TRUE
V(net)$cuts
 [1] FALSE FALSE FALSE  TRUE FALSE  TRUE FALSE  TRUE FALSE FALSE FALSE  TRUE
[13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE

Bridges

We can also identify bridges, using the bridges() function

For directed networks this only can identify bridges for weak components.

bridges(net)
+ 5/63 edges from 32d1bee (vertex names):
[1] Tam     ->Cabin.Trolloc Egwene  ->Tom           Daise   ->Town.Trolloc 
[4] Daise   ->Egwene        Nynavene->Dying.Man    
bridge_edges <- bridges(net)
E(net)$bridges <- FALSE 
E(net)$bridges[bridge_edges] <- TRUE
E(net)$bridges
 [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[13] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE
[25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[37] FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE
[49] FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[61] FALSE  TRUE  TRUE

Retrieving Data from Networks

We’ve saved a lot of useful information about our vertices in our network. It is often easier to use this info as a dataframe though. We use as_data_frame() to convert a network into a dataframe and set what="vertices" to give us all that wonderful vertex info (we can then save it even).

df_out <- as_data_frame(net, what="vertices")
tail(df_out, 5)
                         name      type degree indegree outdegree components
Matt.s.Mom         Matt.s.Mom Character      3        1         2          4
Matt.s.Sisters Matt.s.Sisters Character      3        2         1          4
Padan.Fain         Padan.Fain Character      2        1         1          4
Town.Trolloc     Town.Trolloc Character      1        1         0          2
Daise                   Daise Character      2        0         2          1
                cuts
Matt.s.Mom     FALSE
Matt.s.Sisters FALSE
Padan.Fain     FALSE
Town.Trolloc   FALSE
Daise           TRUE
write.csv(df_out, "vertices.csv", row.names=F)

Geodesic Distances

Calculating Distances

The last thing for this week is calculating geodesic distances. We use the distances() function to do that. If we set mode="out" then the resulting matrix can be read as the distances from the row to the column.

dist <- distances(net, mode="out")
dist[1:5, 1:5] 
         Moraine Lan Marin Nynavene Rand
Moraine        0   2     1        1    1
Lan            2   0     3        1    2
Marin          1   2     0        1    1
Nynavene       1   1     2        0    1
Rand           1   3     2        2    0

Average Distances

To calculate the average distances we can use the mean_distance() function. It will automatically ignore any disconnected components. We can treat the network as directed or undirected.

mean_distance(net, directed=TRUE)
[1] 2.978261
mean_distance(net, directed=FALSE)
[1] 2.827225

Distances Matrix Issues

One problem with the distances() function is that it returns unconnected distances as Inf which can make it impossible to calculate things. We can use is.infinite() and subsetting to replace that with NA which we can exclude more easily

max(dist, na.rm=T) # BROKEN 
[1] Inf
dist[is.infinite(dist)] <- NA
max(dist, na.rm=T) # not broken
[1] 6

Loading Edge Lists

Another type of network data is an edge list format, where each row shows an edge in the format “start_of_edge, end_of_edge” (or head to tail)

edge <- as_edgelist(net)
edge[1:8,]
     [,1]      [,2]      
[1,] "Moraine" "Lan"     
[2,] "Moraine" "Marin"   
[3,] "Moraine" "Nynavene"
[4,] "Moraine" "Rand"    
[5,] "Moraine" "Mat"     
[6,] "Moraine" "Perrin"  
[7,] "Moraine" "Egwene"  
[8,] "Lan"     "Moraine" 

Edge List Data

Edge lists are common ways of providing network data.

I have data that shows connections between members of the 134th Ohio Senate by the number of bills they cosponsored with each other.

edge_data <- read.csv("network_data/OH_134_cosponsor.csv")
head(edge_data)
            .tail         .head weight
1      Kenny Yuko Jay Hottinger    129
2   Matthew Dolan Jay Hottinger    107
3    Vernon Sykes Jay Hottinger     91
4 Sandra Williams Jay Hottinger     53
5    Teresa Fedor Jay Hottinger     70
6  Robert Hackett Jay Hottinger    136

Edge List Data

We can create the network the same way as before though we want to indicate this is undirected data.

leg_net <- graph_from_data_frame(edge_data, directed=FALSE)
leg_net
IGRAPH 5063edf UNW- 34 560 -- 
+ attr: name (v/c), weight (e/n)
+ edges from 5063edf (vertex names):
 [1] Kenny Yuko      --Jay Hottinger Matthew Dolan   --Jay Hottinger
 [3] Vernon Sykes    --Jay Hottinger Sandra Williams --Jay Hottinger
 [5] Teresa Fedor    --Jay Hottinger Robert Hackett  --Jay Hottinger
 [7] Andrew Brenner  --Jay Hottinger Kristina Roegner--Jay Hottinger
 [9] Terry Johnson   --Jay Hottinger Nickie Antonio  --Jay Hottinger
[11] Bob Peterson    --Jay Hottinger Louis Blessing  --Jay Hottinger
[13] Stephanie Kunze --Jay Hottinger Mark Romanchuk  --Jay Hottinger
[15] Bill Reineke    --Jay Hottinger Hearcel Craig   --Jay Hottinger
+ ... omitted several edges

Adding Vertex Attributes

With an edge list we can also easily add in information about each node/vertex. Here I load a second dataset OH_134_people.csv which has information about each individual. The first column needs to be the vertex names.

vertex_data <- read.csv("network_data/OH_134_people.csv")
leg_net <- graph_from_data_frame(edge_data, directed=FALSE, 
            vertices=vertex_data)
leg_net
IGRAPH 1fa10a5 UNW- 34 560 -- 
+ attr: name (v/c), people_id (v/n), first_name (v/c), middle_name
| (v/c), last_name (v/c), suffix (v/c), nickname (v/c), party_id (v/n),
| party (v/c), role_id (v/n), role (v/c), district (v/c),
| followthemoney_eid (v/n), votesmart_id (v/n), opensecrets_id (v/l),
| ballotpedia (v/c), knowwho_pid (v/n), committee_id (v/n), weight
| (e/n)
+ edges from 1fa10a5 (vertex names):
[1] Jay Hottinger--Kenny Yuko      Matthew Dolan--Jay Hottinger  
[3] Jay Hottinger--Vernon Sykes    Jay Hottinger--Sandra Williams
[5] Jay Hottinger--Teresa Fedor    Jay Hottinger--Robert Hackett 
+ ... omitted several edges

Using ggplot2

ggplot2 is the library for making figures in R. The following slides give you an introduction to how to use it.

Starting a Plot

All plots starts with ggplot() function that includes the dataframe you are going to use as the first argument. By itself it doesn’t do anything.

library(ggplot2)
ggplot(df_out)

Indicating What Variables

The other common part of a ggplot() call is a call to another function aes(). This function maps parts of your dataframe onto parts of the plot. In the below example I map the indegree variable to the x axis and the outdegree variable to the y axis.

ggplot(df_out, aes(x=indegree, y=outdegree))

Indicating What Variables

We use geom_*() functions to actually add things to the plot. To do this we literally add the geom_point() function to our previous call

ggplot(df_out, aes(x=indegree, y=outdegree)) + 
    geom_point()

Changing Aesthetics for Everything

We can change things about the points by adding other arguments to geom_point() (there are a lot of options, scroll to Aesthetics here to see them).

ggplot(df_out, aes(x=indegree, y=outdegree)) + 
    geom_point(color="steelblue4", size=6)

Changing Aesthetics for Somethings

We can also use the aes() function within the geom_point() call to have different aesthetics mapped to our data. The below will change the color of the dots depending on whether they are a cutpoint or not.

ggplot(df_out, aes(x=indegree, y=outdegree)) + 
    geom_point(aes(color=cuts), size=6)

Changing Geoms

The power of ggplot are all the geoms. Lets change geom_point() to geom_jitter() which bumps around the dots, and add on another one: geom_smooth()` to show the trend.

ggplot(df_out, aes(x=indegree, y=outdegree)) + 
    geom_jitter(aes(color=cuts), size=6) + 
    geom_smooth(method="lm")
`geom_smooth()` using formula = 'y ~ x'

Where aes() is matters

Look what happens when you move the color=cuts part to the ggplot() call. Any aes() calls here impact all geoms.

ggplot(df_out, aes(x=indegree, y=outdegree, 
    color=cuts)) + 
    geom_jitter(size=6) + 
    geom_smooth(method="lm")
`geom_smooth()` using formula = 'y ~ x'

Adding Labels

We can add or modify labels to our plot using the labs() function

ggplot(df_out, aes(x=indegree, y=outdegree)) + 
    geom_jitter(aes(color=cuts), size=6) + 
    geom_smooth(method="lm") + 
    labs(y="Out Degree", x="In Degree", 
        title="Out vs In Degree")
`geom_smooth()` using formula = 'y ~ x'

Editing the Scale

There are also scale_*_*() functions that can be used to change the style and labels of the scales.

ggplot(df_out, aes(x=indegree, y=outdegree)) + 
    geom_jitter(aes(color=cuts), size=6) + 
    geom_smooth(method="lm") + 
    labs(y="Out Degree", x="In Degree", 
        title="Out vs In Degree") + 
    scale_color_brewer("Cut Point?", 
        type="qual", palette=2)
`geom_smooth()` using formula = 'y ~ x'

Themes

Finally you can use theme_*() to change the overall style of the plot

ggplot(df_out, aes(x=indegree, y=outdegree)) + 
    geom_jitter(aes(color=cuts), size=6) + geom_smooth(method="lm") + 
    labs(y="Out Degree", x="In Degree", 
        title="Out vs In Degree") + 
    scale_color_brewer("Cut Point?", 
        type="qual", palette=2) + theme_minimal()
`geom_smooth()` using formula = 'y ~ x'

Scaling and Visualization

Doing this in R

There are three main functions we will need for this:

  • mds() from the smacof package.
  • CA() from the FactoMineR package.
  • hclust() a base R function (along with cutree() and dist()).

MDS BScaling

The mds() function in smacof lets us estimate multiple types of MDS:

  • mds(distance, type="interval") - Metric MDS
  • mds(distance, type="ordinal") - Non-Metric MDS

In both cases the distance object needs to be a symmetric dissimilarity (distance) matrix. Larger values mean observations are more different from each other.

You select the number of dimensions using ndim=

Recreating a Map from Distances

R has an object UScitiesD that you can call at anytime which shows the distances between several cities.

# UScitiesD #Run this by itself to see 
library(smacof)
mds <- mds(UScitiesD, type="interval")
mds$stress # The stress
[1] 0.001913151
mds$conf # the actual points
                      D1          D2
Atlanta       -0.4543817  0.09024782
Chicago       -0.2415148 -0.21576888
Denver         0.3051669 -0.01607968
Houston       -0.1021788  0.36232737
LosAngeles     0.7612104  0.24459966
Miami         -0.7164226  0.36679903
NewYork       -0.6783018 -0.32755339
SanFrancisco   0.8982951  0.06986850
Seattle        0.8477661 -0.36291042
Washington.DC -0.6196388 -0.21153000

Recreating a Map from Distances

We can plot this with ggplot2

library(ggplot2)
points <- as.data.frame(mds$conf)
ggplot(points, aes(x=D1, y=D2)) + 
    geom_point() + 
    theme_minimal()

Lets add some labels

points$names <- rownames(points)
ggplot(points, aes(x=D1, y=D2)) + 
    geom_point() + geom_text(aes(label=names)) + 
    theme_minimal()

Flip the scales

ggplot(points, aes(x=-D1, y=-D2)) + 
    geom_text(aes(label=names)) + 
    theme_minimal()

Finding the Number of MDS Dimensions

UN Data

For the next section we are going to use data on UN Votes (this is in an RData format and is already a distance object).

UN_votes <- readRDS("network_data/distance_1960s.RData")

out <- mds(UN_votes, ndim=2, type="ordinal")
out$stress
[1] 0.1247679

Estimating Many Models

If we want to check different levels of stress we could estimate several models like this:

out_1 <- mds(UN_votes, ndim=1, type="ordinal")
out_2 <- mds(UN_votes, ndim=2, type="ordinal")
out_3 <- mds(UN_votes, ndim=3, type="ordinal")
out_4 <- mds(UN_votes, ndim=4, type="ordinal")

This is repetitive, and good code minimizes repetition.

Using a Loop

We can use a for loop to repeat some lines of code multiple times changing things as we do:

Basic idea:

for(ii in 1:5){
    print(ii)
}
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5

More Complicated Loop

We want to create something to store the results and then store it every time we go through the loop.

out <- numeric(5) ## Vector of length 5
for(ii in 1:5){
    out[ii] <- ii
}
print(out)
[1] 1 2 3 4 5

Using a Loop to Estimate Models

We want to create something to store the results and then store it every time we go through the loop.

out <- numeric(5) ## Vector of length 5
for(ii in 1:5){
    tmp <- smacof::mds(UN_votes, ndim=ii, type="ordinal")
    out[ii] <- tmp$stress
}
print(out)
[1] 0.32174980 0.12476788 0.08293623 0.06374924 0.05249708

Scree Plot

I use the basic plot() function to make a scree plot:

plot(x=1:5, y=out, ylab="Stress", 
    xlab="Dimensions", 
    type="b")

Hierarchical Clustering

The hclust() function needs a distance object to work. Our UN_votes data already shows “distances” but isn’t a distance object so we use as.dist() to convert it.

hcl <- hclust(as.dist(UN_votes))
plot(hcl) ## creates a dendrogram 

Methods of Aggregation

You can change the method with the method= argument (“single”, “complete”, “average”)

hcl <- hclust(as.dist(UN_votes), method="single")
plot(hcl) ## creates a dendrogram 

Creating Groupings

The cutree() function “cuts the tree” to make clusters. Set k= to set the number of groups you want.

groups <- cutree(hcl, k=4)
groups[1:20]
AFG ALB DZA ARG AUS AUT BRB BLR BEL BEN BOL BWA BRA BGR BFA BDI CIV KHM CMR CAN 
  1   2   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1   3   1   1 

Correspondence Analysis

For correspondence analysis we use the CA() function, and can set the number of dimensions to estimate with ncp= (feel free to se this to something high)

The data here shows the number of a country’s EU MEPs are in each party group.

df <- read.csv("network_data/EU_seats.csv", row.names=1)
library(FactoMineR)

ca <- CA(df, ncp=5)
Warning: ggrepel: 10 unlabeled data points (too many overlaps). Consider
increasing max.overlaps

Correspondence Analysis - Dimensions

We can see how many dimensions to use by calling output$eig

ca$eig
      eigenvalue percentage of variance cumulative percentage of variance
dim 1 0.29470402              29.073034                          29.07303
dim 2 0.23559433              23.241767                          52.31480
dim 3 0.17732184              17.493090                          69.80789
dim 4 0.12367321              12.200564                          82.00846
dim 5 0.09625459               9.495673                          91.50413
dim 6 0.05514655               5.440298                          96.94443
dim 7 0.03097337               3.055574                         100.00000

Plotting One Set

If we want to extract the scales we use output$row$coord or output$col$coord. We can plot it easily, by converting it data.frame and sticking it into ggplot

countries <- as.data.frame(ca$row$coord)
ggplot(countries, aes(x=`Dim 1`, y=`Dim 2`)) + 
    geom_point(size=5) + theme_minimal() 

Plotting Both

If we want to plot them both we need to combine the two data.frames (using rbind()) giving them labels first:

countries <- as.data.frame(ca$row$coord)
countries$Type <- "Country" 
countries$Name <- rownames(countries)
parties <- as.data.frame(ca$col$coord)
parties$Type <- "Party" 
parties$Name <- rownames(parties)

all <- rbind(countries, parties)

ggplot(all, aes(x=`Dim 1`, y=`Dim 2`, color=Type)) + 
    geom_point(size=5) + theme_minimal() 

Visualizing Networks

Now we are going to move onto visualizing networks.

Visualizations can be useful with network data, but they are also hard to do:

  • We have complicated data.
  • We often want to show multiple “types” of information.

We are going to use the ggraph package.

Benefits:

  • It uses a ggplot2 style interface.
  • It allows a lot of fine-tuning of plots.
  • Has a fair amount of useful online documentation on layouts, nodes, and edges

Cons:

  • It is a bit overly complicated at times.

We are going to use a small canned dataset you can download from the internet: strike.paj

It is a communication network between workers at a sawmill. It also is a unique data format: “pajek” which thankfully igraph has a function for

library(igraph)
net <- read_graph("network_data/strike.paj", format="pajek")
Warning in read.graph.pajek(file, ...): At
src/vendor/cigraph/src/io/pajek-lexer.l:130 : Skipping unknown section
'*Partition' on line 68.
ecount(net)
[1] 38
vcount(net)
[1] 24

Basics of Plot

Just like ggplot2 all visualizations will start with a call to ggraph()

library(ggraph)
theme_set(theme_graph())
ggraph(net)
Using "stress" as default layout
Warning: Existing variables `x` and `y` overwritten by layout variables

Adding Nodes and Edges

To add nodes and edges to this plot we will use geom_node_point() and geom_edge_link()

  • geom_node_point: Adds our nodes as circles
  • geom_edge_link: Adds edges as straight lines (no arrows)
ggraph(net) + geom_node_point(size=6) +
    geom_edge_link()
Using "stress" as default layout
Warning: Existing variables `x` and `y` overwritten by layout variables
Warning: Using the `size` aesthetic in this geom was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` in the `default_aes` field and elsewhere instead.

Layouts

Laying out a plot can impact how useful it is by a lot:

Warning: Existing variables `x` and `y` overwritten by layout variables
Warning: Existing variables `x` and `y` overwritten by layout variables

Layouts Two Broad Approaches:

  • Dimension Reduction: Use multivariate techniques to scale into two dimensions
    • MDS, Pivot Scaling, Eigenvector
  • Force-Directed: Simulates a physical process
    • Fruchterman and Reingold, Kamada and Kawai, Davidson-Harel, DrL, Large Graph Layout (LGL), Graphopt, and GEM

Force-Directed

In most of these layouts they do something like:

  • Each node repulses all other nodes.
  • Edges pull two nodes together.
  • The balance of this is that groups of nodes with lots of connections are close and groups without them are far.

Fruchterman and Reingold Example

FR views vertexes as “atomic particles or celestial bodies, exerting attractive and repulsive forces from one another”.

How does this algorithm work?

  1. Calculate the amount of repulsion between all nodes.
  2. Calculate the amount of attraction between all adjacent nodes.
  3. Move nodes based on the weight of attraction and repulsion, but limit the amount of movement by a temperature.
  4. Reduce the temperature, go back to step 1.

Fruchterman and Reingold Example

set.seed(1)
lo <- create_layout(net, layout="igraph", algorithm="fr", 
    niter=1) 
ggplot(lo) + geom_node_point(size=6) +
    geom_edge_link() + labs(title="1 Iteration")
set.seed(1)
lo <- create_layout(net, layout="igraph", algorithm="fr", 
    niter=2) 
ggplot(lo) + geom_node_point(size=6) +
    geom_edge_link() + labs(title="2 Iterations")
set.seed(1)
lo <- create_layout(net, layout="igraph", algorithm="fr", 
    niter=3) 
ggplot(lo) + geom_node_point(size=6) +
    geom_edge_link() + labs(title="3 Iterations")
set.seed(1)
lo <- create_layout(net, layout="igraph", algorithm="fr", 
    niter=4) 
ggplot(lo) + geom_node_point(size=6) +
    geom_edge_link() + labs(title="4 Iterations")

set.seed(1)

lo <- create_layout(net, layout="igraph", algorithm="fr", 
    niter=10) 
ggplot(lo) + geom_node_point(size=6) +
    geom_edge_link() + labs(title="10 Iteration")
set.seed(1)
lo <- create_layout(net, layout="igraph", algorithm="fr", 
    niter=25) 
ggplot(lo) + geom_node_point(size=6) +
    geom_edge_link() + labs(title="25 Iterations")
set.seed(1)
lo <- create_layout(net, layout="igraph", algorithm="fr", 
    niter=50) 
ggplot(lo) + geom_node_point(size=6) +
    geom_edge_link() + labs(title="50 Iterations")
set.seed(1)
lo <- create_layout(net, layout="igraph", algorithm="fr", 
    niter=100) 
ggplot(lo) + geom_node_point(size=6) +
    geom_edge_link() + labs(title="100 Iterations")

Setting Layouts

To set the layout you set layout= to what you want, you can also pass additional arguments as necessary.

If you want to create the exact same layout every time run set.seed() directly prior to making the plot. This sets the “random seed” that is used.

set.seed(1)
ggraph(graph=net, layout="fr", niter=250) + 
  geom_edge_link() +  geom_node_point(size=6)
Warning: Existing variables `x` and `y` overwritten by layout variables

Large Network - DNC

Network of DNC emails from here.

dnc_net <- read_graph("network_data/dnc.gml", format='gml')
Warning in read.graph.gml(file, ...): At vendor/cigraph/src/io/gml.c:149 : One
or more unknown entities will be returned verbatim (&NewLine;).
ggraph(graph=dnc_net, "graphopt") + 
  geom_edge_link() +  geom_node_point()

Large Network - Only Main Component

We can use the function largest_component() to grab just that part. Also the |> is a pipe which passes on the output.

dnc_net |> largest_component() |> 
    ggraph("fr") + 
    geom_edge_link() +  geom_node_point()

Large Network - No Isolates

Deleting all the isolates using which() and delete_vertices().

isolates <- which(degree(dnc_net)==0)
dnc_net |> delete_vertices(v=isolates) |> 
    ggraph("fr") + geom_edge_link() +  geom_node_point()

Labeling Nodes

We can use geom_node_text() or geom_node_label() to label our nodes.

ggraph(graph=net, "stress") + 
  geom_edge_link() +  
  geom_node_label(aes(label=name)) 
Warning: Existing variables `x` and `y` overwritten by layout variables

They also have a repel=T argument that will move the labels away from the center of the node.

ggraph(graph=net, "stress") + 
  geom_edge_link() +  geom_node_point() +
  geom_node_text(aes(label=name), repel=T) 
Warning: Existing variables `x` and `y` overwritten by layout variables

Vertex Attributes

Vertex attributes are included for a variety of reasons. This includes:

  • Demonstrating who is important in a network.
  • Showing groups in a network.
  • Presenting other relevant details

Scaling by Degree

Often we will scale a node size by a measure of importance, like degree:

V(net)$degree <- degree(net)
ggraph(graph=net, "stress") + 
  geom_edge_link() +  geom_node_point(aes(size=degree)) +
  ggtitle("Sized by Degree") + 
  scale_size("Degree")
Warning: Existing variables `x` and `y` overwritten by layout variables

New Network

This data is of Spanish high school students and includes negative and positive relations. We are going to delete the negative edges.

edges <- read.csv("network_data/spanish_hs_edges.csv")
nodes <- read.csv("network_data/spanish_hs_nodes.csv")
net <- graph_from_data_frame(edges, vertices=nodes, directed=T)
neg_edges <- which(E(net)$weight < 0)
net <- delete_edges(net, neg_edges)
net
IGRAPH fdda5ca DNW- 105 1058 -- 
+ attr: name (v/n), Colegio (v/n), Curso (v/n), Grupo (v/c), Sexo
| (v/c), prosocial (v/n), crttotal (v/n), X_pos (v/c), id (e/n), weight
| (e/n)
+ edges from fdda5ca (vertex names):
 [1] 3043->3047 3043->3087 3043->3093 3043->3065 3043->3097 3043->3044
 [7] 3043->3045 3043->3088 3043->3056 3043->3090 3043->3073 3043->3066
[13] 3043->3060 3043->3092 3043->3096 3043->3077 3043->3084 3043->3105
[19] 3043->3067 3043->3064 3043->3081 3043->3068 3043->3061 3043->3058
[25] 3043->3055 3043->3072 3043->3095 3043->3051 3043->3054 3043->3086
[31] 3043->3085 3043->3089 3043->3047 3043->3048 3043->3049 3043->3050
+ ... omitted several edges

Coloring Vertices

We can color the nodes by setting aes(color=) to a vertex attribute.

ggraph(graph=net, "stress") + 
  geom_edge_link() +    
  geom_node_point(aes(color=Sexo), size=4) +
  ggtitle("Colored by Sex")

Edges

There are a few things we might want to do with our edges:

  • Add arrows for a directed network
  • Show edge attributes.

General

I think it is easier to see a network by making the edges gray.

ggraph(graph=net, "stress") + 
  geom_edge_link(color="gray") +    
  geom_node_point(aes(color=Sexo), size=4) +
  ggtitle("Colored by Sex") 

Adding Arrows

Arrows are annoying to add here, but there is some good help online. We manualy create an arrow (arrow) and manually end them before the node (end_cap)

ggraph(graph=net, "stress") + 
  geom_edge_link(color="gray", 
    arrow = arrow(length = unit(4, 'mm')), 
    end_cap = circle(3, 'mm')) +    
  geom_node_point(aes(color=Sexo), size=4) 

Adding Attributes

Finally we can assign edge attributes to aesthetics

ggraph(graph=net, "stress") + 
  geom_edge_link(color="gray", aes(width=weight), 
    arrow = arrow(length = unit(4, 'mm')), 
    end_cap = circle(3, 'mm')) +    
  geom_node_point(aes(color=Sexo), size=4) 

Multiple

The default for ggraph is to show only a single edge when there are two mutual edges. We can change that by using geom_edge_fan()

ggraph(graph=net, "stress") + 
  geom_edge_fan(aes(color=weight),
    arrow = arrow(length = unit(4, 'mm')), 
    end_cap = circle(3, 'mm')) +    
  geom_node_point(aes(color=Sexo), size=4) 

Network Centrality

Ohio Network

This data shows the relationships between donors in the Ohio legislature. You can download the edgelist here and the meta data here

library(igraph)
edges <- read.csv("network_data/edge_OH.csv")
nodes <- read.csv("network_data/meta_OH.csv")
net <- graph_from_data_frame(edges, directed=F, vertices=nodes)

Functions

  • Degree: degree()
  • Eigenvector: eigen_centrality()
  • Beta: power_centrality() set the beta value with exponent=
  • Closeness: closeness()
  • Harmonic Mean: harmonic_centrality()
  • Betweenness: betweenness()
  • K-Step: ego_size()

Degree, Eigenvector, and Beta - Undirected

These are generally relatively simple to use by eigen_centrality() outputs a whole object with information. To get just the centralities use $vector

V(net)$degree <-  degree(net)
tmp <- eigen_centrality(net)
V(net)$eigen <-  tmp$vector
tmp$value ## The eigenvalue 
[1] 115.055
V(net)$beta <-  power_centrality(net, exponent=0.008)

Plotting Relationship between Total and Centrality

stats <- as_data_frame(net, what="vertices")
ggplot(stats, aes(x=Total, y=eigen)) + 
    geom_point() + theme_minimal() + 
    geom_smooth(method="lm") + 
    scale_x_log10()
`geom_smooth()` using formula = 'y ~ x'

Plotting Relationship between Org Type Centrality

ggplot(stats, aes(x=CatCodeGroup, y=eigen)) + 
    geom_boxplot() + theme_minimal() + 
    theme(axis.text.x = element_text(angle = 90, hjust = 1))

Betweenness and Closeness

Again, for the undirected unweighted version there isn’t a lot too these:

V(net)$between <-  betweenness(net, normalized=T)
V(net)$close <-  closeness(net, normalized=T)
V(net)$harmonic <- harmonic_centrality(net, normalized=T)
V(net)$step_2 <- ego_size(net, 2) # Two step
V(net)$step_3 <- ego_size(net, 3) # Three step
ggraph(net) + geom_edge_link(color="gray") + 
    geom_node_point(aes(size=between))
Using "stress" as default layout

Weighted

Things get a bit more complicated if we want to use edge weights, the data we have has something we can use to weight the edges:

Creating Matrix

E(net)$weight <- ifelse(E(net)$edge == "Strong", 2, 1)

Weighted Degree

Weighted degree is the strength() function.

weighted_degree <- strength(net)
regular_degree <- degree(net)
plot(x=regular_degree, y=weighted_degree)

Weighted Eigenvector Centrality

If there is a weight edge attribute it will automatically use it, but you can turn this off by setting weights=NA

weighted_eigen <- eigen_centrality(net)$vector
regular_eigen <- eigen_centrality(net, weights=NA)$vector
plot(x=regular_eigen, y=weighted_eigen)

Directed Networks

For directed networks you can calculate directed versions of theses measures but it isn’t always consistent now:

  • eigen_centrality() have to set directed=T and automatically provides the “in-centrality”. If you want out-centrality put your network in t(): eigen_centrality(t(net), directed=T)
  • power_centrality() will automatically do a directed measure if you give it a directed network. Calculates the “out-centrality” use t() to calculate the “in-centrality”

Whole Network Statistics

Lets use the Spanish High School Network you used earlier this week:

library(igraph)
edges <- read.csv("network_data/spanish_hs_edges.csv")
nodes <- read.csv("network_data/spanish_hs_nodes.csv")
net <- graph_from_data_frame(edges, vertices=nodes, directed=T)
net <- delete_edges(net, E(net)[E(net)$weight <= 0])

Density

Density can be calculated with edge_density() There is an option to turn on loops (edges that start and stop at the same node)

edge_density(net)
[1] 0.09688645

Average Degree

Degree can be calculated with degree() and then average it up with mean()

# note that this has gmode, instead of mode. why? 
deg <- degree(net)
mean(deg)
[1] 20.15238

Component ratio:

Component ratio is not in the SNA package, but the formula is pretty simple:

\[ \textnormal{Component Ratio} = \frac{c-1}{n-1} \]

So we need to calculate the number of components (c), and the number of nodes (n).

comps <- components(net, mode="strong")$no
n <- vcount(net)

(comps-1)/(n-1)
[1] 0.2788462

Connectedness

There is no native function to calculate connectedness but we have all the tools:

dist <- distances(net, mode="out") ## Get the distances matrix
reachability <- is.finite(dist) ## Where are there finite paths?
diag(reachability) <- NA ## Ignore the diag
sum(reachability, na.rm=T) / (vcount(net) * (vcount(net) - 1))
[1] 0.7168498

Compactness

Again, no native function but we can get it relatively quickly in a similar manner:

dist <- distances(net, mode="out") ## Get the distances matrix
diag(dist) <- NA ## Ignore the diag
sum(1/dist, na.rm=T) / (vcount(net) * (vcount(net) - 1))
[1] 0.2914342

Reciprocity

We calculate reciprocity with the reciprocity() function.

The default is to calculate arc-reciprocity, but we can switch it to the other (less good) measure by setting mode="ratio"

reciprocity(net)
[1] 0.389414
edge_density(net)
[1] 0.09688645

Transitivity:

There is a transitivity() function, which works well with undirected networks. For directed networks we have to make use of the sna library and the gtrans() function.

transitivity(net) ## Treats as undirected
[1] 0.4361186
mat_net <- as_adjacency_matrix(net, sparse=F) # Convert to adj mat
sna::gtrans(mat_net) ## Call SNA function
[1] 0.4617656

Centralization

Finally, we have centralization. You can use a lot of different measures of centrality to measure centralization and so there are a suite of functions:

  • center_degree() - Uses degree centrality
  • center_eigen() - Uses eigen centrality
  • center_betw() - Uses betweenness

They also all output an object with multiple things to get the centralization score use $centralization

centr_betw(net)$centralization
[1] 0.09744318
centr_degree(net)$centralization
[1] 0.144878
centr_eigen(net)$centralization
[1] 0.7185371

Clusters

R Work

  • Find cliques
  • Identify overlaps/analyze cliques
  • Implement different cluster methods
  • Access information about clusters

We are going to use the Cocaine Mambo network today.

df <- read.csv("network_data/COCAINE_MAMBO.csv", row.names=1)
mat <- as.matrix(df)
mat[mat>1] <- 1 # there is a weird cell with a 2, not a 1. 
coke_net <- graph_from_adjacency_matrix(mat, mode="undirected")
Warning: The `adjmatrix` argument of `graph_from_adjacency_matrix()` must be symmetric
with mode = "undirected" as of igraph 1.6.0.
ℹ Use mode = "max" to achieve the original behavior.

Cliques

There are several igraph functions for cliques. We want to use max_cliques()

max_cliques() returns a list of the cliques in your network with each item in that list a clique.

cl <- max_cliques(coke_net, min=3)
length(cl) # number of cliques 
[1] 15
cl[[1]] # the first clique 
+ 3/31 vertices, named, from 755ae35:
[1] ARG  JDD  JPPM

Who is in each clique

If we want to look at who is in each clique we are going to have to process this list a little. In particular we will make a matrix showing who is in each clique:

# Create a matixi of 0s 
cl_mat <- matrix(0, nrow=vcount(coke_net), ncol=length(cl))

# Use a for loop to step through the cliques
for(ii in seq_along(cl)){
    cl_mat[cl[[ii]] , ii] <- 1
}
# We can add rownames
rownames(cl_mat) <- V(coke_net)$name
# Size of each clique
colSums(cl_mat)
 [1] 3 3 3 3 3 4 3 4 4 3 3 4 4 4 4
# Number of cliques each individual is in
rowSums(cl_mat)
   A   AP  ARG CPGT EDRS  ELM  ERC   FR   GA HAMS  HPM   JB  JDD   JE JEAB   JG 
   0    1    1    2    0    5    1    2    0    3    1    1    3    0    2    1 
JHLM JJTE   JO JPPM   JV LIRP   OS PCCP RAJH   RB   RS SNRM   VR  WGV YPMG 
   1    2    0   10    0    1    1    2    0    0    3    7    0    0    2 

Number of Cliques

We can calculate the number of cliques each individual is in and put that back on the original data:

V(coke_net)$numb_cliques <- rowSums(cl_mat)

ggraph(coke_net) + geom_edge_link(color="gray") + 
    geom_node_point(size=8, 
    aes(color=numb_cliques))
Using "stress" as default layout

Clique Overlap

With that matrix we can see easily how many cliques each person is in with every other person:

# Multiply the matrix by its transpose 
overlap_mat <- cl_mat %*% t(cl_mat)
overlap_net <- graph_from_adjacency_matrix(overlap_mat, diag=F, 
    mode="undirected", weighted=T)
# ignores the diag, has it undirected,
# and uses the values (overlaps) as weights
ggraph(overlap_net, "fr") + 
    geom_edge_link(color="gray", aes(width=weight)) + 
    geom_node_point(size=8, color="purple")

Clusters

There are several functions we will use to find clusters:

  • Grivan-Newman/Edge Betweenness: cluster_edge_betweenness()
  • Fast-Greedy: cluster_fast_greedy()
  • Walktrap: cluster_walktrap()
  • Louvain: cluster_louvain()

Each function is used in a similar way and returns something similar.

Example with Fast-Greedy

All you have to do is give it the network.

clust <- cluster_fast_greedy(coke_net)

clust
IGRAPH clustering fast greedy, groups: 5, mod: 0.39
+ groups:
  $`1`
  [1] "ARG"  "CPGT" "FR"   "GA"   "JDD"  "JPPM" "OS"   "PCCP"
  
  $`2`
  [1] "EDRS" "JB"   "JE"   "JO"   "SNRM" "VR"   "WGV" 
  
  $`3`
  [1] "HAMS" "JG"   "JJTE" "JV"   "RAJH" "RB"   "YPMG"
  
  $`4`
  + ... omitted several groups/vertices

Accessing things

You can access the modularity scores for each new clustering level with $modularity and $membership provides the membership of each vertex in the clustering level with the highest modularity

clust$membership
 [1] 4 4 1 1 2 4 5 1 1 3 4 2 1 2 5 3 5 3 2 1 3 4 1 1 3 3 4 2 2 2 3
clust$modularity
 [1] -5.960166e-02 -4.295482e-02 -1.085018e-02  5.499405e-03  2.170036e-02
 [6]  3.775268e-02  5.380499e-02  6.970868e-02  8.546373e-02  1.053805e-01
[11]  1.336207e-01  1.544293e-01  1.728597e-01  1.878716e-01  2.028835e-01
[16]  2.266647e-01  2.535672e-01  2.702140e-01  2.850773e-01  2.997919e-01
[21]  3.143579e-01  3.287753e-01  3.430440e-01  3.573127e-01  3.757432e-01
[26]  3.840666e-01  3.862961e-01  3.834721e-01  3.745541e-01  3.567182e-01
[31] -5.551115e-17

Plotting Clusters

I personally like giving each cluster a letter so we can do the following:

V(coke_net)$Cluster <- LETTERS[clust$membership]

ggraph(coke_net) + geom_edge_link(color="gray") + 
    geom_node_point(size=8, 
    aes(color=Cluster))
Using "stress" as default layout

Extracting Data

We can use as_data_frame() to access the vertex data again from our network

vertex_data <- igraph::as_data_frame(coke_net, what="vertices")
vertex_data[1:5,]
     name numb_cliques Cluster
A       A            0       D
AP     AP            1       D
ARG   ARG            1       A
CPGT CPGT            2       A
EDRS EDRS            0       B
#write.csv(vertex_data, "tmp.csv", row.names=F)

Dendrogram

Finally you can also plot the dendrogram (if it is applicable)

plot(as.dendrogram(clust))

Structural Similarity and Blockmodeling

  • How to calculate structural similarity.
  • How to do dyad level comparisons.
  • How to create a blockmodel from structural similarity.
  • How to directly blockmodel.

The data we will use is a london gang network.

net <- read_graph("network_data/london.graphml", format="graphml")
net
IGRAPH 638f4bf UN-- 54 1010 -- 
+ attr: name (v/c), Age (v/n), Birthplace (v/n), Residence (v/n),
| Arrests (v/n), Convictions (v/n), Prison (v/n), Music (v/n), Ranking
| (v/n), id (v/c)
+ edges from 638f4bf (vertex names):
 [1] X1--X2  X1--X3  X1--X4  X1--X4  X1--X5  X1--X6  X1--X7  X1--X7  X1--X8 
[10] X1--X8  X1--X8  X1--X9  X1--X9  X1--X10 X1--X10 X1--X11 X1--X11 X1--X11
[19] X1--X12 X1--X17 X1--X18 X1--X20 X1--X20 X1--X21 X1--X21 X1--X22 X1--X22
[28] X1--X22 X1--X23 X1--X25 X1--X27 X1--X28 X1--X29 X1--X43 X1--X45 X1--X46
[37] X1--X51 X1--X2  X2--X3  X2--X3  X2--X3  X2--X6  X2--X6  X2--X7  X2--X8 
[46] X2--X8  X2--X9  X2--X10 X2--X10 X2--X11 X2--X11 X2--X12 X2--X13 X2--X14
+ ... omitted several edges

Structural similarity

The sedist() function from the blocmodeling package calculate structural similarity.

sedist() accepts for arguments:

  • M= the adjacency matrix.
  • method= A method for calculating the distance (anything in the dist() function can be used)
    • "euclidean" - Calculate the euclidean distance
    • "cor' - Calculate the Pearson correlation.
  • hand.interaction= If you want to play around with how it deals with the interactions (don’t worry about this).

It outputs a dist matrix.

library(blockmodeling)
To cite package 'blockmodeling' in publications please use package
citation and (at least) one of the articles:

  Žiberna, Aleš (2007). Generalized blockmodeling of valued networks.
  Social Networks 29(1), 105-126.

  Žiberna, Aleš (2008). Direct and indirect approaches to blockmodeling
  of valued networks in terms of regular equivalence. Journal of
  Mathematical Sociology 32(1), 57–84.

  Žiberna, Aleš (2014). Blockmodeling of multilevel networks. Social
  Networks 39, 46–61. https://doi.org/10.1016/j.socnet.2014.04.002.

  Žiberna, Aleš (2023).  Generalized and Classical Blockmodeling of
  Valued Networks, R package version 1.1.5.
  Cugmas, Marjan (2023).  Generalized and Classical Blockmodeling of
  Valued Networks, R package version 1.1.5.

To see these entries in BibTeX format, use 'format(<citation>,
bibtex=TRUE)', or 'toBibtex(.)'.
## Convert to an adjacency matrix 
mat <- as_adjacency_matrix(net, sparse=F)
## Calculate structural similarity using euclidean
london_se <- sedist(mat, method="euclidean")
## Look at first 5 rows/columns 
as.matrix(london_se)[1:5,1:5]
         X1       X2       X3        X4        X5
X1  0.00000 21.35416 25.29822 27.276363 26.076810
X2 21.35416  0.00000 26.38181 24.657656 24.000000
X3 25.29822 26.38181  0.00000 22.090722 22.090722
X4 27.27636 24.65766 22.09072  0.000000  8.944272
X5 26.07681 24.00000 22.09072  8.944272  0.000000

Comparing it

Previously we tested whether structural similarity related to ranking similarity, lets look at it for age similarity instead now.

What we need then is the distance in ages for all dyads in our network. We can use the dist() function for this.

If we give dist() a single vector it will calculate the pairwise distance between all items in the vector

## Calculate the distance in ages
age_dist <- dist(V(net)$Age, method="euclidean")
## Check the value (person 1 and 2 are the same age)
## Person 1 and 3 are off by one year. 
as.matrix(age_dist)[1:5, 1:5]
  1 2 3 4 5
1 0 0 1 1 4
2 0 0 1 1 4
3 1 1 0 2 5
4 1 1 2 0 3
5 4 4 5 3 0

Converting to Vectors

What we need though is a full vector of ages and distances so we can directly compare them.

If we put a matrix into c() it will convert it to a vector. Example:

test_mat <- matrix(1:4, nrow=2, ncol=2)
test_mat
     [,1] [,2]
[1,]    1    3
[2,]    2    4
c(test_mat)
[1] 1 2 3 4

Creating a Dataframe

We can do the same thing to our distance matrix.

data <- data.frame("Age_Dist"=c(age_dist),
                   "SE_Dist"=c(london_se))
data[1:5,]
  Age_Dist  SE_Dist
1        0 21.35416
2        1 25.29822
3        1 27.27636
4        4 26.07681
5        5 24.97999

Testing

We can create a scatter plot or run a correlation (cor()):

cor(data$Age_Dist, data$SE_Dist, use="complete.obs")
[1] -0.02831001
ggplot(data, aes(x=Age_Dist, y=SE_Dist)) +
  geom_jitter() + theme_minimal() + 
  labs(y="Struct Distance", 
       x="Age Distance")

Creating Clusters and Blockmodels

We can also hierarchical cluster the output from sedist() just as we did before:

london_se <- sedist(mat, method="euclidean")
clu <- hclust(london_se) 
plot(clu)

We can also hierarchical cluster the output from sedist() just as we did before.

Remember k= sets the number of clusters you want, while h= says how far down the y axis to go to create the clusters (only set one)

memberships <- cutree(clu, k=4)
memberships[1:5]
X1 X2 X3 X4 X5 
 1  1  2  2  2 

Using as a Blockmodel

We can use this to look at it as a blockmodel. There are a few ways to do this:

  • The plotMat() function creates matrix plot showing the blocks, and coloring the squares
    • M= is the adjacency matrix
    • clu= is the memberships in each cluster.
    • orderClu= can be used to order the clusters their total value (though this can be wonky if there are ties)
    • mar= sets the margins of the plots (the default are gigantic)
  • The funByBlocks() applies a function to each block. The default is to calculate the mean() of each block.
    • M= and clu= (same as above)
    • FUN= the function you want to use.

Plotting Blockmodel

plotMat(M=mat, clu=memberships, orderClu=T, 
        mar=c(0.1, 0.1, 0.1, 0.1))
plotMat(M=mat>0, clu=memberships, orderClu=T,
        mar=c(0.1, 0.1, 0.1, 0.1))

Calculating a version of the image matrix

I use mat >0 here as the matrix has weights and this just switches it only 1 and 0.

funByBlocks(M=mat>0, clu=memberships)
          1         2         3         4
1 0.7692308 0.4175824 0.1615385 0.4807692
2 0.4175824 1.0000000 0.1619048 0.2142857
3 0.1615385 0.1619048 0.1080460 0.1250000
4 0.4807692 0.2142857 0.1250000 1.0000000
funByBlocks(M=mat>0, clu=memberships, FUN=median, na.rm=T)
  1 2 3 4
1 1 0 0 0
2 0 1 0 0
3 0 0 0 0
4 0 0 0 1

Direct Blockmodel Optimization

We can find a blockmodel by direct optimization using optRandomParC(). This function can do a lot, we are going to use it with a limited set of options.

  • M= The matrix
  • k= The number of clusters to identify
  • approaches="bin" This sets it to use the binary algorithm.
  • blocks=c("nul", "com") Allows for null or complete blocks.
  • rep= The number of random starts to use (should be at least 100)
  • nCores= The number of cores on your computer to use (defaults to 1)
    • If you want to run this in parallel you can set this to 2 or 4, you might need to install the doParallel and doRNG packages.

Example

It will automatically display the number of solutions and errors (for the best)

blocks <- optRandomParC(M=mat, k= 4, approaches="bin", 
                        blocks=c("nul", "com"), 
                        rep=500, nCores=4)
Loading required namespace: doRNG


Optimization of all partitions completed
2 solution(s) with minimal error = 384 found. 

Looking at the Structure

# The image matrix
IM(blocks)
     [,1]  [,2]  [,3]  [,4] 
[1,] "nul" "nul" "nul" "nul"
[2,] "nul" "com" "nul" "com"
[3,] "nul" "nul" "com" "com"
[4,] "nul" "com" "com" "com"
# The error matrix
EM(blocks)
     [,1] [,2] [,3] [,4]
[1,]   76   46   37   35
[2,]   46   10   12    9
[3,]   37   12    2    8
[4,]   35    9    8    2

Accessing Clusters

You can either use clu() or orderClu() to access the clusters. orderClu() simple reorders the clusters (similar as above)

clu(blocks)[1:10]
 [1] 4 4 4 3 3 3 4 2 2 2
orderClu(blocks)[1:10]
4 4 4 3 3 3 4 2 2 2 
1 1 1 2 2 2 1 3 3 3 

Plotting

You can plot it as is

plot(x=blocks, mar=c(0.1, 0.1, 0.1, 0.1))

Or ordered

plot(x=blocks, mar=c(0.1, 0.1, 0.1, 0.1), orderClu=T)

Bipartite Networks

Loading in R

For igraph the graph_from_biadjacency_matrix() function can be used to convert the matrix into a bipartite network. We are using a network of interlocking boards from Scotland. You can download it here.

library(igraph)
library(ggraph)

inc_df <- read.csv("network_data/scotland.csv", row.names=1)
inc_mat <- as.matrix(inc_df)

net <- graph_from_biadjacency_matrix(inc_mat)

net
IGRAPH cbaad78 UN-B 244 358 -- 
+ attr: type (v/l), name (v/c)
+ edges from cbaad78 (vertex names):
 [1] earl of dalkeith--north.british.railway         
 [2] earl of dalkeith--forth.bridge.railway          
 [3] earl of dalkeith--scottish.widows.fund.life.ass.
 [4] carlow, c.      --north.british.railway         
 [5] carlow, c.      --fife.coal                     
 [6] carlow, c.      --royal.bank.of.scotland        
 [7] gilroy, a.(b.)  --north.british.railway         
 [8] gilroy, a.(b.)  --victoria.jute                 
+ ... omitted several edges

Type

The important thing that sets this as a bipartite network is that there is a nodal attribute type that defines which node is in which group.

table(V(net)$type)

FALSE  TRUE 
  136   108 

This is useful, but also frustrating, as it doesn’t tell you what is what. I recommend creating a new attribute with better labels

V(net)$label <- NA
# I know which is which because 
# I know there are 109 companies. 
V(net)$label[V(net)$type] <- "Company"
V(net)$label[!V(net)$type] <- "Board Member"

table(V(net)$label)

Board Member      Company 
         136          108 

Plotting Bipartite Networks

There is a layout for bipartite networks (I don’t like it)

ggraph(net, "bipartite") + 
    geom_edge_link(color="gray") + 
    geom_node_point(size=3, aes(color=label))

Bipartite Statistics

Average Degree

Lets start by calculating the degree for each group independently this is easy by subsetting the results our label (or the type) attribute

corp_degree <- degree(net)[V(net)$label == "Company"]
member_degree <- degree(net)[V(net)$label != "Company"]

mean(corp_degree)
[1] 3.314815
mean(member_degree)
[1] 2.632353

Density

For density, we need the number of edges, and the number of each type:

table(V(net)$type)

FALSE  TRUE 
  136   108 
ecount(net)/(136*108)
[1] 0.02437364

Other Statistics

There is no easy way to calculate the 4-cycle closures measure in R (or UCINET for what I can tell).

Projecting

Projecting can be pretty easy:

## Get the "true"
company_net <- bipartite_projection(net, which="true")

## Get the "false"
members_net <- bipartite_projection(net, which="false")

members_net
IGRAPH fefd4dd UNW- 136 678 -- 
+ attr: name (v/c), label (v/c), weight (e/n)
+ edges from fefd4dd (vertex names):
 [1] earl of dalkeith--carlow, c.               
 [2] earl of dalkeith--gilroy, a.(b.)           
 [3] earl of dalkeith--grierson, h.             
 [4] earl of dalkeith--mccosh, a.k.             
 [5] earl of dalkeith--macpherson, h.s.         
 [6] earl of dalkeith--simpson, a.              
 [7] earl of dalkeith--younger, h.g.            
 [8] earl of dalkeith--lord balfour of burleigh 
+ ... omitted several edges

One-Mode Network and Plot

We can then plot this

ggraph(members_net, "kk") + 
    geom_edge_link(color="gray", aes(width=weight)) + 
    geom_node_point(size=3)

Pruning Projection

First identify what the distribution of edge weights looks like.

hist(E(members_net)$weight)

Lets drop any edge with a weight less than 1 (THIS IS A BAD IDEA FOR THIS NETWORK)

We can use which() to identify which edges are at a certain threshold and delete.edges() to… delete edges

edge_to_delete <- which(E(members_net)$weight < 2)
pruned_net <- delete.edges(members_net, edge_to_delete)
Warning: `delete.edges()` was deprecated in igraph 2.0.0.
ℹ Please use `delete_edges()` instead.
ggraph(pruned_net, "kk") + 
    geom_edge_link(color="gray", aes(width=weight)) + 
    geom_node_point(size=3)

Bonacich’s Metric

So this isn’t a super common metric, so I wrote an R function you can download. You can use it easily by calling the file with the code through source()

It works to the matrix, so we create that with as_biadjacency_matrix()

source("network_data/project_bonac.r")
# you will now have a project_bonac() function 

mat <- as_biadjacency_matrix(net)
bonac_normalized <- project_bonac(mat)
bonac_memb_net <- graph_from_adjacency_matrix(bonac_normalized, 
    diag=F, mode="undirected", weight=T)
ggraph(bonac_memb_net, "kk") + 
    geom_edge_link(color="gray", aes(width=weight)) + 
    geom_node_point(size=3) + scale_edge_width(range=c(0.5, 2))

Clustering Bipartite Network

If you want to cluster the bipartite network using the Dual Louvain method then you have to do some work to get it back into the right spot:

clust_1 <- cluster_louvain(members_net)
clust_2 <- cluster_louvain(company_net)

## Put each label in the right spot
node_select <- V(net)$label=="Board Member"
V(net)$dual_cluster[node_select] <- letters[clust_1$membership]

node_select <- V(net)$label=="Company"
V(net)$dual_cluster[node_select] <- LETTERS[clust_2$membership]
# To clean things up I'm going to drop
# the isolates
plot_net <- delete.vertices(net, degree(net) == 0)
Warning: `delete.vertices()` was deprecated in igraph 2.0.0.
ℹ Please use `delete_vertices()` instead.
ggraph(plot_net, "kk") + 
    geom_edge_link(color="gray") + 
    geom_node_point(size=3, 
    aes(color=dual_cluster, shape=label)) 

Quadratic Assignment Procedure (QAP)

The actual running of QAP based regression and correlation isn’t that hard but getting the data into the right format can be.

For dyadic QAP tests we need to have matrices/networks that represent all of our variables of interest.

The sna package has (most of the) QAP functions.

We are going to use three networks and two other datasets.

library(igraph)

setwd("network_data")
mids_df <- read.csv("cow/mids_net.csv")
nmc_data <- read.csv("cow/nmc_vertex.csv")
mids_net <- graph_from_data_frame(mids_df, 
        vertices=nmc_data, directed=F)

trade_df <- read.csv("cow/trade_net.csv")
trade_net <- graph_from_data_frame(trade_df,
        vertices=nmc_data, directed=F)

cont_df <- read.csv("cow/cont_net.csv")
cont_net <- graph_from_data_frame(cont_df, 
        vertices=nmc_data, directed=F)

SNA Package and Matrices

Most of what we will use is in the SNA package. The SNA package accepts matrices, so we can convert everything into matrices.

Remember, we set sparse=F and if we want to use weights we set attr="weight" (we don’t have weights here).

mids_mat <- as_adjacency_matrix(mids_net,sparse=F)

trade_mat <- as_adjacency_matrix(trade_net, sparse=F)

cont_mat <- as_adjacency_matrix(cont_net, sparse=F)

Correlation

gcor() will correlate two matrices, but doesn’t give us a p-value hypothesis test.

Set mode="graph" if you have an undirected network, or mode="digraph" if you have a directed network.

library(sna)
Loading required package: statnet.common

Attaching package: 'statnet.common'
The following objects are masked from 'package:base':

    attr, order
Loading required package: network

'network' 1.18.2 (2023-12-04), part of the Statnet Project
* 'news(package="network")' for changes since last version
* 'citation("network")' for citation information
* 'https://statnet.org' for help, support, and other information

Attaching package: 'network'
The following objects are masked from 'package:igraph':

    %c%, %s%, add.edges, add.vertices, delete.edges, delete.vertices,
    get.edge.attribute, get.edges, get.vertex.attribute, is.bipartite,
    is.directed, list.edge.attributes, list.vertex.attributes,
    set.edge.attribute, set.vertex.attribute
sna: Tools for Social Network Analysis
Version 2.7-2 created on 2023-12-05.
copyright (c) 2005, Carter T. Butts, University of California-Irvine
 For citation information, type citation("sna").
 Type help(package="sna") to get started.

Attaching package: 'sna'
The following object is masked from 'package:blockmodeling':

    sedist
The following objects are masked from 'package:igraph':

    betweenness, bonpow, closeness, components, degree, dyad.census,
    evcent, hierarchy, is.connected, neighborhood, triad.census
gcor(cont_mat, trade_mat, mode="graph")
[1] 0.2254307

Hypothesis Testing Correlation

To test the hypothesis we need: gaptest() which takes a list of networks, the function you want to use to calculate the test statistics, and two annoying indicators (g1=1, g2=2). Finally, reps= will tell it how many permutations to do.

qap <- qaptest(list(cont_mat, trade_mat), 
    gcor, g1=1, g2=2, reps=1000, mode="graph")

You can call summary() on your output to get details

  • p(f(perm) >= f(d): The proportion of permuted values that are greater than or equal to your real value. A one-sided greater than p-value
  • p(f(perm) <= f(d)): The proportion of permuted values that are less than or equal to your real value. A one-sided less than p-value.
summary(qap)

QAP Test Results

Estimated p-values:
    p(f(perm) >= f(d)): 0 
    p(f(perm) <= f(d)): 1 

Test Diagnostics:
    Test Value (f(d)): 0.2254307 
    Replications: 1000 
    Distribution Summary:
        Min:     -0.01351818 
        1stQ:    -0.005134005 
        Med:     -0.0009419196 
        Mean:    -0.000174768 
        3rdQ:    0.003250166 
        Max:     0.02840268 

If you are curious you can access all the different correlations found through the permutation/QAP test with $dist and the observed value is $testval

sum(abs(qap$dist) < qap$testval) # Two-sided p-value
[1] 1000
hist(qap$dist)

QAP Regression

We can use netlogit() or netlm() to calculate our regression.

It takes:

  • y= our DV (as a matrix or network)
  • x= a list of independent variables (as matrices or networks)
  • reps= the number of reps (for not set this at like 10).
  • nullhyp= If we want to do Y-permutations ("qap") or semi-partial ("qapspp")

We already have our trade network, and contiguity network, so we can just put them into this:

mod <- netlogit(y=mids_mat,
        x=list(trade_mat, cont_mat), 
        reps=100, nullhyp="qapspp")

To get the results just call the object

mod

Network Logit Model

Coefficients:
            Estimate  Exp(b)       Pr(<=b) Pr(>=b) Pr(>=|b|)
(intercept) -5.553377   0.00387435 1.00    0.00    1.00     
x1           0.341634   1.40724509 0.86    0.14    0.33     
x2           4.620454 101.54015727 1.00    0.00    0.00     

Goodness of Fit Statistics:

Null deviance: 50308.62 on 36290 degrees of freedom
Residual deviance: 2578.148 on 36287 degrees of freedom
Chi-Squared test of fit improvement:
     47730.47 on 3 degrees of freedom, p-value 0 
AIC: 2584.148   BIC: 2609.646 
Pseudo-R^2 Measures:
    (Dn-Dr)/(Dn-Dr+dfn): 0.5680815 
    (Dn-Dr)/Dn: 0.9487534 

Each row shows estimated statistics for the variable.

  • Label (x1)
  • The coefficient estimate (0.34)
  • The exponentiation of the coefficient (1.40), this is helpful sometimes for logistic regression.
  • Pr(<=b) One-sided less than hypothesis p-value.
  • Pr(>=b) One-sided greater than hypothesis p-value.
  • Pr(>=|b|) Two-sided hypothesis p-value.

Adding in More Interesting Variables

We talked about creating variables that indicate where two nodes are equal to, or the distance between two nodes, or the summation of two nodes… How? The outer() function.

outer() takes in two vectors, and an operation, and then does that function to every pairwise combination.

test_vector <- c(1,3,5)
outer(test_vector, test_vector, "+")
     [,1] [,2] [,3]
[1,]    2    4    6
[2,]    4    6    8
[3,]    6    8   10

We can do this also with differences, but we should note that a-b is different than b-a so for undirected data we want to take the absolute value.

test_vector <- c(1,3,5)
outer(test_vector, test_vector, "-")
     [,1] [,2] [,3]
[1,]    0   -2   -4
[2,]    2    0   -2
[3,]    4    2    0
outer(test_vector, test_vector, "-") |> abs()
     [,1] [,2] [,3]
[1,]    0    2    4
[2,]    2    0    2
[3,]    4    2    0

It is often interesting to look at if two nodes are the same as other nodes. We can do that with the function "==" in outer()

test_vector <- c("Miami", "OSU", "OU")
outer(test_vector, test_vector, "==")
      [,1]  [,2]  [,3]
[1,]  TRUE FALSE FALSE
[2,] FALSE  TRUE FALSE
[3,] FALSE FALSE  TRUE

Difference in Military Personnel

So to add the difference in Mil Personnel we need to extract that vector from the network, clean it up slightly, put it through outer() and then take the absolute value of that.

mil_cap <- V(mids_net)$milper
mil_cap[mil_cap==-9] <- 0 # Replacing missing values with 0s. 
mil_cap_mat <- outer(mil_cap, mil_cap, "-")
mil_cap_mat <- (abs(mil_cap_mat))

We can add that matrix to the netlogit function just as we did with the networks before.

mod <- netlogit(y=mids_mat,
        x=list(trade_mat, cont_mat, mil_cap_mat), 
        reps=100)
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred

Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
summary(mod)

Network Logit Model

Coefficients:
            Estimate      Exp(b)      Pr(<=b) Pr(>=b) Pr(>=|b|)
(intercept) -5.7848114241  0.00307389 1.00    0.00    1.00     
x1           0.3666349780  1.44287114 0.87    0.13    0.24     
x2           4.5827769946 97.78556750 1.00    0.00    0.00     
x3           0.0008590974  1.00085947 1.00    0.00    0.00     

Goodness of Fit Statistics:

Null deviance: 50308.62 on 36290 degrees of freedom
Residual deviance: 2509.59 on 36286 degrees of freedom
Chi-Squared test of fit improvement:
     47799.03 on 4 degrees of freedom, p-value 0 
AIC: 2517.59    BIC: 2551.587 
Pseudo-R^2 Measures:
    (Dn-Dr)/(Dn-Dr+dfn): 0.5684336 
    (Dn-Dr)/Dn: 0.9501161 
Contingency Table (predicted (rows) x actual (cols)):

         Actual
Predicted       0       1
        0   35942     314
        1      22      12

    Total Fraction Correct: 0.9907413 
    Fraction Predicted 1s Correct: 0.3529412 
    Fraction Predicted 0s Correct: 0.9913394 
    False Negative Rate: 0.9631902 
    False Positive Rate: 0.0006117228 

Test Diagnostics:

    Null Hypothesis: qap 
    Replications: 100 
    Distribution Summary:

       (intercept)        x1        x2        x3
Min      -92.54464  -2.63579  -2.35812  -4.43309
1stQ     -89.01781  -1.04673  -1.07821  -1.95423
Median   -88.30635  -0.43733  -0.05183  -0.38320
Mean     -88.33197  -0.11787   0.03580  -0.09013
3rdQ     -87.53640   0.76717   0.78712   1.33367
Max      -83.63469   5.23985   4.63217   8.70260

Second Example - Spanish High School

We are going to load the high school data we had before and then delete the nodes that are have missing information.

edges <- read.csv("network_data/spanish_hs_edges.csv")
nodes <- read.csv("network_data/spanish_hs_nodes.csv")
net <- graph_from_data_frame(edges, vertices=nodes, directed=T)
net <- delete_vertices(net, which(is.na(V(net)$prosocial)))

Model - DV

We want to look at what leads an individual to indicate that they have a better relationship with someone.

The DV is the rating the sender gave the relationship (-2 to +2), with 0 meaning no relationship.

Model - IV

Our dyadic independent variables:

  • Are the sender and the receiver the same sex?
  • Are the sender and the receiver in the same group?
  • Is the sender male?
  • Is the receiver male?
  • Does the sender have a higher sociability score?
  • Does the receiver have a higher sociability score?

The dependent variable is easy enough using as_adjacency_matrix() and extracting the weight attribute. We’ve seen how to identify if the two nodes are the same before.

dv_mat <- as_adjacency_matrix(net, attr="weight", sparse=F)
same_sex <- outer(V(net)$Sexo, V(net)$Sexo, "==")
same_group <- outer(V(net)$Grupo, V(net)$Grupo, "==")

To create a ‘sender is male’ matrix we can compare the nodal sex with a vector of just "Male".

Remember outer() goes through each item in the first argument along the rows, so for sender we put the node’s first and we switch the order for the receiver matrix.

send_male <- outer(V(net)$Sexo, rep("Male", length(V(net))), "==")
rec_male <- outer(rep("Male", length(V(net))), V(net)$Sexo, "==")

We can do something similar with the prosocial variable but just multiply it by 1.

send_soc <- outer(V(net)$prosocial, rep(1, length(V(net))), "*")
rec_soc <- outer(rep(1, length(V(net))), V(net)$prosocial, "*")

If you are confused by what we are doing try running:

outer(c(1.4, 0.4, 0.1), rep(1, 3), "*")
outer(rep(1, 3), c(1.4, 0.4, 0.1), "*")

Estimation

We can estimate this just like before using netlm(). mode="digraph" is the default and indicates a directed network.

mod <- netlm(y=dv_mat, x=list(same_sex, 
                same_group,
                send_male, rec_male, 
                rec_soc, send_soc), 
         mode="digraph")
mod

OLS Network Model

Coefficients:
            Estimate     Pr(<=b) Pr(>=b) Pr(>=|b|)
(intercept) -0.054080439 0.193   0.807   0.381    
x1           0.106753064 1.000   0.000   0.000    
x2           0.391130533 1.000   0.000   0.000    
x3          -0.005801325 0.416   0.584   0.865    
x4           0.040534712 0.979   0.021   0.044    
x5          -0.024541180 0.265   0.735   0.554    
x6           0.029253310 0.655   0.345   0.662    

Residual standard error: 0.5489 on 6313 degrees of freedom
F-statistic: 114.3 on 6 and 6313 degrees of freedom, p-value:     0 
Multiple R-squared: 0.09797     Adjusted R-squared: 0.09711 

Explanation

The coefficient on same sex (x1) and same group (x21) are both statistically significant and positive. In particular, if the receiver is in the same group as the sender then they are likely to rank them 0.39 points higher, and if they are the same sex then they are likely to rank them 0.11 higher.

The coefficient on receiver male has a two-sided p-value of 0.044 and so is also statistically significant. This indicates that male students are more likely to receive a higher rating than female students (although the effect is relatively small as the coefficient is 0.04).

Comparison

If you want to compare it to the version without QAP you can set nullhyp="classic".

You’ll see that the two-sided p-values are generally smaller in this case (except for the ones that are exactly 0).

mod_wrong <- netlm(y=dv_mat, x=list(same_sex, 
                same_group,
                send_male, rec_male, 
                rec_soc, send_soc), 
         mode="digraph", nullhyp="classic")
mod_wrong

OLS Network Model

Coefficients:
            Estimate     Pr(<=b)    Pr(>=b)       Pr(>=|b|)    
(intercept) -0.054080439 0.03956882  9.604312e-01  7.913763e-02
x1           0.106753064 1.00000000  1.766833e-14  3.533667e-14
x2           0.391130533 1.00000000 9.981165e-126 1.996233e-125
x3          -0.005801325 0.34136768  6.586323e-01  6.827354e-01
x4           0.040534712 0.99784800  2.151996e-03  4.303991e-03
x5          -0.024541180 0.20131319  7.986868e-01  4.026264e-01
x6           0.029253310 0.84077019  1.592298e-01  3.184596e-01

Residual standard error: 0.5489 on 6313 degrees of freedom
F-statistic: 114.3 on 6 and 6313 degrees of freedom, p-value:     0 
Multiple R-squared: 0.09797     Adjusted R-squared: 0.09711 

Nodal Tests with Permutations

For nodal tests we need another library (I know right), this one is permuco so run install.packages("permuco").

What do we need to do to test what leads to having more trading partners?

  1. Calculate the degree centrality on our trade network.
  2. Extract all that wonderful nodal info into a dataframe.
  3. Add in the democracy data.
  4. Run our permutation regression.

(If degree() is freaking out it might be a clash between sna and igraph fix it with igraph::degree())

V(trade_net)$degree <- igraph::degree(trade_net)
trade_df <- as_data_frame(trade_net, what="vertices")
setwd("network_data")
democracy_df <- read.csv("cow/democracy_data.csv")

Merging Data

To merge two datasets we use the merge() function:

  • x= The first dataset.
  • y= The second dataset.
  • by.x= The column from the first dataset to use to merge.
  • by.y= The column from the second dataset to use to merge.
all_data <- merge(trade_df, democracy_df, 
            by.x="name", 
            by.y="COWcode")

Estimating Regression

Running the regression is pretty simple, and is similar to what you would do with normal linear regression in R.

lmperm() is the function, it takes in a formula, a dataset, and a number of permutations (lets do 50 so this doesn’t take forever).

library(permuco)

mod <- lmperm(degree~v2x_polyarchy + milper,
        data=all_data, np=50)
Warning in lmperm_fix(formula = formula, data = data, method = method, type =
type, : The number of permutations is below 2000, p-values might be unreliable.
Warning in check_distribution(distribution = distribution, digits = 10, : the
distribution of v2x_polyarchy, milper may be discrete.

Results

The results are similarly formatted.

summary(mod)
Table of marginal t-test of the betas
Resampling test using freedman_lane to handle nuisance variables and 50 permutations.
               Estimate Std. Error  t value parametric Pr(>|t|)
(Intercept)   -0.046836  0.4700088 -0.09965           9.207e-01
v2x_polyarchy  3.265364  0.7930987  4.11722           6.017e-05
milper         0.003611  0.0007406  4.87514           2.515e-06
              resampled Pr(<t) resampled Pr(>t) resampled Pr(>|t|)
(Intercept)                                                       
v2x_polyarchy                1             0.02               0.02
milper                       1             0.02               0.02

Exponential Random Graph Models (ERGM)

We are going to go back to the MIDs data we were using before (scroll above for info on it)

library(network)
library(ergm)
Registered S3 methods overwritten by 'ergm':
  method               from 
  simulate.formula     lme4 
  simulate.formula_lhs lme4 
  summary.formula      Hmisc

'ergm' 4.6.0 (2023-12-17), part of the Statnet Project
* 'news(package="ergm")' for changes since last version
* 'citation("ergm")' for citation information
* 'https://statnet.org' for help, support, and other information
'ergm' 4 is a major update that introduces some backwards-incompatible
changes. Please type 'news(package="ergm")' for a list of major
changes.

Attaching package: 'ergm'
The following object is masked from 'package:statnet.common':

    snctrl
library(intergraph)
setwd("network_data")

mids_df <- read.csv("cow/mids_net.csv")
nmc_data <- read.csv("cow/nmc_vertex.csv")
mids_net <- graph_from_data_frame(mids_df, 
        vertices=nmc_data, directed=F)
V(mids_net)$milper[V(mids_net)$milper<0] <- 0
mids_net <- asNetwork(mids_net)

trade_df <- read.csv("cow/trade_net.csv")
trade_net <- graph_from_data_frame(trade_df,
        vertices=nmc_data, directed=F)
trade_net <- asNetwork(trade_net)

cont_df <- read.csv("cow/cont_net.csv")
cont_net <- graph_from_data_frame(cont_df, 
        vertices=nmc_data, directed=F)
cont_net <- asNetwork(cont_net)

Modeling MIDS

We started by incorporating our other networks into the model, we can do this here by putting those network objects into dyadcov().

mod <- ergm(mids_net ~ edges + 
            dyadcov(trade_net) + 
            dyadcov(cont_net) )
Starting maximum pseudolikelihood estimation (MPLE):
Obtaining the responsible dyads.
Evaluating the predictor and response matrix.
Maximizing the pseudolikelihood.
Finished MPLE.
Evaluating log-likelihood at the estimate. 
summary(mod)
Call:
ergm(formula = mids_net ~ edges + dyadcov(trade_net) + dyadcov(cont_net))

Maximum Likelihood Results:

                  Estimate Std. Error MCMC % z value Pr(>|z|)    
edges              -5.5534     0.1207      0 -46.026   <1e-04 ***
dyadcov.trade_net   0.3416     0.2993      0   1.141    0.254    
dyadcov.cont_net    4.6205     0.1813      0  25.479   <1e-04 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

     Null Deviance: 25154  on 18145  degrees of freedom
 Residual Deviance:  1289  on 18142  degrees of freedom
 
AIC: 1295  BIC: 1318  (Smaller is better. MC Std. Err. = 0)

Adding in Dyadic-Independent Terms

We can add in a variety of dyadic-independent terms:

  • nodematch() will add a term indicating that two nodes match on a categorical attribute.
  • nodecov() will add the dyadic sum of the attribute.
  • nodefactor() will add a count of how many types each level of an attribute shows up in a dyad.
    • Example: If you have an attribute that is “Region” with the levels “Northwest”, “West Coast”, etc. You could have a dyad where both nodes are in the “Northwest” and so this dyad would have a 2 on the “Northwest” level and 0 on other levels, if one node was “West Coast” and the other “Northwest” the it would have a 1 on both of those and 0 on all others. This is equivalent to adding indicator variables from a factor.
  • absdiff() will add a term that is the absolute difference between the nodes on whatever variable you include.

We will start by accounting for the absolute difference in military personnel.

mod <- ergm(mids_net ~ edges + 
            dyadcov(trade_net) + 
            dyadcov(cont_net) + 
            absdiff("milper") )
Starting maximum pseudolikelihood estimation (MPLE):
Obtaining the responsible dyads.
Evaluating the predictor and response matrix.
Maximizing the pseudolikelihood.
Finished MPLE.
Evaluating log-likelihood at the estimate. 

The coefficients are going to be the same as QAP, but the p-values aren’t. We haven’t included any network variables here.

summary(mod)
Call:
ergm(formula = mids_net ~ edges + dyadcov(trade_net) + dyadcov(cont_net) + 
    absdiff("milper"))

Maximum Likelihood Results:

                    Estimate Std. Error MCMC % z value Pr(>|z|)    
edges             -5.7848114  0.1328795      0 -43.534   <1e-04 ***
dyadcov.trade_net  0.3666350  0.3015442      0   1.216    0.224    
dyadcov.cont_net   4.5827770  0.1845116      0  24.837   <1e-04 ***
absdiff.milper     0.0008591  0.0001316      0   6.530   <1e-04 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

     Null Deviance: 25154  on 18145  degrees of freedom
 Residual Deviance:  1255  on 18141  degrees of freedom
 
AIC: 1263  BIC: 1294  (Smaller is better. MC Std. Err. = 0)

Including Dyad Dependent Terms

The two gw-terms we’ve learned:

  • gwdegree()captures the tendency towards transitivity. Positive values mean open triads are more likely to close.
  • gwesp() captures a tendency towards having equal or unequal distributions of edges. Positive values mean edges are more equally distributed, negative values mean edges tend to go to those who already have edges (popularity).

Both can have their decay set to 0 with decay=0, fixed=T.

full_mod <- ergm(mids_net ~ edges + dyadcov(trade_net) + 
            dyadcov(cont_net) + nodecov("milper") +
            gwdegree(decay=0, fixed=T) + 
            gwesp(decay=0, fixed=T) )
Starting maximum pseudolikelihood estimation (MPLE):
Obtaining the responsible dyads.
Evaluating the predictor and response matrix.
Maximizing the pseudolikelihood.
Finished MPLE.
Starting Monte Carlo maximum likelihood estimation (MCMLE):
Iteration 1 of at most 60:
Optimizing with step length 0.1218.
The log-likelihood improved by 2.0849.
Estimating equations are not within tolerance region.
Iteration 2 of at most 60:
Optimizing with step length 0.1449.
The log-likelihood improved by 1.8646.
Estimating equations are not within tolerance region.
Iteration 3 of at most 60:
Optimizing with step length 0.1308.
The log-likelihood improved by 1.8698.
Estimating equations are not within tolerance region.
Iteration 4 of at most 60:
Optimizing with step length 0.4978.
The log-likelihood improved by 2.9557.
Estimating equations are not within tolerance region.
Iteration 5 of at most 60:
Optimizing with step length 0.9212.
The log-likelihood improved by 2.3475.
Estimating equations are not within tolerance region.
Iteration 6 of at most 60:
Optimizing with step length 1.0000.
The log-likelihood improved by 0.1328.
Estimating equations are not within tolerance region.
Iteration 7 of at most 60:
Optimizing with step length 1.0000.
The log-likelihood improved by 0.0764.
Convergence test p-value: 0.0577. Not converged with 99% confidence; increasing sample size.
Iteration 8 of at most 60:
Optimizing with step length 1.0000.
The log-likelihood improved by 0.0353.
Convergence test p-value: 0.8188. Not converged with 99% confidence; increasing sample size.
Iteration 9 of at most 60:
Optimizing with step length 1.0000.
The log-likelihood improved by 0.0316.
Convergence test p-value: 0.1981. Not converged with 99% confidence; increasing sample size.
Iteration 10 of at most 60:
Optimizing with step length 1.0000.
The log-likelihood improved by 0.0166.
Convergence test p-value: 0.1883. Not converged with 99% confidence; increasing sample size.
Iteration 11 of at most 60:
Optimizing with step length 1.0000.
The log-likelihood improved by 0.0123.
Convergence test p-value: 0.0042. Converged with 99% confidence.
Finished MCMLE.
Evaluating log-likelihood at the estimate. Fitting the dyad-independent submodel...
Bridging between the dyad-independent submodel and the full model...
Setting up bridge sampling...
Using 16 bridges: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 .
Bridging finished.

This model was fit using MCMC.  To examine model diagnostics and check
for degeneracy, use the mcmc.diagnostics() function.
summary(full_mod)
Call:
ergm(formula = mids_net ~ edges + dyadcov(trade_net) + dyadcov(cont_net) + 
    nodecov("milper") + gwdegree(decay = 0, fixed = T) + gwesp(decay = 0, 
    fixed = T))

Monte Carlo Maximum Likelihood Results:

                    Estimate Std. Error MCMC % z value Pr(>|z|)    
edges             -5.4664006  0.1834414      0 -29.799   <1e-04 ***
dyadcov.trade_net  0.2838874  0.2630876      0   1.079    0.281    
dyadcov.cont_net   4.0072176  0.1818470      0  22.036   <1e-04 ***
nodecov.milper     0.0006911  0.0001079      0   6.403   <1e-04 ***
gwdeg.fixed.0     -1.0343249  0.2488827      0  -4.156   <1e-04 ***
gwesp.fixed.0      0.8095523  0.1512802      0   5.351   <1e-04 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

     Null Deviance: 25154  on 18145  degrees of freedom
 Residual Deviance:  1123  on 18139  degrees of freedom
 
AIC: 1135  BIC: 1182  (Smaller is better. MC Std. Err. = 0.459)

What does this mean?

  • There is a positive relationship between contiguity and MID. Countries that are contiguous are much more likely to go have a dispute with each other.
  • Countries with dissimilar military capabilities are more likely to have a dispute than countries with similar military capabilities.
  • There is a tendency for countries that get into disputes to get into disputes often (gwdegree)
  • There is a tendency for transitivity in disputes as well. Disputes seem to cluster around certain sets of countries. (gwesp)

Model Fit

If you look at the ergm documentation, there are a lot of network variables. This leads to the question. Can we identify if we have a good (or bad) model?

One way to check this is to see if our model does a good job simulating a network with similar characteristics.

This can be done through the gof() function and then plot().

Checking Model Fit

gof() takes in a model object, simulates a large number of networks, and then calculates descriptive statistics on it.

The boxplots show the distribution of descriptive statistics for the simulated networks. The black line shows the statistic for the observed network. A model that fits well is a model where the black line is generally within the gray boxplots.

  • Top left: Shows the descriptive statistics that are related to all all the parameters you’ve included.
  • Top right: Shows the degree distribution
  • Bottom left: edge-wise shared partners distribution
  • Bottom right: distribution of geodesic distances.

Different Example - High Schools

Lets look at trade networks in particular, lets see if democracies are more likely to trade with other democracies.

Need to do some cleaning to get the data in:

edges <- read.csv("network_data/spanish_hs_edges.csv")
nodes <- read.csv("network_data/spanish_hs_nodes.csv")
net <- graph_from_data_frame(edges, vertices=nodes, directed=T)
net <- delete_vertices(net, which(is.na(V(net)$prosocial)))
net <- delete_edges(net, which(E(net)$weight < 0))
net <- as.undirected(net,   mode="mutual")

net <- asNetwork(net)

Run our simple model.

mod <- ergm(net ~ edges + nodematch("Sexo") + nodematch("Grupo") + 
            nodefactor("Sexo") + nodefactor("Grupo") +
            nodecov("prosocial"))
Starting maximum pseudolikelihood estimation (MPLE):
Obtaining the responsible dyads.
Evaluating the predictor and response matrix.
Maximizing the pseudolikelihood.
Finished MPLE.
Evaluating log-likelihood at the estimate. 
summary(mod)
Call:
ergm(formula = net ~ edges + nodematch("Sexo") + nodematch("Grupo") + 
    nodefactor("Sexo") + nodefactor("Grupo") + nodecov("prosocial"))

Maximum Likelihood Results:

                     Estimate Std. Error MCMC % z value Pr(>|z|)    
edges                -4.61398    0.38530      0 -11.975   <1e-04 ***
nodematch.Sexo        1.22256    0.18073      0   6.765   <1e-04 ***
nodematch.Grupo       2.95815    0.19446      0  15.212   <1e-04 ***
nodefactor.Sexo.Male  0.10895    0.11254      0   0.968   0.3330    
nodefactor.Grupo.B   -0.09376    0.11821      0  -0.793   0.4277    
nodefactor.Grupo.C   -0.15182    0.13449      0  -1.129   0.2590    
nodefactor.Grupo.D    0.23911    0.14274      0   1.675   0.0939 .  
nodecov.prosocial    -0.35101    0.23933      0  -1.467   0.1425    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

     Null Deviance: 4381  on 3160  degrees of freedom
 Residual Deviance: 1108  on 3152  degrees of freedom
 
AIC: 1124  BIC: 1173  (Smaller is better. MC Std. Err. = 0)

How well does it fit?

est_gof <- gof(mod)
par(mfrow=c(2,2)) # Creates a 2 by 2
plot(est_gof)

Can we improve it?

Lets add in dyadic dependent terms, starting with a fixed decay of 0.

set.seed(1)
mod <- ergm(net ~ edges +nodematch("Sexo") + nodematch("Grupo") + 
            nodefactor("Sexo") + nodefactor("Grupo") +
            nodecov("prosocial") + 
    gwesp(decay=0, fixed=T) + gwdegree(0, fixed=T))
est_gof <- gof(mod)
par(mfrow=c(2,2)) 
plot(est_gof)

Changing the decay a bit?

mod <- ergm(net ~ edges +nodematch("Sexo") + nodematch("Grupo") + 
            nodefactor("Sexo") + nodefactor("Grupo") +
            nodecov("prosocial") + 
    gwesp(decay=.25, fixed=T) + gwdegree(.25, fixed=T))
est_gof <- gof(mod)
par(mfrow=c(2,2)) 
plot(est_gof)

mod <- ergm(net ~ edges +nodematch("Sexo") + nodematch("Grupo") + 
            nodefactor("Sexo") + nodefactor("Grupo") +
            nodecov("prosocial") + 
    gwesp(decay=.5, fixed=T) + gwdegree(.5, fixed=T))
est_gof <- gof(mod)
par(mfrow=c(2,2)) 
plot(est_gof)

mod <- ergm(net ~ edges +nodematch("Sexo") + nodematch("Grupo") + 
            nodefactor("Sexo") + nodefactor("Grupo") +
            nodecov("prosocial") + 
    gwesp(decay=.75, fixed=T) + gwdegree(.75, fixed=T))
est_gof <- gof(mod)
par(mfrow=c(2,2)) 
plot(est_gof)

summary(mod)
Call:
ergm(formula = net ~ edges + nodematch("Sexo") + nodematch("Grupo") + 
    nodefactor("Sexo") + nodefactor("Grupo") + nodecov("prosocial") + 
    gwesp(decay = 0.75, fixed = T) + gwdegree(0.75, fixed = T))

Monte Carlo Maximum Likelihood Results:

                      Estimate Std. Error MCMC % z value Pr(>|z|)    
edges                -5.940007   0.351806      0 -16.884  < 1e-04 ***
nodematch.Sexo        0.724214   0.121896      0   5.941  < 1e-04 ***
nodematch.Grupo       1.408892   0.147166      0   9.573  < 1e-04 ***
nodefactor.Sexo.Male  0.034010   0.067383      0   0.505  0.61375    
nodefactor.Grupo.B   -0.005018   0.056520      0  -0.089  0.92925    
nodefactor.Grupo.C   -0.048045   0.070391      0  -0.683  0.49490    
nodefactor.Grupo.D    0.136707   0.081334      0   1.681  0.09280 .  
nodecov.prosocial    -0.192379   0.168620      0  -1.141  0.25391    
gwesp.fixed.0.75      1.316061   0.136495      0   9.642  < 1e-04 ***
gwdeg.fixed.0.75      1.075639   0.384633      0   2.797  0.00517 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

     Null Deviance: 4380.7  on 3160  degrees of freedom
 Residual Deviance:  952.1  on 3150  degrees of freedom
 
AIC: 972.1  BIC: 1033  (Smaller is better. MC Std. Err. = 0.4174)

Interpretation

We can look at each term here in kind:

  • edges This is equivalent to the intercept of a normal regression and so generally is not worth discussing.
  • nodematch.Sexo This shows the effect of two students having the same sex. The positive coefficient (0.72) is statistically significant (p-value < 0.05). This means that two students are more likely to have a friendship (an edge) if they are of the same sex.
  • nodematch.Grupo This shows the effect of two students being in same group. This coefficient is against positive (1.48) and is significant (p-value < 0.05) meaning that students in the same group are more likely to be friends.
  • nodefactor.Sexo.Male This shows whether male students are more likely to have friends than female students. It is not significant so there is no difference.
  • nodefactor.Grupo.B, nodefactor.Grupo.C, and nodefactor.Grupo.D show whether students in group B, C, and D are more likely to have friends than those in group A. This is again, not significant, so there is no difference.
  • nodecov.prosocial This shows whether the overall total amount of “prosocial” behavior increases a friendship between two people. There is again, no effect.
  • gwesp.fixed.0.75 This shows how much transitivity matters to the network structure. The coefficient is positive (1.32) and statistically significant (p-value < 0.05) meaning that triangles are likely to form in this network (more than compared to a random network). In practice this indicates that transitivity happens in this networks.
  • gwdeg.fixed.0.75 This models the distribution of degrees, the positive (1.08) and significant (p-value <0.05) means that there is a “rich get richer” aspect to this. The distribution of degrees tends towards a few with a large number of ties and many with few ties. (If it was negative we’d expect a more uniform distribution of degrees).