```
<- matrix(c(0, 1, 0,
mat 0, 0, 0,
1, 1, 0),
nrow=3, ncol=3,
byrow=T)
mat
```

```
[,1] [,2] [,3]
[1,] 0 1 0
[2,] 0 0 0
[3,] 1 1 0
```

For anyone discovering this, the code below here is pulled from quarto slides so there are a *lot* of headers. I urge you to use the Table of Contents on the right to navigate this. The beginning of each section *should* indicate what packages are being used. In general the following packages are used:

- igraph: To do most of the network work here.
- ggraph: To visualize the networks.
- smacof: for MDS scaling.

- FactoMineR: for correspondence analysis.
- blockmodeling: For blockmodeling and structural equivalence.
- sna: A few random things that igraph cannot do and QAP
- ergm: For ERGMs (not currently on this page).
- ggplot2: Used throughout for visualizations.
- permuco: Used for an easy way to do permutation tests with regression (you could use the more standard boot library but this would require more coding)
- intergraph: Used to translate between igraph and network objects.

The course this was written for followed Analyzing Social Networks in R but did not make use of the xUCINET package as it appears to be no longer maintained (it is not on CRAN).

One basic format for network data is the matrix. You can make basic matrices using the `matrix()`

function:

```
<- matrix(c(0, 1, 0,
mat 0, 0, 0,
1, 1, 0),
nrow=3, ncol=3,
byrow=T)
mat
```

```
[,1] [,2] [,3]
[1,] 0 1 0
[2,] 0 0 0
[3,] 1 1 0
```

The `rownames()`

and `colnames()`

functions are used access or set the row names and column names.

```
rownames(mat) <- c("JoJo", "Frida", "Kevin ")
colnames(mat) <- rownames(mat)
mat
```

```
JoJo Frida Kevin
JoJo 0 1 0
Frida 0 0 0
Kevin 1 1 0
```

Although we can do a lot with basic matrices, what we really want to create is an *object* that has a the properties of a network.

The `igraph`

package has a `graph_from_adjacency_matrix()`

function used to create network data. It turns a matrix into an `igraph`

object.

`library(igraph)`

```
Attaching package: 'igraph'
```

```
The following objects are masked from 'package:stats':
decompose, spectrum
```

```
The following object is masked from 'package:base':
union
```

`<- graph_from_adjacency_matrix(mat, mode="directed") net `

`graph_from_adjacency_matrix()`

There are some options we can set:

`mode=`

`"directed"`

directed network`"undirected"`

undirected, using upper triangle to make

`weighted=`

`NULL`

(default) the numbers in matrix give the*number*of edges between`TRUE`

creates edge weights.

`diag=`

where to include the diagonal, set to`FALSE`

to ignore diagonals.`add.colnames=`

`NULL`

(default) use column names as the vertex names.`NA`

ignore the column names.

We can call our network object to then see some information about it.

` net`

```
IGRAPH a75aeb6 DN-- 3 3 --
+ attr: name (v/c)
+ edges from a75aeb6 (vertex names):
[1] JoJo ->Frida Kevin ->JoJo Kevin ->Frida
```

This shows the the attributes for the vertices (name) and some of the edges using the names of the vertices.

Lets load some more interesting data. We need to read in a csv (creating a data frame) of an adjacency csv. Because we know we will make this into an adjacency matrix we set `row.names=1`

to convert the first column of the csv into row names.

The data is here and is an adjacency matrix based on interactions in the first epsiode of Amazon Prime’s Wheel of Time.

```
<- read.csv("network_data/wheel_of_time_ep1.csv", row.names=1)
net_mat # setting row.names=1, turns the first column into rownames
1:5, 1:5] # look at first 5 rows and cols net_mat[
```

```
Moraine Lan Marin Nynavene Rand
Moraine 0 4 1 1 1
Lan 4 0 0 1 0
Marin 1 0 0 1 1
Nynavene 1 1 0 0 1
Rand 1 0 0 0 0
```

We again use `graph_from_adjacency_matrix`

to convert to an igraph object, but first put `net_mat`

into `as.matrix()`

. Why do we set `weighted=TRUE`

?

```
library(igraph)
<- graph_from_adjacency_matrix(as.matrix(net_mat),
net mode="directed", weighted=TRUE)
```

If we ever want to go back to the adjacency matrix we can:

```
<- as.matrix(net)
mat 1:5,1:5] mat[
```

```
5 x 5 sparse Matrix of class "dgCMatrix"
Moraine Lan Marin Nynavene Rand
Moraine . 1 1 1 1
Lan 1 . . 1 .
Marin 1 . . 1 1
Nynavene 1 1 . . 1
Rand 1 . . . .
```

The iGraph object actually has the adjacency matrix always there

`1:5,1:5] net[`

```
5 x 5 sparse Matrix of class "dgCMatrix"
Moraine Lan Marin Nynavene Rand
Moraine . 4 1 1 1
Lan 4 . . 1 .
Marin 1 . . 1 1
Nynavene 1 1 . . 1
Rand 1 . . . .
```

What would we want to know about our network as a whole?

- Number of edges, number of nodes?
- Number of components?

`ecount()`

counts the number of edges, `vcount()`

the number of vertices

`ecount(net)`

`[1] 63`

`vcount(net)`

`[1] 22`

. . .

`edge_density()`

the proportion of possible edges that exist.

`edge_density(net)`

`[1] 0.1363636`

The `count_components()`

function returns the number of components within a network.

`count_components(net)`

`[1] 2`

For directed data it defaults to weak components, we can also switch to strong components using `mode=`

:

`count_components(net, mode="strong")`

`[1] 7`

We are going to work on this more later, but it is useful for us to be able to quickly visualize our data (and check components).

`plot(net)`

We can also create an undirected network out of a directed network using `as.undirected()`

. The way it creates these depends on `mode=`

.

```
<- as.undirected(net, mode="mutual")
und_net ## Undirected edge exists only if both have an edge
as.matrix(und_net)[1:5,1:5]
```

```
5 x 5 sparse Matrix of class "dgCMatrix"
Moraine Lan Marin Nynavene Rand
Moraine . 1 1 1 1
Lan 1 . . 1 .
Marin 1 . . . .
Nynavene 1 1 . . .
Rand 1 . . . .
```

Switching it to `mode="collapse"`

```
<- as.undirected(net, mode="collapse")
und_net ## Undirected edge exists if either have an edge
as.matrix(und_net)[1:5,1:5]
```

```
5 x 5 sparse Matrix of class "dgCMatrix"
Moraine Lan Marin Nynavene Rand
Moraine . 1 1 1 1
Lan 1 . . 1 .
Marin 1 . . 1 1
Nynavene 1 1 1 . 1
Rand 1 . 1 1 .
```

We use the `V()`

and `E()`

functions to access the vertices or edges of our network (note the capitalization)

`V(net)`

```
+ 22/22 vertices, named, from 32d1bee:
[1] Moraine Lan Marin Nynavene Rand
[6] Mat Perrin Egwene False.Dragon Eladia
[11] Dying.Man Tam Laila Tom Brandelwyn
[16] Cabin.Trolloc Danya Matt.s.Mom Matt.s.Sisters Padan.Fain
[21] Town.Trolloc Daise
```

`E(net)`

```
+ 63/63 edges from 32d1bee (vertex names):
[1] Moraine ->Lan Moraine ->Marin Moraine ->Nynavene
[4] Moraine ->Rand Moraine ->Mat Moraine ->Perrin
[7] Moraine ->Egwene Lan ->Moraine Lan ->Nynavene
[10] Marin ->Moraine Marin ->Nynavene Marin ->Rand
[13] Marin ->Egwene Nynavene->Moraine Nynavene->Lan
[16] Nynavene->Rand Nynavene->Perrin Nynavene->Egwene
[19] Nynavene->Dying.Man Rand ->Moraine Rand ->Mat
[22] Rand ->Perrin Rand ->Egwene Rand ->Tam
[25] Mat ->Moraine Mat ->Rand Mat ->Perrin
[28] Mat ->Danya Mat ->Matt.s.Mom Mat ->Matt.s.Sisters
+ ... omitted several edges
```

We can have attributes at three different levels of the network: whole network, vertex, and edges. We can get the full list of the different attributes using `*_attr_names()`

functions:

`vertex_attr_names(net)`

`[1] "name"`

`edge_attr_names(net)`

`[1] "weight"`

`graph_attr_names(net)`

`character(0)`

There are several weights to access a vertex or edge attribute, but the easiest one is to something like: `V(net)$attr`

where `attr`

is the name of the attribute. You can do this with `E()`

as well

`V(net)$name`

```
[1] "Moraine" "Lan" "Marin" "Nynavene"
[5] "Rand" "Mat" "Perrin" "Egwene"
[9] "False.Dragon" "Eladia" "Dying.Man" "Tam"
[13] "Laila" "Tom" "Brandelwyn" "Cabin.Trolloc"
[17] "Danya" "Matt.s.Mom" "Matt.s.Sisters" "Padan.Fain"
[21] "Town.Trolloc" "Daise"
```

`E(net)$weight`

```
[1] 4 1 1 1 1 1 1 4 1 1 1 1 2 1 1 1 1 3 1 1 3 4 4 3 1 3 4 1 2 2 1 1 3 3 4 1 1 4
[39] 4 1 1 1 1 1 1 1 1 3 1 1 1 1 1 1 1 2 1 2 1 2 1 1 1
```

You modify attributes in a way similar to how dataframes are modified in R: `E(net)$attr <- "Good"`

```
V(net)$type <- "Character"
V(net)$type
```

```
[1] "Character" "Character" "Character" "Character" "Character" "Character"
[7] "Character" "Character" "Character" "Character" "Character" "Character"
[13] "Character" "Character" "Character" "Character" "Character" "Character"
[19] "Character" "Character" "Character" "Character"
```

Each vertex and edge can be indexed using `[ ]`

. For example `V(net)[2]`

will return the second vertex

`V(net)[2]`

```
+ 1/22 vertex, named, from 32d1bee:
[1] Lan
```

`E(net)[2]`

```
+ 1/63 edge from 32d1bee (vertex names):
[1] Moraine->Marin
```

We can also use the indexing to modify the attributes:

`V(net)$name[9] ## Missing a space`

`[1] "False.Dragon"`

```
V(net)$name[9] <- "False Dragon"
V(net)$name ## Fixed
```

```
[1] "Moraine" "Lan" "Marin" "Nynavene"
[5] "Rand" "Mat" "Perrin" "Egwene"
[9] "False Dragon" "Eladia" "Dying.Man" "Tam"
[13] "Laila" "Tom" "Brandelwyn" "Cabin.Trolloc"
[17] "Danya" "Matt.s.Mom" "Matt.s.Sisters" "Padan.Fain"
[21] "Town.Trolloc" "Daise"
```

Dealing with edges can be a bit more confusing, they also have IDs. The best way to identify them is to identify them by what vertex they are incident to.

`get.edge.ids()`

will give you the edges from one vertex to another (given using the `vp=`

argument)

`V(net)[5] # Rand`

```
+ 1/22 vertex, named, from 32d1bee:
[1] Rand
```

`V(net)[6] # Mat`

```
+ 1/22 vertex, named, from 32d1bee:
[1] Mat
```

```
<- get.edge.ids(net, vp=c(V(net)[5], V(net)[6]))
edid edid
```

`[1] 21`

`E(net)[edid]`

```
+ 1/63 edge from 32d1bee (vertex names):
[1] Rand->Mat
```

`incident()`

is used to get all edges that are incident to a vertex. You can set `mode=`

to decide what to do with directed edges.

`incident(net, V(net)[5], mode="in")`

```
+ 7/63 edges from 32d1bee (vertex names):
[1] Moraine ->Rand Marin ->Rand Nynavene->Rand Mat ->Rand Perrin ->Rand
[6] Egwene ->Rand Tam ->Rand
```

`incident(net, V(net)[5], mode="out")`

```
+ 5/63 edges from 32d1bee (vertex names):
[1] Rand->Moraine Rand->Mat Rand->Perrin Rand->Egwene Rand->Tam
```

`incident(net, V(net)[5], mode="all")`

```
+ 12/63 edges from 32d1bee (vertex names):
[1] Rand ->Moraine Moraine ->Rand Marin ->Rand Nynavene->Rand
[5] Rand ->Mat Mat ->Rand Rand ->Perrin Perrin ->Rand
[9] Rand ->Egwene Egwene ->Rand Rand ->Tam Tam ->Rand
```

We set `mode="in"`

for the number of edges pointing towards a node, `mode="out"`

for number pointing out and `mode="all"`

for all.

It can be useful to then add this back to our network data.

`degree(net) [1:5] ## Show first 5 so I don't go off screen`

```
Moraine Lan Marin Nynavene Rand
13 4 8 13 12
```

`mean(degree(net)) # average degree`

`[1] 5.727273`

```
V(net)$degree <- degree(net)
V(net)$indegree <- degree(net, mode="in")
V(net)$outdegree <- degree(net, mode="out")
```

We can do the same thing but with membership in different components. We are going to use `component.dist()`

now which returns an object with information about the components.

```
<- components(net, mode="strong")
comps $no #number of components comps
```

`[1] 7`

`$membership #Which component each vertex is in comps`

```
Moraine Lan Marin Nynavene Rand
4 4 4 4 4
Mat Perrin Egwene False Dragon Eladia
4 4 4 3 3
Dying.Man Tam Laila Tom Brandelwyn
5 4 4 6 4
Cabin.Trolloc Danya Matt.s.Mom Matt.s.Sisters Padan.Fain
7 4 4 4 4
Town.Trolloc Daise
2 1
```

`V(net)$components <- comps$membership `

One thing I like to do is convert numbers into letters for labels. All R sessions have two variables `letters`

and `LETTERS`

that are the letters from a to z.

`c(1, 3, 5)] letters[`

`[1] "a" "c" "e"`

`$members] LETTERS[comps`

```
[1] "D" "D" "D" "D" "D" "D" "D" "D" "C" "C" "E" "D" "D" "F" "D" "G" "D" "D" "D"
[20] "D" "B" "A"
```

We can check which vertices are cutpoints using the `articulation_points()`

function.

For directed networks this can *only* identify cutpoints for weak components.

`articulation_points(net)`

```
+ 5/22 vertices, named, from 32d1bee:
[1] Tam Egwene Daise Mat Nynavene
```

```
<- articulation_points(net)
cut_points V(net)$cuts <- FALSE
V(net)$cuts[cut_points] <- TRUE
V(net)$cuts
```

```
[1] FALSE FALSE FALSE TRUE FALSE TRUE FALSE TRUE FALSE FALSE FALSE TRUE
[13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
```

We can also identify bridges, using the `bridges()`

function

For directed networks this *only* can identify bridges for weak components.

`bridges(net)`

```
+ 5/63 edges from 32d1bee (vertex names):
[1] Tam ->Cabin.Trolloc Egwene ->Tom Daise ->Town.Trolloc
[4] Daise ->Egwene Nynavene->Dying.Man
```

```
<- bridges(net)
bridge_edges E(net)$bridges <- FALSE
E(net)$bridges[bridge_edges] <- TRUE
E(net)$bridges
```

```
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[13] FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
[25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[37] FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
[49] FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[61] FALSE TRUE TRUE
```

We’ve saved a lot of useful information about our vertices in our network. It is often easier to use this info as a dataframe though. We use `as_data_frame()`

to convert a network into a dataframe and set `what="vertices"`

to give us all that wonderful vertex info (we can then save it even).

```
<- as_data_frame(net, what="vertices")
df_out tail(df_out, 5)
```

```
name type degree indegree outdegree components
Matt.s.Mom Matt.s.Mom Character 3 1 2 4
Matt.s.Sisters Matt.s.Sisters Character 3 2 1 4
Padan.Fain Padan.Fain Character 2 1 1 4
Town.Trolloc Town.Trolloc Character 1 1 0 2
Daise Daise Character 2 0 2 1
cuts
Matt.s.Mom FALSE
Matt.s.Sisters FALSE
Padan.Fain FALSE
Town.Trolloc FALSE
Daise TRUE
```

`write.csv(df_out, "vertices.csv", row.names=F)`

The last thing for this week is calculating geodesic distances. We use the `distances()`

function to do that. If we set `mode="out"`

then the resulting matrix can be read as the distances from the *row* to the *column*.

```
<- distances(net, mode="out")
dist 1:5, 1:5] dist[
```

```
Moraine Lan Marin Nynavene Rand
Moraine 0 2 1 1 1
Lan 2 0 3 1 2
Marin 1 2 0 1 1
Nynavene 1 1 2 0 1
Rand 1 3 2 2 0
```

To calculate the average distances we can use the `mean_distance()`

function. It will automatically ignore any *disconnected* components. We can treat the network as directed or undirected.

`mean_distance(net, directed=TRUE)`

`[1] 2.978261`

`mean_distance(net, directed=FALSE)`

`[1] 2.827225`

One problem with the `distances()`

function is that it returns unconnected distances as `Inf`

which can make it impossible to calculate things. We can use `is.infinite()`

and subsetting to replace that with `NA`

which we can exclude more easily

`max(dist, na.rm=T) # BROKEN `

`[1] Inf`

```
is.infinite(dist)] <- NA
dist[max(dist, na.rm=T) # not broken
```

`[1] 6`

Another type of network data is an edge list format, where each row shows an edge in the format “start_of_edge, end_of_edge” (or head to tail)

```
<- as_edgelist(net)
edge 1:8,] edge[
```

```
[,1] [,2]
[1,] "Moraine" "Lan"
[2,] "Moraine" "Marin"
[3,] "Moraine" "Nynavene"
[4,] "Moraine" "Rand"
[5,] "Moraine" "Mat"
[6,] "Moraine" "Perrin"
[7,] "Moraine" "Egwene"
[8,] "Lan" "Moraine"
```

Edge lists are common ways of providing network data.

I have data that shows connections between members of the 134th Ohio Senate by the number of bills they cosponsored with each other.

```
<- read.csv("network_data/OH_134_cosponsor.csv")
edge_data head(edge_data)
```

```
.tail .head weight
1 Kenny Yuko Jay Hottinger 129
2 Matthew Dolan Jay Hottinger 107
3 Vernon Sykes Jay Hottinger 91
4 Sandra Williams Jay Hottinger 53
5 Teresa Fedor Jay Hottinger 70
6 Robert Hackett Jay Hottinger 136
```

We can create the network the same way as before though we want to indicate this is undirected data.

```
<- graph_from_data_frame(edge_data, directed=FALSE)
leg_net leg_net
```

```
IGRAPH 5063edf UNW- 34 560 --
+ attr: name (v/c), weight (e/n)
+ edges from 5063edf (vertex names):
[1] Kenny Yuko --Jay Hottinger Matthew Dolan --Jay Hottinger
[3] Vernon Sykes --Jay Hottinger Sandra Williams --Jay Hottinger
[5] Teresa Fedor --Jay Hottinger Robert Hackett --Jay Hottinger
[7] Andrew Brenner --Jay Hottinger Kristina Roegner--Jay Hottinger
[9] Terry Johnson --Jay Hottinger Nickie Antonio --Jay Hottinger
[11] Bob Peterson --Jay Hottinger Louis Blessing --Jay Hottinger
[13] Stephanie Kunze --Jay Hottinger Mark Romanchuk --Jay Hottinger
[15] Bill Reineke --Jay Hottinger Hearcel Craig --Jay Hottinger
+ ... omitted several edges
```

With an edge list we can also easily add in information about each node/vertex. Here I load a second dataset OH_134_people.csv which has information about each individual. The first column needs to be the vertex names.

```
<- read.csv("network_data/OH_134_people.csv")
vertex_data <- graph_from_data_frame(edge_data, directed=FALSE,
leg_net vertices=vertex_data)
leg_net
```

```
IGRAPH 1fa10a5 UNW- 34 560 --
+ attr: name (v/c), people_id (v/n), first_name (v/c), middle_name
| (v/c), last_name (v/c), suffix (v/c), nickname (v/c), party_id (v/n),
| party (v/c), role_id (v/n), role (v/c), district (v/c),
| followthemoney_eid (v/n), votesmart_id (v/n), opensecrets_id (v/l),
| ballotpedia (v/c), knowwho_pid (v/n), committee_id (v/n), weight
| (e/n)
+ edges from 1fa10a5 (vertex names):
[1] Jay Hottinger--Kenny Yuko Matthew Dolan--Jay Hottinger
[3] Jay Hottinger--Vernon Sykes Jay Hottinger--Sandra Williams
[5] Jay Hottinger--Teresa Fedor Jay Hottinger--Robert Hackett
+ ... omitted several edges
```

ggplot2 is *the* library for making figures in R. The following slides give you an introduction to how to use it.

All plots starts with `ggplot()`

function that includes the *dataframe* you are going to use as the first argument. By itself it doesn’t do anything.

```
library(ggplot2)
ggplot(df_out)
```

The other common part of a `ggplot()`

call is a call to another function `aes()`

. This function *maps* parts of your dataframe onto parts of the plot. In the below example I map the `indegree`

variable to the x axis and the `outdegree`

variable to the y axis.

`ggplot(df_out, aes(x=indegree, y=outdegree))`

We use `geom_*()`

functions to actually *add* things to the plot. To do this we literally *add* the `geom_point()`

function to our previous call

```
ggplot(df_out, aes(x=indegree, y=outdegree)) +
geom_point()
```

We can change things about the points by adding other arguments to `geom_point()`

(there are a lot of options, scroll to Aesthetics here to see them).

```
ggplot(df_out, aes(x=indegree, y=outdegree)) +
geom_point(color="steelblue4", size=6)
```

We can also use the `aes()`

function within the `geom_point()`

call to have different aesthetics mapped to our data. The below will change the color of the dots depending on whether they are a cutpoint or not.

```
ggplot(df_out, aes(x=indegree, y=outdegree)) +
geom_point(aes(color=cuts), size=6)
```

The power of ggplot are all the geoms. Lets change `geom_point()`

to `geom_jitter()`

which bumps around the dots, and add on another one: geom_smooth()` to show the trend.

```
ggplot(df_out, aes(x=indegree, y=outdegree)) +
geom_jitter(aes(color=cuts), size=6) +
geom_smooth(method="lm")
```

``geom_smooth()` using formula = 'y ~ x'`

Look what happens when you *move* the `color=cuts`

part to the `ggplot()`

call. Any `aes()`

calls here impact *all* geoms.

```
ggplot(df_out, aes(x=indegree, y=outdegree,
color=cuts)) +
geom_jitter(size=6) +
geom_smooth(method="lm")
```

``geom_smooth()` using formula = 'y ~ x'`

We can add or modify labels to our plot using the `labs()`

function

```
ggplot(df_out, aes(x=indegree, y=outdegree)) +
geom_jitter(aes(color=cuts), size=6) +
geom_smooth(method="lm") +
labs(y="Out Degree", x="In Degree",
title="Out vs In Degree")
```

``geom_smooth()` using formula = 'y ~ x'`

There are also `scale_*_*()`

functions that can be used to change the style and labels of the scales.

```
ggplot(df_out, aes(x=indegree, y=outdegree)) +
geom_jitter(aes(color=cuts), size=6) +
geom_smooth(method="lm") +
labs(y="Out Degree", x="In Degree",
title="Out vs In Degree") +
scale_color_brewer("Cut Point?",
type="qual", palette=2)
```

``geom_smooth()` using formula = 'y ~ x'`

Finally you can use `theme_*()`

to change the overall style of the plot

```
ggplot(df_out, aes(x=indegree, y=outdegree)) +
geom_jitter(aes(color=cuts), size=6) + geom_smooth(method="lm") +
labs(y="Out Degree", x="In Degree",
title="Out vs In Degree") +
scale_color_brewer("Cut Point?",
type="qual", palette=2) + theme_minimal()
```

``geom_smooth()` using formula = 'y ~ x'`

There are three main functions we will need for this:

`mds()`

from the smacof package.`CA()`

from the FactoMineR package.`hclust()`

a base R function (along with`cutree()`

and`dist()`

).

The `mds()`

function in `smacof`

lets us estimate multiple types of MDS:

`mds(distance, type="interval")`

- Metric MDS`mds(distance, type="ordinal")`

- Non-Metric MDS

In both cases the `distance`

object needs to be a symmetric *dissimilarity* (distance) matrix. Larger values mean observations are more different from each other.

You select the number of dimensions using `ndim=`

R has an object `UScitiesD`

that you can call at anytime which shows the distances between several cities.

```
# UScitiesD #Run this by itself to see
library(smacof)
<- mds(UScitiesD, type="interval")
mds $stress # The stress mds
```

`[1] 0.001913151`

`$conf # the actual points mds`

```
D1 D2
Atlanta -0.4543817 0.09024782
Chicago -0.2415148 -0.21576888
Denver 0.3051669 -0.01607968
Houston -0.1021788 0.36232737
LosAngeles 0.7612104 0.24459966
Miami -0.7164226 0.36679903
NewYork -0.6783018 -0.32755339
SanFrancisco 0.8982951 0.06986850
Seattle 0.8477661 -0.36291042
Washington.DC -0.6196388 -0.21153000
```

We can plot this with `ggplot2`

```
library(ggplot2)
<- as.data.frame(mds$conf)
points ggplot(points, aes(x=D1, y=D2)) +
geom_point() +
theme_minimal()
```

Lets add some labels

```
$names <- rownames(points)
pointsggplot(points, aes(x=D1, y=D2)) +
geom_point() + geom_text(aes(label=names)) +
theme_minimal()
```

Flip the scales

```
ggplot(points, aes(x=-D1, y=-D2)) +
geom_text(aes(label=names)) +
theme_minimal()
```

For the next section we are going to use data on UN Votes (this is in an RData format and is already a distance object).

```
<- readRDS("network_data/distance_1960s.RData")
UN_votes
<- mds(UN_votes, ndim=2, type="ordinal")
out $stress out
```

`[1] 0.1247679`

If we want to check different levels of stress we could estimate several models like this:

```
<- mds(UN_votes, ndim=1, type="ordinal")
out_1 <- mds(UN_votes, ndim=2, type="ordinal")
out_2 <- mds(UN_votes, ndim=3, type="ordinal")
out_3 <- mds(UN_votes, ndim=4, type="ordinal") out_4
```

This is repetitive, and good code minimizes repetition.

We can use a *for* loop to repeat some lines of code multiple times changing things as we do:

Basic idea:

```
for(ii in 1:5){
print(ii)
}
```

```
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
```

We want to create something to store the results and then store it every time we go through the loop.

```
<- numeric(5) ## Vector of length 5
out for(ii in 1:5){
<- ii
out[ii]
}print(out)
```

`[1] 1 2 3 4 5`

We want to create something to store the results and then store it every time we go through the loop.

```
<- numeric(5) ## Vector of length 5
out for(ii in 1:5){
<- smacof::mds(UN_votes, ndim=ii, type="ordinal")
tmp <- tmp$stress
out[ii]
}print(out)
```

`[1] 0.32174980 0.12476788 0.08293623 0.06374924 0.05249708`

I use the basic `plot()`

function to make a scree plot:

```
plot(x=1:5, y=out, ylab="Stress",
xlab="Dimensions",
type="b")
```

The `hclust()`

function needs a `distance`

object to work. Our `UN_votes`

data already shows “distances” but isn’t a `distance`

object so we use `as.dist()`

to convert it.

```
<- hclust(as.dist(UN_votes))
hcl plot(hcl) ## creates a dendrogram
```

You can change the method with the `method=`

argument (“single”, “complete”, “average”)

```
<- hclust(as.dist(UN_votes), method="single")
hcl plot(hcl) ## creates a dendrogram
```

The `cutree()`

function “cuts the tree” to make clusters. Set `k=`

to set the number of groups you want.

```
<- cutree(hcl, k=4)
groups 1:20] groups[
```

```
AFG ALB DZA ARG AUS AUT BRB BLR BEL BEN BOL BWA BRA BGR BFA BDI CIV KHM CMR CAN
1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 3 1 1
```

For correspondence analysis we use the `CA()`

function, and can set the number of dimensions to estimate with `ncp=`

(feel free to se this to something high)

The data here shows the number of a country’s EU MEPs are in each party group.

```
<- read.csv("network_data/EU_seats.csv", row.names=1)
df library(FactoMineR)
<- CA(df, ncp=5) ca
```

```
Warning: ggrepel: 10 unlabeled data points (too many overlaps). Consider
increasing max.overlaps
```

We can see how many dimensions to use by calling `output$eig`

`$eig ca`

```
eigenvalue percentage of variance cumulative percentage of variance
dim 1 0.29470402 29.073034 29.07303
dim 2 0.23559433 23.241767 52.31480
dim 3 0.17732184 17.493090 69.80789
dim 4 0.12367321 12.200564 82.00846
dim 5 0.09625459 9.495673 91.50413
dim 6 0.05514655 5.440298 96.94443
dim 7 0.03097337 3.055574 100.00000
```

If we want to extract the scales we use `output$row$coord`

or `output$col$coord`

. We can plot it easily, by converting it data.frame and sticking it into ggplot

```
<- as.data.frame(ca$row$coord)
countries ggplot(countries, aes(x=`Dim 1`, y=`Dim 2`)) +
geom_point(size=5) + theme_minimal()
```

If we want to plot them both we need to combine the two data.frames (using `rbind()`

) giving them labels first:

```
<- as.data.frame(ca$row$coord)
countries $Type <- "Country"
countries$Name <- rownames(countries)
countries<- as.data.frame(ca$col$coord)
parties $Type <- "Party"
parties$Name <- rownames(parties)
parties
<- rbind(countries, parties)
all
ggplot(all, aes(x=`Dim 1`, y=`Dim 2`, color=Type)) +
geom_point(size=5) + theme_minimal()
```

Now we are going to move onto visualizing networks.

Visualizations *can* be useful with network data, but they are also hard to do:

- We have complicated data.
- We often want to show multiple “types” of information.

We are going to use the `ggraph`

package.

Benefits:

- It uses a ggplot2 style interface.
- It allows a lot of fine-tuning of plots.
- Has a fair amount of useful online documentation on layouts, nodes, and edges

Cons:

- It is a bit overly complicated at times.

We are going to use a small canned dataset you can download from the internet: strike.paj

It is a communication network between workers at a sawmill. It also is a unique data format: “pajek” which thankfully igraph has a function for

```
library(igraph)
<- read_graph("network_data/strike.paj", format="pajek") net
```

```
Warning in read.graph.pajek(file, ...): At
src/vendor/cigraph/src/io/pajek-lexer.l:130 : Skipping unknown section
'*Partition' on line 68.
```

`ecount(net)`

`[1] 38`

`vcount(net)`

`[1] 24`

Just like ggplot2 all visualizations will start with a call to `ggraph()`

```
library(ggraph)
theme_set(theme_graph())
ggraph(net)
```

`Using "stress" as default layout`

`Warning: Existing variables `x` and `y` overwritten by layout variables`

To add nodes and edges to this plot we will use `geom_node_point()`

and `geom_edge_link()`

`geom_node_point`

: Adds our nodes as circles`geom_edge_link`

: Adds edges as straight lines (no arrows)

```
ggraph(net) + geom_node_point(size=6) +
geom_edge_link()
```

`Using "stress" as default layout`

`Warning: Existing variables `x` and `y` overwritten by layout variables`

```
Warning: Using the `size` aesthetic in this geom was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` in the `default_aes` field and elsewhere instead.
```

Laying out a plot can impact how useful it is by a lot:

`Warning: Existing variables `x` and `y` overwritten by layout variables`

`Warning: Existing variables `x` and `y` overwritten by layout variables`

- Dimension Reduction: Use multivariate techniques to scale into two dimensions
- MDS, Pivot Scaling, Eigenvector

- Force-Directed: Simulates a physical process
- Fruchterman and Reingold, Kamada and Kawai, Davidson-Harel, DrL, Large Graph Layout (LGL), Graphopt, and GEM

In most of these layouts they do something like:

- Each node repulses all other nodes.
- Edges pull two nodes together.
- The balance of this is that groups of nodes with lots of connections are close and groups without them are far.

FR views vertexes as “atomic particles or celestial bodies, exerting attractive and repulsive forces from one another”.

How does this algorithm work?

- Calculate the amount of repulsion between all nodes.
- Calculate the amount of attraction between all adjacent nodes.
- Move nodes based on the weight of attraction and repulsion, but limit the amount of movement by a
**temperature**. - Reduce the
**temperature**, go back to step 1.

```
set.seed(1)
<- create_layout(net, layout="igraph", algorithm="fr",
lo niter=1)
ggplot(lo) + geom_node_point(size=6) +
geom_edge_link() + labs(title="1 Iteration")
set.seed(1)
<- create_layout(net, layout="igraph", algorithm="fr",
lo niter=2)
ggplot(lo) + geom_node_point(size=6) +
geom_edge_link() + labs(title="2 Iterations")
set.seed(1)
<- create_layout(net, layout="igraph", algorithm="fr",
lo niter=3)
ggplot(lo) + geom_node_point(size=6) +
geom_edge_link() + labs(title="3 Iterations")
set.seed(1)
<- create_layout(net, layout="igraph", algorithm="fr",
lo niter=4)
ggplot(lo) + geom_node_point(size=6) +
geom_edge_link() + labs(title="4 Iterations")
```

```
set.seed(1)
<- create_layout(net, layout="igraph", algorithm="fr",
lo niter=10)
ggplot(lo) + geom_node_point(size=6) +
geom_edge_link() + labs(title="10 Iteration")
set.seed(1)
<- create_layout(net, layout="igraph", algorithm="fr",
lo niter=25)
ggplot(lo) + geom_node_point(size=6) +
geom_edge_link() + labs(title="25 Iterations")
set.seed(1)
<- create_layout(net, layout="igraph", algorithm="fr",
lo niter=50)
ggplot(lo) + geom_node_point(size=6) +
geom_edge_link() + labs(title="50 Iterations")
set.seed(1)
<- create_layout(net, layout="igraph", algorithm="fr",
lo niter=100)
ggplot(lo) + geom_node_point(size=6) +
geom_edge_link() + labs(title="100 Iterations")
```

To set the layout you set `layout=`

to what you want, you can also pass additional arguments as necessary.

If you want to create the *exact* same layout every time run `set.seed()`

directly prior to making the plot. This sets the “random seed” that is used.

```
set.seed(1)
ggraph(graph=net, layout="fr", niter=250) +
geom_edge_link() + geom_node_point(size=6)
```

`Warning: Existing variables `x` and `y` overwritten by layout variables`

Network of DNC emails from here.

`<- read_graph("network_data/dnc.gml", format='gml') dnc_net `

```
Warning in read.graph.gml(file, ...): At vendor/cigraph/src/io/gml.c:149 : One
or more unknown entities will be returned verbatim (
).
```

```
ggraph(graph=dnc_net, "graphopt") +
geom_edge_link() + geom_node_point()
```

We can use the function `largest_component()`

to grab just that part. Also the `|>`

is a *pipe* which passes on the output.

```
|> largest_component() |>
dnc_net ggraph("fr") +
geom_edge_link() + geom_node_point()
```

Deleting all the isolates using `which()`

and `delete_vertices()`

.

```
<- which(degree(dnc_net)==0)
isolates |> delete_vertices(v=isolates) |>
dnc_net ggraph("fr") + geom_edge_link() + geom_node_point()
```

We can use `geom_node_text()`

or `geom_node_label()`

to label our nodes.

```
ggraph(graph=net, "stress") +
geom_edge_link() +
geom_node_label(aes(label=name))
```

`Warning: Existing variables `x` and `y` overwritten by layout variables`

They also have a `repel=T`

argument that will move the labels away from the center of the node.

```
ggraph(graph=net, "stress") +
geom_edge_link() + geom_node_point() +
geom_node_text(aes(label=name), repel=T)
```

`Warning: Existing variables `x` and `y` overwritten by layout variables`

Vertex attributes are included for a variety of reasons. This includes:

- Demonstrating who is important in a network.
- Showing groups in a network.
- Presenting other relevant details

Often we will scale a node size by a measure of importance, like degree:

```
V(net)$degree <- degree(net)
ggraph(graph=net, "stress") +
geom_edge_link() + geom_node_point(aes(size=degree)) +
ggtitle("Sized by Degree") +
scale_size("Degree")
```

`Warning: Existing variables `x` and `y` overwritten by layout variables`

This data is of Spanish high school students and includes negative and positive relations. We are going to delete the negative edges.

```
<- read.csv("network_data/spanish_hs_edges.csv")
edges <- read.csv("network_data/spanish_hs_nodes.csv")
nodes <- graph_from_data_frame(edges, vertices=nodes, directed=T)
net <- which(E(net)$weight < 0)
neg_edges <- delete_edges(net, neg_edges)
net net
```

```
IGRAPH fdda5ca DNW- 105 1058 --
+ attr: name (v/n), Colegio (v/n), Curso (v/n), Grupo (v/c), Sexo
| (v/c), prosocial (v/n), crttotal (v/n), X_pos (v/c), id (e/n), weight
| (e/n)
+ edges from fdda5ca (vertex names):
[1] 3043->3047 3043->3087 3043->3093 3043->3065 3043->3097 3043->3044
[7] 3043->3045 3043->3088 3043->3056 3043->3090 3043->3073 3043->3066
[13] 3043->3060 3043->3092 3043->3096 3043->3077 3043->3084 3043->3105
[19] 3043->3067 3043->3064 3043->3081 3043->3068 3043->3061 3043->3058
[25] 3043->3055 3043->3072 3043->3095 3043->3051 3043->3054 3043->3086
[31] 3043->3085 3043->3089 3043->3047 3043->3048 3043->3049 3043->3050
+ ... omitted several edges
```

We can color the nodes by setting `aes(color=)`

to a vertex attribute.

```
ggraph(graph=net, "stress") +
geom_edge_link() +
geom_node_point(aes(color=Sexo), size=4) +
ggtitle("Colored by Sex")
```

There are a few things we might want to do with our edges:

- Add arrows for a directed network
- Show edge attributes.

I think it is easier to see a network by making the edges gray.

```
ggraph(graph=net, "stress") +
geom_edge_link(color="gray") +
geom_node_point(aes(color=Sexo), size=4) +
ggtitle("Colored by Sex")
```

Arrows are *annoying* to add here, but there is some good help online. We manualy create an arrow (`arrow`

) and manually end them before the node (`end_cap`

)

```
ggraph(graph=net, "stress") +
geom_edge_link(color="gray",
arrow = arrow(length = unit(4, 'mm')),
end_cap = circle(3, 'mm')) +
geom_node_point(aes(color=Sexo), size=4)
```

Finally we can assign edge attributes to aesthetics

```
ggraph(graph=net, "stress") +
geom_edge_link(color="gray", aes(width=weight),
arrow = arrow(length = unit(4, 'mm')),
end_cap = circle(3, 'mm')) +
geom_node_point(aes(color=Sexo), size=4)
```

The default for ggraph is to show only a single edge when there are two mutual edges. We can change that by using `geom_edge_fan()`

```
ggraph(graph=net, "stress") +
geom_edge_fan(aes(color=weight),
arrow = arrow(length = unit(4, 'mm')),
end_cap = circle(3, 'mm')) +
geom_node_point(aes(color=Sexo), size=4)
```

This data shows the relationships between donors in the Ohio legislature. You can download the edgelist here and the meta data here

```
library(igraph)
<- read.csv("network_data/edge_OH.csv")
edges <- read.csv("network_data/meta_OH.csv")
nodes <- graph_from_data_frame(edges, directed=F, vertices=nodes) net
```

- Degree:
`degree()`

- Eigenvector:
`eigen_centrality()`

- Beta:
`power_centrality()`

set the**beta**value with`exponent=`

- Closeness:
`closeness()`

- Harmonic Mean:
`harmonic_centrality()`

- Betweenness:
`betweenness()`

- K-Step:
`ego_size()`

These are generally relatively simple to use by `eigen_centrality()`

outputs a whole object with information. To get just the centralities use `$vector`

```
V(net)$degree <- degree(net)
<- eigen_centrality(net)
tmp V(net)$eigen <- tmp$vector
$value ## The eigenvalue tmp
```

`[1] 115.055`

`V(net)$beta <- power_centrality(net, exponent=0.008)`

```
<- as_data_frame(net, what="vertices")
stats ggplot(stats, aes(x=Total, y=eigen)) +
geom_point() + theme_minimal() +
geom_smooth(method="lm") +
scale_x_log10()
```

``geom_smooth()` using formula = 'y ~ x'`

```
ggplot(stats, aes(x=CatCodeGroup, y=eigen)) +
geom_boxplot() + theme_minimal() +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
```

Again, for the undirected unweighted version there isn’t a lot too these:

```
V(net)$between <- betweenness(net, normalized=T)
V(net)$close <- closeness(net, normalized=T)
V(net)$harmonic <- harmonic_centrality(net, normalized=T)
V(net)$step_2 <- ego_size(net, 2) # Two step
V(net)$step_3 <- ego_size(net, 3) # Three step
```

```
ggraph(net) + geom_edge_link(color="gray") +
geom_node_point(aes(size=between))
```

`Using "stress" as default layout`

Things get a bit more complicated if we want to use edge weights, the data we have has something we can use to weight the edges:

`E(net)$weight <- ifelse(E(net)$edge == "Strong", 2, 1)`

Weighted degree is the `strength()`

function.

```
<- strength(net)
weighted_degree <- degree(net)
regular_degree plot(x=regular_degree, y=weighted_degree)
```

If there is a `weight`

edge attribute it will automatically use it, but you can turn this off by setting `weights=NA`

```
<- eigen_centrality(net)$vector
weighted_eigen <- eigen_centrality(net, weights=NA)$vector
regular_eigen plot(x=regular_eigen, y=weighted_eigen)
```

For directed networks you can calculate directed versions of theses measures but it isn’t always consistent now:

`eigen_centrality()`

have to set`directed=T`

and automatically provides the “in-centrality”. If you want out-centrality put your network in`t()`

:`eigen_centrality(t(net), directed=T)`

`power_centrality()`

will automatically do a directed measure if you give it a directed network. Calculates the “out-centrality” use`t()`

to calculate the “in-centrality”

Lets use the Spanish High School Network you used earlier this week:

```
library(igraph)
<- read.csv("network_data/spanish_hs_edges.csv")
edges <- read.csv("network_data/spanish_hs_nodes.csv")
nodes <- graph_from_data_frame(edges, vertices=nodes, directed=T)
net <- delete_edges(net, E(net)[E(net)$weight <= 0]) net
```

Density can be calculated with `edge_density()`

There is an option to turn on loops (edges that start and stop at the same node)

`edge_density(net)`

`[1] 0.09688645`

Degree can be calculated with `degree()`

and then average it up with `mean()`

```
# note that this has gmode, instead of mode. why?
<- degree(net)
deg mean(deg)
```

`[1] 20.15238`

Component ratio is not in the SNA package, but the formula is pretty simple:

\[ \textnormal{Component Ratio} = \frac{c-1}{n-1} \]

So we need to calculate the number of components (c), and the number of nodes (n).

```
<- components(net, mode="strong")$no
comps <- vcount(net)
n
-1)/(n-1) (comps
```

`[1] 0.2788462`

There is no native function to calculate connectedness but we have all the tools:

```
<- distances(net, mode="out") ## Get the distances matrix
dist <- is.finite(dist) ## Where are there finite paths?
reachability diag(reachability) <- NA ## Ignore the diag
sum(reachability, na.rm=T) / (vcount(net) * (vcount(net) - 1))
```

`[1] 0.7168498`

Again, no native function but we can get it relatively quickly in a similar manner:

```
<- distances(net, mode="out") ## Get the distances matrix
dist diag(dist) <- NA ## Ignore the diag
sum(1/dist, na.rm=T) / (vcount(net) * (vcount(net) - 1))
```

`[1] 0.2914342`

We calculate reciprocity with the `reciprocity()`

function.

The default is to calculate arc-reciprocity, but we can switch it to the other (less good) measure by setting `mode="ratio"`

`reciprocity(net)`

`[1] 0.389414`

`edge_density(net)`

`[1] 0.09688645`

There is a `transitivity()`

function, which works well with *undirected* networks. For *directed networks* we have to make use of the `sna`

library and the `gtrans()`

function.

`transitivity(net) ## Treats as undirected`

`[1] 0.4361186`

```
<- as_adjacency_matrix(net, sparse=F) # Convert to adj mat
mat_net ::gtrans(mat_net) ## Call SNA function sna
```

`[1] 0.4617656`

Finally, we have centralization. You can use a lot of different measures of **centrality** to measure centralization and so there are a suite of functions:

`center_degree()`

- Uses degree centrality`center_eigen()`

- Uses eigen centrality`center_betw()`

- Uses betweenness

They also all output an object with multiple things to get the centralization score use `$centralization`

`centr_betw(net)$centralization`

`[1] 0.09744318`

`centr_degree(net)$centralization`

`[1] 0.144878`

`centr_eigen(net)$centralization`

`[1] 0.7185371`

- Find cliques
- Identify overlaps/analyze cliques
- Implement different cluster methods
- Access information about clusters

We are going to use the Cocaine Mambo network today.

```
<- read.csv("network_data/COCAINE_MAMBO.csv", row.names=1)
df <- as.matrix(df)
mat >1] <- 1 # there is a weird cell with a 2, not a 1.
mat[mat<- graph_from_adjacency_matrix(mat, mode="undirected") coke_net
```

```
Warning: The `adjmatrix` argument of `graph_from_adjacency_matrix()` must be symmetric
with mode = "undirected" as of igraph 1.6.0.
ℹ Use mode = "max" to achieve the original behavior.
```

There are several igraph functions for cliques. We want to use `max_cliques()`

`max_cliques()`

returns a `list`

of the cliques in your network with each item in that list a clique.

```
<- max_cliques(coke_net, min=3)
cl length(cl) # number of cliques
```

`[1] 15`

`1]] # the first clique cl[[`

```
+ 3/31 vertices, named, from 755ae35:
[1] ARG JDD JPPM
```

If we want to look at who is in each clique we are going to have to process this list a little. In particular we will make a matrix showing who is in each clique:

```
# Create a matixi of 0s
<- matrix(0, nrow=vcount(coke_net), ncol=length(cl))
cl_mat
# Use a for loop to step through the cliques
for(ii in seq_along(cl)){
<- 1
cl_mat[cl[[ii]] , ii]
}# We can add rownames
rownames(cl_mat) <- V(coke_net)$name
# Size of each clique
colSums(cl_mat)
```

` [1] 3 3 3 3 3 4 3 4 4 3 3 4 4 4 4`

```
# Number of cliques each individual is in
rowSums(cl_mat)
```

```
A AP ARG CPGT EDRS ELM ERC FR GA HAMS HPM JB JDD JE JEAB JG
0 1 1 2 0 5 1 2 0 3 1 1 3 0 2 1
JHLM JJTE JO JPPM JV LIRP OS PCCP RAJH RB RS SNRM VR WGV YPMG
1 2 0 10 0 1 1 2 0 0 3 7 0 0 2
```

We can calculate the number of cliques each individual is in and put that back on the original data:

```
V(coke_net)$numb_cliques <- rowSums(cl_mat)
ggraph(coke_net) + geom_edge_link(color="gray") +
geom_node_point(size=8,
aes(color=numb_cliques))
```

`Using "stress" as default layout`

With that matrix we can see easily how many cliques each person is in with every other person:

```
# Multiply the matrix by its transpose
<- cl_mat %*% t(cl_mat)
overlap_mat <- graph_from_adjacency_matrix(overlap_mat, diag=F,
overlap_net mode="undirected", weighted=T)
# ignores the diag, has it undirected,
# and uses the values (overlaps) as weights
ggraph(overlap_net, "fr") +
geom_edge_link(color="gray", aes(width=weight)) +
geom_node_point(size=8, color="purple")
```

There are several functions we will use to find clusters:

- Grivan-Newman/Edge Betweenness:
`cluster_edge_betweenness()`

- Fast-Greedy:
`cluster_fast_greedy()`

- Walktrap:
`cluster_walktrap()`

- Louvain:
`cluster_louvain()`

Each function is used in a similar way and returns something similar.

All you have to do is give it the network.

```
<- cluster_fast_greedy(coke_net)
clust
clust
```

```
IGRAPH clustering fast greedy, groups: 5, mod: 0.39
+ groups:
$`1`
[1] "ARG" "CPGT" "FR" "GA" "JDD" "JPPM" "OS" "PCCP"
$`2`
[1] "EDRS" "JB" "JE" "JO" "SNRM" "VR" "WGV"
$`3`
[1] "HAMS" "JG" "JJTE" "JV" "RAJH" "RB" "YPMG"
$`4`
+ ... omitted several groups/vertices
```

You can access the modularity scores for each new clustering level with `$modularity`

and `$membership`

provides the membership of each vertex in the clustering level with the highest modularity

`$membership clust`

` [1] 4 4 1 1 2 4 5 1 1 3 4 2 1 2 5 3 5 3 2 1 3 4 1 1 3 3 4 2 2 2 3`

`$modularity clust`

```
[1] -5.960166e-02 -4.295482e-02 -1.085018e-02 5.499405e-03 2.170036e-02
[6] 3.775268e-02 5.380499e-02 6.970868e-02 8.546373e-02 1.053805e-01
[11] 1.336207e-01 1.544293e-01 1.728597e-01 1.878716e-01 2.028835e-01
[16] 2.266647e-01 2.535672e-01 2.702140e-01 2.850773e-01 2.997919e-01
[21] 3.143579e-01 3.287753e-01 3.430440e-01 3.573127e-01 3.757432e-01
[26] 3.840666e-01 3.862961e-01 3.834721e-01 3.745541e-01 3.567182e-01
[31] -5.551115e-17
```

I personally like giving each cluster a letter so we can do the following:

```
V(coke_net)$Cluster <- LETTERS[clust$membership]
ggraph(coke_net) + geom_edge_link(color="gray") +
geom_node_point(size=8,
aes(color=Cluster))
```

`Using "stress" as default layout`

We can use `as_data_frame()`

to access the vertex data again from our network

```
<- igraph::as_data_frame(coke_net, what="vertices")
vertex_data 1:5,] vertex_data[
```

```
name numb_cliques Cluster
A A 0 D
AP AP 1 D
ARG ARG 1 A
CPGT CPGT 2 A
EDRS EDRS 0 B
```

`#write.csv(vertex_data, "tmp.csv", row.names=F)`

Finally you can also plot the dendrogram (if it is applicable)

`plot(as.dendrogram(clust))`

- How to calculate structural similarity.
- How to do dyad level comparisons.
- How to create a blockmodel from structural similarity.
- How to directly blockmodel.

The data we will use is a london gang network.

```
<- read_graph("network_data/london.graphml", format="graphml")
net net
```

```
IGRAPH 638f4bf UN-- 54 1010 --
+ attr: name (v/c), Age (v/n), Birthplace (v/n), Residence (v/n),
| Arrests (v/n), Convictions (v/n), Prison (v/n), Music (v/n), Ranking
| (v/n), id (v/c)
+ edges from 638f4bf (vertex names):
[1] X1--X2 X1--X3 X1--X4 X1--X4 X1--X5 X1--X6 X1--X7 X1--X7 X1--X8
[10] X1--X8 X1--X8 X1--X9 X1--X9 X1--X10 X1--X10 X1--X11 X1--X11 X1--X11
[19] X1--X12 X1--X17 X1--X18 X1--X20 X1--X20 X1--X21 X1--X21 X1--X22 X1--X22
[28] X1--X22 X1--X23 X1--X25 X1--X27 X1--X28 X1--X29 X1--X43 X1--X45 X1--X46
[37] X1--X51 X1--X2 X2--X3 X2--X3 X2--X3 X2--X6 X2--X6 X2--X7 X2--X8
[46] X2--X8 X2--X9 X2--X10 X2--X10 X2--X11 X2--X11 X2--X12 X2--X13 X2--X14
+ ... omitted several edges
```

The `sedist()`

function from the `blocmodeling`

package calculate structural similarity.

`sedist()`

accepts for arguments:

`M=`

the adjacency matrix.`method=`

A method for calculating the distance (anything in the`dist()`

function can be used)`"euclidean"`

- Calculate the euclidean distance`"cor'`

- Calculate the Pearson correlation.

`hand.interaction=`

If you want to play around with how it deals with the interactions (don’t worry about this).

It outputs a `dist`

matrix.

`library(blockmodeling)`

```
To cite package 'blockmodeling' in publications please use package
citation and (at least) one of the articles:
Žiberna, Aleš (2007). Generalized blockmodeling of valued networks.
Social Networks 29(1), 105-126.
Žiberna, Aleš (2008). Direct and indirect approaches to blockmodeling
of valued networks in terms of regular equivalence. Journal of
Mathematical Sociology 32(1), 57–84.
Žiberna, Aleš (2014). Blockmodeling of multilevel networks. Social
Networks 39, 46–61. https://doi.org/10.1016/j.socnet.2014.04.002.
Žiberna, Aleš (2023). Generalized and Classical Blockmodeling of
Valued Networks, R package version 1.1.5.
Cugmas, Marjan (2023). Generalized and Classical Blockmodeling of
Valued Networks, R package version 1.1.5.
To see these entries in BibTeX format, use 'format(<citation>,
bibtex=TRUE)', or 'toBibtex(.)'.
```

```
## Convert to an adjacency matrix
<- as_adjacency_matrix(net, sparse=F)
mat ## Calculate structural similarity using euclidean
<- sedist(mat, method="euclidean")
london_se ## Look at first 5 rows/columns
as.matrix(london_se)[1:5,1:5]
```

```
X1 X2 X3 X4 X5
X1 0.00000 21.35416 25.29822 27.276363 26.076810
X2 21.35416 0.00000 26.38181 24.657656 24.000000
X3 25.29822 26.38181 0.00000 22.090722 22.090722
X4 27.27636 24.65766 22.09072 0.000000 8.944272
X5 26.07681 24.00000 22.09072 8.944272 0.000000
```

Previously we tested whether structural similarity related to ranking similarity, lets look at it for age similarity instead now.

What we need then is the *distance* in ages for all *dyads* in our network. We can use the `dist()`

function for this.

If we give `dist()`

a single vector it will calculate the pairwise distance between all items in the vector

```
## Calculate the distance in ages
<- dist(V(net)$Age, method="euclidean")
age_dist ## Check the value (person 1 and 2 are the same age)
## Person 1 and 3 are off by one year.
as.matrix(age_dist)[1:5, 1:5]
```

```
1 2 3 4 5
1 0 0 1 1 4
2 0 0 1 1 4
3 1 1 0 2 5
4 1 1 2 0 3
5 4 4 5 3 0
```

What we need though is a full vector of ages and distances so we can directly compare them.

If we put a matrix into `c()`

it will convert it to a vector. Example:

```
<- matrix(1:4, nrow=2, ncol=2)
test_mat test_mat
```

```
[,1] [,2]
[1,] 1 3
[2,] 2 4
```

`c(test_mat)`

`[1] 1 2 3 4`

We can do the same thing to our distance matrix.

```
<- data.frame("Age_Dist"=c(age_dist),
data "SE_Dist"=c(london_se))
1:5,] data[
```

```
Age_Dist SE_Dist
1 0 21.35416
2 1 25.29822
3 1 27.27636
4 4 26.07681
5 5 24.97999
```

We can create a scatter plot or run a correlation (`cor()`

):

`cor(data$Age_Dist, data$SE_Dist, use="complete.obs")`

`[1] -0.02831001`

```
ggplot(data, aes(x=Age_Dist, y=SE_Dist)) +
geom_jitter() + theme_minimal() +
labs(y="Struct Distance",
x="Age Distance")
```

We can also hierarchical cluster the output from `sedist()`

just as we did before:

```
<- sedist(mat, method="euclidean")
london_se <- hclust(london_se)
clu plot(clu)
```

We can also hierarchical cluster the output from `sedist()`

just as we did before.

Remember `k=`

sets the number of clusters you want, while `h=`

says how far down the y axis to go to create the clusters (only set one)

```
<- cutree(clu, k=4)
memberships 1:5] memberships[
```

```
X1 X2 X3 X4 X5
1 1 2 2 2
```

We can use this to look at it as a blockmodel. There are a few ways to do this:

- The
`plotMat()`

function creates matrix plot showing the blocks, and coloring the squares`M=`

is the adjacency matrix`clu=`

is the memberships in each cluster.`orderClu=`

can be used to order the clusters their total value (though this can be wonky if there are ties)`mar=`

sets the margins of the plots (the default are gigantic)

- The
`funByBlocks()`

applies a function to each block. The default is to calculate the`mean()`

of each block.`M=`

and`clu=`

(same as above)`FUN=`

the function you want to use.

```
plotMat(M=mat, clu=memberships, orderClu=T,
mar=c(0.1, 0.1, 0.1, 0.1))
plotMat(M=mat>0, clu=memberships, orderClu=T,
mar=c(0.1, 0.1, 0.1, 0.1))
```

I use `mat >0`

here as the matrix has weights and this just switches it only 1 and 0.

`funByBlocks(M=mat>0, clu=memberships)`

```
1 2 3 4
1 0.7692308 0.4175824 0.1615385 0.4807692
2 0.4175824 1.0000000 0.1619048 0.2142857
3 0.1615385 0.1619048 0.1080460 0.1250000
4 0.4807692 0.2142857 0.1250000 1.0000000
```

`funByBlocks(M=mat>0, clu=memberships, FUN=median, na.rm=T)`

```
1 2 3 4
1 1 0 0 0
2 0 1 0 0
3 0 0 0 0
4 0 0 0 1
```

We can find a blockmodel by direct optimization using `optRandomParC()`

. This function can do a lot, we are going to use it with a limited set of options.

`M=`

The matrix`k=`

The number of clusters to identify`approaches="bin"`

This sets it to use the binary algorithm.`blocks=c("nul", "com")`

Allows for null or complete blocks.`rep=`

The number of random starts to use (should be at least 100)`nCores=`

The number of cores on your computer to use (defaults to 1)- If you want to run this in parallel you can set this to 2 or 4, you might need to install the
`doParallel`

and`doRNG`

packages.

- If you want to run this in parallel you can set this to 2 or 4, you might need to install the

It will automatically display the number of solutions and errors (for the best)

```
<- optRandomParC(M=mat, k= 4, approaches="bin",
blocks blocks=c("nul", "com"),
rep=500, nCores=4)
```

`Loading required namespace: doRNG`

```
Optimization of all partitions completed
2 solution(s) with minimal error = 384 found.
```

```
# The image matrix
IM(blocks)
```

```
[,1] [,2] [,3] [,4]
[1,] "nul" "nul" "nul" "nul"
[2,] "nul" "com" "nul" "com"
[3,] "nul" "nul" "com" "com"
[4,] "nul" "com" "com" "com"
```

```
# The error matrix
EM(blocks)
```

```
[,1] [,2] [,3] [,4]
[1,] 76 46 37 35
[2,] 46 10 12 9
[3,] 37 12 2 8
[4,] 35 9 8 2
```

You can either use `clu()`

or `orderClu()`

to access the clusters. `orderClu()`

simple reorders the clusters (similar as above)

`clu(blocks)[1:10]`

` [1] 4 4 4 3 3 3 4 2 2 2`

`orderClu(blocks)[1:10]`

```
4 4 4 3 3 3 4 2 2 2
1 1 1 2 2 2 1 3 3 3
```

You can plot it as is

`plot(x=blocks, mar=c(0.1, 0.1, 0.1, 0.1))`

Or ordered

`plot(x=blocks, mar=c(0.1, 0.1, 0.1, 0.1), orderClu=T)`

For igraph the `graph_from_biadjacency_matrix()`

function can be used to convert the matrix into a bipartite network. We are using a network of interlocking boards from Scotland. You can download it here.

```
library(igraph)
library(ggraph)
<- read.csv("network_data/scotland.csv", row.names=1)
inc_df <- as.matrix(inc_df)
inc_mat
<- graph_from_biadjacency_matrix(inc_mat)
net
net
```

```
IGRAPH cbaad78 UN-B 244 358 --
+ attr: type (v/l), name (v/c)
+ edges from cbaad78 (vertex names):
[1] earl of dalkeith--north.british.railway
[2] earl of dalkeith--forth.bridge.railway
[3] earl of dalkeith--scottish.widows.fund.life.ass.
[4] carlow, c. --north.british.railway
[5] carlow, c. --fife.coal
[6] carlow, c. --royal.bank.of.scotland
[7] gilroy, a.(b.) --north.british.railway
[8] gilroy, a.(b.) --victoria.jute
+ ... omitted several edges
```

The *important* thing that sets this as a bipartite network is that there is a nodal attribute `type`

that defines which node is in which group.

`table(V(net)$type)`

```
FALSE TRUE
136 108
```

This is useful, but also frustrating, as it doesn’t tell you what is what. I recommend creating a new attribute with better labels

```
V(net)$label <- NA
# I know which is which because
# I know there are 109 companies.
V(net)$label[V(net)$type] <- "Company"
V(net)$label[!V(net)$type] <- "Board Member"
table(V(net)$label)
```

```
Board Member Company
136 108
```

There is a layout for bipartite networks (I don’t like it)

```
ggraph(net, "bipartite") +
geom_edge_link(color="gray") +
geom_node_point(size=3, aes(color=label))
```

Lets start by calculating the degree for each group *independently* this is easy by subsetting the results our label (or the type) attribute

```
<- degree(net)[V(net)$label == "Company"]
corp_degree <- degree(net)[V(net)$label != "Company"]
member_degree
mean(corp_degree)
```

`[1] 3.314815`

`mean(member_degree)`

`[1] 2.632353`

For density, we need the number of edges, and the number of each type:

`table(V(net)$type)`

```
FALSE TRUE
136 108
```

`ecount(net)/(136*108)`

`[1] 0.02437364`

There is no *easy* way to calculate the 4-cycle closures measure in R (or UCINET for what I can tell).

Projecting can be pretty easy:

```
## Get the "true"
<- bipartite_projection(net, which="true")
company_net
## Get the "false"
<- bipartite_projection(net, which="false")
members_net
members_net
```

```
IGRAPH fefd4dd UNW- 136 678 --
+ attr: name (v/c), label (v/c), weight (e/n)
+ edges from fefd4dd (vertex names):
[1] earl of dalkeith--carlow, c.
[2] earl of dalkeith--gilroy, a.(b.)
[3] earl of dalkeith--grierson, h.
[4] earl of dalkeith--mccosh, a.k.
[5] earl of dalkeith--macpherson, h.s.
[6] earl of dalkeith--simpson, a.
[7] earl of dalkeith--younger, h.g.
[8] earl of dalkeith--lord balfour of burleigh
+ ... omitted several edges
```

We can then plot this

```
ggraph(members_net, "kk") +
geom_edge_link(color="gray", aes(width=weight)) +
geom_node_point(size=3)
```

First identify what the distribution of edge weights looks like.

`hist(E(members_net)$weight)`

Lets drop any edge with a weight less than 1 (THIS IS A BAD IDEA FOR THIS NETWORK)

We can use `which()`

to identify which edges are at a certain threshold and `delete.edges()`

to… delete edges

```
<- which(E(members_net)$weight < 2)
edge_to_delete <- delete.edges(members_net, edge_to_delete) pruned_net
```

```
Warning: `delete.edges()` was deprecated in igraph 2.0.0.
ℹ Please use `delete_edges()` instead.
```

```
ggraph(pruned_net, "kk") +
geom_edge_link(color="gray", aes(width=weight)) +
geom_node_point(size=3)
```

So this isn’t a super common metric, so I wrote an R function you can download. You can use it easily by calling the file with the code through `source()`

It works to the matrix, so we create that with `as_biadjacency_matrix()`

```
source("network_data/project_bonac.r")
# you will now have a project_bonac() function
<- as_biadjacency_matrix(net)
mat <- project_bonac(mat)
bonac_normalized <- graph_from_adjacency_matrix(bonac_normalized,
bonac_memb_net diag=F, mode="undirected", weight=T)
```

```
ggraph(bonac_memb_net, "kk") +
geom_edge_link(color="gray", aes(width=weight)) +
geom_node_point(size=3) + scale_edge_width(range=c(0.5, 2))
```

If you want to cluster the bipartite network using the Dual Louvain method then you have to do some work to get it back into the right spot:

```
<- cluster_louvain(members_net)
clust_1 <- cluster_louvain(company_net)
clust_2
## Put each label in the right spot
<- V(net)$label=="Board Member"
node_select V(net)$dual_cluster[node_select] <- letters[clust_1$membership]
<- V(net)$label=="Company"
node_select V(net)$dual_cluster[node_select] <- LETTERS[clust_2$membership]
```

```
# To clean things up I'm going to drop
# the isolates
<- delete.vertices(net, degree(net) == 0) plot_net
```

```
Warning: `delete.vertices()` was deprecated in igraph 2.0.0.
ℹ Please use `delete_vertices()` instead.
```

```
ggraph(plot_net, "kk") +
geom_edge_link(color="gray") +
geom_node_point(size=3,
aes(color=dual_cluster, shape=label))
```

The actual *running* of QAP based regression and correlation isn’t that hard but getting the data into the right format can be.

For dyadic QAP tests we need to have matrices/networks that represent all of our variables of interest.

The `sna`

package has (most of the) QAP functions.

We are going to use three networks and two other datasets.

- mids_net.csv is an edgelist of militarized interstate disputes
- trade_net.csv is an edgelist of “major trading partners”
- cont_net.csv is an edgelist of which countries are next to which countries
- nmc_vertex.csv is a dataset of military capabilities.
- democracy_data.csv is a dataset of democratic levels of countries.

```
library(igraph)
setwd("network_data")
<- read.csv("cow/mids_net.csv")
mids_df <- read.csv("cow/nmc_vertex.csv")
nmc_data <- graph_from_data_frame(mids_df,
mids_net vertices=nmc_data, directed=F)
<- read.csv("cow/trade_net.csv")
trade_df <- graph_from_data_frame(trade_df,
trade_net vertices=nmc_data, directed=F)
<- read.csv("cow/cont_net.csv")
cont_df <- graph_from_data_frame(cont_df,
cont_net vertices=nmc_data, directed=F)
```

Most of what we will use is in the SNA package. The SNA package accepts matrices, so we can convert everything into matrices.

Remember, we set `sparse=F`

and if we want to use weights we set `attr="weight"`

(we don’t have weights here).

```
<- as_adjacency_matrix(mids_net,sparse=F)
mids_mat
<- as_adjacency_matrix(trade_net, sparse=F)
trade_mat
<- as_adjacency_matrix(cont_net, sparse=F) cont_mat
```

`gcor()`

will correlate two matrices, but doesn’t give us a p-value hypothesis test.

Set `mode="graph"`

if you have an undirected network, or `mode="digraph"`

if you have a directed network.

`library(sna)`

`Loading required package: statnet.common`

```
Attaching package: 'statnet.common'
```

```
The following objects are masked from 'package:base':
attr, order
```

`Loading required package: network`

```
'network' 1.18.2 (2023-12-04), part of the Statnet Project
* 'news(package="network")' for changes since last version
* 'citation("network")' for citation information
* 'https://statnet.org' for help, support, and other information
```

```
Attaching package: 'network'
```

```
The following objects are masked from 'package:igraph':
%c%, %s%, add.edges, add.vertices, delete.edges, delete.vertices,
get.edge.attribute, get.edges, get.vertex.attribute, is.bipartite,
is.directed, list.edge.attributes, list.vertex.attributes,
set.edge.attribute, set.vertex.attribute
```

```
sna: Tools for Social Network Analysis
Version 2.7-2 created on 2023-12-05.
copyright (c) 2005, Carter T. Butts, University of California-Irvine
For citation information, type citation("sna").
Type help(package="sna") to get started.
```

```
Attaching package: 'sna'
```

```
The following object is masked from 'package:blockmodeling':
sedist
```

```
The following objects are masked from 'package:igraph':
betweenness, bonpow, closeness, components, degree, dyad.census,
evcent, hierarchy, is.connected, neighborhood, triad.census
```

`gcor(cont_mat, trade_mat, mode="graph")`

`[1] 0.2254307`

To test the hypothesis we need: `gaptest()`

which takes a list of networks, the function you want to use to calculate the test statistics, and two annoying indicators (`g1=1, g2=2`

). Finally, `reps=`

will tell it how many permutations to do.

```
<- qaptest(list(cont_mat, trade_mat),
qap g1=1, g2=2, reps=1000, mode="graph") gcor,
```

You can call `summary()`

on your output to get details

`p(f(perm) >= f(d)`

: The proportion of permuted values that are greater than or equal to your real value. A one-sided*greater than*p-value`p(f(perm) <= f(d))`

: The proportion of permuted values that are less than or equal to your real value. A one-sided*less than*p-value.

`summary(qap)`

```
QAP Test Results
Estimated p-values:
p(f(perm) >= f(d)): 0
p(f(perm) <= f(d)): 1
Test Diagnostics:
Test Value (f(d)): 0.2254307
Replications: 1000
Distribution Summary:
Min: -0.01351818
1stQ: -0.005134005
Med: -0.0009419196
Mean: -0.000174768
3rdQ: 0.003250166
Max: 0.02840268
```

If you are curious you can access all the different correlations found through the permutation/QAP test with `$dist`

and the observed value is `$testval`

`sum(abs(qap$dist) < qap$testval) # Two-sided p-value`

`[1] 1000`

`hist(qap$dist)`

We can use `netlogit()`

or `netlm()`

to calculate our regression.

It takes:

`y=`

our DV (as a matrix or network)`x=`

a list of independent variables (as matrices or networks)`reps=`

the number of reps (for not set this at like 10).`nullhyp=`

If we want to do Y-permutations (`"qap"`

) or semi-partial (`"qapspp"`

)

We already have our trade network, and contiguity network, so we can just put them into this:

```
<- netlogit(y=mids_mat,
mod x=list(trade_mat, cont_mat),
reps=100, nullhyp="qapspp")
```

To get the results just call the object

` mod`

```
Network Logit Model
Coefficients:
Estimate Exp(b) Pr(<=b) Pr(>=b) Pr(>=|b|)
(intercept) -5.553377 0.00387435 1.00 0.00 1.00
x1 0.341634 1.40724509 0.86 0.14 0.33
x2 4.620454 101.54015727 1.00 0.00 0.00
Goodness of Fit Statistics:
Null deviance: 50308.62 on 36290 degrees of freedom
Residual deviance: 2578.148 on 36287 degrees of freedom
Chi-Squared test of fit improvement:
47730.47 on 3 degrees of freedom, p-value 0
AIC: 2584.148 BIC: 2609.646
Pseudo-R^2 Measures:
(Dn-Dr)/(Dn-Dr+dfn): 0.5680815
(Dn-Dr)/Dn: 0.9487534
```

Each row shows estimated statistics for the variable.

- Label (x1)
- The coefficient estimate (0.34)
- The exponentiation of the coefficient (1.40), this is helpful sometimes for logistic regression.
`Pr(<=b)`

One-sided less than hypothesis p-value.`Pr(>=b)`

One-sided greater than hypothesis p-value.`Pr(>=|b|)`

Two-sided hypothesis p-value.

We talked about creating variables that indicate where two nodes are equal to, or the distance between two nodes, or the summation of two nodes… How? The `outer()`

function.

`outer()`

takes in two vectors, and an operation, and then does that function to every pairwise combination.

```
<- c(1,3,5)
test_vector outer(test_vector, test_vector, "+")
```

```
[,1] [,2] [,3]
[1,] 2 4 6
[2,] 4 6 8
[3,] 6 8 10
```

We can do this also with differences, but we should note that `a-b`

is different than `b-a`

so for undirected data we want to take the absolute value.

```
<- c(1,3,5)
test_vector outer(test_vector, test_vector, "-")
```

```
[,1] [,2] [,3]
[1,] 0 -2 -4
[2,] 2 0 -2
[3,] 4 2 0
```

`outer(test_vector, test_vector, "-") |> abs()`

```
[,1] [,2] [,3]
[1,] 0 2 4
[2,] 2 0 2
[3,] 4 2 0
```

It is often interesting to look at if two nodes are the same as other nodes. We can do that with the function `"=="`

in `outer()`

```
<- c("Miami", "OSU", "OU")
test_vector outer(test_vector, test_vector, "==")
```

```
[,1] [,2] [,3]
[1,] TRUE FALSE FALSE
[2,] FALSE TRUE FALSE
[3,] FALSE FALSE TRUE
```

So to add the difference in Mil Personnel we need to extract that vector from the network, clean it up slightly, put it through `outer()`

and then take the absolute value of that.

```
<- V(mids_net)$milper
mil_cap ==-9] <- 0 # Replacing missing values with 0s.
mil_cap[mil_cap<- outer(mil_cap, mil_cap, "-")
mil_cap_mat <- (abs(mil_cap_mat)) mil_cap_mat
```

We can add that matrix to the `netlogit`

function just as we did with the networks before.

```
<- netlogit(y=mids_mat,
mod x=list(trade_mat, cont_mat, mil_cap_mat),
reps=100)
```

```
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
Warning: glm.fit: fitted probabilities numerically 0 or 1 occurred
```

`summary(mod)`

```
Network Logit Model
Coefficients:
Estimate Exp(b) Pr(<=b) Pr(>=b) Pr(>=|b|)
(intercept) -5.7848114241 0.00307389 1.00 0.00 1.00
x1 0.3666349780 1.44287114 0.87 0.13 0.24
x2 4.5827769946 97.78556750 1.00 0.00 0.00
x3 0.0008590974 1.00085947 1.00 0.00 0.00
Goodness of Fit Statistics:
Null deviance: 50308.62 on 36290 degrees of freedom
Residual deviance: 2509.59 on 36286 degrees of freedom
Chi-Squared test of fit improvement:
47799.03 on 4 degrees of freedom, p-value 0
AIC: 2517.59 BIC: 2551.587
Pseudo-R^2 Measures:
(Dn-Dr)/(Dn-Dr+dfn): 0.5684336
(Dn-Dr)/Dn: 0.9501161
Contingency Table (predicted (rows) x actual (cols)):
Actual
Predicted 0 1
0 35942 314
1 22 12
Total Fraction Correct: 0.9907413
Fraction Predicted 1s Correct: 0.3529412
Fraction Predicted 0s Correct: 0.9913394
False Negative Rate: 0.9631902
False Positive Rate: 0.0006117228
Test Diagnostics:
Null Hypothesis: qap
Replications: 100
Distribution Summary:
(intercept) x1 x2 x3
Min -92.54464 -2.63579 -2.35812 -4.43309
1stQ -89.01781 -1.04673 -1.07821 -1.95423
Median -88.30635 -0.43733 -0.05183 -0.38320
Mean -88.33197 -0.11787 0.03580 -0.09013
3rdQ -87.53640 0.76717 0.78712 1.33367
Max -83.63469 5.23985 4.63217 8.70260
```

We are going to load the high school data we had before and then delete the nodes that are have missing information.

```
<- read.csv("network_data/spanish_hs_edges.csv")
edges <- read.csv("network_data/spanish_hs_nodes.csv")
nodes <- graph_from_data_frame(edges, vertices=nodes, directed=T)
net <- delete_vertices(net, which(is.na(V(net)$prosocial))) net
```

We want to look at *what* leads an individual to indicate that they have a *better* relationship with someone.

The DV is the rating the sender gave the relationship (-2 to +2), with 0 meaning no relationship.

Our dyadic independent variables:

- Are the sender and the receiver the same sex?
- Are the sender and the receiver in the same group?
- Is the sender male?
- Is the receiver male?
- Does the sender have a higher sociability score?
- Does the receiver have a higher sociability score?

The dependent variable is easy enough using `as_adjacency_matrix()`

and extracting the weight attribute. We’ve seen how to identify if the two nodes are the same before.

```
<- as_adjacency_matrix(net, attr="weight", sparse=F)
dv_mat <- outer(V(net)$Sexo, V(net)$Sexo, "==")
same_sex <- outer(V(net)$Grupo, V(net)$Grupo, "==") same_group
```

To create a ‘sender is male’ matrix we can compare the nodal sex with a vector of just `"Male"`

.

Remember `outer()`

goes through each item in the *first* argument along the rows, so for sender we put the node’s first and we switch the order for the receiver matrix.

```
<- outer(V(net)$Sexo, rep("Male", length(V(net))), "==")
send_male <- outer(rep("Male", length(V(net))), V(net)$Sexo, "==") rec_male
```

We can do something similar with the prosocial variable but just multiply it by 1.

```
<- outer(V(net)$prosocial, rep(1, length(V(net))), "*")
send_soc <- outer(rep(1, length(V(net))), V(net)$prosocial, "*") rec_soc
```

If you are confused by what we are doing try running:

```
outer(c(1.4, 0.4, 0.1), rep(1, 3), "*")
outer(rep(1, 3), c(1.4, 0.4, 0.1), "*")
```

We can estimate this just like before using `netlm()`

. `mode="digraph"`

is the default and indicates a directed network.

```
<- netlm(y=dv_mat, x=list(same_sex,
mod
same_group,
send_male, rec_male,
rec_soc, send_soc), mode="digraph")
```

` mod`

```
OLS Network Model
Coefficients:
Estimate Pr(<=b) Pr(>=b) Pr(>=|b|)
(intercept) -0.054080439 0.193 0.807 0.381
x1 0.106753064 1.000 0.000 0.000
x2 0.391130533 1.000 0.000 0.000
x3 -0.005801325 0.416 0.584 0.865
x4 0.040534712 0.979 0.021 0.044
x5 -0.024541180 0.265 0.735 0.554
x6 0.029253310 0.655 0.345 0.662
Residual standard error: 0.5489 on 6313 degrees of freedom
F-statistic: 114.3 on 6 and 6313 degrees of freedom, p-value: 0
Multiple R-squared: 0.09797 Adjusted R-squared: 0.09711
```

The coefficient on same sex (`x1`

) and same group (`x21`

) are both statistically significant and positive. In particular, if the receiver is in the same group as the sender then they are likely to rank them 0.39 points higher, and if they are the same sex then they are likely to rank them 0.11 higher.

The coefficient on receiver male has a two-sided p-value of 0.044 and so is also statistically significant. This indicates that male students are more likely to receive a higher rating than female students (although the effect is relatively small as the coefficient is 0.04).

If you want to compare it to the version without QAP you can set `nullhyp="classic"`

.

You’ll see that the two-sided p-values are generally smaller in this case (except for the ones that are exactly 0).

```
<- netlm(y=dv_mat, x=list(same_sex,
mod_wrong
same_group,
send_male, rec_male,
rec_soc, send_soc), mode="digraph", nullhyp="classic")
mod_wrong
```

```
OLS Network Model
Coefficients:
Estimate Pr(<=b) Pr(>=b) Pr(>=|b|)
(intercept) -0.054080439 0.03956882 9.604312e-01 7.913763e-02
x1 0.106753064 1.00000000 1.766833e-14 3.533667e-14
x2 0.391130533 1.00000000 9.981165e-126 1.996233e-125
x3 -0.005801325 0.34136768 6.586323e-01 6.827354e-01
x4 0.040534712 0.99784800 2.151996e-03 4.303991e-03
x5 -0.024541180 0.20131319 7.986868e-01 4.026264e-01
x6 0.029253310 0.84077019 1.592298e-01 3.184596e-01
Residual standard error: 0.5489 on 6313 degrees of freedom
F-statistic: 114.3 on 6 and 6313 degrees of freedom, p-value: 0
Multiple R-squared: 0.09797 Adjusted R-squared: 0.09711
```

For nodal tests we need another library (I know right), this one is `permuco`

so run `install.packages("permuco")`

.

What do we need to do to test what leads to having more trading partners?

- Calculate the degree centrality on our trade network.
- Extract all that wonderful nodal info into a dataframe.
- Add in the democracy data.
- Run our permutation regression.

(If `degree()`

is freaking out it might be a clash between `sna`

and `igraph`

fix it with `igraph::degree()`

)

```
V(trade_net)$degree <- igraph::degree(trade_net)
<- as_data_frame(trade_net, what="vertices")
trade_df setwd("network_data")
<- read.csv("cow/democracy_data.csv") democracy_df
```

To merge two datasets we use the `merge()`

function:

`x=`

The first dataset.`y=`

The second dataset.`by.x=`

The column from the first dataset to use to merge.`by.y=`

The column from the second dataset to use to merge.

```
<- merge(trade_df, democracy_df,
all_data by.x="name",
by.y="COWcode")
```

Running the regression is pretty simple, and is similar to what you would do with normal linear regression in R.

`lmperm()`

is the function, it takes in a formula, a dataset, and a number of permutations (lets do 50 so this doesn’t take *forever*).

```
library(permuco)
<- lmperm(degree~v2x_polyarchy + milper,
mod data=all_data, np=50)
```

```
Warning in lmperm_fix(formula = formula, data = data, method = method, type =
type, : The number of permutations is below 2000, p-values might be unreliable.
```

```
Warning in check_distribution(distribution = distribution, digits = 10, : the
distribution of v2x_polyarchy, milper may be discrete.
```

The results are similarly formatted.

`summary(mod)`

```
Table of marginal t-test of the betas
Resampling test using freedman_lane to handle nuisance variables and 50 permutations.
Estimate Std. Error t value parametric Pr(>|t|)
(Intercept) -0.046836 0.4700088 -0.09965 9.207e-01
v2x_polyarchy 3.265364 0.7930987 4.11722 6.017e-05
milper 0.003611 0.0007406 4.87514 2.515e-06
resampled Pr(<t) resampled Pr(>t) resampled Pr(>|t|)
(Intercept)
v2x_polyarchy 1 0.02 0.02
milper 1 0.02 0.02
```

We are going to go back to the MIDs data we were using before (scroll above for info on it)

```
library(network)
library(ergm)
```

```
Registered S3 methods overwritten by 'ergm':
method from
simulate.formula lme4
simulate.formula_lhs lme4
summary.formula Hmisc
```

```
'ergm' 4.6.0 (2023-12-17), part of the Statnet Project
* 'news(package="ergm")' for changes since last version
* 'citation("ergm")' for citation information
* 'https://statnet.org' for help, support, and other information
```

```
'ergm' 4 is a major update that introduces some backwards-incompatible
changes. Please type 'news(package="ergm")' for a list of major
changes.
```

```
Attaching package: 'ergm'
```

```
The following object is masked from 'package:statnet.common':
snctrl
```

```
library(intergraph)
setwd("network_data")
<- read.csv("cow/mids_net.csv")
mids_df <- read.csv("cow/nmc_vertex.csv")
nmc_data <- graph_from_data_frame(mids_df,
mids_net vertices=nmc_data, directed=F)
V(mids_net)$milper[V(mids_net)$milper<0] <- 0
<- asNetwork(mids_net)
mids_net
<- read.csv("cow/trade_net.csv")
trade_df <- graph_from_data_frame(trade_df,
trade_net vertices=nmc_data, directed=F)
<- asNetwork(trade_net)
trade_net
<- read.csv("cow/cont_net.csv")
cont_df <- graph_from_data_frame(cont_df,
cont_net vertices=nmc_data, directed=F)
<- asNetwork(cont_net) cont_net
```

We started by incorporating our other networks into the model, we can do this here by putting those network objects into `dyadcov()`

.

```
<- ergm(mids_net ~ edges +
mod dyadcov(trade_net) +
dyadcov(cont_net) )
```

`Starting maximum pseudolikelihood estimation (MPLE):`

`Obtaining the responsible dyads.`

`Evaluating the predictor and response matrix.`

`Maximizing the pseudolikelihood.`

`Finished MPLE.`

`Evaluating log-likelihood at the estimate. `

`summary(mod)`

```
Call:
ergm(formula = mids_net ~ edges + dyadcov(trade_net) + dyadcov(cont_net))
Maximum Likelihood Results:
Estimate Std. Error MCMC % z value Pr(>|z|)
edges -5.5534 0.1207 0 -46.026 <1e-04 ***
dyadcov.trade_net 0.3416 0.2993 0 1.141 0.254
dyadcov.cont_net 4.6205 0.1813 0 25.479 <1e-04 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Null Deviance: 25154 on 18145 degrees of freedom
Residual Deviance: 1289 on 18142 degrees of freedom
AIC: 1295 BIC: 1318 (Smaller is better. MC Std. Err. = 0)
```

We can add in a variety of dyadic-independent terms:

`nodematch()`

will add a term indicating that two nodes match on a categorical attribute.`nodecov()`

will add the dyadic sum of the attribute.`nodefactor()`

will add a count of how many types each level of an attribute shows up in a dyad.- Example: If you have an attribute that is “Region” with the levels “Northwest”, “West Coast”, etc. You could have a dyad where both nodes are in the “Northwest” and so this dyad would have a 2 on the “Northwest” level and 0 on other levels, if one node was “West Coast” and the other “Northwest” the it would have a 1 on both of those and 0 on all others. This is equivalent to adding indicator variables from a factor.

- Example: If you have an attribute that is “Region” with the levels “Northwest”, “West Coast”, etc. You could have a dyad where both nodes are in the “Northwest” and so this dyad would have a 2 on the “Northwest” level and 0 on other levels, if one node was “West Coast” and the other “Northwest” the it would have a 1 on both of those and 0 on all others. This is equivalent to adding indicator variables from a factor.
`absdiff()`

will add a term that is the absolute difference between the nodes on whatever variable you include.

We will start by accounting for the absolute difference in military personnel.

```
<- ergm(mids_net ~ edges +
mod dyadcov(trade_net) +
dyadcov(cont_net) +
absdiff("milper") )
```

`Starting maximum pseudolikelihood estimation (MPLE):`

`Obtaining the responsible dyads.`

`Evaluating the predictor and response matrix.`

`Maximizing the pseudolikelihood.`

`Finished MPLE.`

`Evaluating log-likelihood at the estimate. `

The coefficients are going to be the same as QAP, but the p-values aren’t. We haven’t included any network variables here.

`summary(mod)`

```
Call:
ergm(formula = mids_net ~ edges + dyadcov(trade_net) + dyadcov(cont_net) +
absdiff("milper"))
Maximum Likelihood Results:
Estimate Std. Error MCMC % z value Pr(>|z|)
edges -5.7848114 0.1328795 0 -43.534 <1e-04 ***
dyadcov.trade_net 0.3666350 0.3015442 0 1.216 0.224
dyadcov.cont_net 4.5827770 0.1845116 0 24.837 <1e-04 ***
absdiff.milper 0.0008591 0.0001316 0 6.530 <1e-04 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Null Deviance: 25154 on 18145 degrees of freedom
Residual Deviance: 1255 on 18141 degrees of freedom
AIC: 1263 BIC: 1294 (Smaller is better. MC Std. Err. = 0)
```

The two gw-terms we’ve learned:

`gwdegree()`

captures the tendency towards transitivity. Positive values mean open triads are more likely to close.`gwesp()`

captures a tendency towards having equal or unequal distributions of edges. Positive values mean edges are more equally distributed, negative values mean edges tend to go to those who already have edges (popularity).

Both can have their decay set to 0 with `decay=0, fixed=T`

.

```
<- ergm(mids_net ~ edges + dyadcov(trade_net) +
full_mod dyadcov(cont_net) + nodecov("milper") +
gwdegree(decay=0, fixed=T) +
gwesp(decay=0, fixed=T) )
```

`Starting maximum pseudolikelihood estimation (MPLE):`

`Obtaining the responsible dyads.`

`Evaluating the predictor and response matrix.`

`Maximizing the pseudolikelihood.`

`Finished MPLE.`

`Starting Monte Carlo maximum likelihood estimation (MCMLE):`

`Iteration 1 of at most 60:`

`Optimizing with step length 0.1218.`

`The log-likelihood improved by 2.0849.`

`Estimating equations are not within tolerance region.`

`Iteration 2 of at most 60:`

`Optimizing with step length 0.1449.`

`The log-likelihood improved by 1.8646.`

`Estimating equations are not within tolerance region.`

`Iteration 3 of at most 60:`

`Optimizing with step length 0.1308.`

`The log-likelihood improved by 1.8698.`

`Estimating equations are not within tolerance region.`

`Iteration 4 of at most 60:`

`Optimizing with step length 0.4978.`

`The log-likelihood improved by 2.9557.`

`Estimating equations are not within tolerance region.`

`Iteration 5 of at most 60:`

`Optimizing with step length 0.9212.`

`The log-likelihood improved by 2.3475.`

`Estimating equations are not within tolerance region.`

`Iteration 6 of at most 60:`

`Optimizing with step length 1.0000.`

`The log-likelihood improved by 0.1328.`

`Estimating equations are not within tolerance region.`

`Iteration 7 of at most 60:`

`Optimizing with step length 1.0000.`

`The log-likelihood improved by 0.0764.`

```
Convergence test p-value: 0.0577. Not converged with 99% confidence; increasing sample size.
Iteration 8 of at most 60:
Optimizing with step length 1.0000.
The log-likelihood improved by 0.0353.
Convergence test p-value: 0.8188. Not converged with 99% confidence; increasing sample size.
Iteration 9 of at most 60:
Optimizing with step length 1.0000.
The log-likelihood improved by 0.0316.
Convergence test p-value: 0.1981. Not converged with 99% confidence; increasing sample size.
Iteration 10 of at most 60:
Optimizing with step length 1.0000.
The log-likelihood improved by 0.0166.
Convergence test p-value: 0.1883. Not converged with 99% confidence; increasing sample size.
Iteration 11 of at most 60:
Optimizing with step length 1.0000.
The log-likelihood improved by 0.0123.
Convergence test p-value: 0.0042. Converged with 99% confidence.
Finished MCMLE.
Evaluating log-likelihood at the estimate. Fitting the dyad-independent submodel...
Bridging between the dyad-independent submodel and the full model...
Setting up bridge sampling...
Using 16 bridges: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 .
Bridging finished.
This model was fit using MCMC. To examine model diagnostics and check
for degeneracy, use the mcmc.diagnostics() function.
```

`summary(full_mod)`

```
Call:
ergm(formula = mids_net ~ edges + dyadcov(trade_net) + dyadcov(cont_net) +
nodecov("milper") + gwdegree(decay = 0, fixed = T) + gwesp(decay = 0,
fixed = T))
Monte Carlo Maximum Likelihood Results:
Estimate Std. Error MCMC % z value Pr(>|z|)
edges -5.4664006 0.1834414 0 -29.799 <1e-04 ***
dyadcov.trade_net 0.2838874 0.2630876 0 1.079 0.281
dyadcov.cont_net 4.0072176 0.1818470 0 22.036 <1e-04 ***
nodecov.milper 0.0006911 0.0001079 0 6.403 <1e-04 ***
gwdeg.fixed.0 -1.0343249 0.2488827 0 -4.156 <1e-04 ***
gwesp.fixed.0 0.8095523 0.1512802 0 5.351 <1e-04 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Null Deviance: 25154 on 18145 degrees of freedom
Residual Deviance: 1123 on 18139 degrees of freedom
AIC: 1135 BIC: 1182 (Smaller is better. MC Std. Err. = 0.459)
```

- There is a positive relationship between contiguity and MID. Countries that are contiguous are much more likely to go have a dispute with each other.
- Countries with dissimilar military capabilities are more likely to have a dispute than countries with similar military capabilities.
- There is a tendency for countries that get into disputes to get into disputes often (gwdegree)
- There is a tendency for transitivity in disputes as well. Disputes seem to cluster around certain sets of countries. (gwesp)

If you look at the ergm documentation, there are a *lot* of network variables. This leads to the question. Can we identify if we have a good (or bad) model?

One way to check this is to see if our model does a good job simulating a network with similar characteristics.

This can be done through the `gof()`

function and then `plot()`

.

`gof()`

takes in a model object, simulates a large number of networks, and then calculates descriptive statistics on it.

The boxplots show the distribution of descriptive statistics for the simulated networks. The black line shows the statistic for the observed network. A model that fits well is a model where the black line is generally within the gray boxplots.

- Top left: Shows the descriptive statistics that are related to all all the parameters you’ve included.
- Top right: Shows the degree distribution
- Bottom left: edge-wise shared partners distribution
- Bottom right: distribution of geodesic distances.

Lets look at trade networks in particular, lets see if democracies are more likely to trade with other democracies.

Need to do some cleaning to get the data in:

```
<- read.csv("network_data/spanish_hs_edges.csv")
edges <- read.csv("network_data/spanish_hs_nodes.csv")
nodes <- graph_from_data_frame(edges, vertices=nodes, directed=T)
net <- delete_vertices(net, which(is.na(V(net)$prosocial)))
net <- delete_edges(net, which(E(net)$weight < 0))
net <- as.undirected(net, mode="mutual")
net
<- asNetwork(net) net
```

Run our simple model.

```
<- ergm(net ~ edges + nodematch("Sexo") + nodematch("Grupo") +
mod nodefactor("Sexo") + nodefactor("Grupo") +
nodecov("prosocial"))
```

`Starting maximum pseudolikelihood estimation (MPLE):`

`Obtaining the responsible dyads.`

`Evaluating the predictor and response matrix.`

`Maximizing the pseudolikelihood.`

`Finished MPLE.`

`Evaluating log-likelihood at the estimate. `

`summary(mod)`

```
Call:
ergm(formula = net ~ edges + nodematch("Sexo") + nodematch("Grupo") +
nodefactor("Sexo") + nodefactor("Grupo") + nodecov("prosocial"))
Maximum Likelihood Results:
Estimate Std. Error MCMC % z value Pr(>|z|)
edges -4.61398 0.38530 0 -11.975 <1e-04 ***
nodematch.Sexo 1.22256 0.18073 0 6.765 <1e-04 ***
nodematch.Grupo 2.95815 0.19446 0 15.212 <1e-04 ***
nodefactor.Sexo.Male 0.10895 0.11254 0 0.968 0.3330
nodefactor.Grupo.B -0.09376 0.11821 0 -0.793 0.4277
nodefactor.Grupo.C -0.15182 0.13449 0 -1.129 0.2590
nodefactor.Grupo.D 0.23911 0.14274 0 1.675 0.0939 .
nodecov.prosocial -0.35101 0.23933 0 -1.467 0.1425
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Null Deviance: 4381 on 3160 degrees of freedom
Residual Deviance: 1108 on 3152 degrees of freedom
AIC: 1124 BIC: 1173 (Smaller is better. MC Std. Err. = 0)
```

```
<- gof(mod)
est_gof par(mfrow=c(2,2)) # Creates a 2 by 2
plot(est_gof)
```

Lets add in dyadic dependent terms, starting with a fixed decay of 0.

```
set.seed(1)
<- ergm(net ~ edges +nodematch("Sexo") + nodematch("Grupo") +
mod nodefactor("Sexo") + nodefactor("Grupo") +
nodecov("prosocial") +
gwesp(decay=0, fixed=T) + gwdegree(0, fixed=T))
<- gof(mod) est_gof
```

```
par(mfrow=c(2,2))
plot(est_gof)
```

```
<- ergm(net ~ edges +nodematch("Sexo") + nodematch("Grupo") +
mod nodefactor("Sexo") + nodefactor("Grupo") +
nodecov("prosocial") +
gwesp(decay=.25, fixed=T) + gwdegree(.25, fixed=T))
<- gof(mod) est_gof
```

```
par(mfrow=c(2,2))
plot(est_gof)
```

```
<- ergm(net ~ edges +nodematch("Sexo") + nodematch("Grupo") +
mod nodefactor("Sexo") + nodefactor("Grupo") +
nodecov("prosocial") +
gwesp(decay=.5, fixed=T) + gwdegree(.5, fixed=T))
<- gof(mod) est_gof
```

```
par(mfrow=c(2,2))
plot(est_gof)
```

```
<- ergm(net ~ edges +nodematch("Sexo") + nodematch("Grupo") +
mod nodefactor("Sexo") + nodefactor("Grupo") +
nodecov("prosocial") +
gwesp(decay=.75, fixed=T) + gwdegree(.75, fixed=T))
<- gof(mod) est_gof
```

```
par(mfrow=c(2,2))
plot(est_gof)
```

`summary(mod)`

```
Call:
ergm(formula = net ~ edges + nodematch("Sexo") + nodematch("Grupo") +
nodefactor("Sexo") + nodefactor("Grupo") + nodecov("prosocial") +
gwesp(decay = 0.75, fixed = T) + gwdegree(0.75, fixed = T))
Monte Carlo Maximum Likelihood Results:
Estimate Std. Error MCMC % z value Pr(>|z|)
edges -5.940007 0.351806 0 -16.884 < 1e-04 ***
nodematch.Sexo 0.724214 0.121896 0 5.941 < 1e-04 ***
nodematch.Grupo 1.408892 0.147166 0 9.573 < 1e-04 ***
nodefactor.Sexo.Male 0.034010 0.067383 0 0.505 0.61375
nodefactor.Grupo.B -0.005018 0.056520 0 -0.089 0.92925
nodefactor.Grupo.C -0.048045 0.070391 0 -0.683 0.49490
nodefactor.Grupo.D 0.136707 0.081334 0 1.681 0.09280 .
nodecov.prosocial -0.192379 0.168620 0 -1.141 0.25391
gwesp.fixed.0.75 1.316061 0.136495 0 9.642 < 1e-04 ***
gwdeg.fixed.0.75 1.075639 0.384633 0 2.797 0.00517 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Null Deviance: 4380.7 on 3160 degrees of freedom
Residual Deviance: 952.1 on 3150 degrees of freedom
AIC: 972.1 BIC: 1033 (Smaller is better. MC Std. Err. = 0.4174)
```

We can look at each term here in kind:

`edges`

This is equivalent to the intercept of a normal regression and so generally is*not*worth discussing.`nodematch.Sexo`

This shows the effect of two students having the*same*sex. The positive coefficient (0.72) is statistically significant (p-value < 0.05). This means that two students are more likely to have a friendship (an edge) if they are of the same sex.`nodematch.Grupo`

This shows the effect of two students being in*same*group. This coefficient is against positive (1.48) and is significant (p-value < 0.05) meaning that students in the same group are more likely to be friends.`nodefactor.Sexo.Male`

This shows whether male students are more likely to have friends than female students. It is not significant so there is no difference.`nodefactor.Grupo.B`

,`nodefactor.Grupo.C`

, and`nodefactor.Grupo.D`

show whether students in group B, C, and D are more likely to have friends than those in group A. This is again, not significant, so there is no difference.`nodecov.prosocial`

This shows whether the overall total amount of “prosocial” behavior increases a friendship between two people. There is again, no effect.`gwesp.fixed.0.75`

This shows how much transitivity matters to the network structure. The coefficient is positive (1.32) and statistically significant (p-value < 0.05) meaning that triangles are likely to form in this network (more than compared to a random network). In practice this indicates that transitivity happens in this networks.`gwdeg.fixed.0.75`

This models the distribution of degrees, the positive (1.08) and significant (p-value <0.05) means that there is a “rich get richer” aspect to this. The distribution of degrees tends towards a few with a large number of ties and many with few ties. (If it was negative we’d expect a more uniform distribution of degrees).