[1] 38
[1] 24
We are going to use two networks today:
The worker data is in the a “pajek” format:
The other two can be loaded by combining the edge and nodes:
Now we are going to move onto visualizing networks.
Visualizations can be useful with network data, but they are also hard to do:
We are going to use the ggraph
package.
Benefits:
Cons:
We are going to start with a small canned dataset you can download from the internet: strike.paj
It is a communication network between workers at a sawmill. It also is a unique data format: “pajek” which thankfully igraph has a function for
Just like ggplot2 all visualizations will start with a call to ggraph()
To add nodes and edges to this plot we will use geom_node_point()
and geom_edge_link()
geom_node_point
: Adds our nodes as circlesgeom_edge_link
: Adds edges as straight lines (no arrows)We can change the visuals of edges and nodes by assigning:
color=
Color of edge or nodesize=
Size of the nodes (defaults to 1)width=
The width of the edges (defaults to .5 (I think))alpha=
The amount of transparency for nodes or edges.The order of what you add on matters with later things being added on top of earlier things.
You can add on themes to your visualizations as well, theme_graph()
(designed for networks) or theme_void()
can work well
We can use geom_node_text()
or geom_node_label()
to label our nodes.
They also have a repel=T
argument that will move the labels away from the center of the node.
Finally you can label your plot with the labs()
function
Laying out a plot can impact how useful it is by a lot:
In most of these layouts they do something like:
FR views vertexes as “atomic particles or celestial bodies, exerting attractive and repulsive forces from one another”.
How does this algorithm work?
To set the layout you set layout=
to what you want, you can also pass additional arguments as necessary.
If you want to create the exact same layout every time run set.seed()
directly prior to making the plot. This sets the “random seed” that is used.
Working with larger networks can be difficult for a few reasons:
There are a few useful things to do:
Warning
If you are not showing the whole network you must make that clear.
The Ohio data connects groups based on their shared donation patterns.
We can drop the edges that aren’t “strong”:
We can use the function largest_component()
to grab just that part. Also the |>
is a pipe which passes on the output.
I don’t really care about the nodes themselves, what I care about is looking at how polarized this network is. There is a vertex attribute that shows the percentage of donations to Democratic candidates. Can we add that?
We can add vertex attributes to visuals on our network by using the aes()
function inside of geom_node_point()
and connecting the attribute name to the aesthetic we want to change:
geom_node_point(aes(color=degree))
Will color the node based on the vertex attribute named degree
.
We can also change the scales using the scales_*_*()
functions.
scale_color_gradient(low="green", high="purple", midpoint=.5)
- The color scale will go from green to purple with the midpoint at 0.5scale_size_continuous(trans="log10")
- Will impose a log transformation.scale_fill_brewer(type="qual", palette=3)
Fills things using a specific colorbrewer palette.scale_color_viridis_c()
- Colorblind friendly colors.Vertex attributes are included for a variety of reasons. This includes:
This data is of Spanish high school students and includes negative and positive relations. We are going to delete the negative edges.
edges <- read.csv("data/spanish_hs_edges.csv")
nodes <- read.csv("data/spanish_hs_nodes.csv")
net <- graph_from_data_frame(edges, vertices=nodes, directed=T)
neg_edges <- which(E(net)$weight < 0)
net <- delete_edges(net, neg_edges)
net
IGRAPH 1bc71d6 DNW- 105 1058 --
+ attr: name (v/n), Colegio (v/n), Curso (v/n), Grupo (v/c), Sexo
| (v/c), prosocial (v/n), crttotal (v/n), X_pos (v/c), id (e/n), weight
| (e/n)
+ edges from 1bc71d6 (vertex names):
[1] 3043->3047 3043->3087 3043->3093 3043->3065 3043->3097 3043->3044
[7] 3043->3045 3043->3088 3043->3056 3043->3090 3043->3073 3043->3066
[13] 3043->3060 3043->3092 3043->3096 3043->3077 3043->3084 3043->3105
[19] 3043->3067 3043->3064 3043->3081 3043->3068 3043->3061 3043->3058
[25] 3043->3055 3043->3072 3043->3095 3043->3051 3043->3054 3043->3086
[31] 3043->3085 3043->3089 3043->3047 3043->3048 3043->3049 3043->3050
+ ... omitted several edges
Arrows are annoying to add here, but there is some good help online. We manualy create an arrow (arrow
) and manually end them before the node (end_cap
)
Finally we can assign edge attributes to aesthetics
The default for ggraph is to show only a single edge when there are two mutual edges. We can change that by using geom_edge_fan()
Modeling network is difficult as there are a lot of dependencies
There are a few methods that have been developed:
Both attempt to model the complex interdependencies, I’m more familiar with ERGMs.
statnet
suite of packages.