Today we are going to use a subset of country data from The Quality of Governance Institute.
read_csv()
and if you need more help check out the first day of slides.install.packages(c("tidyverse", "gt", "rmarkdown"))
There is a description of all the variables I’ve included here.
For now though we are going to use a few of them:
bl_asymf
average schooling years, females and males between 15 and 64 years old.wdi_expedu
general government expenditure on education (current, capital, and transfers) is expressed as a percentage of GDPOften you want to select just specific rows of data that meet certain requirements.
We need to include some more operators to do this:
<
less than and >
greater than<=
less than or equal to and >=
greater than or equal to==
equal to and !=
not equal toWe can do the same thing but using a variable from our dataset:
[1] FALSE TRUE FALSE NA NA NA NA TRUE TRUE TRUE NA FALSE
[13] FALSE TRUE FALSE TRUE NA FALSE NA TRUE FALSE TRUE NA FALSE
[25] TRUE FALSE FALSE NA FALSE FALSE TRUE NA FALSE TRUE NA TRUE
[37] FALSE TRUE FALSE NA FALSE FALSE FALSE TRUE TRUE TRUE TRUE FALSE
[49] TRUE NA FALSE FALSE FALSE NA NA NA TRUE TRUE TRUE TRUE
[61] NA FALSE NA FALSE TRUE FALSE NA TRUE NA FALSE NA FALSE
[73] FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE FALSE
[85] TRUE TRUE TRUE TRUE FALSE NA TRUE FALSE TRUE FALSE NA FALSE
[97] TRUE FALSE FALSE NA TRUE TRUE NA FALSE TRUE FALSE FALSE TRUE
[109] FALSE FALSE FALSE NA FALSE TRUE NA FALSE FALSE NA FALSE NA
[121] FALSE TRUE NA TRUE FALSE FALSE NA TRUE NA NA NA FALSE
[133] TRUE FALSE FALSE FALSE FALSE TRUE FALSE NA NA FALSE TRUE TRUE
[145] FALSE NA NA NA NA NA TRUE FALSE TRUE NA FALSE TRUE
[157] TRUE FALSE TRUE NA TRUE FALSE TRUE NA FALSE NA FALSE TRUE
[169] TRUE FALSE TRUE FALSE FALSE TRUE TRUE TRUE FALSE FALSE NA NA
[181] FALSE TRUE NA FALSE TRUE FALSE TRUE NA FALSE NA TRUE NA
[193] FALSE FALSE
We can use logical checks to filter our data.
filter()
function is part of the plyr package in the tidyverse.Note
Within the filter()
call you do not need to use data$
before the variable name, it already knows you are using the data you put in the first argument.
cname | ccode | ti_cpi | vdem_academ | wdi_fertility | wdi_afp | bl_asymf | wdi_expedu | wdi_elprodcoal | wef_iu | wdi_foodins | ht_colonial | lp_legor | cai_foetal | cai_mental | cai_physical | ccp_initiat | ccp_market | h_j | wdi_homicides | ccp_strike | wdi_lfpr | br_pvote | br_elect | van_part | bmr_demdur | fh_polity2 | vdem_polyarchy | mad_gdppc | top_top1_income_share | wef_sp |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Albania | 8 | 36 | 0.876 | 1.62 | 0.643 | 11 | 3.61 | 0 | 71.8 | 10 | 0 | 3 | 1 | 1 | 1 | 1 | 1 | 0 | 2.29 | 1 | 68.3 | 1 | 3 | 52.8 | 22 | 8.08 | 0.52 | 1.11e+04 | 0.0908 | 57.3 |
Argentina | 32 | 40 | 0.935 | 2.26 | 0.512 | 10.2 | 5.46 | 2.03 | 74.3 | 12.9 | 2 | 2 | 0 | 0 | 1 | 1 | 2 | 0 | 5.32 | 1 | 69.2 | 1 | 3 | 58.6 | 36 | 8.92 | 0.779 | 1.86e+04 | 0.153 | 365 |
Australia | 36 | 77 | 0.847 | 1.74 | 0.438 | 12.5 | 5.12 | 62.9 | 86.5 | 3.8 | 0 | 1 | 1 | 1 | 1 | 2 | 2 | 1 | 0.892 | 3 | 78.1 | 0 | 3 | 60.8 | 118 | 10 | 0.865 | 4.98e+04 | 0.129 | 852 |
Austria | 40 | 76 | 0.973 | 1.47 | 0.497 | 10.8 | 5.36 | 8.23 | 87.7 | 1.1 | 0 | 4 | 1 | 1 | 1 | 1 | 2 | 1 | 0.967 | 3 | 76.6 | 1 | 3 | 58.5 | 73 | 10 | 0.846 | 4.3e+04 | 0.0992 | 579 |
Armenia | 51 | 35 | 0.8 | 1.75 | 3.92 | 11.8 | 2.71 | 0 | 64.7 | 1.1 | 0 | 1 | 1 | 1 | 2 | 96 | 0 | 1.69 | 1 | 61.6 | 1 | 3 | 51.7 | 1 | 6.75 | 0.472 | 1.15e+04 | 0.178 | 162 | |
Belgium | 56 | 75 | 0.967 | 1.62 | 0.619 | 11.6 | 6.41 | 6.11 | 88.7 | 1.1 | 0 | 2 | 1 | 1 | 1 | 2 | 2 | 1 | 1.69 | 3 | 68.6 | 1 | 3 | 68.2 | 125 | 9.5 | 0.891 | 3.98e+04 | 0.086 | 704 |
Botswana | 72 | 61 | 0.874 | 2.87 | 0.858 | 10.3 | 96.4 | 47 | 21.5 | 5 | 1 | 1 | 1 | 1 | 2 | 2 | 0 | 3 | 73.1 | 0 | 3 | 32.4 | 53 | 8.25 | 0.686 | 1.58e+04 | 0.227 | 93.7 | ||
Belize | 84 | 2.31 | 0.863 | 11.3 | 7.56 | 5 | 1 | 1 | 1 | 2 | 2 | 37.8 | 3 | 67.5 | 0 | 3 | 41.1 | 38 | 0.197 | |||||||||||
Bulgaria | 100 | 42 | 0.906 | 1.56 | 1.11 | 11.2 | 4.09 | 46.2 | 64.8 | 1.9 | 0 | 3 | 1 | 1 | 1 | 2 | 1 | 0 | 1.3 | 1 | 71.7 | 1 | 3 | 40.7 | 29 | 8.92 | 0.615 | 1.84e+04 | 0.182 | 221 |
Canada | 124 | 81 | 0.919 | 1.5 | 0.356 | 12.9 | 9.84 | 91 | 0.7 | 0 | 1 | 1 | 1 | 1 | 2 | 2 | 1 | 1.76 | 3 | 78.5 | 0 | 3 | 50.4 | 152 | 10 | 0.849 | 4.49e+04 | 0.149 | 1.03e+03 | |
Sri Lanka | 144 | 38 | 0.733 | 2.2 | 3.65 | 11.1 | 2.12 | 33.7 | 34.1 | 5 | 1 | 0 | 0 | 0 | 2 | 2 | 0 | 2.42 | 3 | 57.9 | 1 | 3 | 57.1 | 4 | 6.92 | 0.628 | 1.17e+04 | 0.206 | 148 | |
Chile | 152 | 67 | 0.959 | 1.65 | 1.3 | 10.6 | 5.42 | 37.1 | 82.3 | 3.6 | 2 | 2 | 0 | 0 | 0 | 2 | 2 | 1 | 4.4 | 2 | 69 | 1 | 3 | 37.7 | 29 | 10 | 0.863 | 2.21e+04 | 0.265 | 319 |
Taiwan (Province of China) | 158 | 63 | 0.897 | 12.4 | 92.8 | 0 | 4 | 1 | 2 | 1 | 3 | 0 | 3 | 52.8 | 23 | 10 | 0.84 | 4.47e+04 | 0.145 | 439 | ||||||||||
Croatia | 191 | 48 | 0.873 | 1.47 | 1.01 | 12 | 3.92 | 20.6 | 72.7 | 0.9 | 0 | 1 | 1 | 1 | 2 | 1 | 1 | 0.577 | 1 | 66.6 | 1 | 3 | 51.6 | 19 | 9.33 | 0.732 | 2.2e+04 | 0.104 | 239 | |
Cuba | 192 | 47 | 0.117 | 1.62 | 1.49 | 11.1 | 0 | 2 | 3 | 1 | 1 | 1 | 1 | 1 | 0 | 5.05 | 3 | 64.3 | 0 | 1 | 64.4 | 66 | 1.67 | 0.182 | 8.33e+03 | 0.145 | ||||
Cyprus | 196 | 59 | 0.958 | 1.33 | 2.56 | 11.9 | 5.78 | 0 | 84.4 | 5 | 1 | 1 | 1 | 1 | 2 | 2 | 1 | 1.26 | 1 | 74.1 | 1 | 3 | 31 | 43 | 10 | 0.856 | 2.72e+04 | 0.117 | 170 | |
Czechia | 203 | 59 | 0.942 | 1.71 | 0.408 | 12.9 | 3.85 | 53.1 | 80.7 | 0 | 0 | 1 | 1 | 1 | 2 | 2 | 1 | 0.62 | 1 | 76.8 | 1 | 3 | 48.5 | 26 | 9.75 | 0.812 | 3.07e+04 | 0.102 | 397 | |
Denmark | 208 | 88 | 0.941 | 1.73 | 0.486 | 12.9 | 7.82 | 24.5 | 97.6 | 1.1 | 0 | 5 | 1 | 1 | 1 | 96 | 2 | 1 | 1.01 | 3 | 78.2 | 1 | 3 | 63.9 | 74 | 10 | 0.913 | 4.63e+04 | 0.124 | 662 |
Estonia | 233 | 73 | 0.97 | 1.67 | 0.949 | 12.4 | 4.97 | 5.33 | 89.4 | 0.9 | 0 | 1 | 1 | 1 | 2 | 2 | 1 | 2.12 | 1 | 79.1 | 1 | 3 | 43.9 | 28 | 9.75 | 0.901 | 2.74e+04 | 0.13 | 235 | |
Fiji | 242 | 55 | 0.357 | 2.77 | 1.12 | 10.2 | 2 | 5 | 1 | 1 | 1 | 1 | 2 | 2 | 1 | 2 | 60.4 | 1 | 2 | 55.4 | 5 | 6.33 | 0.415 | |||||||
Finland | 246 | 85 | 0.947 | 1.41 | 0.919 | 11.3 | 6.38 | 8.3 | 88.9 | 2 | 0 | 5 | 1 | 1 | 1 | 1 | 2 | 1 | 1.63 | 3 | 77.8 | 1 | 3 | 54.4 | 102 | 10 | 0.88 | 3.89e+04 | 0.105 | 571 |
France | 250 | 72 | 0.881 | 1.88 | 1 | 10.3 | 5.45 | 2.16 | 82 | 0.7 | 0 | 2 | 1 | 1 | 1 | 1 | 2 | 1 | 1.2 | 3 | 72 | 0 | 3 | 43.7 | 73 | 9.58 | 0.88 | 3.85e+04 | 0.0966 | 1.03e+03 |
Germany | 276 | 80 | 0.971 | 1.57 | 0.416 | 12.3 | 4.91 | 44.3 | 89.7 | 0.7 | 0 | 1 | 1 | 1 | 2 | 1 | 1 | 0.948 | 3 | 78.5 | 1 | 3 | 57.3 | 29 | 10 | 0.878 | 4.62e+04 | 0.129 | 1.13e+03 | |
Greece | 300 | 45 | 0.854 | 1.35 | 3.08 | 11.1 | 42.7 | 73 | 2.3 | 0 | 2 | 1 | 1 | 1 | 2 | 2 | 1 | 0.941 | 1 | 68.4 | 1 | 3 | 63.5 | 45 | 9.58 | 0.858 | 2.35e+04 | 0.109 | 434 | |
Hungary | 348 | 46 | 0.467 | 1.55 | 0.841 | 12 | 4.67 | 19.5 | 76.1 | 0.8 | 0 | 3 | 1 | 1 | 1 | 1 | 96 | 1 | 2.49 | 1 | 71.9 | 1 | 3 | 59.2 | 29 | 8.33 | 0.489 | 2.56e+04 | 0.12 | 391 |
Ireland | 372 | 73 | 0.94 | 1.75 | 0.363 | 12.4 | 3.51 | 17.3 | 84.5 | 3.5 | 0 | 1 | 0 | 0 | 0 | 2 | 2 | 1 | 0.872 | 3 | 73.2 | 1 | 3 | 44.8 | 97 | 10 | 0.88 | 6.47e+04 | 0.12 | 451 |
Israel | 376 | 61 | 0.945 | 3.09 | 4.33 | 12.7 | 6.09 | 45.4 | 81.6 | 1.7 | 0 | 1 | 1 | 1 | 1 | 1 | 1.49 | 72.4 | 1 | 3 | 50.8 | 71 | 7.75 | 0.7 | 3.3e+04 | 0.165 | 624 | |||
Italy | 380 | 52 | 0.967 | 1.29 | 1.31 | 11 | 4.04 | 16.1 | 74.4 | 1.1 | 0 | 2 | 1 | 1 | 1 | 1 | 2 | 1 | 0.569 | 2 | 65.7 | 1 | 3 | 58.1 | 73 | 10 | 0.867 | 3.44e+04 | 0.0913 | 897 |
Jamaica | 388 | 44 | 0.943 | 1.98 | 0.404 | 10.6 | 5.41 | 0 | 55.1 | 5 | 1 | 0 | 1 | 1 | 2 | 2 | 0 | 43.9 | 3 | 70.4 | 0 | 3 | 30.2 | 57 | 8.92 | 0.812 | 7.27e+03 | 0.197 | 89.7 | |
Japan | 392 | 73 | 0.711 | 1.42 | 0.382 | 12.8 | 3.18 | 33.2 | 84.6 | 0.7 | 0 | 4 | 0 | 0 | 1 | 2 | 2 | 1 | 0.263 | 3 | 79.1 | 0 | 3 | 44.9 | 67 | 10 | 0.827 | 3.87e+04 | 0.131 | 919 |
Kazakhstan | 398 | 31 | 0.338 | 2.84 | 0.786 | 11.4 | 2.62 | 71.6 | 78.9 | 0 | 0 | 1 | 1 | 1 | 2 | 2 | 5.06 | 1 | 76.6 | 1 | 2 | 52.7 | 28 | 1.83 | 0.236 | 2.53e+04 | 0.154 | 83.7 | ||
Jordan | 400 | 49 | 0.326 | 2.76 | 3.93 | 10.2 | 3.03 | 0 | 66.8 | 5 | 2 | 1 | 1 | 1 | 2 | 2 | 1 | 1.36 | 3 | 41.8 | 0 | 2 | 8.95 | 73 | 3.42 | 0.27 | 1.15e+04 | 0.174 | 143 | |
Korea (the Republic of) | 410 | 57 | 0.836 | 0.977 | 2.15 | 12.8 | 4.33 | 43.1 | 95.9 | 0 | 0 | 4 | 1 | 1 | 1 | 2 | 2 | 1 | 0.604 | 1 | 68.9 | 0 | 3 | 55.7 | 31 | 8.67 | 0.868 | 3.79e+04 | 0.149 | 579 |
Kyrgyzstan | 417 | 29 | 0.621 | 3.3 | 0.827 | 11 | 6.03 | 13.2 | 38 | 0.8 | 0 | 1 | 1 | 1 | 1 | 96 | 1 | 2.19 | 1 | 62.5 | 1 | 3 | 27.6 | 28 | 6.58 | 0.465 | 5.18e+03 | 0.145 | 57.7 | |
Latvia | 428 | 58 | 0.965 | 1.6 | 0.69 | 11.7 | 4.4 | 0 | 83.6 | 0.6 | 0 | 1 | 1 | 1 | 1 | 2 | 1 | 4.36 | 1 | 78.1 | 1 | 3 | 43.8 | 26 | 8.67 | 0.833 | 2.43e+04 | 0.0969 | 141 | |
Lithuania | 440 | 59 | 0.938 | 1.63 | 2.69 | 11.8 | 3.81 | 0 | 79.7 | 1.1 | 0 | 1 | 1 | 1 | 1 | 2 | 1 | 4.57 | 1 | 77.6 | 0 | 3 | 44.3 | 27 | 10 | 0.824 | 2.74e+04 | 0.113 | 182 | |
Luxembourg | 442 | 81 | 0.946 | 1.38 | 0.628 | 12 | 3.57 | 0 | 97.1 | 0.9 | 0 | 2 | 1 | 1 | 1 | 2 | 2 | 1 | 0.338 | 1 | 70.8 | 1 | 3 | 39.5 | 129 | 10 | 0.874 | 5.74e+04 | 0.101 | 153 |
Malaysia | 458 | 47 | 0.504 | 2 | 0.876 | 11.4 | 4.48 | 42.3 | 81.2 | 6.7 | 5 | 1 | 0 | 1 | 1 | 2 | 2 | 1 | 3 | 68.5 | 0 | 3 | 38.9 | 62 | 6.75 | 0.383 | 2.48e+04 | 0.149 | 251 | |
Malta | 470 | 54 | 0.895 | 1.23 | 0.739 | 11.8 | 4.82 | 0 | 81.4 | 0.8 | 0 | 2 | 0 | 0 | 0 | 2 | 2 | 1 | 1.59 | 3 | 73.2 | 1 | 3 | 67.6 | 55 | 0.756 | 3.2e+04 | 0.0961 | 104 | |
Moldova (the Republic of) | 498 | 33 | 0.846 | 1.26 | 0.698 | 11.2 | 5.44 | 0 | 76.1 | 4 | 0 | 1 | 1 | 1 | 2 | 1 | 1 | 4.1 | 1 | 43.7 | 1 | 3 | 46 | 28 | 7.67 | 0.526 | 6.75e+03 | 0.0974 | 97.7 | |
Netherlands (the) | 528 | 82 | 0.93 | 1.59 | 0.449 | 11.8 | 5.18 | 38.7 | 94.7 | 1.7 | 0 | 2 | 1 | 1 | 1 | 2 | 2 | 1 | 0.586 | 3 | 80.3 | 1 | 3 | 61.7 | 122 | 10 | 0.876 | 4.75e+04 | 0.0699 | 895 |
New Zealand | 554 | 87 | 0.897 | 1.71 | 0.341 | 11 | 6.28 | 4.25 | 90.8 | 4.4 | 0 | 1 | 1 | 1 | 1 | 2 | 2 | 1 | 0.744 | 3 | 81.1 | 1 | 3 | 45.3 | 162 | 10 | 0.892 | 3.53e+04 | 0.119 | 461 |
Norway | 578 | 84 | 0.934 | 1.56 | 0.83 | 12.7 | 7.91 | 0.105 | 96.5 | 1.1 | 0 | 5 | 1 | 1 | 1 | 2 | 2 | 1 | 0.468 | 3 | 77.8 | 1 | 3 | 55.8 | 119 | 10 | 0.889 | 8.46e+04 | 0.109 | 532 |
Panama | 591 | 37 | 0.901 | 2.46 | 1.28 | 10.1 | 6.92 | 57.9 | 2 | 2 | 1 | 0 | 0 | 96 | 2 | 0 | 9.39 | 1 | 71.4 | 1 | 3 | 52.3 | 28 | 9.33 | 0.756 | 2.26e+04 | 0.197 | 174 | ||
Poland | 616 | 60 | 0.943 | 1.46 | 1.07 | 11.8 | 4.56 | 80.9 | 77.5 | 0.5 | 0 | 3 | 1 | 1 | 1 | 1 | 2 | 1 | 0.73 | 1 | 70.4 | 1 | 3 | 40.3 | 30 | 9.17 | 0.695 | 2.75e+04 | 0.147 | 481 |
Romania | 642 | 47 | 0.91 | 1.76 | 1.4 | 11.4 | 3.1 | 27.6 | 70.7 | 3.4 | 0 | 3 | 1 | 1 | 1 | 1 | 1 | 0 | 1.28 | 1 | 67.9 | 1 | 3 | 43.8 | 28 | 8.92 | 0.672 | 2.01e+04 | 0.137 | 228 |
Russian Federation (the) | 643 | 28 | 0.376 | 1.58 | 1.97 | 12.1 | 4.69 | 14.8 | 80.9 | 0.5 | 0 | 1 | 1 | 1 | 2 | 2 | 0 | 8.21 | 1 | 74.4 | 0 | 2 | 43.7 | 20 | 3.92 | 0.27 | 2.47e+04 | 0.215 | 503 | |
Saudi Arabia | 682 | 49 | 0.074 | 2.32 | 1.8 | 10.1 | 0 | 93.3 | 0 | 1 | 0 | 1 | 1 | 2 | 2 | 0 | 1.27 | 3 | 57.5 | 0 | 0 | 0 | 93 | 0 | 0.016 | 5.03e+04 | 0.209 | 274 | ||
Serbia | 688 | 39 | 0.725 | 1.49 | 0.992 | 11.7 | 3.59 | 72.4 | 73.4 | 2 | 0 | 1 | 1 | 1 | 1 | 1 | 0 | 1.23 | 1 | 67.4 | 1 | 3 | 53.5 | 13 | 7.83 | 0.348 | 1.41e+04 | 0.108 | 180 | |
Singapore | 702 | 85 | 0.466 | 1.14 | 1.69 | 12.8 | 1.2 | 88.2 | 1.4 | 5 | 1 | 1 | 1 | 1 | 2 | 2 | 1 | 0.156 | 3 | 77 | 0 | 2 | 40.6 | 54 | 4.5 | 0.387 | 6.84e+04 | 0.142 | 494 | |
Slovakia | 703 | 50 | 0.945 | 1.54 | 0.576 | 12 | 3.94 | 12.5 | 80.7 | 0.8 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 1.14 | 1 | 72.5 | 1 | 3 | 44.8 | 26 | 9.58 | 0.815 | 2.71e+04 | 0.0785 | 242 | |
Slovenia | 705 | 60 | 0.953 | 1.61 | 0.699 | 12.2 | 4.78 | 29.6 | 79.7 | 0.5 | 0 | 1 | 1 | 1 | 1 | 1 | 1 | 0.481 | 1 | 75.1 | 1 | 3 | 39.2 | 28 | 10 | 0.838 | 2.92e+04 | 0.0803 | 255 | |
South Africa | 710 | 43 | 0.772 | 2.4 | 0.391 | 10.2 | 6.16 | 92.7 | 56.2 | 19.3 | 5 | 1 | 1 | 1 | 1 | 2 | 2 | 0 | 35.9 | 1 | 60 | 1 | 2 | 34.5 | 25 | 8.92 | 0.738 | 1.22e+04 | 0.193 | 392 |
Spain | 724 | 58 | 0.95 | 1.26 | 0.851 | 10.9 | 4.21 | 19 | 86.1 | 1.8 | 0 | 2 | 1 | 1 | 1 | 1 | 1 | 1 | 0.621 | 1 | 74.1 | 3 | 52 | 42 | 10 | 0.86 | 3.15e+04 | 0.125 | 776 | |
Sweden | 752 | 85 | 0.964 | 1.76 | 0.281 | 12 | 7.57 | 0.667 | 92.1 | 1.2 | 0 | 5 | 1 | 1 | 1 | 2 | 2 | 1 | 1.08 | 96 | 83 | 1 | 3 | 64.9 | 108 | 10 | 0.909 | 4.55e+04 | 0.094 | 779 |
Switzerland | 756 | 85 | 0.959 | 1.52 | 0.433 | 12.2 | 5.13 | 0 | 89.7 | 0.7 | 0 | 4 | 1 | 1 | 1 | 96 | 2 | 1 | 0.586 | 1 | 84.2 | 1 | 3 | 46 | 171 | 10 | 0.896 | 6.14e+04 | 0.11 | 868 |
Tajikistan | 762 | 25 | 0.087 | 3.59 | 0.723 | 10.8 | 5.23 | 1.53 | 22 | 0 | 1 | 1 | 1 | 2 | 2 | 0 | 3 | 42.1 | 0 | 2 | 46.1 | 28 | 2.17 | 0.17 | 4.44e+03 | 0.149 | 38 | |||
Tonga | 776 | 3.56 | 11.3 | 6 | 5 | 1 | 0 | 0 | 0 | 2 | 2 | 3 | 49.3 | 0 | 3 | 18.3 | 49 | |||||||||||||
Trinidad and Tobago | 780 | 41 | 0.816 | 1.73 | 0.605 | 11.2 | 0 | 77.3 | 5 | 1 | 0 | 1 | 1 | 2 | 2 | 0 | 30.6 | 3 | 68.7 | 0 | 3 | 54 | 57 | 9.17 | 0.752 | 2.85e+04 | 0.197 | 88.3 | ||
United Arab Emirates (the) | 784 | 70 | 0.123 | 1.41 | 0.933 | 11.7 | 0 | 98.5 | 5 | 1 | 1 | 1 | 1 | 2 | 2 | 1 | 0.464 | 3 | 82.8 | 0 | 0 | 0 | 48 | 0.917 | 0.095 | 7.64e+04 | 0.158 | 171 | ||
Ukraine | 804 | 32 | 0.448 | 1.3 | 1.45 | 11.6 | 5.41 | 34.6 | 58.9 | 1.6 | 0 | 1 | 1 | 1 | 2 | 2 | 1 | 6.18 | 1 | 66.6 | 1 | 3 | 40.8 | 28 | 6.42 | 0.405 | 9.81e+03 | 0.0978 | 229 | |
United Kingdom of Great Britain and Northern Ireland (the) | 826 | 80 | 0.926 | 1.68 | 0.432 | 12.9 | 5.44 | 22.8 | 94.9 | 1.3 | 0 | 1 | 1 | 1 | 1 | 2 | 2 | 1 | 1.2 | 3 | 77.7 | 0 | 3 | 48.8 | 134 | 9.5 | 0.874 | 3.81e+04 | 0.13 | 1.29e+03 |
United States of America (the) | 840 | 71 | 0.91 | 1.73 | 0.833 | 12.8 | 34.2 | 87.3 | 0.8 | 0 | 1 | 1 | 1 | 1 | 2 | 2 | 1 | 4.96 | 3 | 72.6 | 0 | 3 | 70 | 219 | 9.08 | 0.831 | 5.53e+04 | 0.19 | 2.09e+03 | |
Venezuela (Bolivarian Republic of) | 862 | 18 | 0.259 | 2.27 | 2.72 | 10.2 | 0 | 72 | 2 | 2 | 0 | 0 | 0 | 1 | 2 | 36.7 | 1 | 64.7 | 1 | 1 | 29.4 | 14 | 2.17 | 0.215 | 1.07e+04 | 0.197 | 193 |
What about if we want to check if our rows meet multiple condition? Then we need logical operators.
!
(e.g. !TRUE == FALSE
)&
|
(shift + backslash)&
returns TRUE
if both values are TRUE
|
returns TRUE
if at least one value is TRUE
We can then combine logical checks together.
Lets collect countries with more than 10 years of average education but spend less than 5% of their GDP on education ::: {.cell}
[1] "Albania" "Armenia"
[3] "Bulgaria" "Sri Lanka"
[5] "Croatia" "Czechia"
[7] "Estonia" "Germany"
[9] "Hungary" "Ireland"
[11] "Italy" "Japan"
[13] "Kazakhstan" "Jordan"
[15] "Korea (the Republic of)" "Latvia"
[17] "Lithuania" "Luxembourg"
[19] "Malaysia" "Malta"
[21] "Poland" "Romania"
[23] "Russian Federation (the)" "Serbia"
[25] "Slovakia" "Slovenia"
[27] "Spain"
:::
Create two new datasets.
[1] "Algeria"
[2] "Bangladesh"
[3] "Myanmar"
[4] "Cameroon"
[5] "Congo (the)"
[6] "Congo (the Democratic Republic of the)"
[7] "Benin"
[8] "El Salvador"
[9] "Guatemala"
[10] "Haiti"
[11] "Honduras"
[12] "India"
[13] "Iraq"
[14] "Kenya"
[15] "Kuwait"
[16] "Lao People's Democratic Republic (the)"
[17] "Lesotho"
[18] "Malawi"
[19] "Maldives"
[20] "Mauritania"
[21] "Morocco"
[22] "Namibia"
[23] "Nepal"
[24] "Nicaragua"
[25] "Pakistan"
[26] "Rwanda"
[27] "Viet Nam"
[28] "Eswatini"
[29] "Syrian Arab Republic (the)"
[30] "Togo"
[31] "Turkey"
[32] "Uganda"
[33] "Egypt"
[34] "Tanzania, the United Republic of"
[35] "Zambia"
Tidyverse syntax makes use of pipes to chain multiple functions together.
%>%
) in between each step.For example (in pseudo-code):
Output <- Step 1(Input) %>% Step 2() %>% Step 3()
Translation: Take the Input, apply Step 1 to it, then take the output of Step 1 and apply Step 2 to it, then take the output of Step 2 and apply Step 3 to it, and finally store the output of Step 3 as Output.
[1] "Albania" "Armenia"
[3] "Bulgaria" "Sri Lanka"
[5] "Croatia" "Czechia"
[7] "Estonia" "Germany"
[9] "Hungary" "Ireland"
[11] "Italy" "Japan"
[13] "Kazakhstan" "Jordan"
[15] "Korea (the Republic of)" "Latvia"
[17] "Lithuania" "Luxembourg"
[19] "Malaysia" "Malta"
[21] "Poland" "Romania"
[23] "Russian Federation (the)" "Serbia"
[25] "Slovakia" "Slovenia"
[27] "Spain"
What does the pull()
function do? It pulls out a column from your data.
[1] "Albania" "Armenia"
[3] "Bulgaria" "Sri Lanka"
[5] "Croatia" "Czechia"
[7] "Estonia" "Germany"
[9] "Hungary" "Ireland"
[11] "Italy" "Japan"
[13] "Kazakhstan" "Jordan"
[15] "Korea (the Republic of)" "Latvia"
[17] "Lithuania" "Luxembourg"
[19] "Malaysia" "Malta"
[21] "Poland" "Romania"
[23] "Russian Federation (the)" "Serbia"
[25] "Slovakia" "Slovenia"
[27] "Spain"
[1] "Albania" "Armenia"
[3] "Bulgaria" "Sri Lanka"
[5] "Croatia" "Czechia"
[7] "Estonia" "Germany"
[9] "Hungary" "Ireland"
[11] "Italy" "Japan"
[13] "Kazakhstan" "Jordan"
[15] "Korea (the Republic of)" "Latvia"
[17] "Lithuania" "Luxembourg"
[19] "Malaysia" "Malta"
[21] "Poland" "Romania"
[23] "Russian Federation (the)" "Serbia"
[25] "Slovakia" "Slovenia"
[27] "Spain"
[1] "Albania" "Armenia"
[3] "Bulgaria" "Sri Lanka"
[5] "Croatia" "Czechia"
[7] "Estonia" "Germany"
[9] "Hungary" "Ireland"
[11] "Italy" "Japan"
[13] "Kazakhstan" "Jordan"
[15] "Korea (the Republic of)" "Latvia"
[17] "Lithuania" "Luxembourg"
[19] "Malaysia" "Malta"
[21] "Poland" "Romania"
[23] "Russian Federation (the)" "Serbia"
[25] "Slovakia" "Slovenia"
[27] "Spain"
%>%
has been around for a while in the tidyverse.|>
instead.%>%
is the same as |>
Yes this is all kind of silly and strange.
One of the most useful tidyverse functions is summarize()
.
summarize()
transforms data by applying a function(s) to columns in the data.What if we want to figure out the average average education for all countries in our data?
mean(bl_asymf, na.rm = TRUE) |
---|
9.11 |
What if we want to calculate other statistics?
You generally want to use functions that only return 1 value. Why?
What if we want to figure out the average education for countries that spend less than 5% of their GDP on education?
mean(bl_asymf, na.rm = T) |
---|
8.81 |
We can improve the output by changing the column name: summarize(col_name = mean(variable))
Note
You can use multiple lines with pipes, it is common to put the pipe at the end of each line and indent the next line.
There is also a function specifically for the number of observations: n()
Find the mean and median average education and education expenditure for countries with a GDP per capita (mad_gdppc
) of more than 10,000.
Often we want to provide summaries of groups within the data. For example: how does the GDP vary by election type? br_pvote
is an indicator for having proportional representation.
Here we’ll use the group_by()
function to create groups of our data.
group_by()
alonegroup_by()
expects variable(s) that you want to use to group your dataset:
# A tibble: 194 × 31
# Groups: br_pvote [3]
cname ccode ti_cpi vdem_academ wdi_fertility wdi_afp bl_asymf wdi_expedu
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Afghanist… 4 16 0.560 4.47 2.64 4.83 4.06
2 Albania 8 36 0.876 1.62 0.643 11.0 3.61
3 Algeria 12 35 0.338 3.02 2.52 7.71 NA
4 Andorra 20 NA NA NA NA NA 3.25
5 Angola 24 19 0.440 5.52 0.921 NA NA
6 Antigua a… 28 NA NA 1.99 NA NA NA
7 Azerbaijan 31 25 0.0770 1.73 1.61 NA 2.46
8 Argentina 32 40 0.935 2.26 0.512 10.2 5.46
9 Australia 36 77 0.847 1.74 0.438 12.5 5.12
10 Austria 40 76 0.973 1.47 0.497 10.8 5.36
# ℹ 184 more rows
# ℹ 23 more variables: wdi_elprodcoal <dbl>, wef_iu <dbl>, wdi_foodins <dbl>,
# ht_colonial <dbl>, lp_legor <dbl>, cai_foetal <dbl>, cai_mental <dbl>,
# cai_physical <dbl>, ccp_initiat <dbl>, ccp_market <dbl>, h_j <dbl>,
# wdi_homicides <dbl>, ccp_strike <dbl>, wdi_lfpr <dbl>, br_pvote <dbl>,
# br_elect <dbl>, van_part <dbl>, bmr_demdur <dbl>, fh_polity2 <dbl>,
# vdem_polyarchy <dbl>, mad_gdppc <dbl>, top_top1_income_share <dbl>, …
Only change is the addition of # Groups: br_pvote [3] (grouping variable, and number of groups).
Lets chain together group_by()
and summarize()
br_pvote | mean | n |
---|---|---|
0 | 1.79e+04 | 95 |
1 | 1.97e+04 | 93 |
1.39e+04 | 6 |
What is ugly about this?
is.na()
checks if something is missing or not.
br_pvote | mean | n |
---|---|---|
0 | 1.79e+04 | 95 |
1 | 1.97e+04 | 93 |
Tip
The drop_na( )
tidyverse function can replace filter(!is.na( ))
There are several variables that can be used to group countries. Pick one of them, pick an interval variable that you think might vary by the group, and then calculate the number of observations, mean, and median for each group.
There is a description of all the variables I’ve included here.
br_pvote
van_part
Data that I am using
Filtering out observations that are missing a value for br_pvote
Grouping the data frame by br_pvote
Summarizing (number of observations, mean of van_part
, median of van_part
)
There are two ways to save our summary results. Both can be helpful depending on what you are doing:
write_csv()
: Writes to a CSV file.We are going to use: gt
The gt package is a a lot so we are not going to get to it all but iet lets you do a lot of things:
The function gt()
will create a table object that we can then modify. Lets see what happens when we make a table.
We can then modify the style by pipping it into functions like a opt_stylize()
function and cols_align()
We can also change the labeling of our columns easily using the actual name and what we want it to be called
We can also modify values using the text_case_match()
function. This one will check if a cell matches what is on the left side of the tilde (~
) and replace it with the right side.
Finally, tab_header()
and tab_source_note()
can be used to add other information about your table:
And now we export it with gtsave()
There are a lot of options to modify your table here.
Try to see if you can change the mean and median to be listed in percentages and then make the number of observations bold.
Using my table from before: