= function(n_fellows, n_facils = 5){
breakout # Takes in number of fellows and returns
# a possible permutation based on number of facilitators
= n_fellows%/%n_facils
gpSize = n_fellows%%n_facils
remainder
# Determine distribution of groups
= rep(gpSize, n_facils)
gpDist if (remainder>0){gpDist[1:remainder] = gpDist[1:remainder] + 1}
= NULL
group for (i in 1:length(gpDist)){
= c(group, rep(LETTERS[i], gpDist[i]))
group
}
= cbind.data.frame(id = sample(1:n_fellows, size=n_fellows),
dist
group)return(dist)
}
Recently, I had the opportunity to join a fellowship with 27 other fellows. As part of the fellowship, we would meet for regular discussions once a week with the guidance of 5 facilitators. The program has been immensely rewarding so far - but in this post I will discuss a probability problem that I’ve been pondering over since the first meeting.
The problem is as follows: Every week for 10 weeks, all 28 fellows (assuming no absentees, but this may be explored later) and 5 (barring absentees again) facilitators meet for a discussion. Each meeting will have at least one breakout session, where the 28 fellows will be separated into the same number of groups as there are facilitators - so usually, 5 - and have a discussion on the assigned readings. The allocation of fellows is independent between each breakout session. The organisers of the fellowship were concerned if there were enough opportunities for everyone to get to know each other during the breakout sessions.
We can address that question from two different angles. We can fix the number of breakout rounds in total and observe the distribution of the simulated interactions. With this we can ask how many times would the same two people interact with each other, and how many people would not have met. The alternative is to keep simulating until everyone has interacted with every other person at least once, and observe the number of rounds required.
I’m sure there are solutions based in permutations and combinatorics that can derive an exact solution to the questions above. My personal take is that not only are these solutions complicated and tedious, the results may also not be helpful. Having an understanding of the probabilistic distribution of the answers would be much more interesting and interpretable to account for random effects which happen all the time. To that end, I present my solution below with supplementary code in R.
Defining the breakout function
First, I defined a function that takes in the number of fellows n_fellows
and number of facilitators n_facils
and outputs a possible permutation of the fellows in the number of groups equal to n_facils
. In other words, it randomly assigns a group (A,B,C,D,E,…) to each fellow. We will rely on this function to simulate the answer to our questions.
Simulating interactions over a fixed number of breakout rounds
The first angle focuses on the number of interactions between fellows after a fixed number of breakout rounds. To do this we first specify the essential parameters of the model. Here we assumed the n_facils
to be fixed at 5.
= 28 # Number of fellows
n_fellows = 1000 # Number of simulations
N = 20 # Number of breakout rounds rounds
To track the total number of interactions between two particular fellows, we can store the results of each breakout round in a matrix met
, which is initialised with the number of rows and columns equal to n_fellows
. The simulation then proceeds as follows: at the start of every round of simulation, we set all interaction counts to 0 - that is, met
is a 0 matrix. Then the breakout rounds will be simulated and the interaction between members of the same group in each round will increase by 1. At the end of the 20 breakout rounds, met
will contain the number of interactions between each of the fellows.
= rep(NA, N)
met_null = rep(NA, N)
met_mean
for (n in 1:N){
= matrix(0L, nc=n_fellows, nr=n_fellows)
met for (round in 1:rounds){
= breakout(n_fellows)
dist
# Check to see if fellows met, if met then
# their corresponding entry in met matrix increases by 1
for (gp in dist$group %>% unique){
= dist %>% filter(group == gp) %>% pull(id)
fellows for (i in fellows){
for (j in fellows){
= met[i,j] + 1
met[i,j]
}
}
}
}= ((rowSums(met) - (rounds))/n_fellows) %>% mean
met_mean[n] = sum(met[upper.tri(met, diag=FALSE)] == 0)
met_null[n] }
We can generate quite interesting inferences from the met
matrices generated by the 1000 rounds of simulation. In the code above, met_null
calculates at each round the number of interactions not made, i.e. when two particular fellows have not met. This is computed by taking the upper triangular matrix of met
and counting the number of 0’s, that is, the number of no interactions. Also, met_mean
tells us the average number of times any two fellows have had an interaction.
met_mean
is constant over each simulation and equal to 3.32. This means that on average, each fellow will meet every other fellow for 3.32 times. I don’t yet have an explanation for why it remains constant, but I think that the random effects over all the simulations are nullified by taking the mean of the number of interactions. I might be wrong.
The histogram of the number of no-interactions, or met_null
, is plotted above. It seems that for 20 breakout rounds, on average there can be 10 missed interactions - meaning 10 pairs of interactions between fellows will not materialise. This is quite interesting to note - out of a possible 378 interactons (taking 27+26+25+…+1), only around 2.6% of interactions are missed. It might reflect a high level of success in helping fellows establish interactions with one another.
Simulating the number of breakout rounds needed for every fellow to meet
Now instead of fixing the number of breakout rounds at 20, lets consider the number of breakout rounds it takes for every fellow to interact at least once. This would require every simulation to run a ‘while’ loop to keep simulating breakout rounds (in what might be the worst zoom-nightmare) until every fellow has met each other at least once. The code is attached below.
As with the previous simulation, the met
matrix stores the interactions between each of the 28 fellows. However, the difference here is that it only has indicator (T/F) variables to store whether or not a pair of fellows have met, since we are not interested in the number of times they interacted. Similar to the previous simulation, when fellows are split into breakout groups, the matrix entry corresponding to their own and all their group mates will change to 1, indicating that they have interacted. The breakout groups will continue to be split over the ‘while’ loop, which runs until the total number of interactions exceeds n_fellows^2
, which is sum of all entries in met
when every fellow has met each other at least once.
= rep(NA, N)
counts for (iter in 1:N){
# initialise a nxn identity matrix to store 'met'
= matrix(0L, nc=n_fellows, nr=n_fellows)
met
# randomly allocate groups each time
= 0
counter # loop until everyone has met each other at least once
while (sum(met) < n_fellows^2){
= breakout(n_fellows)
dist
# Check to see if fellows met, if met then
# their corresponding entry in met matrix = 1
for (gp in dist$group %>% unique){
= dist %>% filter(group == gp) %>% pull(id)
fellows for (i in fellows){
for (j in fellows){
= 1
met[i,j]
}
}
}= counter + 1
counter
}= counter
counts[iter] }
counts
contains the number of breakout rounds required for every fellow to meet at least once in each simulation. The distribution of counts
is plotted below.
We see that the number of breakout rounds needed for all fellows to meet has a wide range, from 22 to 71, with the mean concentrated around 35. This means that it would require an average of 3.5 breakout sessions every week for every fellow to meet each other at least once.
To understand both simulations, I conclude that although the results suggest that it takes on average 35 breakout sessions for all fellows to meet at least once, having 20 breakout sessions already has a high coverage of close to 97%. Pursuing an objective for every fellow to meet each other at least once might not be worth the cost of extending an additional 15 or so breakout sessions. As a caveat, the limitations of the simulations are clear - the number of interactions does not tell us anything about the quality of interactions that the fellows have, which is difficult to quantify in and of itself. At its core, the fellowship should prioritise the quality of interactions and discussions over the number of links created in the social network.
This was quite an entertaining and insightful exercise in simulation, and its comforting to know that I can solve some complex problems with simulation instead of probability theory.