JSvg: Javascript Analysis and Vector Graphics

Monday, August 17, 2020

Modeling the Spread of COVID-19: Part 5 - Back on Campus

In previous posts, we've looked at the probabilistic math behind estimating the spread of infection, and also estimated the chances of acquiring COVID-19 under various conditions given daily exposure. In this post, we will finally look at how it all comes together, and some initial results from some more intricate modeling. We're not taking the approach many epidimiologists use that involves choosing an R value (how many people, on average does one spread it to) and then propagating the numbers from there. Rather, as in prior posts, we're looking at the probabilities: how mask wearing and social distancing decrease the probability of acquiring COVID-19, and then starting our model with the probabilities implied by the rate of infection in the community at large. Ultimately, for the many campuses which are opening shortly, the surrounding community infection rate is the most relevant factor in success or failure of any mitigation strategy. If students are routinely mixing with the outside community, campus strategies in limiting the spread will be marginal at best.

Another factor that we will include in our model this time is the possibility of airborne transmission such that poor ventilation which does not bring in fresh air, and which is directed improperly (for example, it flows from a potential sick person towards healthy individuals) can spread the virus. This has been demonstrated previously with Legionaire's disease and was a concern with the H1N1 pandemic in 2008-2009. A recent paper from Morawska et al (May 27, 2020) confirms the possibility of indoor transmission in this manner, stating:

"Multiple studies provide strong evidence for indoor airborne transmission of viruses, particularly in crowded, poorly ventilated environments (^{Coleman et al., 2018}^,^{Distasio and Trump, 1990}^,^{Knibbs et al., 2012}^,^{Li et al., 2005}^,^{Moser et al., 1979}^,^{Nishiura et al., 2020})."

In particular, recirculating air is to be avoided. Considering that many places are simply "updating their filters" and turning the air on higher, it's not clear that this type of mitigation will be effective. Furthermore, most universities have HVAC systems that are relatively haphazard. Some are brand new and state of the art, and others, not so much. It's pretty much a crapshoot regarding which building you are in and which classroom. Some classrooms, such as organic chem labs, have fume hoods with suction. One would presume that such areas are safer that the generic classroom.

Our model today treats ventilation as a variable. If ventilation is correctly implemented (adequate), then social distancing is considered to be effective. If, on the other hand, air is being recirculated, you can be far away from someone who is sick, but for all you know, you're being hit with virus particles from ANOTHER classroom when the vent blows on you. Many HVAC systems have downward blowing rather than upward sucking air vents. (But this is the subject of another post later.) So if you're in one of those improperly ventilated rooms, we model that social distancing is ineffective, and all you have is your mask. It is likely that the truth lies somewhere in between.

This model is much more sophisticated the prior modeling shown on this site. It accounts for social distancing and masking as before, but now also includes ventilation, when someone is contagious, and also if they are asymptomatic. In this model, we presume that someone is contagious between 2-20 days after acquiring the virus, and that if they are symptomatic, the symptoms will show up on day 5. In reality, contagion is not a step function (yes or no over a period). Rather it is something like a Poisson distribution with low lambda (as shown on Wikipedia) where you are most contagious at the start of it all. This sort of complexity may make it into the model later.

Other assumptions of this model is that everyone enters class everyday with the possibility of community based infection. When you go home, you re-enter the community cesspool, and you have the possibility of coming back infected based on the community infection rate. Likewise, we assume that if you are infected, you have a 40% chance of being asymptomatic (if you are a young person), and as such, you might just stay in class because you don't know you're sick. For those who are symptomatic, day 5 is when they are ejected from class because they have either been told to leave ("stop sneezing everywhere!") or they have realized that they are sick in some way, and have chosen to do the right thing. While they would, in theory just be out for 2-3 weeks, in practice, as an educator, very few students in my teaching past have been able to catch up in physics after missing 2-3 weeks. Therefore, once they are out, they are OUT, and we no longer consider them in our probabilities. Also, unlike prior models, this one can keep track of class days and weekends. It's assumed that classes meet only Monday, Wednesday, and Friday, and that weekends are off.

Here's what the configuration file for the simulation looks like:

/***************************** Initial Parameters **********************************************/

var daysTillVisiblyContagious = 5;    // Days until you KNOW you have COVID after being infected
    // Note that this can be changed to a higher number in case people are waiting for test
    // confirmation before taking steps (Yes this has happened, especially when symptoms seem mild.)
var dayContagious = 2;            // Day you are contagious after being infected
var dayNotContagious = 20;        // Day you stop being contagious
var sympRate = 0.60;    // 60% chance of being symptomatic (find source)

var dailyModel = false;    // Run classes daily (for example to consider a dorm);

// https://www.medrxiv.org/content/10.1101/2020.03.24.20042606v1.full.pdf
// p = 0.022    // 18-45 years, it goes up as you get older, also, this is just initial data
var pSpread = 0.022;    // risk of infection if exposed -
                    // this is probably low as there are multiple contact points not considered
                    // Additionally, it does not consider time of contact

var pInf = 0.001049; // 1662/1.586e6 (those infected over last 14 days/ Philly population)
// Real infection rate may be 6-24 times greater - here go with 10x greater
// https://jamanetwork.com/journals/jamainternalmedicine/fullarticle/2768834?guestAccessKey=7a5c32e6-3c27-41b3-b46c-43c4a38bbe00&utm_source=For_The_Media&utm_medium=referral&utm_campaign=ftm_links&utm_content=tfl&utm_term=072120
pInf = 10*pInf;

// Mask wearing
// https://www.thelancet.com/journals/lancet/article/PIIS0140-6736(20)31142-9/fulltext
// Reduction due to mask wearing: OR = 0.15 [0.07-0.34]
// Reduction due to 1m distancing: OR = 0.18 [0.09-0.38]
var pMask = 0.15;    // Will adjust these with slider later. Build the model first
var pDist = 0.18;

This gives you a sense of the assumptions going in. There is also, in this model, the ability to assign a student to a cohort and a class so as to track where they are infected, if infected. So when a student is created using the makeStudent() function, they have all kinds of properties one can keep track of:

Likewise, stats for a given class can be tracked as well:

function makeClassArr(name, number, sdEffective ) {
    this.name = name;
    this.initialNumber = number;   // initial number of students in class
    this.number = number;           // current number of students in class
    this.propInfected = [];   // daily array of proportion of class infected based on current numbers
    this.currentInClass = [];
    this.NinfectedHere = 0;   // Number of cumulative infections starting in this class
    this.sdEffective = sdEffective; // Is social distancing effective or not due to poor ventilation?
    return this;
}

At this point, I'm going to skip to some results. I ran two major takes of the model, first involving nearly 60 classes (more small ones) and 180 students divided amongst those classes. In this case, I wanted to separate out the effects of class size and classroom ventilation. So the large number of classes for the small number of students was done to ensure that class sizes could be kept low, and I could track the cohorts separately. The model was run 100 times over the first 60 days of the semester. First, let's look at where people get infected.

The above graph shows the averages for the 100 models that were run. After 60 days, on average 60.2% of students were infected. However, 65.3% of these came from the community at large, NOT from the the campus. This doesn't let the campus off the hook. 28.1% of infections on campus came from larger classes whereas only 6.4% of infections came from smaller classes. Divided by ventilation type, of all the infections, 82.3% of on campus infections were in areas with poor ventilation and 17.7% were in areas with good ventilation.

Looking at mean growth rate within classes regardless of infection source:

we see that larger classes have a higher rate of spread than do smaller classes IF the ventilation is inadequate. [Note: please excuse the error bars going below zero!]. However, even with adequate ventilation, after 60 days (end of October-ish), we can count on 5% of a given class to be infected. Given that in this model there were 60 classes, this is still an appreciable number.

What happens if we just let students mix and match classes without separating out students who are in large and small classes, and equally, allowing them into classrooms that are both well and poorly ventilated? Here's the first graph again:

In this case, an average 58.7% of students were infected. This is probably not significantly different from the prior scenario, but I haven't run the stats yet. In any event, the results are very close. Of these, 62.7% came from the community and not from the campus, 24.8% of infections came from larger classes, and 12.4% of infections came from smaller classes. The division by ventilation type is, actually the same as in the prior model: 82.3% of on campus infections were in areas with poor ventilation and 17.7% were in areas with good ventilation. I suspect that where we would see significant differences in the models when they are ultimately tested, is the ratio of infections coming from larger vs smaller classes.

Let's follow that up looking at the time data:

Okay, so we have more class sizes here, with the largest class (60 students) having adequate ventilation. The rates of infection are a bit lower in this case, but that may be due to a lower total number of students (120 vs 180 in the prior simulation). Most interestingly, in these results, as the class sizes get larger, the ventilation outcomes converge implying that if you don't do ventilation right everywhere, it probably doesn't matter that you're doing it at all.

Now, to be perfectly clear - these results are due to a quick (about a week to write and test) model and the structure of the inputs matters. For example, the mixed model should probably have been limited to class sizes of 15 and 60 as with the earlier model. However, the way the code is set up, it's difficult to do this easily. One wants to compare apples to apples. Before these results are ready for publication in a journal, a great deal more work involving examination of model stability should be done.

Nonetheless... I believe that there are a few key points that this early work indicates:

Surrounding community spread is ultimately quite important. If you are in an endemic area, and your students and faculty are not isolated, that endemic spread will spread to your campus. Remember that one of the initial inputs to this model is the community infection rate. If it's higher, then we expect more infection to come back to campus every day when there is class - and then the campus becomes a vehicle for further spread.
When cohorts of students stay together, smaller classes and better ventilation systems are probably better for preventing spread.
When cohorts of students mix, mitigating measures like ventilation may not be as effective.
When compared to other work of this nature,(which I've hosted on niiler.com since the authors from a big university wish to remain anonymous), the results seem reasonable. In this paper which considers only on-campus propagation of the virus (but also a latency period and asymptomatic infection), at the end of 60 days, nearly a quarter of the campus population is infected. When we look at the bar graphs above indicating campus spread, this is quite close to our numbers. (Again, there is some nuance, but given that we're starting with completely different codes and methods, the fact that we are within each others error bars is comforting).

If you are hankering for a better look at the JavaScript code (post-processing was done in R after harvesting the information in the "studentResults" and "classNumbersResults" variables from the JavaScript console), you can find it here. It's assumed that you have some JavaScript and HTML experience. As time permits, I will try to craft a more user friendly simulation online.

In the meantime, wear a mask when you go out, social distance and avoid crowds, and remember to take this thing seriously. There are a lot of otherwise healthy individuals who have had COVID and are not so healthy afterwards. We still don't know the long-term side effects.

Monday, August 10, 2020

Efficacy of Masks

As we wander around in our not-quite locked down state here in the US, one sees all manner of masks being worn by our friends, coworkers, and the odd person on the street. How well do they work, and how much do they decrease the risk of COVID-19 transmission?

The goal of the mask is primarily to protect other people from YOU. That is, when you wear a mask, it mainly serves to keep what you spew out from getting into the air and thence into other people's respiratory tracts. However, masks can, to some extent, protect you from others. The idea is that the mask creates a barrier to trap droplets or particulates. If you are breathing out, they are trapped on the inside, in theory, and don't make it out to the public at large. However, if other people are breathing on you, a mask can still work to trap particles on the outside, subject to some limitations. In this mode, the theory goes, the mask is less effective because of a number of factors: 1) with successive breaths, you are more likely to inhale some of what was trapped on the outside, 2) if your mask doesn't fit perfectly, particulates can detour around the mask and still make it into your nose or mouth. You might say that the same works in reverse - and you'd be right, but other people aren't right next to your mask inhaling, so this is much less of a risk. There's a third reason why masks may not protect you perfectly, and that is because if their outsides are getting covered in particulates, when you take off the mask, you may be touching this stuff, and transfer it to other locations on your face or mucus membranes, and thus get COVID exposure.

No matter which type of mask you use, these principles remain, more or less, the same. However, it may be that certain types of masks work better than others in their ability to filtrate particles.

A recent study in the American Association for the Advancement of Science's journal "Science Advances" takes a new, and low cost look at how well various masks work using a home built black box into which one can inhale, a green laser, and a cellphone. The study titled: "Low-cost measurement of facemask efficacy for filtering expelled droplets during speech" rated masks based on the count of particles that emerged.

Emma Fischer and her team from Duke University tested 14 different commonly available mask types by having four different speakers say the words "Stay healthy, people" five times in a row while either wearing or not wearing the mask in question. The data were recorded using cell phone video, and a custom algorithm written in Mathematica was used to count the particles in each frame. The results graded masks by the number of particles which emerged on the far side.

In the above graph, the larger the number of particles (higher up the dot), the worse the mask is - if we assume that particle count is the end of the story. By this measure, the gator style fleece mask is worse than wearing no mask at all!

Particle counts may be very important, especially considering the multiple indicators that airborne transmission of COVID-19 may play a larger role than previously thought. However, it's not the end of the story. The study's main assumption is that the more particles carrying virus the worse, especially considering the potential for true airborne spread which has been noted in a number of places.

One of the flaws in the study is that it doesn't consider potential viral load. Larger particles or droplets can carry more particles. In the paper's supplemental materials [https://advances.sciencemag.org/content/suppl/2020/08/07/sciadv.abd3083.DC1/abd3083_SM.pdf] is a helpful set of graphs which show for one trial and one subject, what the estimated particle size spectrum is. Below is one such panel for the fleece vs no mask where the x-axis is droplet diameter in mm, and y-axis is the particle count.

It should be immediately obvious that there is a much higher count of smaller particles emitted from the speaker for the fleece gator condition (orange), than for the no-mask condition (green). But there are much larger particles coming out when not wearing a mask. If viral load is correlated to droplet volume, perhaps efficacy is not just a measure involving particle count but NET volume of droplets?

To that end, I thought it would be interesting to calculate the net volume of droplets for each of the two conditions. We can read information directly from the figure in order to make some ballpark estimates. The figure has some problems in that the histogram doesn't really line up with the x-axis, but otherwise gives us enough information to go on if we assume (as physicists are want to do) that the droplets are all spherical.

# Code is all in R statistical language - counts and diameters read from the graph

># First for the fleece gator

> count = c(410, 400, 150, 40, 10);> diameter = c(0.2, 0.3, 0.4, 0.5, 0.6)> fleece = data.frame(count,diameter)

> # Next for the no mask condition

> count = c(230,250,150,100, 80, 60, 40, 20, 10, 5)
> diameter = c(0.2,0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 1.1)
> nomask = data.frame(count,diameter)

> # Now compare the particle counts

> sum(nomask$count)[1] 945> sum(fleece$count)[1] 1010

> # Yup there are more particles coming through the fleece, no debate there.

> # Calculate the volume of the droplets presuming that they are spherical

> fleece$volume = (4/3)*pi*(fleece$diameter/2)^3
> nomask$volume = (4/3)*pi*(nomask$diameter/2)^3
> sum(fleece$volume)
[1] 0.2303835
> sum(nomask$volume)
[1] 2.280273

> # How much more droplet volume is there with the no mask condition?

>2.280273/0.2303835
[1] 9.897727

> # Almost 10 times more stuff comes out in the no mask condition.

I would consider that this volume consideration is particularly relevant considering that amount of viral load one is exposed to is one of the more popular explanations as to why some individuals get COVID-19 more severely. But that said, aerosolization of virus particles is also relevant.

Bottom line? There are a number of different ways to measure mask efficacy, and each of them has its own limitations inherent in the assumptions made by the authors studying the phenomena. Another example of this is that the bandana is likely to have more side leakage (not measured so far as I can tell) compared to the gators.

Regardless, wearing a mask when in public remains an important way to begin to control the COVID-19 pandemic. In closing, let me leave you with this thought from Mandal et al's letter to the editor in the International Journal of Nursing Studies:

"Above all contradictions, nonetheless, masks are trouble-free, easily available, low-priced and clearly efficient. Masks are the visual alarm to the need for social distancing along with additional protective measures. Furthermore masks are also the insignia for front-line healthcare professionals reminding them of their well-being, self-confidence, security, and trust upon their hospital authorities (^{Klompas et al., 2020}). Mass-masking, no less than altruism, would work more than an individual level in community settings. It may lead to a significant reduction of the basic reproduction number of SARS-CoV-2 and consequently may portray an effect parallel to herd immunity (^{Cheng et al., 2020}). In this regard, researchers are urging policymakers to re-evaluate the role of universal masking in the way to combat COVID-19."

Friday, July 31, 2020

Reopening Schools: Kids and COVID-19

One of the key reasons for reopening schools in the fall has been the idea that kids don’t get COVID-19, or if they do get it, it’s not so severe. I want to address these topics in this post based on the current data available at the time of this writing.

I’m going to lead with a statement that isn’t particularly sciency and say this: considering that kids spread just about everything else with apparent ease, why do we somehow think that they magically don’t spread COVID-19? The reasons given for this are based, somewhat, on fact. But I will argue that the facts have been poorly interpreted. First, the early observed rates of COVID-19 in kids in the United States was originally quite low. Second, at the current time, when we look at other countries that are successfully reopening, the rates of spreading in schools and juvenile social situations is basically negligible. So that’s it, right? Nothing to see here. Let’s move along and reopen everything school related.

With regards to the first issue, if we presume that kids don’t get COVID as severely as adults and that only severe cases that present for hospitalization are tested, then it means that we are undersampling kids. Also, once we understood that COVID-19 was going to be a serious problem, we locked down the schools. So using the smart thermometers (yes, they are internet connected if need be), one can track how flu+covid incidence.

It’s not just that local district, but the effect has been reported elsewhere. New Scientist reports that flu is way down in Australia due to the lockdowns compared to previous years. This trend has also been reported on in Singapore and more globally. All this points to the idea that if you are looking at the low COVID-19 rates in kids to be an indicator that they don’t get it or can’t spread it, you are likely mistaken. We’ll see that there are other lines of evidence pointing this way.

What about the idea that since other countries can safely reopen schools, we can too? Well, in theory, that’s a nice sentiment. However, let’s remember that schools here are already severely underfunded and that teachers in some districts often buy supplies for their students because the students can’t afford them, and the districts won’t pay for them. Additionally schools are not only not getting extra money to adapt to the situation, schools are being threatened with a loss of funding if they don’t reopen. The situation is dire. Therefore, the expectation that schools here can do what schools in Europe or Japan are doing is nonsense at the get go. Likewise, the other countries we are comparing ourselves to have much lower rates of COVID-19 than we do.

So kids aren’t great indicators of the spread of COVID-19 due to factors mentioned above. But, let’s presume, for argument sake, that the conclusion – that kids don’t spread COVID-19 – is actually true. Why not reopen schools? If this is the case, then all those extra precautions everyone is taking might be overkill, right? What is not being accounted for are the caregivers, bus drivers, teachers, and other adults in their lives who are more susceptible and who will now have contact with each other. Remember that one doesn’t have to have a disease to spread it. Although we are trying to keep everything clean and not touch our faces, invariably something slips through the cracks, and there is a source that a kid or adult touches. Think this isn’t an issue? See this video.

The nurse who made it isn’t infected with the green goo she touches, but she does spread it everywhere.

There are signs that despite institutions taking recommended precautions, COVID-19 spread remains an issue. For example, a number of summer camps have closed due to campers and staff testing positive for COVID-19. The worst of these involved 82 infected individuals at a camp which followed relatively strict guidelines for social distancing. But a number of other camps have been affected as well as noted here and here.

There is also a case in which three teachers had been teaching together, virtually from the same classroom it seems, and all three came down with COVID-19 and one died. It seems from that article that no students were even involved, just the teachers. The teachers all claim to have followed CDC guidelines to the letter. This last example might be an argument for closing it all down except for truly essential businesses.

If we look at the CDC page on pediatric infections with COVID-19, it’s clear that their presentation is based on early work and is not up to date. The data they present seems to indicate that COVID-19 is simply not much of a problem in kids. I’ve copied part of their page to here and added my comments in italics.

"Pediatric cases of coronavirus disease 2019 (COVID-19), caused by severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), have been reported. However, there are relatively fewer cases of COVID-19 among children compared to cases among adult patients.^1-5

In the United States, 2% of confirmed cases of COVID-19 were among persons aged <18 years.⁴ (article dated Apr 6, 2020 - 44,672 cases with age reported, 965 of them under 18, majority had missing data on symptoms, report published early after testing – so pediatric outcomes were largely unknown, prioritization of testing given to those with severe cases, so it is almost certain that the case count is under-reported.)
In China, 2.2% of confirmed cases of COVID-19 were among persons aged <19 years old.¹ (article dated Feb 24, 2020 - 44,672 cases with age reported, 965 of them under 18. The total number of tests was not reported – so it’s not clear how many individuals under 18 were tested compared to individuals over 18.) [https://jamanetwork.com/journals/jama/fullarticle/2762130]
In Italy, 1.2% of COVID-19 cases were among children aged <18 years.² (22,512 cases reported. Article dated Mar 17, 2020 - This article is an infographic only and also does not report the number of tests given.) https://jamanetwork.com/journals/jama/fullarticle/2762130 ]
In Spain, 0.8% of confirmed cases of COVID-19 were among persons aged < 18 years.⁵ (The study cites the following information: “At the end of the second week, 41 of 365 patients (children) (11.2%) had positive test results (Table).” So while 0.8% of cases were in persons < 18 years, 11.2% of kids tested had COVID-19. Note also that after March 9 when Madrid was declared an area of community transmission, “the recommendation was to test only hospitalized children with symptoms and signs of COVID-19 or patients with comorbidities and a high risk of complications. Some children at risk of hospitalization were also tested, although they were ultimately discharged.”) [https://jamanetwork.com/journals/jamapediatrics/fullarticle/2764394]

If we look at the early US data regarding infections of kids, remember that 1) there was very little testing available until late April/early May and it was reserved for severe cases, and 2) starting from beginning to middle March, schools in most places in the US were shut down, and kids stayed home with their parents or caregivers. In the meantime, many adults continued to go to work since they were either “essential personnel” or else their state shut the schools but only had very limited shutdowns of private companies.

So who tests positive mostly during this time period? Adults. It is only after Memorial Day when everyone came out of lockdown and ran out to the beach or to graduation parties or some such that we start seeing a spike in cases for kids. In Florida where 54,022 kids under 18 were tested for COVID-19 as of July 15, a third of them tested positive. This is a rate much greater than for adults in the state, although it is possible that it is simply due to the increased rate of testing.

Is it just Florida where kids are getting infected or is it a one off? English data from the start of July indicate no significant differences between infection rates in different age groups. However, their data do not cover group homes, and it is expected that the rate of infection may be higher in the elderly.

A study of 59,073 contacts of 5,706 COVID-19 index patients in South Korea indicated that kids from 0-9 spread the disease less than adults, but kids older than that spread the disease at the same rate. An analysis of viral load in 3,712 patients in Germany found that the viral load (a measure of infectivity) of children was not significantly different from that of adults. Their conclusions are that: “Based on these results, we have to caution against an unlimited re-opening of schools and kindergartens in the present situation. Children may be as infectious as adults.” This may be borne out by what we saw in Israel. While children there followed social distancing and face mask protocols, things were good. However, when these protocols were let up, things took a turn for the worse. Time Magazine reports:

While Israeli children initially followed the “bubble model” when they returned to school on May 3, limitations on class sizes were lifted two weeks later. During a heatwave, children were even permitted to leave their masks at home.

By June 3, the Israeli government was forced to close down schools after 2,026 students, teachers and staff had tested positive for COVID-19. 28,147 students were placed under quarantine due to possible exposure to the virus, according to the education ministry. At one single school, there were over 130 cases.

A good recent summary (July 23, 2020) of these studies and others on children is given in Smithsonian Magazine here.

I’d be remiss if I didn’t mention article written by my cousin Eric Niiler at Wired which indicates that under the right conditions it may be possible to reopen safely. However, this largely depends on what’s happening in the outside community and the resources given to the school. If there is rampant community spread and a school is not using bubbling, masks, and other risk reduction strategies, COVID will spread. But the Israeli case indicates that even when things go well, it doesn’t take much to screw up badly. Until there is a vaccine that is proven safe and effective, it looks like we’re in it for the long haul.

Sunday, June 28, 2020

Modeling the Spread of COVID-19: Part 4 - Real Exposure

In the last post we discussed the efficacy of mitigating strategies to prevent the spread of COVID-19. Based on the work of Chu et al(2020) who reviewed a number of other studies, both mask usage and social distancing are relatively effective means of preventing the spread of COVID-19. However, neither are 100% effective, and as such, even when taking such precautions, it is possible to acquire COVID-19 with enough exposure.

In this post, we will attempt to estimate the “true” exposure rate based on recorded rates of COVID-19 in the population. Additionally, we will start looking at how repeated daily exposure to the same group of people can still spread the disease, given that the group is large enough.

There are a number of places one can go to get estimates of both population and rate of infection. Google now will pop up a nice graphic if you do a search for something like: “COVID-19 PA case count”

Below it are further links to state and county dashboards. The state dashboard currently (June 26) shows 81,374 cases for a population of 12.7 million. This is a case rate of

percentInfected = 81374/12.7e6 = 0.006407 = 0.641%

According to the State of Pennsylvania’s COVID-19 dashboard 78% of these cases are recovered. What does that mean?

“Individuals who have recovered is determined using a calculation, similar to what is being done by several other states. If a case has not been reported as a death, and it is more than 30 days past the date of their first positive test (or onset of symptoms) then an individual is considered recovered.”

This is an attempt to quantify the number of active cases (22% of 81,374 = 17,902) which can then be used as a better estimate of contagion. Instead of considering the percent infected, we consider the percent who are contagious:

percentContagious = 17902/12.7e6 = 0.00141 = 0.141%

That said, it may be better to work on a more local level since there are some parts of the state that have a much lower rate of contagion, and others that have a much higher rate. While there is (sadly) still a fair bit of travel in between, a first order approximation should work with the local data as it is likely to be more relevant. Let’s consider Chester County, PA since that’s where I am. Chester County is not that far outside of Philadelphia, which has been a COVID-19 hotspot.

Let’s calculate the infection rate for Chester County. To do so we need the number of cases, and the total population. I’m not sure where the state got it’s population count for Chester County. According to Google it’s 524,989. If we use this case count and Google’s population count, we get an incidence rate of

percentInfected = 3578/524989 = 0.00681 => 0.681%

which translates to 681 cases per 100,000 population. This is really close to what the above graphic shows so we’ll go with the official count of 685 per 100,000. How many have recovered? Of these, 64.45% have recovered, leaving 35.55% potentially infectious. Thus, we get a contagion rate of

percentContagious = 0.3555*0.00681 = 0.00242 = 0.242%.

We could even get more specific and look at West Chester Borough statistics

According to the Chester County dashboard, the population of the borough is 20,048, and there have been 126 cases overall with 34 in the last 30 days. This makes the contagious percentage:

percentContagious = 34/20048 = 0.00169 = 0.169%

This is slightly higher than the rate for Pennsylvania at large, but much smaller than Chester County as a whole.

The percent recovered stat is an attempt to quantify potential load on the health-care system, but also the infectiousness moving forward. The 30 day window is not perfect, but it’s a start. In the next post, we’ll look at better estimates of contagion

Now we have enough information to start building our model at either the state, county, or local level. The model itself will have several assumptions, and its goal is to give us the sense of how day to day contacts within a single group of a certain size may contribute to the spread of COVID-19. We will run the model for each condition – unprotected, with a mask, with social distancing, and with mask and social distancing. The presumption is that ALL group members are following the condition specified. We’ll see later how to partition the group based on protective measures, but that introduces too much complexity for the moment. So the key assumptions are as follows:

* Your chance of encountering an infected person in the group at the start of it all is based on the population stats calculated above, for example 0.00169 if you are considering West Chester Borough
Your chance of encountering an infected person will only go up from there if members of the group share the infection, and if they don’t, then it will still be the initial percentage.
Given that you encounter an infected person, your chance of getting infected yourself is 0.022 (2.2%) from the Luo et al (2020) paper. Furthermore, this assumes that the group is made up of 18-44 year olds.
Your risk is reduced by wearing a mask (odds-ratio = 0.15)
Your risk is reduced by social distancing by at least a meter (odds-ratio = 0.18).
A person who is infected today can infect someone tomorrow (not real, but we’ll deal with this in an upcoming post)
Everyone in the group stays in the group (again, not real – since once someone starts getting sick, they will probably leave the group. We’ll deal with this later also).

The approach we will take to this is called Monte Carlo modeling, so named because of the plethora of casinos in Monte Carlo. The idea is to run the model some number of times and let the results of random number generators be compared with the stated probabilities in order to estimate whether or not people get infected. We then compute the average of the results and the standard deviation (an estimate of the uncertainty) to get a sense of how bad things get.

How does such a model deal with chance? Most computer languages have a random number generator function that will spit out a different random number every time they are run. In JavaScript, which we will use here, that function is Math.random(). The function will randomly spit out a number between 0 and 1 with 16 digits of precision. If you have a probability of 0.022 of catching COVID-19 from a an infected friend in your group, the random number generator would have to spit out a number less than 0.022 in order for you to be considered infected. In fact, we would run this test every day for each member of the group. In a group of 20 people, Math.random() would be run 19 times in a given day order to estimate if the infection spread to any of the 20 people. Every day and for every person and iteration of the model, we run the function below to get a sense of whether we have added to the number of infected people.

function checkInfections(prob1, prob2, prob3, prob4) {

	// See if individual is infected this round
	// prob1 - No precautions
	// prob2 - With Mask
	// prob3 - With Social Distancing
	// prob4 - With Mask+Social Distancing
	// Additional number infected (in the usual order)
	var N1 = 0;	
	var N2 = 0;
	var N3 = 0;
	var N4 = 0;
	ran = Math.random();	// Generate the random number
	// Then compare it to each probability in turn
	if (ran < prob1) {
		N1++;
	} 

	if (ran < prob2) {
		N2++;
	} 

	if (ran < prob3) {
		N3++;
	} 		
	if (ran < prob4) {
		N4++;
	} 			
	// Return the number of additional infected in each category
	return [N1, N2, N3, N4];
}

We need to have a way to calculate the initial probabilities of infection that go into this function. This will be slightly different than what was done previously because previously, we assumed that EVERYONE we came into contact with was infected and contagious. This is no longer true. So instead of calculating our probability of infection as:

N = 10; // Number of exposures

p1 = 0.022 // probability of getting COVID-19 from close contact

pN = 1 – p1 // probability of NOT getting it

pInfected = (1 - pN**N)

we now have to do the following:

N = 10; // Number of exposures

p = 0.022 // probability of getting COVID-19 from close contactpropInfected = 0.00169; // proportion infected people in your population

propNinf = 1 – propInfected; // proportion of people NOT infected in population

pExposure = (1 – propNinf**N);

pInfected = p*pExposure

Wait! Why haven’t we calculated pInfected in the same way? Why not just use our prior result and then multiply it by the probability of encountering someone in the population? This is because our earlier calculation was the probability of being infected if encountering N known infected people. As a matter of fact, we are not in that situation, we are in a situation where we are encountering N people who may or may not be infected. Therefore, we first calculate a probability of exposure due to N encounters and then multiply it by the chance of becoming infected due to a single encounter. So if there are 10 people in our group, using the above math:

propNinf = 0.99831 (99.8% no)

pExposure = 0.016772 (1.67% chance of exposure)

pInfected = 0.0003689 (0.03% chance of infection with no precautions)

Let’s be clear about what this means. This 0.03% chance of being infected if you live life as normal seems pretty small, except when you start actually mixing with the usual number of people. If you simply count close contacts (including due to air flow) from your shopping trip (N=30?), your training at the gym (N=20?), your working day (for me N = 100), your trip to the pub afterwards (N=30), your net number of contacts is now perhaps as high as 150-200. Let’s rerun that calculation on infection probability for 150 contacts.

pExposure = (1 – 0.99831**150) = 0.2241 = (22.41%)

pInfected = 0.022*0.2241 = 0.00493 = (0.493%)

So your chance of exposure goes up to almost 1 in 5 people overall, but the chance of infection is still, in theory, about half a percent. Those are good odds, right? But what if everyone of those 150 people has the same odds? What’s the chance that at least one of you will get COVID-19? Note that we can calculate the effect of wearing a mask or social distancing as before:

p2 = 0.15 // Odds-ratio for mask

p3 = 0.18; // Odds-ratio for social distancing

pWithMask = p2*pInfected; // Probability of being infected while wearing a mask

pWithSocialDist = p3*pInfected; // Probability of being infected while social distancing

pWithMaskPlusDist = p2*p3*pInfected; // Probability of being infected with both

In any event, even if we isolate a group except for going home, carefully getting gas or picking up groceries, or hanging out with family members who may have commitments outside of the home, there is the off chance that someone additional gets infected, and then returns to the group the next day. Therefore, the exposure rate will go up. Here’s the way to think of it. Let’s call the initial proportion of those infected who are contagious: pInfected. For West Chester:

pInfected = 0.00169

at the start of it all. Let’s say we have 10 people in the group, and one of them was unlucky enough to get infected outside our group since we last met up. This means that we have the outside population-wide proportion of people who are infected and contagious (0.00169) plus our new source of infection from within our group, which now, if self-contained, has a contagious proportion of (1/10 = 0.10). The new net exposure to infected people is now 0.10 + 0.00169 = 0.10169 (yikes!). Therefore the exposure rate, which controls the other probabilities will likely go up from day to day.

So the bottom line in our model is that we need to randomly test everyone every day to see if they have succumbed, our risk is based on both population statistics of infection (to start with), but then also changes within our group, and finally, we need to run the model a bunch of times (since it is random) to determine just how stable our results are over time.

A first run of the model with a group of 50 people and an initial proportion of infected population of 0.00169 (0.169%) leads to the following outcomes.

The solid lines in the graph show averages over the number of runs, while the dotted lines indicate the standard deviations. These results show that with the inputs as noted, after about 35 days, about half of the group is affected if no-one takes precautions. Taking precautions clearly makes for a better result.

But what if your exposure to a group is larger in that there are more people? Perhaps during your day, you come into contact with 100 people rather than 50. These results are shown below:

In this case, the spread of results is lower, but the timeline is sped up – the 50% infection mark for no precautions is now at 25 days instead of 35 days. At the end of 45 days, 90.1% of those who don’t take precautions are infected, and about 1% (0.98%) of those who social distance + wear masks are also infected.

Finally, let’s consider the possibility that the CDC is correct in their recent announcement that the national infection rate is ten times higher than what is published in the statistics. If that is true, then instead of an initial proportion of infected/contagious of 0.00169 (0.169%), it could be more like 0.0169 (1.69%). With these inputs our model looks like this:

The timeline is sped up again with the unprotected reaching a 50% infection rate by day 21-22. At the end of 45 days, the unprotected are 100% infected, and even those wearing masks and social distancing both are, on average 4.7% infected.

As noted before, there are a number of assumptions here which are not quite true. Likewise, in real groups we don’t have 100% of people doing only one thing. Rather there are likely to be a proportion of people wearing masks, social distancing, or throwing caution to the wind. In the next post, we’ll look at how to deal with these factors in order to get an even better estimate of how COVID-19 is spread. In fact, we’ll be able to see just how much of a problem it can be if you’ve got just that one person who refuses to wear a mask or social distance.

Despite the shortcomings of this model, it does highlight how quickly contagion of this nature can spread. Therefore, these results should be alarming, especially since they are quite sensitive to initial conditions (active infection rate) which we can only guess at. Even with a small initial proportion, the rate of infected in our group increases rapidly, especially for the unprotected.

Full Code below the jump...