Organised by
R Ladies+ Melbourne
Dionne and Kirsty and we are from
R Ladies+ Melbourne.
We are here to teach you how to code in R!
Coding is writing instructions that a machine or computer can understand to perform a task.
 
 
 
R is coding language used in statistics, data science and many other disciplines. It is also very powerful tool for visualising data.
āļø R is free, open source and everyone can access
š„³ It can do so many things
š Lots of resources to learn and perform tasks, with a very large user community
š§āš You can get a super cool job ! Itās a fantastic skill that is highly sought after in the work force
šŗļø What is an infectious disease, what is an outbreak and who are disease detectives?
š How can R be used to investigate outbreaks ?
Then YOU! are going to solve an outbreak !
𦠠Infectious diseases are caused by certain microorganisms, such as bacteria, viruses, parasites or fungi, and will make people (or animals) sick when they are exposed to these microorganisms.
š Most microorganisms donāt make us sick, but some do, and these are known as pathogens.
𤧠There are lots of different ways that you might get an infectious disease. Some ways include breathing respiratory droplets, eating contaminated food or water, insect bites and many many more !
šThe World Health Organisation definition of an outbreak:
š” āoccurrence of disease in excess of normalā
This might be because of a new pathogen, bringing pathogens to new areas, a mutating pathogen.
Many infectious diseases now have vaccines that mean outbreaks to these pathogens occur much less frequently (for example: measles, chicken pox, influenza etc.)

The COVID-19 pandemic was caused by a virus known as SARS-CoV2. It caused a global pandemic.
A pandemic is defined as an infectious disease affecting multiple parts of the world simultaneously, i.e. multiple countries are experiencing an epidemic.
There are now several vaccines that have helped to reduce the severity of this infection.

Norovirus causes vomitting and diarrhoea (you might call this gastro).
It is extremely stable in the environment and highly contagious.
Ingesting norovirus causes disease.
There is no vaccine available for norovirus.

Buruli ulcer is caused by a bacteria known as Mycobacterium ulcerans.
The bacteria is slow growing and flesh eating and is endemic to Melbourne and surrounds.
Endemic means that there is an ongoing transmission of a pathogen in that area.
Mode of transmission not entirely understood but likely involves mosquitoes and possums.
Public health departments in the government
Academic research at Research Institutes and Universities
Doctors and health staff
And many others too !
An epidemiologist is someone who studies howā¦
āto keep the public informed on different public health issues and offer solutions to keep communities safe. They perform studies on outbreaks, their causes, transmission and effect on the public, collating that information into accessible data and health recommendations.ā
Source: What is epidemiology and what does an epidemiologist do?
What is the problem?
What is the cause?
What can we do to make the situation better?
Unknown illness causes students to fall ill at local school.
Local authorities are unsure of the cause of this outbreak, but it appears to be linked to a recent excursion to Melbourne CBD
A few days earlier, students at a local school went into Melbourne to see an exhibition at the National Gallery of Victoria.
š At 8am, they took the bus to the NGV.
šØ They were in the gallery from 9am - 12pm.
š² They had lunch from 12pm - 2pm at three different locations.
š§āšØ They went back to the gallery for another exhibition between 2 -4 pm.
š” At 4pm they were back on the bus to go home.
And you have been given some information including:
Name
Class
Age
Sex
Height
Weight
Was the student feeling sick
What were their symptoms
And we are going to do this in R !
We can save important information for later.
We can use <- or =
[1] "My name is Joy"
[1] "My name is Joy"     "My name is Anger"   "My name is Sadness"
Your turn !
[1] 16440
Try saving a number or a word to a meaningful name - can you see where it goes?
First we need to load in the data and tools to help us.
One of the great things about R is that there are lots of pre-installed commands that we can access or ones that others help to create and we can access. These are our āpackagesā that we access with the library() command.
outbreak_data?š”The number one step in data analysis is looking at the data
For this, we are going to use our first pre-installed command called colnames().
 [1] "firstname"           "class"               "height"             
 [4] "weight"              "age"                 "sex"                
 [7] "temperature"         "sickness"            "shortness_of_breath"
[10] "chills"              "palpitation"         "bloody_stools"      
[13] "pain_chest"          "abdominal_cramps"    "dizziness"          
[16] "nausea"              "vomiting"            "vertigo"            
[19] "cough"               "fever"               "diarrhea"           
[22] "constipation"        "headache"            "pain_abdominal"     
[25] "throat_sore"         "muscle_pains"       
āCan you find outbreak_data in the environment pane?
š§ What happens if you click on it there?
In outbreak_data we have several kinds of information and itās important to identify what these are !
How would you describe what data we have in Firstname?
How is this different to Weight ?
What are some terms we could use to define these ?
The types of data we have to work with will determine how it is best to visualise !
A scatter plot

Notice both the x and the y-axis are things we can count or measure.
A boxplot

Notice the y-axis are things we count or measure but the x-axis are specific groups.
outbreak_dataWe will be using a function called ggplot(). This is a fantastic tool we can use to make amazing graphs !
Like any graph we need to decide:
š§āšØother ideas like colour, design, groups ⦠weāll get to this later !
outbreak_dataaes is referring to aesthetics, these are all the things we are going to customise.
Next we choose what kind of graph we need
geom_point() or geom_boxplot()Letās first try geom_point().
We use the + sign to add it to our first ggplot command
Where would we put geom_boxplot() ?
geom_point() or geom_boxplot()
Can you change size, shape and colourā¦what kinds of plots can you make ?

If youāre feeling stuck, try this:


If youāre feeling stuck, try this:

š§ If we donāt know what R is doing, we can always get help !
The fill can be specified as aesthetics like the x and y axis.
Can you specify the fill with our Sickness information about each student ?

This symbol %>% is used to send our data into another command such as summarise()
As the name would suggest, we can generate summaries, such as calculating the median
group_by() with multiple groupsJust like our boxplot, we want to know the median temperature of the students, grouped by their Class and Sickness status
šTry to do this by adding group_by(class, sickness)
How can we relate this to the boxplot we made earlier ?
group_by() with multiple groups# A tibble: 8 Ć 3
# Groups:   class [4]
  class sickness median
  <chr> <chr>     <dbl>
1 A     no         36.5
2 A     yes        39  
3 B     no         37  
4 B     yes        39  
5 C     no         37  
6 C     yes        39  
7 D     no         37  
8 D     yes        39  
We are going to use a new type of graph to do this, itās called a bar plot.
As well as the bar plot, we can also use the count() command
Sickness summarises all of the symptoms. Now itās time to get specific!
Can you make some other bar plots to see what the students were sick with?
š” Remember we can use colnames(outbreak_data) to find out what information we have in outbreak_data!
𤩠You might also like to try customising your plot. Remember we can change the colour by specifying fill = within the geom_bar() command. You might also like to try aes(fill = ).

Data can be formatted in different ways.
For example, prioritise having one row per individual with lots of columns.
This is like our outbreak_data.
Or sometimes we can change the format and so instead we have a long list of one observation per row.
This is what we have in our symptoms data.
For the next few plots we are going to use symptoms and not outbreak_data, however they record exactly the same thing.


This doesnāt necessarily help us to decide what is making the students sick because they are not separated as to whether they were sick or not
We really want to separate those that have a YES in Sickness
This needs the filter() command


What does this graph tell us?
What are the symptoms of those that are sick?
What might be some causes of illness that give these symptoms?
A bacterial infection
Commonly from contaminated food
Examplesā¦undercooked chicken, raw egg, insufficient cleaning
You can start to feel sick within 6 to 36 hours after eating
šØ At the gallery?
š On the bus?
š² At lunch?
š² We have just been sent some information about where the studentās ate lunch !
They ate at three different restaurants !

š¦ What are pathogens and what are outbreaks
š Who are epidemiologists
š¢ What is R
š How to make plots with ggplot
š How to organise and arrange complex data
Our online book of todayās lesson
Scan the QR code on our poster!

It Takes a Spark !
and to all of you for helping solve the outbreak !
