Introduction

This assignment covers the material in topic A, including visualisation/charting and data wrangling. If you have completed all the workshops in topic A you should have already met most of the code you’ll need to complete the assignment.

The assignment is assessed and should be your own work. You’re welcome to discuss with other students, but submitted work must be your own. If you use external resources, then please ensure that you cite those resources.

There are three exercises in total, each worth the same number of marks.

Start by downloading the R markdown template for the assignment from here (right click -> Save link as)

https://www.massey.ac.nz/~jcmarsha/161122/assessment/assignmentA.Rmd

NOTE: The R markdown file contains instructions for obtaining the data. The data is student ID specific, so the region and sites you’ll be looking at in Exercise 2 will be different to other students. Make sure you follow the instructions in the .Rmd file before you start!

All answers for each exercise should be written up in the assignmentA.Rmd file. Before submission, Knit your assignmentA.Rmd file to HTML and submit the .html file to the Assignment A dropbox on stream.

Grading

There are typically multiple ways to achieve a correct answer for each question.

Exercise 1

Answer the following questions by producing code in the code blocks provided in the markdown file:

  1. In the first code block, use dplyr to manipulate the data so that you have the proportion immunised (i.e. Immunised divided by Eligible) for each DHB, for each Age, and each Date. Save the result so you can use it for the remaining questions.

    • You should end up with a data frame containing variables for DHB, Date, Age and Proportion.
    • It should have 5787 observations.
  2. One of the health targets for immunisation was having 95% of 8 month olds immunised. Which DHBs have met this target, and how often have they met it?

    • Use filter to find the rows you want, and then count by DHB.
  3. Produce a plot of the proportion immunised at 8 months for each DHB through time, comparing them to the 95% health target.

    • The geom_hline() function can be used to put a horizontal line on the plot.
    • Consider how to clearly differentiate each DHB.
    • Make sure you clearly label axes and have a suitable title.
  4. Choose ONE (1) DHB (e.g. perhaps where you grew up, if you did so in NZ, or where a friend or family grew up if not from NZ) and produce a plot to demonstrate how that DHB is tracking over time for all age groups.

    • Consider smoothing the data
    • Make sure you can clearly distinguish the different age groups.
    • Make sure you clearly label axes and have a suitable title.

Exercise 2

Your goal in this question is to use river quality data from Land and Water Aotearoa https://www.lawa.org.nz to reproduce the chart below as best as you can.

  1. The first code block then performs some data wrangling. Describe what data manipulation is being performed in the second code block. Ensure you detail the purpose of each command in the pipeline.

  2. Produce the plot by adding code to the third code block.

    • The student-specific datasets cover portions of Aotearoa New Zealand, and so some of them may not have all four LandCover types (e.g. might be missing the Exotic forest group). That is OK! All datasets have at least 3 groups.
    • Each curve represents a SiteID. You may find the group aesthetic to be useful.
    • The colours used are #7C7189, #FAE093, #D04E59 and #BC8E7D.
    • The same amount of transparency has been used for both the points and the uncertainty bands.
    • Don’t worry too much about matching font size as it may depend on your computer settings.

Exercise 3

This question uses data from the automatic counter on He Ara Kotahi, the pedestrian and cycling bridge over the Manawatu river.

  1. In the first code block, write code to create a table of the mean daily counts over the bridge (regardless of direction) for each day of the week by each mode (pedestrian or cyclist).

    • The wday() function from the lubridate package will be useful.
    • Pivoting wider will give a more readable table.
  2. Write code to create a table of the total monthly counts over the bridge (regardless of direction) for each month by each mode (pedestrian or cyclist).

    • The month() function from the lubridate package will be useful.
  3. As seen on the plot below, typically there are more pedestrians crossing in the To Massey direction compared to the To City direction, as many people walking do a loop, coming back across the river using the vehicle bridge at Fitzherbert.

    However, during the period recorded there was a running event which started at Massey and had participants run across the bridge from Massey towards the City. Use the data to find the date on which this event was held. You must include the code you use to find the answer.

    • Pivotting the data wider by Direction will help.