--- title: "Point patterns - estimation and modelling disease data: Portfolio" author: "YOUR NAME HERE" date: "Week 11" output: html_document --- ## Introduction In this worksheet we will continue with using kernel density estimates to estimate the spatial intensities of point patterns from Poisson processes. We'll first look at how we can extend our ideas of KDEs to look at adaptive bandwidths, will then look at the importance of denominators, and how we can compute risk surfaces. Finally, we'll look at how we might model the spatial intensity of a point pattern by incorporating spatially varying covariates. **You should work through the worksheet_week11.Rmd worksheet before this one. You should then work through this worksheet and fill in your answers, Knit, and hand the .html into the dropbox on stream.** ## Packages Today we'll be using the `tidyverse` for ggplot charts, `sparr` for our spatial analyses, and will use `terra` and `stars` for loading up a pixel image. You might need to install these if you haven't done so already. Let's load them: ```{r, message=FALSE} library(tidyverse) library(sparr) library(terra) library(stars) ``` ## Data The data for this worksheet consists of locations of cases of campylobacteriosis within Palmerston North in 2005-2007. In addition, we have locations of controls, sampled from the population of Palmerston North to act as a proxy of where people live. We'll load them off the web, setup an appropriate rectangular window, and convert to point patterns: ```{r, message=FALSE} cases <- read_csv("https://www.massey.ac.nz/~jcmarsha/233214/data/palmy_cases.csv") controls <- read_csv("https://www.massey.ac.nz/~jcmarsha/233214/data/palmy_controls.csv") palmy.owin <- owin(xrange=c(2725000, 2740000), yrange=c(6085000, 6097000)) cases.ppp <- as.ppp(cases, W = palmy.owin) controls.ppp <- as.ppp(controls, W = palmy.owin) ``` The coordinates are linear, so we don't need to worry about the CRS for the analysis that we're doing, or for our planer plots. We would need to worry about this if we wanted to align the points with other mapping tools! Our key question concerns the distribution of campylobacteriosis cases relative to the distribution of people (i.e. risk of campylobacteriosis in people) and whether this might be related to social deprivation, as measured by the social deprivation index. This is plausible for a few reasons: 1. People in areas that are relatively more affluent may have better access to healthcare, so may be more likely to be diagnosed. 2. Consumption of contaminated chicken is the main source of campylobacteriosis, and people in areas that are relatively more affluent may consume more fresh chicken. Frozen chicken, even if originally contaminated is typically not a risk of infection. ## Data exploration We'll start by plotting the data using base R and ggplot: ```{r} plot(palmy.owin, main="Palmerston North") points(controls.ppp, col='black') points(cases.ppp, col='red') combined <- bind_rows(control=controls, case=cases, .id = 'type') ggplot(combined) + geom_point(mapping=aes(x=x, y=y, col=type), alpha=0.5) + coord_fixed() + theme_void() ``` ### Q1: Looking at the above plot, would you think the point processes for either cases or controls are random across the area? WRITE YOUR ANSWER HERE Let's use `sparr` to produce adaptive kernel density estimates of cases and controls: ```{r} case.bd <- bivariate.density(cases.ppp, h0 = 800, adapt=TRUE) plot(case.bd, main="Cases") control.bd <- bivariate.density(controls.ppp, h0 = 800, adapt=TRUE) plot(control.bd, main="Controls") ``` And use this to form a relative risk surface: ```{r} rr <- risk(case.bd, control.bd) ``` ### Q2: Alter the below code to colour using a diverging scale centered at log relative risk of 0. Are there areas of increased or decreased risk of campylobacteriosis? ```{r} plot(rr) ``` DISCUSS THIS HERE ## Modelling Below is a simple model of cases of campylobacteriosis in terms of the density of controls (i.e. the population) ```{r} control.im <- control.bd$z out <- ppm(cases.ppp ~ log(control.im)) out ``` ### Q3: Is this the result you expect? Why, or why not? WRITE YOUR ANSWER HERE The following code reads in a raster of the Social Deprivation Index (SDI), converts it to a pixel image for use in our `ppm` modelling, and plots it. A value of 1 indicates low social deprivation (i.e. relatively affluent) whereas a value of 10 indicates high social deprivation (i.e. relatively poor). ```{r} sdi.rast = rast("https://www.massey.ac.nz/~jcmarsha/233214/data/palmy_SDI.tif") sdi.im <- sdi.rast |> st_as_stars() |> as.im() plot(sdi.im) ``` ### Q4: In the code block below, model the case distribution using the control density and social deprivation. What are your conclusions? ```{r} ``` DISCUSS THIS HERE