Homework 1

Overview

Please provide a reproducible .Rmd script that answers all questions below and produces all of your analysis and plots. Some reminders:

Reproducible: This means when I run your .Rmd file, it will run all analyses and create all plots without errors, without me having to reset my working directory, and without forcing me to install anything on my machine (i.e., use the require(librarian); shelf(your packages, lib = tempdir()) approach as in the code block below). This also means there should be NO ERRORS when I run it!
Format: Answer each question with text and code, plus figures/tables where appropriate.

Background

“Coral Bleaching” by NOAA’s National Ocean Service is marked with Public domain mark 1.0.

For our first homework assignment, we will be modeling the relationship between sea surface temperature anomalies and coral bleaching events. This data is derived from a paper on the regional severity of coral bleaching events triggered by high ocean temperatures:

McWilliams, J et al (2005) Accelerating impacts of temperature-induced coral bleaching in the Caribbean. Ecology 86(5):2055-2060.

Do not use the analysis presented in the paper, although you can refer to the paper for the ecological background.

Data column descriptions:

SST: sea surface temperature “anomalies.” That is, the difference between the average sea surface temperature for the Caribbean between Aug and October, and the long run average from 1961 to 1990. Positive values represent years in which the water was warmer than average; negative values represent cool years.
LOGBLEACH: the log10 transformed percentage of 1 degree latitude/longitude cells reporting at least one coral bleaching event in a given year.
MASS: an indicator marking those years which were considered “Mass Bleaching Events”. 1 = mass bleaching event; 0 = not a mass bleaching event.

# List of packages necessary to run this script:
require(librarian, quietly = TRUE)
shelf(tidyverse, # for tidying data
      mgcv, # for qq.gam
      quiet = TRUE,
      lib = tempdir())

# Load data:
bleach <- 
  read.csv("https://github.com/LivingLandscapes/Course_EcologicalModeling/raw/master/data/bleach.csv")

Homework questions

Create two models:
1. a linear regression model ( function = lm() ) relating LOGBLEACH to SST, with LOGBLEACH being the response variable.
2. a generalized linear model ( function = glm() ) relating MASS to SST, with MASS being the response variable.
  - NOTE: for this model, you will need to decide on the family and link function.
Assess model assumptions (i.e., check model diagnostics) for both models.
1. Linear regression: revisit the Linear models and probability distributions lab for information on how to do this.
2. Generalized linear model: depending on your family/link choices, model diagnostics that assume normality may be useless. Instead, check out the mgcv::qq.gam function documentation and examples. Also, check out Ben Bolker’s logistic regression walkthrough, especially page 5.
Summarize results from both models, as well as your model diagnostics. Be careful to differentiate the linear regression results from the generalized linear model results.
The authors log transformed the bleaching variable “… to generate linear relationships with temperature.” In R, back transform LOGBLEACH to the original percentage scale, and plot against SST (No need to re-run the model). Does a log-transformation make ecological sense, given the type of process and the data?
At what range of values does the linear regression model (still using the LOGBLEACH response variable) predict 50% bleaching will occur? Provide an estimate of uncertainty for this prediction (Hint: use the predict.lm() function with options newdata and interval=”prediction”. See help(predict.lm) ).
Given the generalized linear model results, at what value of SST is there a >50% chance of a mass bleaching event occurring? Provide an estimate of uncertainty for this prediction. (Hint: check out the differences between ?predict.lm() and ?predict.glm(). I suggest setting ‘type = link’, ‘se.fit = TRUE’, and transforming the resultant predictions into probabilities using the plogis() function).
Choose your favorite model and create a figure showing the relationship between the response and predictor. Include confidence intervals. There are multiple ways to create predicted effects plots (e.g., ggeffects R package, use the base R package ‘predict’ function). But be careful that the confidence intervals are correct (see hint in question #6)!