Homework 2

Overview

Please provide a reproducible .Rmd script that answers all questions below and produces all of your analysis and plots. Some reminders:

Reproducible: This means when I run your .Rmd file, it will run all analyses and create all plots without errors, without me having to reset my working directory, and without forcing me to install anything on my machine (i.e., use the require(librarian); shelf(your packages, lib = tempdir()) approach as in the code block below). This also means there should be NO ERRORS when I run it!
Format: Answer each question with text and code, plus figures/tables where appropriate.

Background

“Mount Lofty in the Adelaide Hills” by Ruben Schade is licensed under CC BY-NC 2.0

The discipline of landscape ecology frequently postulates that the spatial pattern of habitat is important, in addition to local characteristics such as patch area, vegetation type, and climate. Westphal et al (2003) analyzed data from the South Australian Bird Atlas using a series of landscape pattern metrics estimated at 3 spatial scales. They concluded that landscape structure had a positive effect on many bird species. However, this dataset was never designed to be analyzed using logistic regression, and consequently their conclusions were somewhat weak, and badly compromised by model selection uncertainty. They used AIC rather than AICc, because the number of datapoints was relatively large (n=499) compared to the number of parameters in the most complex model (K = 5) so n/K ~ 100 and AIC is probably adequate. However the number of models considered (R = 45) is quite large. In addition, because of the way the Atlas data were archived, it was not possible to directly compare the effect of local patch variables to landscape pattern covariates.

At the same time, Dr. Scott Field and colleagues of the University of Adelaide collected an independent data set at 34 woodland sites in the Mt. Lofty Ranges using a standard 2 ha, 20 min timed count procedure in each of 2 years. Field et al. (2002) describe some of the issues related to designing this survey; we have data for 34 of the 38 sites in that study. The bird data has sites and years in rows, and bird species in columns; for a given site in a particular year, a 1 indicates that the species was observed at least once out of three visits to the site, and a zero indicates that the species was not observed in three visits. This provides some correction for the problem of false negatives. FYI: if you’re thinking “duh–these folks should have just used an occupancy model,” check out the date of the OG occupancy modeling papers (e.g., MacKenzie et al. 2002 in Ecology).

Here are citations for the papers mentioned above:

Westphal, M. I., Field, S. A., Tyre, A. J., Paton, D., & Possingham, H. P. (2003). Effects of landscape pattern on bird species distribution in the Mt. Lofty Ranges, South Australia. Landscape Ecology, 18, 413-426.
Field, S. A., Tyre, A. J., & Possingham, H. P. (2002). Estimating bird species richness: how should repeat surveys be organized in time?. Austral Ecology, 27(6), 624-629.

Do not use the analysis presented in these papers, although you can refer to the paper for the ecological background.

Data descriptions:

We have two datasets for this homework. First, we have the bird occurrence data in mlrbird:

Year - year codes.
Patch - patch code
Name - Name of survey location
East - UTM easting
North - UTM northing
All the rest are four-letter bird species codes.

We also have landscape pattern variables similar to those used by Westphal et al at 2 km, 5 km, and 10 km scales, as well as three “local” covariates. Each of the landscape variables starts with the name, followed by 2k, 5k, or 10K according to which buffer size was used. The variables used by Westphal et al are indicated with an asterisk, and “local” covariates are indicated by two asterisks. These landscape data are in mlrland. Note that these data have NOT been modified to reduce correlations among the variables as Westphal et al described (This is a hint, maybe).

NOTE: There are twice as many rows in mlrbird than mlrland. This is because the bird surveys were conducted over two years, but landscape variables were only collected once… How should you deal with this?

Patch - patch code
Name - Name of survey location
*TLA - total landscape area
NumP - number of patches
MPS - mean patch size
LPI - largest patch index (% of TLA covered by largest patch)
*LSI - landscape shape index (measure of edge) = 1.0 if one circular patch, increases with edge
PSSD - patch size std. Dev.
*MNN - mean nearest neighbor distance (not if NumP is 0, it should be NA not 0 - check)
TE - total edge
*MPAR = (TE/TLA/NumP) = mean patch perimeter area ratio. Note that MPAR is a derived variable not found in the provided dataset.
**MRF - mean annual rainfall
**PA - patch area
**PP - patch perimeter

# List of packages necessary to run this script:
require(librarian, quietly = TRUE)
shelf(tidyverse, cowplot,
      AICcmodavg, # for aic.tab, model averaging, etc.
      mgcv, # for qq.gam
      quiet = TRUE,
      lib = tempdir())

# Load data:
mlrbird <- 
  read.csv("https://github.com/LivingLandscapes/Course_EcologicalModeling/raw/master/data/mlrbird.csv")
mlrland <- 
  read.csv("https://github.com/LivingLandscapes/Course_EcologicalModeling/raw/master/data/mlrland.csv")

# NOTE: You will need to join these datasets to conduct the analysis.

Homework questions

Describe a set of 7 - 10 hypotheses (i.e., models comprised of combinations of covariates) that will allow you to compare patch vs. landscape explanations for variation in patch use of woodland birds, and estimate the best scale for landscape effects.
Fit your selected models to the data using binomial generalized linear models (GLMs) for at least two bird species. I recommend using grcu (Grey Currawong), silv (Silvereye), or yfhe (Yellow faced Honeyeater) – at first glance they appear to have something interesting to say with this limited dataset.
Use AICc – based model selection to compare your approximating models with each other, and draw conclusions about the relative importance of patch vs. landscape effects for the species you considered.
Calculate a model averaged estimate for one or more of the biologically important parameters in your models. For example, the effect of patch area (PA) is directly relevant to estimating the effects of proposed land use changes (e.g. vegetation clearance) on bird species.
Create model ranking tables (AICc tables) and create “marginal effects” plots for the model-averaged parameters from question #4. Marginal effects tells us how the response variable changes when a specific predictor variable changes. Other covariates held constant (e.g., at zero, at their means, etc.). There are multiple ways to generate marginal effects plots (e.g., ggeffects R package, use the base R package ‘predict’ function), but make sure the confidence interval calculations are correct (Hint: you may want to revisit Question #6 from Homework #1 for this).
Interpret the results of your model selection and model averaging.