LIME: Length-based Integrated Mixed Effects

Package Description

Developed by Rudd and Thorson (2017), Length-based Integrated Mixed Effects (LIME) is a flexible stock assessment method for fisheries with developing data collection programs and/or limited capacity for monitoring. For these fisheries, it is often easier to collect length measurements from fishery catch than to quantify total catch.

Conventional stock assessment tools that rely on length measurements without total catch do not directly account for variable fishing mortality and recruitment over time. However, this equilibrium assumption is likely violated in almost every fishery, degrading estimation performance. LIME is an extension of length-only approaches that accounts for time-varying recruitment and fishing mortality.

The LIME model uses length data and biological information to estimate stock status. Key attributes of LIME include:

  • Accounting for time-varying fishing mortality and recruitment

  • Requirement of at least 1 year of length composition data of the catch, and assumptions about growth, natural mortality, and maturity

  • Estimation of annual fishing mortality, length at 50% and 95% selectivity, and recruitment variation

  • Derivation of random effects for time-varying recruitment

  • Fitting to multiple years of length composition data and/or catch and/or an abundance index time series, if available.

  • Estimation of spawning potential ratio reference points (and MSY-based reference points if there is information on scale, e.g. catch data)

LIME requires at least one year of length composition data and assumptions about biological parameters. It can also integrate more years of length composition data, a time series of catch, and/or an abundance index. The mixed effects structure accounts for time-variation in fishing mortality and recruitment, making it more effective than conventional stock assessment tools that rely only on length measurements.

This package is run from R as found on CRAN. R can be installed https://cran.r-project.org/.

The recommended way to run R is using RStudio. After you have installed R, then install RStudio. It can be found https://www.rstudio.com/products/rstudio/download/.

Installation

The website https://github.com/merrillrudd/LIME/wiki/2---Introduction-and-installation provides instructions on package installation.

###################################################
# Installing packages from GitHub directly
# NB Update all your packages before installing
##################################################


check.packages <- function(pkg){
  new.pkg <- pkg[!(pkg %in% installed.packages()[, "Package"])]
  if (length(new.pkg)) 
    install.packages(new.pkg, dependencies = TRUE)
  sapply(pkg, require, character.only = TRUE)
}

check.packages("devtools")

#require("devtools")
#install.packages("devtools")

You now need to install the R package Template Model Builder (TMB): - Windows: https://github.com/kaskr/adcomp/wiki/Windows-installation - Linux: https://github.com/kaskr/adcomp (See README) - Mac: should be the same as Linux, please contact via email at mbrudd@uw.edu if that doesn’t work.

# Run once devtools is successfully installed:

# Install the required package TMBhelper:

github_pkg = "kaskr/TMB_contrib_R/TMBhelper"
check.packages2 <- function(pkg,github_pkg){
  new.pkg <- pkg[!(pkg %in% installed.packages()[, "Package"])]
  if (length(new.pkg)) 
    devtools::install_github(github_pkg)
   # install.packages(new.pkg, dependencies = TRUE)
  sapply(pkg, require, character.only = TRUE)
}

github_pkg = "kaskr/TMB_contrib_R/TMBhelper"
check.packages2("TMBhelper",github_pkg=github_pkg)
## Loading required package: TMBhelper
## TMBhelper 
##      TRUE
#devtools::install_github("kaskr/TMB_contrib_R/TMBhelper")

# Now, install LIME:
github_pkg = c("merrillrudd/LIME")
check.packages2("LIME",github_pkg=github_pkg)
## Loading required package: LIME
## LIME 
## TRUE
#devtools::install_github("merrillrudd/LIME",dependencies=TRUE)   

# Finally, load the LIME and tidyverse packages. You may have to install the tidyverse package from "Tools -> Imstall Pacakages":

library(LIME)
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 3.6.2
## -- Attaching packages -------------------------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.2     v purrr   0.3.3
## v tibble  2.1.3     v dplyr   0.8.3
## v tidyr   1.0.0     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.4.0
## Warning: package 'ggplot2' was built under R version 3.6.3
## Warning: package 'tidyr' was built under R version 3.6.2
## Warning: package 'readr' was built under R version 3.6.1
## Warning: package 'purrr' was built under R version 3.6.2
## Warning: package 'dplyr' was built under R version 3.6.1
## Warning: package 'forcats' was built under R version 3.6.2
## -- Conflicts ----------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(formatR)
## Warning: package 'formatR' was built under R version 3.6.3

NB Once you have installed LIME, you don’t need to run the install.packages code chunk again.

At https://github.com/merrillrudd/LIME/wiki/2---Introduction-and-installation, it is stated that “the LIME package can be used in two ways: 1) simulating the expected age-structured population dynamics, length composition, catch data, and an abundance index using LIME and 2) fitting to empirical length data (at a minimum), plus any available catch and/or abundance index data, to provide an estimate of the spawning potential ratio (SPR).

We focus here on the second application.

LIME has been developed for data-limited fisheries, where few data are available other than a representative sample of the size structure of the vulnerable portion of the population (i.e., the catch) and an understanding of the life history of the species. LIME relaxes the equilibrium assumptions of other length-based methods by estimating annual fishing mortality and recruitment variation (among other parameters), deriving annual recruitment as a random effect.

See Rudd and Thorson (2017) in the reference list for full details of the model, including simulation testing to evaluate performance across life history types, population variability scenarios, and data availability scenarios, as well as violations of the model assumptions."

The description below is to fit our own empirical length data using LIME. Although we describe the steps below, there are good descriptions in

https://github.com/merrillrudd/LIME/wiki, and

https://github.com/merrillrudd/LIME/blob/master/vignettes/LIME.pdf

The latter is the full user manual, on which we are basing our description.

Specifying the biological inputs and starting values:

The user is required to input the following parameters to create_lh_list for simulation and estimation:

Minimum inputs: Biology * linf : von Bertalanffy asymptotic length vbk : von Bertalanffy growth coefficient M : Annual natural mortality rate lwa : Length-weight scaling parameter lwb : Length-weight allometric parameter M50 : Length or age at 50% maturity maturity_input *: Whether M50 input is in “length” or “age”

Minimum inputs: Exploitation * S50 : Length or age at 50% selectivity S95 : Length or age at 95% selectivity selex_input *: Whether S50 and S95 are in “length” or “age”

The S50 and S95 values are used as starting values when estimating the length or age at 50% and 95% selectivity

There are a range of other biological, exploitation and variation inputs that can be included; see pages 3-5 of https://github.com/merrillrudd/LIME/blob/master/vignettes/LIME.pdf.

For our worked example, we will specify the minimum inputs, using female growth rates and the default values for non-minimum inputs.

lh <- create_lh_list(vbk = 0.15, linf = 81.44505, t0 = -1.87856856036483, lwa = 0.00000244, lwb = 3.34694,
S50 = c(3.96802434600842), S95 = c(13.141251755892), selex_input = "age", selex_type = c("logistic"),
M50 = 45, maturity_input = "length", M = 0.20, binwidth = 1, CVlen = 0.1,
SigmaR = 0.737, SigmaF = 0.3, SigmaC = 0.1, SigmaI = 0.1, R0 = 1, Frate = 0.1,
Fequil = 0.25, qcoef = 1e-05, start_ages = 0, rho = 0.43, theta = 10, nseasons = 1,
nfleets = 1)

As per the LIME.pdf vignettes:

“Now we should check out the biological parameters and selectivity we’ve created. Note that even if maturity and selectivity are input by length, create_lh_list converts to age and outputs both selectivity-at-age by fleet (lh\(S_fa) and selectivity-at-length by fleet (lh\)S_fl)”.

par(mfrow=c(2,2), mar=c(4,4,3,1))
plot(lh$L_a, type="l", lwd=4, xlab="Age", ylab="Length (cm)")
plot(lh$W_a, type="l", lwd=4, xlab="Age", ylab="Weight (g)")
plot(lh$Mat_l, type="l", lwd=4, xlab="Length (cm)", ylab="Proportion mature")
# plot selectivity for the first (and only) fleet (first row)
plot(lh$S_fl[1,], type="l", lwd=4, xlab="Length (cm)", ylab="Proportion selected to gear")

The LIME.pdf vignettes next recommend checking the time step regarding predicted growth:

“Particularly for short-lived fish, it is possible that an annual time step is too coarse of a time scale to capture individual growth between years. For example, if a fish grows rapidly between ages 1 and 2, it is possible the probability of being a length given age will result in a negligible probability of being certain lengths.However, it is likely those lengths will be observed, and in this case LIME will not be able to fit the data well.

“To check for this issue, we recommend plotting the probability of being in a length bin given age to make sure there is greater than negligible probability of observing all lengths.

“For tips on what to do when there is a negligible probability of observing some lengths, see the”Format data" section in the vignettes“.

plba <- with(lh, age_length(highs, lows, L_a, CVlen)) # create the matrix of probability of being in a length bin given age
ramp <- colorRamp(c("purple4", "darkorange"))
col_vec <- rgb( ramp(seq(0, 1, length = nrow(plba))), max = 255)
par(mfrow=c(1,1))
matplot(t(plba[-1,]), type="l", lty=1, lwd=3, col=col_vec, xaxs="i", yaxs="i", ylim=c(0, 0.5), xlab="Length bin (cm)", ylab="Density")
legend("topright", legend=lh$ages[seq(2,length(lh$ages),by=3)], col=col_vec[seq(2,length(lh$ages),by=3)],, lwd=3, title="Age")

Data inputs to LIME

LIME contains functions to simulate a population and generate data. These are detailed in the vignettes.

Here, we are using our own test dataset. This needs to be appropriately formatted for LIME.

Length composition data

Length-composition data can be formatted as a matrix, or, where there are multiple fleets, as an array or list.

As our test data has only one fleet, we format the length-composition data as a matrix, with the years along the rows and length bins along the columns.

Let us now read in these length-composition data.

The rows of the length data matrix must be labeled with the years (either 1971, 1972, etc. or 1, 2, etc.) and the columns must be labeled with the midpoints of the length bins.

We include code to format the matrix in this manner. In our case, we have multiple years of length frequency data with headers. This data have the first column as the year (n=31), the next 3 columns as the fleet, sex and Stage1_wght, and then the next 17 columns as 17 length bins. We will need to remove the first 4 columns. We will do this by reading in our data as “temp”, and reformatting this before assigning it to the required “Len2”.

NB We are assuming that you have saved our test data in the same directory as the R markdown code and that you have opened a new R session. Otherwise you can change our code to point to the directory where the data file is and call it your working directory using the setwd command.

temp <- read.csv("LengthData.csv",header=T)

### assign the first column to be years, and columns 5:21 as the length-frequency data
years <- temp[,1]
LenFreq <- as.matrix(temp[,5:21])
rownames(LenFreq) <- years

## adjust with the true midpoints
mids <- seq(10, by = 5, length.out = ncol(LenFreq))
colnames(LenFreq) <- mids

Catch index

(NB LIME can also use abundance data, if available - see vignettes)

Our test data also includes a time series of catch, which LIME can use.

This should be in matrix form, with fleets along the the rows and years along the columns. With one fleet, the time series should still be in matrix form rather than a vector (the model will look for the first row, for the one and only fleet).

Note that the length-frequency example data are from 1971-2001. The number of years of catch data should match the total number of years modelled, so we will need only the catch data from 1971-2001.

(That is, for our example data, we only need rows 12-42).

We have observed catch-by-age data; we will read this in and sum over the ages.

Note that the catch data columns must be labelled with the years. Years must match up across the catch and length-frequency data sets.

If there are missing years in the length data, all length bins can be filled with zeros (they are not registered as true zeros, but a row summing to 0 is a flag not to fit to that row).

If there are missing years in the catch data, a negative value can be inserted for that year.

We again assume you have saved the data to the same directory as your working directory.

temp <- read.csv("TotalCatchAge.csv",header=T)
Catch <- as.matrix(t(apply(temp[12:42,],1,sum)))

##MR edit
colnames(Catch) <- years 

Data input

Observations

LIME requires inputing observed data in a list form, including: . years: inclusive years to be modeled . LF: length frequency data in matrix, array, or list form . neff_ft: effective sample size to use in case of multinomial distribution (otherwise use Dirichlet multinomial to estimate effective sample size, or assume nominal sample size = effective sample size if not included) . I_ft: Index of abundance by fleet and time (if applicable) . C_ft: Catch by fleet and time (if applicable) in numbers or biomass (to be specified later)

We don’t have sample sizes for our test data set, but this is okay - under default run_LIME arguments, the effective sample size is estimated using the Dirichlet-multinomial likelihood so you don’t need to specify them. The input sample sizes are only used with the multinomial likelihood.

data_all <- list(years = years, LF = LenFreq, C_ft = Catch)

Life history and starting values

LIME includes a function create_inputs which checks the length frequency data and includes all input data, life history information, and starting values in a list together to input into LIME. create_inputs requires:

  • lh: life history and starting values list output from create_lh_inputs

  • input_data: input data list described in Data input section above.

inputs_all <- create_inputs(lh = lh, input_data = data_all)

Fitting the model to our data

Per the vignettes:

LIME is set up to estimate certain parameters and fix others by default.

LIME estimates by default:

  • log_F_ft: matrix of fishing mortality estimates in log-space by fleet (rows) over time (columns)

  • log_sigma_R: recruitment standard deviation in log-space

  • log_S50_f: length at 50% selectivity in log space, one for each fleet

  • log_Sdelta_f: difference between length at 95% selectivity and length at 50% selectivity in log-space, so that length at 95% selectivity can never be less than length at 50% selectivity, one for each fleet

  • Nu_input: temporal variation in recruitment, treated as random effect

Parameters estimated by default under certain conditions:

  • log_theta: Dirichlet-multinomial parameter relating to effective sample size (only estimated if using Dirichlet-multinomial to fit to length composition data, where run_LIME argument LFdist=1)

  • beta: equilibrium recruitment in log-space, serves as population scaling parameter (only estimated if fitting to catch data, run_LIME argument data_avail must include “Catch”)

  • log_q_f: catchability coefficient in log-space, one for each fleet (only estimated if fitting to an index, run_LIME argument data_avail must include “Index”)

Other parameters that could be estimated but are fixed by default:

  • log_sigma_C: observation error on catch, fixed at log(0.2)

  • log_sigma_I: observation error on abundance index, fixed at log(0.2)

  • log_CV_L: coefficient of variation on predicted age-length curve, fixed at log(0.1)

The function run_LIME used to run LIME models has many settings, but hopefully most models will keep the defaults and use only a few arguments.

Basic inputs to run_LIME:

  • modpath: model path to save results and flags; default=NULL to save in R environment locally

  • input: list output from create_inputs

  • data_avail: what data types will LIME fit to? “LC”= length composition only, “Index_LC”= index and length comps, “Catch_LC”= catch and length comps, and “Index_Catch_LC” = index, catch, and length comps.

If fitting to catch data, the user must specify:

  • C_type: default=0 (no catch data), 1 for catch in numbers, 2 for catch in biomass.

There are many additional settings that are listed on pages 14-15 of the vignettes. Key ones include:

  • LFdist: which distribution to fit to length frequency data? default=1 to use Dirichlet-multinomial and estimate additional parameter related to effective sample size, multinomial = 0

  • est_F_ft: which F parameters to estimate? default=TRUE to estimate all. Otherwise, a matrix with fleets as rows and years along columns, with a 1 where the F parameter should be estimated and a 0 where the F parameter should be fixed at the starting value.

  • est_selex_f: estimate selectivity parameters? default=TRUE to estimate selectivity parameters for all fleets. Turn off selectivity estimation for all or a single fleet with FALSE (e.g. estimate selectivity parameters for fleet 1 but not fleet 2 with c(TRUE, FALSE))

The vignettes provide examples for data-rich, length-data-only, and length- and catch-data situations.

In our case, we are fitting to length and catch data.

This is the scenario most similar to the ideas behind catch-curve stock reduction analysis (CCSRA). CCSRA used catch and age composition data to estimate stock status. LIME can use catch and length composition, which is much easier to collect than ages. With accurate assumptions about life history information, the combination of catch and length data can be very informative of stock status.

The fitting process can take a little time (around 1-2 minutes, perhaps). Our test data took a little longer.

catch_lc <- run_LIME(modpath = NULL, input = inputs_all, data_avail = "Catch_LC", 
    C_type = 2)
## #########################
## The model is likely not converged
## #########################
## #########################
## The model is likely not converged
## #########################
## #########################
## The model is likely not converged
## #########################
## #########################
## The model is likely not converged
## #########################
## #########################
## The model is likely not converged
## #########################
## #########################
## The model is likely not converged
## #########################

We now check the convergence:

gradient <- catch_lc$opt$max_gradient <= 0.001
hessian <- catch_lc$Sdreport$pdHess
hessian == TRUE & gradient == TRUE
## [1] FALSE

Our test data returned “FALSE” indicating that convergence was not achieved. That is, the program did not find the global minimum of the negative log likelihood surface and can move on to examine results.

Given the issues with convergence, let’s now try just fitting to the length data only.

This is the scenario for which LIME was developed: integrating multiple years of length data to account for time-varying fishing mortality and recruitment. The big change from the data-rich case in the code is to make sure data_avail="LC" and C_type is set to zero or removed from the function call (because the default is zero). We can still use the inputs_all list that includes the index and catch data, but as long as data_avail="LC" the model will not include the other data types.

lc_only <- run_LIME(modpath = NULL, input = inputs_all, data_avail = "LC")
## Warning: 'TMBhelper::Optimize' is deprecated.
## Use 'fit_tmb' instead.
## See help("Deprecated")
## #########################
## The model is likely not converged
## #########################
## Warning: 'TMBhelper::Optimize' is deprecated.
## Use 'fit_tmb' instead.
## See help("Deprecated")
## #########################
## The model is likely not converged
## #########################
## Warning: 'TMBhelper::Optimize' is deprecated.
## Use 'fit_tmb' instead.
## See help("Deprecated")
## #########################
## The model is likely not converged
## #########################
## Warning: 'TMBhelper::Optimize' is deprecated.
## Use 'fit_tmb' instead.
## See help("Deprecated")
## #########################
## The model is likely not converged
## #########################
## Warning: 'TMBhelper::Optimize' is deprecated.
## Use 'fit_tmb' instead.
## See help("Deprecated")
## #########################
## The model is likely not converged
## #########################
## Warning: 'TMBhelper::Optimize' is deprecated.
## Use 'fit_tmb' instead.
## See help("Deprecated")
## #########################
## The model is likely not converged
## #########################
## Time difference of 4.020208 mins

Note the time difference is shorter than the data-rich case.

Check convergence

gradient <- lc_only$opt$max_gradient <= 0.001
hessian <- lc_only$Sdreport$pdHess
hessian == TRUE & gradient == TRUE
## [1] FALSE

The model still has not converged. That is, the program did not find the global minimum of the negative log likelihood surface.

So, we need to consider adjustments to the fitting process, as follows.

Turning off parameter estimation (if issues with non-convergence)

(This is taken directly from the vignettes)

There may be non-convergence issues (apparent from either a high final gradient or non-positive definite Hessian matrix).

Neither model using our test-data examples (length-frequency only, or length-frequency with catch data) converged.

Let us consider the case where we used both the length-frequency and catch test data.

Some first checks in these cases are to

  1. see if the Dirichlet-multinomial theta parameter is estimated too high, in which case it can be fixed to a high value and not estimated:
inputs_all$theta <- 50
check1 <- run_LIME(modpath = NULL, input = inputs_all, data_avail = "Catch_LC", 
    fix_more = "log_theta", C_type = 2)

Check convergence

gradient <- check1$opt$max_gradient <= 0.001
hessian <- check1$Sdreport$pdHess
hessian == TRUE & gradient == TRUE
## [1] TRUE

We can see that this now resolves the convergence issues experienced using our test data set.

That is, because both the hessian and gradient checks passed, we assume that the program found the global minimum of the negative log likelihood surface and can move on to examine results.

All the results can be viewed using

check1$Report

where check1 is simply the name we assigned the run_LIME command.

For a summary of all the output contained within $Report, simply enter

names(check1$Report)

Plot results

The LIME package includes functions to plot length composition model fits and other results.

LIME model fits including catch data with length composition data:

plot_LCfits(LF_df = LF_df, Inputs = check1$Inputs, Report = check1$Report)
Fits to length composition data for the LIME model run with catch and length data.

Fits to length composition data for the LIME model run with catch and length data.

LIME estimates of key population parameters:

plot_output(Inputs = check1$Inputs, Report = check1$Report, Sdreport = check1$Sdreport, 
    lh = lh, True = NULL, plot = c("Fish", "Rec", "SPR", "ML", "SB", "Selex"), 
    set_ylim = list(Fish = c(0, 0.5), SPR = c(0, 1)))
LIME estimates of key population parameters (green) compared to the truth (black) with catch and length data.

LIME estimates of key population parameters (green) compared to the truth (black) with catch and length data.

The model captures the strong increase and then decline in fishing mortality, with corresponding declines in SPR and relative spawner biomass (belwo the target and limit reference points) before a recovery.

Turning off parameter estimation (if issues with non-convergence) (continued)

If issues with non-convergence persist, other options are (noting that the examples from here on use only the length-frequency data):

  1. check if the recruitment standard deviation is estimated too high or too low, in which case it can be fixed at a reasonable value:
inputs_all$SigmaR <- 0.3
check2 <- run_LIME(modpath = NULL, input = inputs_all, data_avail = "LC", fix_more = "log_sigma_R")

(Note: this was not the case for our test data set. Running LIME with this adjustment does not overcome our test data convergence issue).

  1. Some length datasets may not be informative enough to tease apart fishing mortality, recruitment deviations, and selectivity. In these cases it may be convenient or helpful to fix selectivity at the starting values:
check3 <- run_LIME(modpath = NULL, input = inputs_all, data_avail = "LC", est_selex_f = FALSE)
plot_output(Inputs = check3$Inputs, Report = check3$Report, Sdreport = check3$Sdreport, 
    lh = lh, True = true, plot = c("Fish", "Rec", "SPR", "ML", "SB", "Selex"), 
    set_ylim = list(Fish = c(0, 0.5), SPR = c(0, 1)))

(Note: this was not relevant to our test data set, whose length data were informative. Running LIME with this adjustment does not overcome our test data convergence issue).

It is also possible to fix selectivity in a shape that is not the 2-parameter logistic curve. For example, a dome-shaped selectivity can be set up using the 2-parameter logistic function on the left side of the fully selected lengths, and a normal distribution on the right side of the fully selected lengths.

This is further explaining in the vignettes.

Finally, LIME can handle data from multiple fleets where selectivities differ between them. This is further explained in the vignettes.

Interpreting results in a management context

LIME estimates annual fishing mortality rates, lengths at 50% and 95% selectivity to the fishing gear, and the Dirichlet-multinomial parameter theta as fixed effects. Thes can all be found via the output under NAME$Report, where NAME is the name given the run_LIME command.

LIME can estimate unbiased SPR when length data are available. This can be compared to the target SPR (in our case 40%) and the ratio SPR/SPRtarget gives whether overfishing is occurring or not.

LIME does not require catch data: if no information on the scale of population size is available, recruitment will be estimated relative to average levels for an unfished population. As measures of stock status, LIME derives SPR reference points, and F30% and F40% reference points (the fishing mortality rates that would result in SPR of 30% and 40%, respectively; Clark 2002). LIME derives MSY by finding the fishing mortality rate that results in the highest yield per recruit.

When total catch data are available (thereby providing information on scale of the population size), LIME estimates equilibrium recruitment, which scales the MSY based on a per-recruit equation to a scale appropriate for the population size.

Key references

LIME Methodology

Rudd, MB and Thorson, JT. 2017. Accounting for variable recruitment and fishing mortality in length-based stock assessments for data-limited fisheries. Canadian Journal of Fisheries and Aquatic Sciences https://doi.org/10.1139/cjfas-2017-0143.

LIME Application

Other

Clark, W.G.W. 2002. F35% revisited ten years later. North American Journal of Fisheries Management 22: 251 -257.