# LIME: Length-based Integrated Mixed Effects

## Package Description

Developed by Rudd and Thorson (2017), Length-based Integrated Mixed Effects (LIME) is a flexible stock assessment method for fisheries with developing data collection programs and/or limited capacity for monitoring. For these fisheries, it is often easier to collect length measurements from fishery catch than to quantify total catch.

Conventional stock assessment tools that rely on length measurements without total catch do not directly account for variable fishing mortality and recruitment over time. However, this equilibrium assumption is likely violated in almost every fishery, degrading estimation performance. LIME is an extension of length-only approaches that accounts for time-varying recruitment and fishing mortality.

The LIME model uses length-compostion data and biological information to estimate stock status. Key attributes of LIME include:

• Accounting for time-varying fishing mortality and recruitment

• Requirement of at least 1 year of catch length-composition data, and assumptions about growth, natural mortality, and maturity

• Estimation of annual fishing mortality, lengths-at-50%- and -95%-selectivity, and recruitment variation

• Estimation of random effects for time-varying recruitment

• Fitting to multiple years of length-composition data and/or catch and/or an abundance index time series, if available.

• Estimation of spawning potential ratio (SPR) reference points (and MSY-based reference points if there is information on scale, e.g. catch data)

LIME requires at least one year of length-composition data and assumptions about biological parameters. It can also fit to more years of length-composition data, a time series of catch, and/or an abundance index. The mixed effects structure accounts for time-variation in fishing mortality and recruitment, making it more effective than conventional stock assessment tools that rely only on length measurements.

This package is run from R as found on CRAN. R can be installed from https://cran.r-project.org/.

The recommended way to run R is using RStudio. After you have installed R, then install RStudio. It can be found at https://www.rstudio.com/products/rstudio/download/.

# Installation

The website https://github.com/merrillrudd/LIME/wiki/2---Introduction-and-installation provides instructions on package installation.

Let us set up checking/installing packages functions

# Ordinary checking/installing function
Check_Install.packages <- function(pkg){
new.pkg <- pkg[!(pkg %in% installed.packages()[, "Package"])]
if (length(new.pkg))
install.packages(new.pkg,dependencies = TRUE) else print(paste0("'",pkg,"' has been installed already!"))
sapply(pkg, require, character.only = TRUE)
}

# github checking/installing function
Github_checkInstall.packages <- function(github_pkg){
pkg <- tail(unlist(strsplit(github_pkg,"/")),1)
new.pkg <- pkg[!(pkg %in% installed.packages()[, "Package"])]
if (length(new.pkg))
devtools::install_github(github_pkg, build_vignettes = TRUE)
sapply(pkg, require, character.only = TRUE)
}

Installing packages from GitHub directly. NB Update all your packages before installing

Check and install “devtools”

Check_Install.packages("devtools")

Check and install “TMBhelper” and “LIME” from github

# Check if existed and if not, install the required package TMBhelper:
Github_checkInstall.packages(github_pkg=github_pkg)
## Loading required package: TMBhelper
## TMBhelper
##      TRUE
# Check if existed and if not, install the required package LIME:
github_pkg = c("merrillrudd/LIME")
Github_checkInstall.packages(github_pkg=github_pkg)
## Loading required package: LIME
## Warning in checkMatrixPackageVersion(): Package version inconsistency detected.
## TMB was built with Matrix version 1.2.18
## Current Matrix version is 1.3.2
## Please re-install 'TMB' from source using install.packages('TMB', type = 'source') or ask CRAN for a binary version of 'TMB' matching CRAN's 'Matrix' package
## LIME
## TRUE
#devtools::install_github("merrillrudd/LIME",dependencies=TRUE)   

Load the LIME and tidyverse packages. You may have to install the tidyverse package from “Tools -> Install Pacakages”:

library(LIME)
library(tidyverse)
## Warning: package 'tidyverse' was built under R version 4.0.3
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.3     v purrr   0.3.4
## v tibble  3.0.6     v dplyr   1.0.4
## v tidyr   1.1.2     v stringr 1.4.0
## v readr   1.4.0     v forcats 0.5.1
## Warning: package 'ggplot2' was built under R version 4.0.3
## Warning: package 'tibble' was built under R version 4.0.3
## Warning: package 'readr' was built under R version 4.0.3
## Warning: package 'dplyr' was built under R version 4.0.3
## Warning: package 'forcats' was built under R version 4.0.3
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::lag()    masks stats::lag()
library(formatR)

NB Once you have installed LIME, you don’t need to run the install.packages code chunk again.

At https://github.com/merrillrudd/LIME/wiki/2---Introduction-and-installation, it is stated that "the LIME package can be used in two ways: 1) simulating the expected age-structured population dynamics, length composition, catch data, and an abundance index using LIME and 2) fitting to empirical length data (at a minimum), plus any available catch and/or abundance index data, to provide an estimate of the spawning potential ratio (SPR).

We focus here on the second application.

LIME has been developed for data-limited fisheries, where few data are available other than a representative sample of the size structure of the vulnerable portion of the population (i.e., the catch) and an understanding of the life history of the species. LIME relaxes the equilibrium assumptions of other length-based methods by estimating annual fishing mortality and recruitment variation (among other parameters), modelling annual recruitment as a random effect.

See Rudd and Thorson (2017) in the reference list for full details of the model, including simulation testing to evaluate performance across life history types, population variability scenarios, and data availability scenarios, as well as violations of the model assumptions.

The description below is to fit our empirical length-compostion data using LIME. Although we describe the steps below, there are good descriptions in

https://github.com/merrillrudd/LIME/blob/master/vignettes/LIME.pdf

The latter is the full user manual, on which we are basing our description.

### Specifying the biological inputs and starting values:

The user is required to input the following parameters to create_lh_list for simulation and estimation:

Minimum inputs: Biology * linf : von Bertalanffy asymptotic length vbk : von Bertalanffy growth coefficient M : Annual natural mortality rate lwa : Length-weight scaling parameter lwb : Length-weight allometric parameter M50 : Length- or age-at-50% maturity maturity_input *: Whether M50 input is in “length” or “age”

Minimum inputs: Exploitation * S50 : Length- or age-at-50%-selectivity S95 : Length- or age-at-95%-selectivity selex_input *: Whether S50 and S95 are in “length” or “age”

The S50 and S95 values are used as starting values when estimating the length- or age-at-50%- and 95%-selectivity.

There are a range of other biological, exploitation and variation inputs that can be included; see pages 3-5 of https://github.com/merrillrudd/LIME/blob/master/vignettes/LIME.pdf.

For our worked example, we will specify the minimum inputs, using female growth rates and the default values for non-minimum inputs.

Set up inputs and starting valus:

lh <- create_lh_list(vbk = 0.15, linf = 81.44505, t0 = -1.87856856036483, lwa = 0.00000244, lwb = 3.34694,
S50 = c(3.96802434600842), S95 = c(13.141251755892), selex_input = "age", selex_type = c("logistic"),
M50 = 45, maturity_input = "length", M = 0.20, binwidth = 1, CVlen = 0.1,
SigmaR = 0.737, SigmaF = 0.3, SigmaC = 0.1, SigmaI = 0.1, R0 = 1, Frate = 0.1,
Fequil = 0.25, qcoef = 1e-05, start_ages = 0, rho = 0.43, theta = 10, nseasons = 1,
nfleets = 1)

As per the LIME.pdf vignettes:

“Now we should check out the biological parameters and selectivity we’ve created. Note that even if maturity and selectivity are input by length, create_lh_list converts to age and outputs both selectivity-at-age by fleet (lh$$S_fa</em>) and selectivity-at-length by fleet (<em>lh$$S_fl)”.

plot biological and selectivity inputs:

par(mfrow=c(2,2), mar=c(4,4,3,1))
plot(lh$L_a, type="l", lwd=4, xlab="Age", ylab="Length (cm)") plot(lh$W_a, type="l", lwd=4, xlab="Age", ylab="Weight (g)")
plot(lh$Mat_l, type="l", lwd=4, xlab="Length (cm)", ylab="Proportion mature") # plot selectivity for the first (and only) fleet (first row) plot(lh$S_fl[1,], type="l", lwd=4, xlab="Length (cm)", ylab="Proportion selected to gear")

The LIME.pdf vignettes next recommends checking the time step regarding predicted growth:

"Particularly for short-lived fish, it is possible that an annual time step is too coarse of a time scale to capture individual growth between years. For example, if a fish grows rapidly between ages 1 and 2, it is possible the probability of being a length given age will result in a negligible probability of being certain lengths. However, it is likely those lengths will be observed, and in this case LIME will not be able to fit the data well.

"To check for this issue, we recommend plotting the probability of being in a length bin given age to make sure there is greater than negligible probability of observing all lengths.

“For tips on what to do when there is a negligible probability of observing some lengths, see the”Format data" section in the vignettes".

plot probability of being in a length bin given age:

plba <- with(lh, age_length(highs, lows, L_a, CVlen)) # create the matrix of probability of being in a length bin given age
ramp <- colorRamp(c("purple4", "darkorange"))
col_vec <- rgb( ramp(seq(0, 1, length = nrow(plba))), max = 255)
par(mfrow=c(1,1))
matplot(t(plba[-1,]), type="l", lty=1, lwd=3, col=col_vec, xaxs="i", yaxs="i", ylim=c(0, 0.5), xlab="Length bin (cm)", ylab="Density")
legend("topright", legend=lh$ages[seq(2,length(lh$ages),by=3)], col=col_vec[seq(2,length(lh$ages),by=3)],, lwd=3, title="Age") ### Data inputs to LIME LIME contains functions to simulate a population and generate data. These are detailed in the vignettes. Here, we are using our own test dataset. This needs to be appropriately formatted for LIME. #### Length composition data Length-composition data can be formatted as a matrix, or, where there are multiple fleets, as an array or list. As our test data has only one fleet, we format the length-composition data as a matrix, with the years as the rows and length bins as the columns. Let us now read in these length-composition data. The rows of the length data matrix must be labeled with the years (either 1971, 1972, etc. or 1, 2, etc.) and the columns must be labeled with the midpoints of the length bins. We include code to format the matrix in this manner. In our case, we have multiple years of length-composition data with headers. These data have the first column as the year (n=31), the next 3 columns as the fleet, sex and Stage1_wght, and then the next 17 columns as 17 length bins. We will need to remove the first 4 columns. We will do this by reading in our data as “temp”, and reformatting this before assigning it to the required “Len2”. NB We are assuming that you have saved our test data in the same directory as the R markdown code and that you have opened a new R session. Otherwise you can change our code to point to the directory where the data file is and call it your working directory using the setwd command. Create length data: temp <- read.csv("LengthData.csv",header=T) ### assign the first column to be years, and columns 5:21 as the length-frequency data years <- temp[,1] LenFreq <- as.matrix(temp[,5:21]) rownames(LenFreq) <- years ## adjust with the true midpoints mids <- seq(10, by = 5, length.out = ncol(LenFreq)) colnames(LenFreq) <- mids #### Catch index (NB LIME can also use abundance data, if available - see vignettes) Our test data also includes a time series of catch, which LIME can use. These data should be in matrix form, with fleets as the the rows and years as the columns. With one fleet, the time series should still be in matrix form rather than a vector (the model will look for the first row, for the one and only fleet). Note that the length-frequency example data are from 1971-2001. The number of years of catch data should match the total number of years modelled, so we will need only the catch data from 1971-2001. (That is, for our example data, we only need rows 12-42). We have observed catch-by-age data; we will read this in and sum over the ages. Note that the catch data columns must be labelled with the years. Years must match up across the catch and length-frequency data sets. If there are missing years in the length data, all length bins can be filled with zeros (they are not registered as true zeros, but a row summing to 0 is a flag not to fit to that row). If there are missing years in the catch data, a negative value can be inserted for that year. We again assume you have saved the data to the same directory as your working directory. Create catch data: temp <- read.csv("TotalCatchAge.csv",header=T) Catch <- as.matrix(t(apply(temp[12:42,],1,sum))) ##MR edit colnames(Catch) <- years  #### Data input Observations LIME requires inputing observed data in a list form, including: . years: inclusive years to be modeled . LF: length-frequency data in matrix, array, or list form . neff_ft: effective sample size to use in case of the multinomial distribution (otherwise use Dirichlet multinomial to estimate effective sample size, or assume nominal sample size = effective sample size if not included) . I_ft: Index of abundance by fleet and time (if applicable) . C_ft: Catch by fleet and time (if applicable) in numbers or biomass (to be specified later) We don’t have sample sizes for our test data set, but this is okay - under default run_LIME arguments, the effective sample size is estimated using the Dirichlet-multinomial likelihood so you don’t need to specify them. The input sample sizes are only used with the multinomial likelihood. Formatting the model input data type: data_all <- list(years = years, LF = LenFreq, C_ft = Catch) Life history and starting values LIME includes a function create_inputs that checks the length-frequency data and includes all input data, life history information, and starting values in a list together to input into LIME. create_inputs requires: • lh: life history and starting values list output from create_lh_inputs • input_data: input data list described in Data input section above. inputs_all <- create_inputs(lh = lh, input_data = data_all) ### Fitting the model to our data Per the vignettes: LIME is set up to estimate certain parameters and fix others by default. LIME estimates by default: • log_F_ft: matrix of fishing mortality estimates in log-space by fleet (rows) over time (columns) • log_sigma_R: recruitment standard deviation in log-space • log_S50_f: length-at-50%-selectivity in log space, one for each fleet • log_Sdelta_f: difference between the length-at-95% selectivity and the length-at-50%-selectivity in log-space, so that the length-at-95%-selectivity can never be less than the length-at-50%- selectivity, one for each fleet • Nu_input: temporal variation in recruitment, treated as random effect Parameters estimated by default under certain conditions: • log_theta: Dirichlet-multinomial parameter relating to effective sample size (only estimated if using Dirichlet-multinomial to fit to length composition data, where run_LIME argument LFdist=1) • beta: equilibrium recruitment in log-space, serves as population scaling parameter (only estimated if fitting to catch data, run_LIME argument data_avail must include “Catch”) • log_q_f: catchability coefficient in log-space, one for each fleet (only estimated if fitting to an index, run_LIME argument data_avail must include “Index”) Other parameters that could be estimated but are fixed by default: • log_sigma_C: observation error on catch, fixed at log(0.2) • log_sigma_I: observation error on abundance index, fixed at log(0.2) • log_CV_L: coefficient of variation on predicted age-length curve, fixed at log(0.1) The function run_LIME used to run LIME models has many settings, but hopefully most models will keep the defaults and use only a few arguments. Basic inputs to run_LIME: • modpath: model path to save results and flags; default=NULL to save in R environment locally • input: list output from create_inputs • data_avail: what data types will LIME fit to? “LC”= length composition only, “Index_LC”= index and length comps, “Catch_LC”= catch and length comps, and “Index_Catch_LC” = index, catch, and length comps. If fitting to catch data, the user must specify: • C_type: default=0 (no catch data), 1 for catch in numbers, 2 for catch in biomass. There are many additional settings that are listed on pages 14-15 of the vignettes. Key ones include: • LFdist: which distribution to fit to length frequency data? default=1 to use Dirichlet-multinomial and estimate additional parameter related to effective sample size, multinomial = 0 • est_F_ft: which F parameters to estimate? default=TRUE to estimate all. Otherwise, a matrix with fleets as rows and years along columns, with a 1 where the F parameter should be estimated and a 0 where the F parameter should be fixed at the starting value. • est_selex_f: estimate selectivity parameters? default=TRUE to estimate selectivity parameters for all fleets. Turn off selectivity estimation for all or a single fleet with FALSE (e.g. estimate selectivity parameters for fleet 1 but not fleet 2 with c(TRUE, FALSE)) The vignettes provide examples for data-rich, length-data-only, and length- and catch-data situations. In our case, we are fitting to length and catch data. This is the scenario most similar to the ideas behind catch-curve stock reduction analysis (CCSRA). CCSRA uses catch and age-composition data to estimate stock status. LIME can use catch and length-composition data, which are much easier to collect than ages. With accurate assumptions about life history information, the combination of catch and length data can be very informative of stock status. The fitting process can take a little time (around 1-2 minutes, perhaps). Our test data took a little longer. catch_lc <- run_LIME(modpath = NULL, input = inputs_all, data_avail = "Catch_LC", C_type = 2) ## Note that getReportCovariance=FALSE causes an error in TMB::sdreport when no ADREPORTed variables are present ## ######################### ## The model is likely not converged ## ######################### ## Note that getReportCovariance=FALSE causes an error in TMB::sdreport when no ADREPORTed variables are present ## ######################### ## The model is likely not converged ## ######################### ## Note that getReportCovariance=FALSE causes an error in TMB::sdreport when no ADREPORTed variables are present ## ######################### ## The model is likely not converged ## ######################### ## Note that getReportCovariance=FALSE causes an error in TMB::sdreport when no ADREPORTed variables are present ## Note that getReportCovariance=FALSE causes an error in TMB::sdreport when no ADREPORTed variables are present ## ######################### ## The model is likely not converged ## ######################### ## Note that getReportCovariance=FALSE causes an error in TMB::sdreport when no ADREPORTed variables are present ## ######################### ## The model is likely not converged ## ######################### #### We now check the convergence: gradient <- catch_lc$opt$max_gradient <= 0.001 hessian <- catch_lc$Sdreport$pdHess hessian == TRUE & gradient == TRUE ## [1] FALSE Our test data returned “FALSE” indicating that convergence was not achieved. That is, the program did not find the global minimum of the negative log likelihood surface and can move on to examine results. Given the issues with convergence, let’s now try just fitting to the length data only. This is the scenario for which LIME was developed: integrating multiple years of length data to account for time-varying fishing mortality and recruitment. The big change from the data-rich case in the code is to make sure data_avail="LC" and C_type is set to zero or removed from the function call (because the default is zero). We can still use the inputs_all list that includes the index and catch data, but as long as data_avail="LC" the model will not include the other data types. lc_only <- run_LIME(modpath = NULL, input = inputs_all, data_avail = "LC") ## Note that getReportCovariance=FALSE causes an error in TMB::sdreport when no ADREPORTed variables are present ## ######################### ## The model is likely not converged ## ######################### ## Note that getReportCovariance=FALSE causes an error in TMB::sdreport when no ADREPORTed variables are present ## ######################### ## The model is likely not converged ## ######################### ## Note that getReportCovariance=FALSE causes an error in TMB::sdreport when no ADREPORTed variables are present ## ######################### ## The model is likely not converged ## ######################### ## Note that getReportCovariance=FALSE causes an error in TMB::sdreport when no ADREPORTed variables are present ## ######################### ## The model is likely not converged ## ######################### ## Note that getReportCovariance=FALSE causes an error in TMB::sdreport when no ADREPORTed variables are present ## ######################### ## The model is likely not converged ## ######################### ## Note that getReportCovariance=FALSE causes an error in TMB::sdreport when no ADREPORTed variables are present ## ######################### ## The model is likely not converged ## ######################### ## Time difference of 2.906552 mins Note the time difference is shorter than the data-rich case. #### Check convergence gradient <- lc_only$opt$max_gradient <= 0.001 hessian <- lc_only$Sdreport$pdHess hessian == TRUE & gradient == TRUE ## [1] FALSE The model still has not converged. That is, the program did not find the global minimum of the negative log likelihood surface. So, we need to consider adjustments to the fitting process, as follows. ### Turning off parameter estimation (if issues with non-convergence) (This is taken directly from the vignettes) There may be non-convergence issues (apparent from either a high final gradient or non-positive definite Hessian matrix). Neither model using our test-data examples (length-frequency only, or length-frequency with catch data) converged. Let us consider the case where we used both the length-frequency and catch test data. Some first checks in these cases are to 1. see if the Dirichlet-multinomial theta parameter is estimated too high, in which case it can be fixed to a high value and not estimated: inputs_all$theta <- 50
check1 <- run_LIME(modpath = NULL, input = inputs_all, data_avail = "Catch_LC", fix_more = "log_theta",
C_type = 2)
## Note that getReportCovariance=FALSE causes an error in TMB::sdreport when no ADREPORTed variables are present

#### Check convergence

gradient <- check1$opt$max_gradient <= 0.001
hessian <- check1$Sdreport$pdHess
hessian == TRUE & gradient == TRUE
## [1] TRUE

We can see that this now resolves the convergence issues experienced using our test data set.

That is, because both the hessian and gradient checks passed, we assume that the program found the global minimum of the negative log likelihood surface and can move on to examine results.

All the results can be viewed using

check1$Report where check1 is the name we assigned the run_LIME command. For a summary of all the output contained within$Report, simply enter

names(check1$Report) ### Plot results The LIME package includes functions to plot fits of the model to the length-composition data and other results. LIME model fits including catch data with length-composition data: plot_LCfits(LF_df = LF_df, Inputs = check1$Inputs, Report = check1$Report) LIME estimates of key population parameters: plot_output(Inputs = check1$Inputs, Report = check1$Report, Sdreport = check1$Sdreport,
lh = lh, True = NULL, plot = c("Fish", "Rec", "SPR", "ML", "SB", "Selex"), set_ylim = list(Fish = c(0,
0.5), SPR = c(0, 1)))

The model captures the strong increase and then decline in fishing mortality, with corresponding declines in SPR and relative spawner biomass (belwo the target and limit reference points) before a recovery.

### Turning off parameter estimation (if issues with non-convergence) (continued)

If issues with non-convergence persist, other options are (noting that the examples from here on use only the length-frequency data):

1. check if the recruitment standard deviation is estimated too high or too low, in which case it can be fixed at a reasonable value:
inputs_all$SigmaR <- 0.3 check2 <- run_LIME(modpath = NULL, input = inputs_all, data_avail = "LC", fix_more = "log_sigma_R") (Note: this was not the case for our test data set. Running LIME with this adjustment did not overcome our test data convergence issue). 1. Some length datasets may not be informative enough to tease apart fishing mortality, recruitment deviations, and selectivity. In these cases it may be convenient or helpful to fix selectivity at the starting values: check3 <- run_LIME(modpath = NULL, input = inputs_all, data_avail = "LC", est_selex_f = FALSE) plot_output(Inputs = check3$Inputs, Report = check3$Report, Sdreport = check3$Sdreport,
lh = lh, True = true, plot = c("Fish", "Rec", "SPR", "ML", "SB", "Selex"), set_ylim = list(Fish = c(0,
0.5), SPR = c(0, 1)))

(Note: this was not relevant to our test data set, whose length data were informative. Running LIME with this adjustment does not overcome our test data convergence issue).

It is also possible to fix selectivity in a shape that is not the 2-parameter logistic curve. For example, a dome-shaped selectivity can be set up using the 2-parameter logistic function on the left side of the fully selected lengths, and a normal distribution on the right side of the fully selected lengths.

This is further explaining in the vignettes.

Finally, LIME can handle data from multiple fleets where selectivities differ between them. This is further explained in the vignettes.

### Interpreting results in a management context

LIME estimates annual fishing mortality rates, lengths at 50%- and 95%-selectivity to the fishing gear, and the Dirichlet-multinomial parameter theta as fixed effects. Thes can all be found via the output under NAME\$Report, where NAME is the name given the run_LIME command.

LIME can estimate unbiased SPR when length data are available. This can be compared to the target SPR (in our case 40%) and the ratio SPR/SPRtarget gives whether overfishing is occurring or not.

LIME does not require catch data: if no information on the scale of population size is available, recruitment will be estimated relative to average levels for an unfished population. As measures of stock status, LIME derives SPR reference points, and F30% and F40% reference points (the fishing mortality rates that would result in SPR of 30% and 40%, respectively; Clark 2002). LIME derives MSY by finding the fishing mortality rate that results in the highest yield-per-recruit.

When total catch data are available (thereby providing information on scale of the population size), LIME estimates equilibrium recruitment, which scales the MSY based on a per-recruit equation to a scale appropriate for the population size.

## Key references

### LIME Methodology

Rudd, M.B. and J.T. Thorson. 2018. Accounting for variable recruitment and fishing mortality in length-based stock assessments for data-limited fisheries. Canadian Journal of Fisheries and Aquatic Sciences.75(7):1019-1035. https://doi.org/10.1139/cjfas-2017-0143

### Other

Clark, W.Clark, W.G. 2002. F35% revised ten years later. em>North American Journal Fisheries Management. https://doi.org/10.1577/1548-8675(2002)022<0251:FRTYL>2.0.CO;2.