NB Once you have installed LIME, you don't need to run the install.packages code chunk again.

At https://github.com/merrillrudd/LIME/wiki/2---Introduction-and-installation, it is stated that "the LIME package can be used in two ways: 1) simulating the expected age-structured population dynamics, length composition, catch data, and an abundance index using LIME and 2) fitting to empirical length data (at a minimum), plus any available catch and/or abundance index data, to provide an estimate of the spawning potential ratio (SPR). We focus here on the second application. LIME has been developed for data-limited fisheries, where few data are available other than a representative sample of the size structure of the vulnerable portion of the population (i.e., the catch) and an understanding of the life history of the species. LIME relaxes the equilibrium assumptions of other length-based methods by estimating annual fishing mortality and recruitment variation (among other parameters), deriving annual recruitment as a random effect. See Rudd and Thorson (2017) in the reference list for full details of the model, including simulation testing to evaluate performance across life history types, population variability scenarios, and data availability scenarios, as well as violations of the model assumptions." The description below is to fit our own empirical length data using LIME. Although we describe the steps below, there are good descriptions in https://github.com/merrillrudd/LIME/wiki, and https://github.com/merrillrudd/LIME/blob/master/vignettes/LIME.pdf The latter is the full user manual, on which we are basing our description. ### Specifying the biological inputs and starting values: The user is required to input the following parameters to create_lh_list for simulation and estimation: Minimum inputs: Biology * **linf** *: von Bertalanffy asymptotic length * **vbk** *: von Bertalanffy growth coefficient * **M** *: Annual natural mortality rate * **lwa** *: Length-weight scaling parameter * **lwb** *: Length-weight allometric parameter * **M50** *: Length or age at 50% maturity * **maturity_input** *: Whether M50 input is in "length" or "age" Minimum inputs: Exploitation * **S50** *: Length or age at 50% selectivity * **S95** *: Length or age at 95% selectivity * **selex_input** *: Whether S50 and S95 are in "length" or "age" The S50 and S95 values are used as starting values when estimating the length or age at 50% and 95% selectivity There are a range of other biological, exploitation and variation inputs that can be included; see pages 3-5 of https://github.com/merrillrudd/LIME/blob/master/vignettes/LIME.pdf. For our worked example, we will specify the minimum inputs, using female growth rates and the default values for non-minimum inputs. ``` {r read in inputs and starting values, eval=TRUE} lh <- create_lh_list(vbk = 0.15, linf = 81.44505, t0 = -1.87856856036483, lwa = 0.00000244, lwb = 3.34694, S50 = c(3.96802434600842), S95 = c(13.141251755892), selex_input = "age", selex_type = c("logistic"), M50 = 45, maturity_input = "length", M = 0.20, binwidth = 1, CVlen = 0.1, SigmaR = 0.737, SigmaF = 0.3, SigmaC = 0.1, SigmaI = 0.1, R0 = 1, Frate = 0.1, Fequil = 0.25, qcoef = 1e-05, start_ages = 0, rho = 0.43, theta = 10, nseasons = 1, nfleets = 1) ``` As per the LIME.pdf vignettes: "Now we should check out the biological parameters and selectivity we've created. Note that even if maturity and selectivity are input by length, create_lh_list converts to age and outputs both selectivity-at-age by fleet (lh$S_fa) and selectivity-at-length by fleet (lh$S_fl)". ``` {r plot biological and selectivity inputs, eval=TRUE} par(mfrow=c(2,2), mar=c(4,4,3,1)) plot(lh$L_a, type="l", lwd=4, xlab="Age", ylab="Length (cm)") plot(lh$W_a, type="l", lwd=4, xlab="Age", ylab="Weight (g)") plot(lh$Mat_l, type="l", lwd=4, xlab="Length (cm)", ylab="Proportion mature") # plot selectivity for the first (and only) fleet (first row) plot(lh$S_fl[1,], type="l", lwd=4, xlab="Length (cm)", ylab="Proportion selected to gear") ``` The LIME.pdf vignettes next recommend checking the time step regarding predicted growth: "Particularly for short-lived fish, it is possible that an annual time step is too coarse of a time scale to capture individual growth between years. For example, if a fish grows rapidly between ages 1 and 2, it is possible the probability of being a length given age will result in a negligible probability of being certain lengths.However, it is likely those lengths will be observed, and in this case LIME will not be able to fit the data well. "To check for this issue, we recommend plotting the probability of being in a length bin given age to make sure there is greater than negligible probability of observing all lengths. "For tips on what to do when there is a negligible probability of observing some lengths, see the "Format data" section in the vignettes". ``` {r plot probability of being in a length bin given age, eval=TRUE} plba <- with(lh, age_length(highs, lows, L_a, CVlen)) # create the matrix of probability of being in a length bin given age ramp <- colorRamp(c("purple4", "darkorange")) col_vec <- rgb( ramp(seq(0, 1, length = nrow(plba))), max = 255) par(mfrow=c(1,1)) matplot(t(plba[-1,]), type="l", lty=1, lwd=3, col=col_vec, xaxs="i", yaxs="i", ylim=c(0, 0.5), xlab="Length bin (cm)", ylab="Density") legend("topright", legend=lh$ages[seq(2,length(lh$ages),by=3)], col=col_vec[seq(2,length(lh$ages),by=3)],, lwd=3, title="Age") ``` ### Data inputs to LIME LIME contains functions to simulate a population and generate data. These are detailed in the vignettes. Here, we are using our own test dataset. This needs to be appropriately formatted for LIME. #### Length composition data Length-composition data can be formatted as a matrix, or, where there are multiple fleets, as an array or list. As our test data has only one fleet, we format the length-composition data as a matrix, with the years along the rows and length bins along the columns. Let us now read in these length-composition data. The rows of the length data matrix must be labeled with the years (either 1971, 1972, etc. or 1, 2, etc.) and the columns must be labeled with the midpoints of the length bins. We include code to format the matrix in this manner. In our case, we have multiple years of length frequency data with headers. This data have the first column as the year (n=31), the next 3 columns as the fleet, sex and Stage1_wght, and then the next 17 columns as 17 length bins. We will need to remove the first 4 columns. We will do this by reading in our data as "temp", and reformatting this before assigning it to the required "Len2".NB We are assuming that you have saved our test data in the same directory as the R markdown code and that you have opened a new R session. Otherwise you can change our code to point to the directory where the data file is and call it your working directory using the setwd command.

```{r Create_LengthData, echo=TRUE} temp <- read.csv("LengthData.csv",header=T) ### assign the first column to be years, and columns 5:21 as the length-frequency data years <- temp[,1] LenFreq <- as.matrix(temp[,5:21]) rownames(LenFreq) <- years ## adjust with the true midpoints mids <- seq(10, by = 5, length.out = ncol(LenFreq)) colnames(LenFreq) <- mids ``` #### Catch index (NB LIME can also use abundance data, if available - see vignettes) Our test data also includes a time series of catch, which LIME can use. This should be in matrix form, with fleets along the the rows and years along the columns. With one fleet, the time series should still be in matrix form rather than a vector (the model will look for the first row, for the one and only fleet). Note that the length-frequency example data are from 1971-2001. The number of years of catch data should match the total number of years modelled, so we will need only the catch data from 1971-2001. (That is, for our example data, we only need rows 12-42). We have observed catch-by-age data; we will read this in and sum over the ages. Note that the catch data columns must be labelled with the years. Years must match up across the catch and length-frequency data sets. If there are missing years in the length data, all length bins can be filled with zeros (they are not registered as true zeros, but a row summing to 0 is a flag not to fit to that row). If there are missing years in the catch data, a negative value can be inserted for that year. We again assume you have saved the data to the same directory as your working directory. ```{r Create_CatchData, echo=TRUE} temp <- read.csv("TotalCatchAge.csv",header=T) Catch <- as.matrix(t(apply(temp[12:42,],1,sum))) ##MR edit colnames(Catch) <- years ``` #### Data input Observations LIME requires inputing observed data in a list form, including: . years: inclusive years to be modeled . LF: length frequency data in matrix, array, or list form . neff_ft: effective sample size to use in case of multinomial distribution (otherwise use Dirichlet multinomial to estimate effective sample size, or assume nominal sample size = effective sample size if not included) . I_ft: Index of abundance by fleet and time (if applicable) . C_ft: Catch by fleet and time (if applicable) in numbers or biomass (to be specified later) We don't have sample sizes for our test data set, but this is okay - under default run_LIME arguments, the effective sample size is estimated using the Dirichlet-multinomial likelihood so you don't need to specify them. The input sample sizes are only used with the multinomial likelihood. ```{r DataInput, echo=TRUE} data_all <- list(years = years, LF = LenFreq, C_ft = Catch) ``` Life history and starting values LIME includes a function create_inputs which checks the length frequency data and includes all input data, life history information, and starting values in a list together to input into LIME. create_inputs requires: - lh: life history and starting values list output from create_lh_inputs - input_data: input data list described in Data input section above. ```{r Create Inputs, echo=TRUE} inputs_all <- create_inputs(lh = lh, input_data = data_all) ``` ### Fitting the model to our data Per the vignettes: LIME is set up to estimate certain parameters and fix others by default. LIME estimates by default: - log_F_ft: matrix of fishing mortality estimates in log-space by fleet (rows) over time (columns) - log_sigma_R: recruitment standard deviation in log-space - log_S50_f: length at 50% selectivity in log space, one for each fleet - log_Sdelta_f: difference between length at 95% selectivity and length at 50% selectivity in log-space, so that length at 95% selectivity can never be less than length at 50% selectivity, one for each fleet - Nu_input: temporal variation in recruitment, treated as random effect Parameters estimated by default under certain conditions: - log_theta: Dirichlet-multinomial parameter relating to effective sample size (only estimated if using Dirichlet-multinomial to fit to length composition data, where run_LIME argument LFdist=1) - beta: equilibrium recruitment in log-space, serves as population scaling parameter (only estimated if fitting to catch data, run_LIME argument data_avail must include "Catch") - log_q_f: catchability coefficient in log-space, one for each fleet (only estimated if fitting to an index, run_LIME argument data_avail must include "Index") Other parameters that could be estimated but are fixed by default: - log_sigma_C: observation error on catch, fixed at log(0.2) - log_sigma_I: observation error on abundance index, fixed at log(0.2) - log_CV_L: coefficient of variation on predicted age-length curve, fixed at log(0.1) The function run_LIME used to run LIME models has many settings, but hopefully most models will keep the defaults and use only a few arguments. Basic inputs to run_LIME: - modpath: model path to save results and flags; default=NULL to save in R environment locally - input: list output from create_inputs - data_avail: what data types will LIME fit to? "LC"= length composition only, "Index_LC"= index and length comps, "Catch_LC"= catch and length comps, and "Index_Catch_LC" = index, catch, and length comps. If fitting to catch data, the user must specify: - C_type: default=0 (no catch data), 1 for catch in numbers, 2 for catch in biomass. There are many additional settings that are listed on pages 14-15 of the vignettes. Key ones include: - LFdist: which distribution to fit to length frequency data? default=1 to use Dirichlet-multinomial and estimate additional parameter related to effective sample size, multinomial = 0 - est_F_ft: which F parameters to estimate? default=TRUE to estimate all. Otherwise, a matrix with fleets as rows and years along columns, with a 1 where the F parameter should be estimated and a 0 where the F parameter should be fixed at the starting value. - est_selex_f: estimate selectivity parameters? default=TRUE to estimate selectivity parameters for all fleets. Turn off selectivity estimation for all or a single fleet with FALSE (e.g. estimate selectivity parameters for fleet 1 but not fleet 2 with c(TRUE, FALSE)) The vignettes provide examples for data-rich, length-data-only, and length- and catch-data situations.