SCRS/2008/166 USING DELTA-GAMMA GENERALIZED LINEAR MODELS TO STANDARDIZE CATCH RATES OF YELLOWFIN TUNA CAUGHT BY BRAZILIAN BAIT-BOATS Humber A. Andrade 1 SUMMARY In order to standardize catch per unit effort (CPUE) for yellowfin caught by Brazilian baitboats as reported in Task II database were analyzed during the assessment meeting for tropical tunas. Models and results gathered during that exercise are described in this document. I have used generalized linear models (GLMs) to estimate standardized CPUE. Gamma density distributions were used to model the positive catches, while binomial distributions were used to model the proportion of positive catches. Factors included in the GLMs were area, quarter and year. Two databases were analyzed, namely complete and set by set. In the latter, fishing days are reported one by one. However, complete data set includes all available data, no matter if effort and catch were summed up by trip, fleet, month or area. Standardized catch rates as calculated using both dataset showed slight increasing trends. Because data about some potentially important explanatory variables (e.g. amount of bait released) are not available those estimates are not ideal relative abundance indices. KEYWORDS Catch/effort, Fishery Statistics, Abundance 1 UFSC INE/CTC Caixa Postal 476 88010-970 Florianópolis-SC Brazil. humber@inf.ufsc.br
1. Introduction Catch rates may be used as relative abundance indices in fish stock assessments as far as they reflect biomass of the stock. In most of papers the relationship between catch rate indices (I) and biomass (B) in the time t is assumed to be linear (I t = q.b t ). Proportionality between the indice and the biomass is represented by the catchability coefficient q. If q is constant across time or if we can estimate it, the indice I t will be useful to infer biomass trends. Usually catch rates are analyzed in order to obtain quantitative indices affected by biomass changes across year, but not by variations of q. Those indices are usually denominated standardized catch rate, which ideally should reflect only biomass variations across time. Generalized linear models (GLM) have been often used to standardize commercial catch rates in order to obtain relative abundance indices (Maunder and Punt, 2004). Hereafter commercial catch rate is denominated as catch per unit effort (CPUE). In the GLM framework a variable (i.e. CPUE) assumed to follow a probability distribution of the exponential family may be modeled as response variable of several explanatory variables that may be qualitative (factors) or quantitative (covariate) (Dobson, 2002). Some transformation of the response variable may be necessary in order to not violate assumptions about probability distributions used in the analysis. Inclusion of year besides other optional explanatory factors (e.g. area) is mandatory because it stands for the biomass variations across the years. Effects of other factors should be eliminated after they area calculated. If the abundance of the stock is not too small and/or if fishing effort is very effective, few catches will be zero. In the Brazilian bait-boat fisheries fishermen search for skipjack schools at surface. After they found a school the catch of the skipjack is rarely zero. In opposition, yellowfin catches are often zero because fishermen do not aim at that species. Most of the eventual positive catches probably occur when yellowfin is found mixed up with skipjack. Schools of yellowfin are probably rare. When all catches are positive or when catches equal to zero are rare, models for positive data only may be used to estimate standardized indices (skipjack case see Andrade, 2008). Nevertheless, if the amount of zeros is large as in the yellowfin case, they might be included. Difficulties may arise because some of the transformation usually requested (e.g. logarithm) are not applicable to values equal zero. Furthermore, zero values are not allowed in some alternative distributions, like the potentially useful and flexible gamma density distribution. The approach often adopted to cope with zero catches is to model positive and proportion of positive catches separated. For example, normal or gamma distributions may be used to model positive catch rates, while binominal distribution is the alternative for proportion of positive catches. Estimations gathered using the two models might be combined in order to calculate standardized catch rates in each year. Those combined models are usually denominated delta-x models, where X is the distribution used in the analysis of positive data. In most of the papers delta-lognormal model is selected in advance. Actually delta-lognormal model is one of the best alternatives when analyzing fishery datasets (Ortiz and Arocha, 2004). Extracting effects of the levels of the year may be difficulty if that factor is considered in interactions. In addition, if proportion of positive and positive catch variables are dependent, even more complex math derivations may be necessary to obtain an analytic solution. Two simplifications used to cope with those two technical difficulties (Andrade, 2008) are: a) Average over effects of nuisance factors (e.g. area ) in order to extract effects of year ; and b) Assume that positive and proportion of positive catches are independent variables. In this paper I have used GLM and those simplifications to estimate standardized catch rates for yellowfin tuna caught by Brazilian bait-boat fleet. 2. Database Overview Catch and effort of Brasilian bait-boat fleet as reported in Task II ICCAT database was analyzed in order to standardize catch rates. Descriptions of that database concerning changes in the fleet across the years and, eventual gaps of data are in Andrade (2008). Follows bellow a summary of relevant information. Database
contains information about national fleet but, part of the vessels were formerly leased boats. There are data for period 1981 throughout 2006, but there is a gap in 2000. The unit of effort unit for bait-boat fleet is fishing day, while catches are in weight. In most of the database entries the effort is larger than one fishing day, hence the data is aggregated. Hereafter database entries with effort equal to one are denominated set by set. All information, including aggregated and set by set data entries, is denominated just as complete database. The amount of reports were larger than 140 from early 1980's to mid 1990's (Figure 1). Set by set account for approximately 20% of reports throughout 1997, but they were rare after 2001. Proportion of catches equal to zero is usually larger than 50% until the end of 1990's. Nevertheless, it decreases in 2000's in conjunction with the proportion of set by set reports. Overall proportion of catches equal to zero in the set by set database is quite larger (82.5%) than in complete database (47.8%). 3. Overall Effort, Catch and Catch Rates Most of fish caught by bait-boats is skipjack. Yellowfin catches accounts only for 5.7% of total catch in the Task II database. Nevertheless, if skipjack (main target) is not taken into account, yellowfin catches sum up to 67% of the remaining total catch. In the set by set database most of ratios between yellowfin and total catches are smaller than 0.05 (Figure 2). Yellowfin catches were seldom larger than skipjack catches, though yellowfin accounted for more than 95% of total catches in some cases (see last black bar at right in figure 2). Statistic summary of catches (t) in that specific cases are: Minimum = 0.1; 1 st quantile = 0.275; Median = 1; Mean = 4.36; 3 rd quantile = 2.1; and Maximum = 57.3. Therefore, when yellowfin catch is the largest one, total catch is usually small. Fishermen seldom find a school dominated by yellowfin that is pretty large and/or vulnerable. There is no doubt that yellowfin is caught mainly mixed in schools dominated by skipjack. Distributions of effort, catch and average catch rate are in figure 3. The three areas (N - north, C - central and S - south) indicated in that figure were those considered in the generalized linear models. Motivations for selecting those three areas are in Andrade (2008), but a summary of them is warranted here. Bait-boat fishing fleet shows seasonal behavior according to skipjack displacement along the Brazilian coast (Andrade, 2003). South coast (S) is explored mainly in summer while the very north of southeastern Brazilian coast (N) is explored mainly in winter. Most of fishing days was in the central (C), which is explored all across the year. Most of yellowfin landed by bait-boat fleet was caught in the central area, at least in part because effort there is larger than in other areas (Figure 3). Total effort in the south area was the smallest. Catch rates of yellowfin are usually small (< 1 ton/fishing day). Exceptions occurred mainly in north and south fishing grounds, but large CPUEs appear only where efforts were not large. Because the proportions of set by set reports remain larger than 10% from 1981 to 1997 (Figure 1), that was the period selected in order to compare catch rates of the two databases (complete vs. set by set). Standard scores [Z = (value mean) / standard deviation] as calculated for nominal catch rates are in figure 4. Smoothed curves are used to indicate overall trends. Catch rate for complete dataset increased until early 1990's when it dropped. Another increasing trend appeared after 1999. Curves for complete dataset as calculated for whole and partial period (1981-1997) showed similar shapes, though increasing and decreasing trends were sharper for the partial database. Smoothed curve for the set by set database showed a slight but continuous increase throughout 1997. Hence catch rates trend are contradictory in 1990's. Standardized catch rates as calculated for complete database decrease, while the opposite pattern showed up in the calculations for the set by set database. 4. Selection of Models
In order to select the explanatory factors I have fitted a saturated model and then, I have used conventional hypothesis tests to verify if dropping each term, one at a time, results in significant increase of deviance. Interactions were dropped and tested first. Main effect of a factor (e.g. area) was discarded only if all interactions that includes that factor were already discarded. Generalized linear models were fitted to six databases as described in the first two columns at left in table 1. Factors considered were year, quarter and area. I have tried out the binomial distribution and two link functions (logit and probit) when modeling the proportion of positive catch rates. Normal and gamma distributions and three link functions (identity, log, inverse) were used to analyze positive catch rates. I have also fitted models using normal distribution and two different link functions (identity and inverse) to the logarithm transformation of catch rate data. Finally, in order to select a model among those fitted to the same response variable I have relied in Akaike and Bayesian information criteria (Akaike, 1974; Schwarz, 1978). When comparing models fitted to different response variables (i.e. positive catch rate and logarithm of positive catch rate) I relied in pseudo-r 2 though it is not an ideal criteria. Models and deviance analyses results are in tables 1 and 2 respectively. Binomial distribution and logit link function were selected to model the proportion of positive catch rates, while gamma density distribution and log link function were selected to model positive catch rates. Hence delta-gamma models are the alternative to standardize yellowfin catch rates. This result carries a warning message. Sometimes delta-lognormal models are selected in advance, but it is not always the best option to standardize catch rates. Models should be confronted with data before choosing one of them. Pseudo-R 2 estimations stand for the proportion of simplest model's deviance ( null model ) that is eliminated if explanatory variables are considered. Therefore the factors I have considered can explain most of deviance for the proportion of positive catch rates, but they are not efficient explanatory variables for the positive catch database (Table 1). Pseudo-R 2 for the model fitted to positive catches of the set by set database was larger than that calculated for complete dataset. In opposition, pseudo-r 2 for the proportion of positive catches for the set by set database was smaller than that calculated for complete dataset. Perhaps that pattern appeared due to the difference in the balance between sample sizes. Sample size for positive catches in the complete dataset is larger than in the set by set database. The opposite occurs when we look at the information about the proportion of positive catches. Models fitted to proportion of positive catches include more explanatory terms (factors and interactions) than than those fitted to positive catch rates (Table 2). Models fitted to the set by set database seem more simple than those fitted to complete database for the same period 1981-1997. In the analyses of positive catches only the factor year was selected in the model fitted to set by set database, while main effects of the three factors (year, quarter and area) and two interactions were selected in the model fitted to complete dataset. Overall year is the more important explanatory factor (Table 2). Area is the major explanatory factor only in the model fitted to proportion of positive catches of the complete dataset (1981-1997). Factor area proved to be more important than quarter in most of models. 5. Diagnostics of the Fittings Diagnostic plots for residuals may show if there were some violation of assumptions and/or if there are outliers. Residuals plots show that fittings for positive database appear to not be of concern (Figure 5). Perhaps the exception is the apparent increasing trend due to few points (see top panel at right). Nevertheless, there is not a sharp increasing trend for the bulk of data in the range of predicted values between -1 and 1. Therefore I assumed that overall fittings of the models for the positive databases were acceptable.
Diagnostic plots for the proportion of positive catches are in figure 6. Few points related to predictions larger than 10 and smaller than - 10 affected strongly the smoothed curves. Cook's distances were large (>1) for several data points as indicated in the panels at bottom (Figure 6). I did sensitivity analyses by discarding those data (each at a time) and fitting the model again. The results gathered in the sensitivity analyses were similar to those gathered when analyzing all data. Therefore I opted to discard any data. 6. Standardized Catch Rates I have used the solutions presented in a companion paper (Andrade, 2008) to calculate standardized catch rates. When averaging over area and quarter factors in order to extract the effects of year, I have considered that all levels of area and quarter have equal weights. Results gathered for the three databases (complete 1981-2006; complete 1981-1997; set by set 1981-1997) are in figure 7. Most of the estimations was smaller than 1 t/fishing day. Standardized catch rates peaked in 1990 for all databases, while a peak in 1981 was apparent only for complete (1981-2006) database. However, if the years from 1998 to 2006 are not considered, the peak in 1981 did not show up for complete dataset. Therefore, including or excluding some level of a factor (e.g. year) may strongly affect the estimations for other levels. Overall coefficients of variation for the set by set database were smaller than those calculated for the other databases (Figure7). That pattern was expected because variability decrease as far as the data is aggregated. Estimations for complete (1981-2006) dataset showed slight increasing trend all across the years as indicated by score Z calculations (Figure 7). Sharp increasing trends from 1990 to 1997 showed up in the calculations for the other two databases. If we compare figures 6 and 7 it becomes evident that the standardization procedure has inverted the original trend for complete database in the 1990's. Nominal catch rates decrease, while the standardized ones increase from 1990 to 1997. Actually standardized calculations are not expected to fit well the nominal catch rates. If both estimations result in similar time trend we should be worried. That would be an indication that standardization procedure was inefficient. Although nominal catch rates from 1990 to 1997 as calculated for complete and set by set databases are contradictory (figure 6), standardized catch rates are not (Figure 7). That is an encouraging result in the sense the standardized results converge no matter if the database includes or not detailed information. However, that convergence do not means that standardized catch rates are reasonable relative abundance indices. How reliable are the standardized calculations primarily depends on if all relevant factors that affect CPUE were appropriately included in the models. Actually the weakness of the analysis is due to the limited explanatory variables considered (year, area and quarter). Several other factors may also affect CPUE, like the amount of bait released and the fishermen skill. In addition, the unit of effort (fishing day) does not hold information about the time period fishermen spent, or even the size of the area they have explored when searching for schools. Therefore, those increasing trends as apparent in the standardized catch rates may be due factors affecting the catchability coefficient q, but not due to biomass B variation across the years. After all, the estimations showed in this paper are useful to compare results gathered in analyses of aggregated and set by set databases but, they are not ideal relative abundance indices. References AKAIKE, H. 1974. A new look at the statistical identification model. IEEE Transactions on Automatic Control, 19: 716-723. ANDRADE, H. A. 2008. Standardized catch rates for skipjack tuna (Katsuwonus pelamis) caught in the southwest of South Atlantic Ocean. SCRS/2008/113. Col. Vol. Sci. Pap. 0(0): 000-000 (in press).
ANDRADE, H. A. 2003. The relationship between skipjack tuna (Katsuwonus pelamis) fishery and seasonal temperature variability in the south-western Atlantic. Fish. Oceanogr. 12(1): 10-18. DOBSON, A. J. 2002. An introduction to generalized linear models. 2 nd Edition. Chapman & Hall/CRC. 225 pp. ICCAT. 2008. Task II database. Available in: http://www.iccat.int. Accessed in May 08, 2008. MAUNDER, M. N. and A. E. Punt. 2004. Standardizing catch and effort data: a review of recent approaches. Fish. Res. 70: 141-159 ORTIZ and F. Arocha, 2004. Alternative error distribution models for standardization of catch rates of nontarget species from a pelagic longline fishery: billfish species in the Venezuelan tuna longline fishery. Fish. Res. 70: 275 297 SCHWARZ, G. 1978. Estimating the dimension of a model. Annals of Statistics. 6: 461-464.
Tables Table 1 Generalized linear models fitted to positive and to proportion of positive catch rates of yellowfin tuna. Factors: A area; Q quarter; Y year. Interactions are indicated by :. AIC Akaike Information Criteria (Akaike, 1974); BIC Bayesian Information Criteria (Schwarz, 1978). Database Response Variables Explanatory Variables Distribution Link Function AIC BIC pseudo-r 2 Complete catch rate A+Q+Y+A:Y+Q:Y gamma log 2104.92 2925.28 24.65 (1981-2006) proportion of positive catches A+Q+Y+A:Q+A:Y+Q:Y binominal logit 956.93 1494.62 89.15 Complete catch rate A+Q+Y+A:Y+Q:Y gamma log 1683.28 2206.55 23.98 (1981-1997) proportion of positive catches A+Q+Y+A:Q+A:Y+Q:Y binomial logit 706.93 1042.37 89.45 Set by set catch rate Y gamma log 314.96 358.20 40.85 (1981-1997) proportion of positive catches Q+Y+Q:Y binomial logit 291.13 485.07 68.94 Table 2 Analysis of deviance tables for the models fitted to positive catch rates and to proportion of positive catch rates. Df Degrees of freedom. Results of tests of Fisher (F) and chi-square are in the last column at right. Database Factor Df Resid. Df Resid. Dev Pr(>F) Complete 1752 3566.44 (1981-2007) Y 24 1728 3286.00 6.65E-18 positive catches A 2 1726 3152.22 3.44E-15 Q 3 1723 3135.64 3.83E-02 Y:Q 70 1653 2819.04 1.43E-08 Y:A 42 1611 2687.16 9.41E-03 Complete 1248 2532.44 (1981-1997) Y 16 1232 2315.99 1.61E-15 positive catches A 2 1230 2247.55 2.87E-08 Q 3 1227 2226.13 1.18E-02 Y:Q 48 1179 2008.80 1.23E-06 Y:A 29 1150 1925.06 4.70E-02 Set by set 93 199.92 (1981-1997) Y 16 77 118.26 1.50E-04 positive catches Database Factor Df Resid. Df Resid. Dev P(> Chi ) Complete 231 1015.82 (1981-2007) Y 24 207 694.38 7.90E-54 proportion of A 2 205 474.16 1.51E-48 positive catches Q 3 202 464.26 1.94E-02 Y:Q 70 132 264.90 2.35E-14 Y:A 44 88 140.70 1.42E-09 A:Q 6 82 110.22 3.19E-05 Complete 164 676.48 (1981-1997) A 2 162 467.04 3.32E-46 proportion of Y 16 146 339.61 2.02E-19 positive catches Q 3 143 331.67 4.74E-02 Y:Q 48 95 181.61 1.94E-12
A:Y 30 65 97.62 5.19E-07 A:Q 6 59 71.40 2.02E-04 Set by set 127 222.03 (1981-1997) Y 16 111 164.08 1.15E-06 proportion of Q 3 108 157.18 7.52E-02 positive catches Y:Q 47 61 68.96 2.56E-04 Figures Figure 1 Number of reports (solid line), proportion of reports in which effort is aggregated (dotted line) and proportion of reports in which yellowfin catch is zero. Source: Task II database - ICCAT (ICCAT, 2008). Figure 2 Histogram for the proportion of yellowfin in the total catch as reported in weight. Figure 3 Effort (fishing days) of Brazilian bait-boat fleet from 1981 to 2006 and yellowfin tuna catch (tons) and catch-per-unit-effort (tons/fishing days). Fishing areas: N north; C central and S south. Source: Task II ICCAT database (ICCAT, 2008).
Figure 4 Score Z [(value mean)/standard deviation] of catch rate as calculated for complete (circles) and set by set (empty triangles) databases. Smoothed trend curve for triangles is the dashed one. Figure 5 Diagnostic plots for the models fitted to positive catch rates. Panels from left to right show results for complete (1981-2006), complete (1981-1997) and set by set (1981-1997) databases respectively. Dashed lines appear in the graph area only if calculations of Cook's distance are larger than one. Figure 6 Diagnostic plots for the models fitted to proportion of positive catch rates. Panels from left to right shows results for complete (1981-2006), complete (1981-1997) and set by set (1981-1997) databases respectively. Dashed lines appear in the graph area only if Cook's distance calculations are larger than one.
Figure 7 Standardized catch rates for three databases: a) Complete database for period 1981 through 2006 (bold dotted line and empty circles); b) Complete database for period 1981through 1997 (solid line and black circles); c) Set by set database for period 1981 through 1997 (bold dashed line and triangles). In the second panel the score Z is (value-mean)/standard deviation and the lines stand for smoothed curves.