SEAMAP Vertical Line Dataset

Similar documents
SUMMARIZING FROG AND TOAD COUNT DATA

Red Snapper distribution on natural habitats and artificial structures in the northern Gulf of Mexico

Zooplankton community structure in the northern Gulf of Mexico: Implications for ecosystem management

Matthew D. Campbell, Ted Switzer, John Mareska, Jill Hendon, Jeff Rester, Chloe Dean, Fernando Martinez-Andrade SEDAR52-WP-12

Standardized catch rates of U.S. blueline tilefish (Caulolatilus microps) from commercial logbook longline data

Addendum to SEDAR16-DW-22

Biostatistics Advanced Methods in Biostatistics IV

United States Commercial Vertical Line Vessel Standardized Catch Rates of Red Grouper in the US South Atlantic,

Shark Catches by the Hawaii-based Longline Fishery. William A. Walsh. Keith A. Bigelow

Anabela Brandão and Doug S. Butterworth

To Illuminate or Not to Illuminate: Roadway Lighting as It Affects Traffic Safety at Intersections

Regional Logbook Best Practices Guidelines: Module 4, 5 and 6: Daily Catch Reporting, Biological Data, Socio-economic Data

Demersal Longline. Introduction. What is demersal longlining? Objectives 4/6/2011. Demersal longlining is global

Section 5.1 Randomness, Probability, and Simulation

CA Collaborative Fisheries Research Program. Rod-and-Reel Surveys of Nearshore Fishes in and Near Central California Marine Protected Areas

FireWorks NFIRS BI User Manual

Socioeconomic Profile and Spatial Analysis of Fisheries in the three central California National Marine Sanctuaries

Standardized catch rates of yellowtail snapper ( Ocyurus chrysurus

INFORMATION ON FISHING EFFORT IN THE NRA FOR 2006 By George Campanis NAFO Secretariat

NOAA/NWFSC Southern California Shelf Rockfish Hook and Line Survey

For-hire Data Collection. Gulf of Mexico Fishery Management Council Red Snapper For-hire Advisory Panel December 2-3, 2014 Tampa, FL

Modeling effects of fishing closures in the Western Florida Shelf

NCSS Statistical Software

Three different funding sources funded different facets of the research.

Look again at the election of the student council president used in the previous activities.

Dynamic Positioning Control Augmentation for Jack-up Vessels

AN INDEX OF ABUNDANCE OF BLUEFIN TUNA IN THE NORTHWEST ATLANTIC OCEAN FROM COMBINED CANADA-U.S. PELAGIC LONGLINE DATA

SEAMAP Reef Fish Video Survey: Relative Indices of Abundance of Red Snapper July 2012

Interim Operating Procedures for SonTek RiverSurveyor M9/S5

[CROSS COUNTRY SCORING]

Sea Turtle Bycatch Reduction Research: Update on Field Trials

Safety Assessment of Installing Traffic Signals at High-Speed Expressway Intersections

APPENDIX A COMPUTATIONALLY GENERATED RANDOM DIGITS 748 APPENDIX C CHI-SQUARE RIGHT-HAND TAIL PROBABILITIES 754

SEDAR52-WP November 2017

ISSN (online) ISBN (online) July New Zealand Fisheries Assessment Report 2017/41. P.L. Horn C.P.

Why the International Community Needs to Help Create Marine Reserves

BASELINE SURVEY, VISUAL - SITE SPECIFIC

Smart Water Application Technologies (SWAT)

Why were anchovy and sardine regime shifts synchronous across the Pacific?

Florida s Artificial Reef Monitoring Efforts

FISH 415 LIMNOLOGY UI Moscow

A REVIEW AND EVALUATION OF NATURAL MORTALITY FOR THE ASSESSMENT AND MANAGEMENT OF YELLOWFIN TUNA IN THE EASTERN PACIFIC OCEAN

Av. Mao Tsé Tung, nr.389, P.O.Box Maputo, Mozambique

Updated and revised standardized catch rate of blue sharks caught by the Taiwanese longline fishery in the Indian Ocean

John Carlson and Jason Osborne SEDAR34-WP-02. Submitted: 6 May 2013 Updated: 8 July 2013

Sea Star Wasting Syndrome Protocols for Subtidal Surveys

Announcements. Lecture 19: Inference for SLR & Transformations. Online quiz 7 - commonly missed questions

Sontek RiverSurveyor Test Plan Prepared by David S. Mueller, OSW February 20, 2004

FISH 415 LIMNOLOGY UI Moscow

INTER-AMERICAN TROPICAL TUNA COMMISSION SCIENTIFIC ADVISORY COMMITTEE FOURTH MEETING. La Jolla, California (USA) 29 April - 3 May 2013

LONGLINE GEAR FORM. VESSEL: SAMPLE No: OBSERVER: Distance deck to water. Number of radio buoys Mainline mm Nm bz

ICES Advice on fishing opportunities, catch, and effort Celtic Seas and Greater North Sea ecoregions Published 30 June 2016

Any violation of tournament rules below automatically disqualifies a team s catch.

SCIENTIFIC COMMITTEE SECOND REGULAR SESSION August 2006 Manila, Philippines

Supplementary Figures

Multilevel Models for Other Non-Normal Outcomes in Mplus v. 7.11

Demersal Longline. Fishing Procedures & Gear INSERT presenter name here. Trinidad longliner, 2005, FAO Fisheries Technical Paper. No.

STANDARDIZED CATCH RATE OF SAILFISH (Istiophorus platypterus) CAUGHT BY BRAZILIAN LONGLINERS IN THE ATLANTIC OCEAN ( )

Analysis of Catch Quota for Kemedukl and Maml in Palauan Water

Ka Imimoana IMET Data Quality Control Report: 1999, 2000, & 2001

FEATURES. Features. UCI Machine Learning Repository. Admin 9/23/13

Data Type Example Additional Explanation. ISO 3 alpha country code

Intern Summary Name: University: Thesis research with MDPI: Thesis research title: Introduction:

Observed pattern of diel vertical migration of Pacific mackerel larvae and its implication for spatial distribution off the Korean Peninsula

Overview of Florida s Cooperative East Coast Red Snapper Tagging Program, SEDAR41-DW10. Submitted: 1 August 2014

WIND DATA REPORT. Paxton, MA

National Renewable Energy Laboratory. Wind Resource Data Summary Guam Naval Ordnance Annex Data Summary and Retrieval for November 2009

CVEN Computer Applications in Engineering and Construction. Programming Assignment #4 Analysis of Wave Data Using Root-Finding Methods

Navigate to the golf data folder and make it your working directory. Load the data by typing

Electromyographic (EMG) Decomposition. Tutorial. Hamid R. Marateb, PhD; Kevin C. McGill, PhD

Naïve Bayes. Robot Image Credit: Viktoriya Sukhanova 123RF.com

Fisheries of the Caribbean, Gulf of Mexico, and South Atlantic; Reef Fish Fishery of Puerto Rico and the U.S. Virgin Islands;

Denise L Seman City of Youngstown

CS472 Foundations of Artificial Intelligence. Final Exam December 19, :30pm

Evaluating the Influence of R3 Treatments on Fishing License Sales in Pennsylvania

CPUE standardization and spatio-temporal distribution modelling of dorado (Coryphaena hippurus) in the Pacific Ocean off Peru

Hitting with Runners in Scoring Position

SMOOTH HAMMERHEAD SHARK (HHS)

Nancy E. Kohler, Danielle Bailey, Patricia A. Turner, and Camilla McCandless SEDAR34-WP-25. Submitted: 10 June 2013

FIG: 27.1 Tool String

Lecture 10. Support Vector Machines (cont.)

Paua research diver survey: review of data collected and simulation study of survey method

Youngs Creek Hydroelectric Project

Coquitlam/Buntzen Project Water Use Plan

[CROSS COUNTRY SCORING]

U S F O S B u o y a n c y And Hydrodynamic M a s s

STAT 625: 2000 Olympic Diving Exploration

Fishing for Red Snapper Hosted by Bob Fowler (850)

Biostatistics & SAS programming

Artificial Reef Program Biological Monitoring Update

E STIMATING KILN SCHEDULES FOR TROPICAL AND TEMPERATE HARDWOODS USING SPECIFIC GRAVITY

Individual Behavior and Beliefs in Parimutuel Betting Markets

WIND DATA REPORT. Swan s Island, ME

VIROLOGY QUALITY ASSURANCE PROGRAM STATISTICAL CENTER

Fish Assessment for Sanctuary Pond October 10, 2005

Info Sheet - Operations

Lower Coquitlam River Project Water Use Plan. Temperature Monitoring Lower Coquitlam River Project Year 2 Report

A SEMI-PRESSURE-DRIVEN APPROACH TO RELIABILITY ASSESSMENT OF WATER DISTRIBUTION NETWORKS

ICES Advice on fishing opportunities, catch, and effort Celtic Seas and Greater North Sea Ecoregions Published 24 October 2017

WIND DATA REPORT. Bourne Water District

Transcription:

SEAMAP Vertical Line Dataset Challenges and Opportunities Mark A. Albins University of South Alabama, Dauphin Island Sea Lab

Analysis goals Indices of abundance for RS using avail. VL data (AL, LA, TX) Relationship between IOA and region, habitat type, depth, etc. Power analysis: # of samples needed to detect change How does sampling across depth strata contribute to overall IOA, power to detect change, and scope of inference?

Analysis plan Download VL data from SEAMAP GSMFC (seamap.gsmfc.org) Check, clean, summarize data Fit GLMs (or GLMMs) for RS numerical catch and RS biomass catch Candidate distributions: Numerical: Poisson, negative binomial, or zero-infl. version (Comp. Poisson) Biomass: log-normal, or zero-infl. Version (Tweedie) Candidate predictors: Categorical: State or region, Year, Depth stratum Continuous: Effort, Depth, Latitude, Longitude Also fit models with Temp, Sal, DO using a reduced dataset (where these are available). Simulate data using Best fit model parameters Across range of effect sizes Across range of sample sizes Fit models to simulated data and calculate Power across range of effect sizes for a range of sample sizes

Challenges and opportunities Differences in sampling protocol Among years Among state partners Differences in sampling design Among years Among state partners Overall data consistency and quality Independent observational units? Dependent subsamples?

Basic observational unit breakdown and consequences for analytical options Sites Stations Lines Hooks If we can collapse to site level without introducing bias, or eliminating too much data, we can use fixed-effects model (GLM) If we need to model repeated stations at a given site, or different numbers of non-equivalent lines (different hook sizes) at a station, then we ll need to use mixed effects model (GLMM) GLMMs are do-able, but limit options, and can be more difficult to fit (optimizer convergence problems, etc.)

Differences in sampling protocol Hook size/bait assignment 2010 random hook size/bait type assigned within backbones 2011 random hook size assigned within backbones 2012-2016 single hook size assigned to entire backbone Lines Bait Gangions Hooks/line Hook size three sequential two simultaneous three simultaneous Mackerel and Squid 11 or 12 (depending on drop) Mackerel Twisted 18 w/swivel sleeve Mackerel Twisted 18 w/out swivel sleeve 10 or 12 (depending on drop) [9, 11] or [3, 8, 11] (depending on drop) 12 8, 11, 13, 15 10 8, 11, 15

Drilling down into protocol differences What do the data tell us about Number of stations per site Number of lines per station Number of hooks per line Hook sizes across lines and stations

Data structure and critical links SEAMAP VL Database consists of three linked tables CRUISE STATION CATCH Cruise_Id / CID SID Connecting STATION with CATCH Ops. Manual states SEAMAPSTATION is link SEMAPSTATION: 6-digit date + station number for the day Not good key surveys in TX and AL from same day might have same code Unique SID assigned to each row in STATION and all associated rows in CATCH were flagged with this SID Resulted in some problems

Problems with SID link between STATION and CATCH 215 SIDs in STATION with no corresponding rows in CATCH Only 15 SEAMAPSTATION in STATION with no corresp. rows in CATCH Duplicate rows in STATION of 2063 rows... 74 primaries + 207 dup vals in SEAMAPSTATION 67 primaries + 200 dup vals in SEAMAPSTATION plus SOURCE 66 primaries + 198 dup vals in above plus LAT, LON 131 primaries + 133 dup vals in above plus TIME 77 primaries + 77 dup vals in all columns (except primary key: SID) 252 individual simultaneous lines fished at same stations unique rows in STATION with different SID & SEAMAPSTATION but duplicates in all other columns Duplicate rows in CATCH - of 38722 rows 92 primaries + 95 dup vals in all columns (except primary key: CDID)

SID in STATION with no corresponding rows in CATCH (215/2064) 210 from 2010 2012: 14 from AL, 196 from LA None have OPSCODEs indicating non-fished station A few have COMMENTs suggesting that site was not-fished Eg No Structure, missed the site, not close enough to rig Many have DEPTH, DEPTHFISHED, TIME and TIMESOAK Some have COMMENTs indicating that fish were caught E.g could not get otoliths from fish on hook #7, awesome site, same fish caught on hooks #8 and #9 Many have comments indicating problems with bait, sharks, tangled lines, etc. 5 from 2013-2016: All from LA, 2015 3 have OPSCODEs and COMMENTS indicating non-fished station Of these 3, 2 have TIMESOAK of 5 min, 1 has no Lat/Lon 2 have no OPSCODEs or COMMENT indicating non-fished station but do have values for DEPTH, DEPTHFISHED, TIME, and TIMESOAK Need to go back to beginning of pipeline to fix check against field data sheets

SEAMAPSTATION in STATION with no corresponding rows in CATCH (15/2064) 3 from LA, 2015 3 have OPSCODEs & COMMENTs indicating non-fished station 2 have TIMESOAK = 5 12 from AL 2010 2011 0 have OPSCODEs indicating non-fished station COMMENTs 5 similar to MARF 6 No Structure 5 blank 2 what should be STRUCTNAME (e.g. 2004 Pyramid 209 ) All have ENV_LAT & ENV_LON 5 have TEMP, SAL, DO 9 have DEPTH DEPTHFISHED 2 data 7 zeros 3 blanks All have TIME = 0 All have TIMESOAK = blank Need to go back to beginning of pipeline to fix check against field data sheets

Duplicate values in STATION columns 77 full duplicate pairs of rows Could be undocumented double lines fished at same site, same time (need to collapse into single row per station) Or cut-paste/copy-paste type errors (need to eliminate) 131 primaries + 133 dup vals for SEAMAPSTATION, SOURCE, LAT, LON, TIME Includes above full dups + those with different values in other columns (e.g. abiotics, DEPTHFISHED, COMMENTs, OPSCODEs, etc.) Could be double lines fished at same site, time (need to collapse) Or could be erroneous repeats (need to eliminate) 66 primaries + 198 dup vals for SEAMAPSTATION, SOURCE, LAT, LON Includes all above plus stations fished more than once in same day (different TIME) Keep these as separate stations but flag as repeat visits to same site (repeated measures) 67 primaries + 200 dup vals in SEAMAPSTATION SOURCE Includes all above plus 1 primary + 2 dup vals due to typo in SEMAPSTATION column (AL, checked field data, fixed) 74 primaries + 207 dup vals in SEAMAPSTATION 7 primaries + 7 dup vals due to same SEAMAPSTATION being assigned in LA and TX on same day Might think about including state in a unique and informative station identifier like, AL050615VL01 Need to go back to beginning of pipeline to fix most check against field data sheets

Rows in STATION with dup vals in all columns except SID & SEAMAPSTATION 252 individual simultaneous lines fished at same stations mapped to unique rows (SEAMAPSTATIONs/SIDs) in STATION All AL data AL uses alpha code at the end of the SEAMAPSTATION identifier to distinguish lines at a station (e.g. 050515VL05A ) Lines mistakenly mapped to the station level at some point in the data prep and/or transfer to GSMFC SEAMAP Suggest adding unique line identifier to CATCH for each station GEARLOC doesn t work due to sequential drops from same GEARLOC HOOKSIZE doesn t work because these were randomly assigned to lines in 2010 Already being fixed at beginning of pipeline

Duplicate rows in CATCH 92 primaries + 95 dup vals in all columns 2010 2011: 88 primaries + 91 dup vals 2012 2016: 4 primaries + 4 dup vals My best guess is that most of these (2010 2011) represent empty hooks but are missing HOOKNUM and HOOKSIZE info Need to go back to beginning of pipeline to fix check against field data sheets

Drilling down into protocol differences Number of stations per site Number of lines per station Number of hooks per line Hook sizes across lines and stations Need to fix: SID link between STATION and CATCH: differentiate clearly between non-fished and fished stations, and assure that there are rows in CATCH for all hooks at all fished stations Duplicates of critical values in STATION: eliminate erroneous repeat stations and flag true repeat stations

Options for including numbers and sizes of hooks in model Diffs. in hook number among stations can be modeled via an effort offset in the predictors Diffs. in hook size representation among stations = source of bias Eliminate stations with non-standard hook size rep. Or include hook size in model - requires inclusion of subsamples (lines/hooks), which necessitates GLMM Unfortunately

Problems with hook number and size data: 2010 2011 Number of rows in CATCH (hooks fished) per station is very inconsistent 47/233 stations with < 10 rows in CATCH data 70/233 stations with n-rows in CATCH multiple of 10 or 12 Most empty-hook (no-catch) rows missing HOOKNUM blank for 1115/5460 rows HOOKSIZE blank for 5/5460 rows Impossible to assure standardization of effort and avoid hook size bias for these early years Will likely require major data entry effort Need to go back to beginning of pipeline to fix check against field data sheets

Problems with hook number and size data: 2012-2016 Rows in CATCH (hooks fished) per station fairly consistent 0/1616 stations with < 10 rows in CATCH 14/1616 stations with n-rows in CATCH multiple of 10 Most empty-hook (no-catch) rows present HOOKNUM blank for 5/33262 rows HOOKSIZE blank for 41/33262 rows Most can be fixed relatively easily, what can t be fixed can be eliminated from dataset without large loss in sample size Need to go back to beginning of pipeline to fix check against field data sheets

Other problems with hook number: missing hooks vs. lost partial rig Potential cause of < 10 rows in CATCH per station: single line stations + inconsistent treatment of missing hooks/lost partial rig Sometimes rows included for missing hooks - often with M for BAITSTATUS Sometimes rows not included for missing hooks - often with COMMENT indicating lost rig or lost partial rig Need to clarify difference (if any) between these categories and standardize how they are treated in the data Suggestions: Stations with missing hooks flagged in OPSCODE using Y(HS) Stations with lost partial rig given full set of rows in CATCH with BAITSTATUS = M for lost hooks Stations with lost (full) rig no rows in CATCH for lost rig, flagged in OPSCODE using L(HS) Treat missing hooks and lost partial rigs same in analysis Treat lost (full) rigs same as station with less than full set of lines

Drilling down into protocol differences Number of stations per site Number of lines per station Number of hooks per line Hook sizes across lines and stations Need to fix: Blanks in HOOKSIZE & HOOKNUM Missing rows in CATCH for no-catch hooks Mising rows in CATCH for lost partial rigs Extra rows in CATCH for lost (full) rigs Easy for 2012 2016, but difficult for 2010 2011 Need to fix: SID link between STATION and CATCH: differentiate clearly between non-fished and fished stations, and assure that there are rows in CATCH for all hooks at all fished stations Duplicates of critical values in STATION: eliminate erroneous repeat stations and flag true repeat stations

Dealing with differences in sampling design and protocols Use all data (2010 2016): Multiple sampling events across time Stations with less than three lines (hook number, size) Different sets of hook sizes and different bait types Hook size and bait type uniform on lines in most years, but randomly assigned within lines in other years Limit to recent years (2012 2016): Multiple sampling events across time Stations with less than three lines (hook number, size)

Dealing with differences in sampling design and protocols Multiple sampling events across time 1. Eliminate repeat sampling events at a site 2. Incorporate repeat sampling events into a longitudinal design (mixed effects model with site as random effect) Under this option, might still be good to eliminate repeat samples within same day, or close together in time to avoid any depletion effect

Dealing with differences in sampling design and protocols (2010 2016) Stations with less than three lines (hook number, size) Different sets of hook sizes and different bait types 1. Keep all stations, collapse lines into station (ignore potential hook size and bait biases) 2. Keep all stations, collapse lines into station (minimize potential hook size and bait biases by eliminating bait types and/or hook sizes not fished during all years) 3. Keep all stations, incorporate hooks-nested-within-lines as subsamples in station, include hook size as predictor, eliminate bait types and/or hook sizes not fished during all years (mixed effects model with individual level random effect at the hook level binary response) For all three options, include number of hooks as measure of effort in model (offset term) Eliminating all stations with less than three lines means dropping all/most of 2011, so not really an option Additional concern with all three options is depletion effect of sequential lines in 2010

Dealing with differences in sampling design and protocols (2012 2016) Stations with less than three lines (hook number, hook size) 1. Eliminate stations with less than 3 lines, collapse lines into station (no hook size biases because equal rep. at all stations) 2. Keep all stations, collapse lines into station (ignore potential hook size bias) 3. Keep all stations, incorporate lines as subsamples in station, include hook size as predictor (mixed effects model with line-nested-within-station as random effect) For all three options, include number of hooks as measure of effort in model (offset term)

Other issues: bait/hook status, lost and/or tangled gear, Sharks, etc. Which hooks should be counted in effort offset? Whole bait (Y) Partial bait (Y) No bait (?) No bait on deployment (N) Damaged (N) Missing (N) Predation (N) Double hooked fish (N) Those that catch other spp. (N) Should tangled gear be included? Does it matter if the tangled gear caught fish or not? Should the whole station be removed, or just those lines affected? Should stations with large sharks present be included? If not, how to we standardize the filter?

Challenges and opportunities Differences in sampling protocol Among years Among state partners Differences in sampling design Among years Among state partners Overall data consistency and quality

CATCH (38722 rows): Categorical variables CAMERA 2422: blank 432: FALSE 240: TRUE GEARLOC 3079: blank BAITSTATUS 16: 0 1: n 1: S HOOKSIZE 1101: 15/0 1100: Aug-00 1081: Nov-00 501: 08 44: blank 2: 1

CATCH (38722 rows): COMMENT Supposed to be catch all, but should only be used for data with no home or to clarify (or add caveats to) other codes used Often used in place of BAITSTATUS (e.g. one fish caught on two hooks) Need to review all comments and use to fill in OPSCODEs, BAITSTATUS, etc.

CATCH (38722 rows): BAITSTATUS No code for bait lost upon deployment (several examples of this in COMMENTs) should treat these hooks differently in analysis One fish caught on multiple hooks treated inconsistently Suggestions: Enter fish data (including GENUS, SPECIES, BIOCODE, etc.) on one line only! Use code F on line with fish data and code L for all other hooks Make note in COMMENT identifying primary hook and all other hooks using their HOOKNUM

CATCH (38722 rows): Species identifiers BIOCODE: 1 blank (where SPECIES & GENUS not blank) SPECIES: 253 blanks (where BIOCODE not blank) 10 BIOCODES with multiple GENUS + SPECIES All of these shortened versions or different capitalizations of correct name

CATCH (38722 rows): FISHID FISHID assigned when no fish on hook (19K/30K) 398 primary + 13235 dup vals of FISHID

CATCH RS only (7195 rows): Fish size data GONADWT PCL SL FL TL WEIGHT Min. : 0.000 Min. :151.0 Min. :160.0 Min. :184.0 Min. : 55.0 Min. : 0.082 1st Qu.: 3.917 1st Qu.:281.0 1st Qu.:305.0 1st Qu.:368.0 1st Qu.: 396.0 1st Qu.: 0.850 Median : 11.367 Median :311.0 Median :370.0 Median :431.0 Median : 467.0 Median : 1.400 Mean : 28.599 Mean :325.2 Mean :395.4 Mean :452.9 Mean : 489.7 Mean : 1.899 3rd Qu.: 31.900 3rd Qu.:356.0 3rd Qu.:475.0 3rd Qu.:524.0 3rd Qu.: 566.0 3rd Qu.: 2.400 Max. :517.600 Max. :695.0 Max. :748.0 Max. :861.0 Max. :4310.0 Max. :13.200 NA's :3763 NA's :6834 NA's :3575 NA's :40 NA's :32 NA's :62

Biological parameters (RS only)

v v v v v v

Missing RS weights (62 rows) 27 have no COMMENT or other indications why missing data 2 of these have BAITSTATUS indicating one fish caught on two hooks 19 of these have data for length measurements 13 have weight removed in COMMENT All from LA on same day: 2015-05-01, 6 different stations 14 were lost before measuring, or partly consumed 7 have comment indicating one fish caught on multiple hooks 4 of these have BAITSTATUS indicating same 1 has data missing COMMENT

STATION (2064 rows): Categorical variables GEARCODE 4: VL 439: blank STRUCTTYPE 219: Artificial Structure 189: ARTIFICIAL REEFS 135: ARTIFICIAL 131: Artificial reef 129: artificial reef 16: ARTIFICAL 2: Artificial Reef 1: Artificial Reef -pyramids- 1: Artificial Reef -two-pile structure- 1: Artificial Reef -wreck- 2: Stand Pipe 1: Z-Pipe 648: PETROLEUM PLATFORMS 32: PLATFORM STRUCTTYPE (cont) 62: NATURAL BOTTOM 42: natural bottom 30: Natural Structure 25: Natural bottom 7: Natural Bottom 5: NATURAL 17: No structure 12: No Structure 39: NO STRUCTURE 41: Unknown 24: UNKNOWN 3: Unidentified Structure 250: blank

STATION (2064 rows): COMMENT Region or lab specific codes cause unnecessary clutter (e.g. Treatment 1 ) Used in place of appropriate OPSCODEs (e.g. OP CODE L11 ) Used to indicate protocols (e.g. single line 12 in gangion) Create new OPSCODEs for these?

STATION (2064 rows): OPSCODE Of 2064 rows in STATION, only 27 have an entry in OPSCODE (all LA 2015, 2016) J, K, O, S, X: each used once TXX: used 21 times LXX: used 2 times PXX: used 1 time (not listed in Appendix 2) Many rows in STATION include COMMENTs indicating an issue that should be reflected in the OPSCODE column, but is not (mostly lost gear situations) OPSCODE can be a powerful tool on the analysis end of the pipeline, but needs to be filled in retroactively, and consistently in the future to be of any use.

STATION (2064 rows): DEPTH + DEPTHFISHED DEPTH - 3 zeros, 9 blanks DEPTHFISHED - 47 zeros, 119 blanks

STATION (2064 rows): Temp, Sal, DO Often measured at nearby station on same or different day Better to leave these blank and do any substitutions as part of analysis If substitutions kept in dataset, need tractable flag and column indicating station ID of substitute station.

STATION (2064 rows): Temp, Sal, DO SECCHI TEMPMAX SALMAX DOMAX Min. : 0.00 Min. : 6.30 Min. : 6.40 Min. :0.280 1st Qu.: 0.00 1st Qu.:20.37 1st Qu.:35.50 1st Qu.:4.420 Median : 1.20 Median :22.20 Median :36.10 Median :5.670 Mean : 3.85 Mean :22.81 Mean :35.79 Mean :5.499 3rd Qu.: 5.90 3rd Qu.:24.90 3rd Qu.:36.40 3rd Qu.:6.600 Max. :30.50 Max. :35.38 Max. :39.96 Max. :8.500 NA's :1132 NA's :702 NA's :702 NA's :730

SECCHI DFTEMP DFSAL DFDO Min. : 0.00 Min. :18.19 Min. :31.07 Min. :0.000 1st Qu.: 0.00 1st Qu.:20.80 1st Qu.:35.53 1st Qu.:3.850 Median : 1.20 Median :22.53 Median :36.23 Median :4.750 Mean : 3.85 Mean :23.43 Mean :35.91 Mean :4.740 3rd Qu.: 5.90 3rd Qu.:25.90 3rd Qu.:36.39 3rd Qu.:5.905 Max. :30.50 Max. :31.62 Max. :39.96 Max. :8.300 NA's :1132 NA's :1599 NA's :1599 NA's :1625

Challenges and opportunities Differences in sampling protocol Among years Among state partners Differences in sampling design Among years Among state partners Overall data consistency and quality

Take home Many of these issues will require going back to the original field data Issues from early years will require extensive data entry/re-entry to deal with no-catch rows, hook position, and hook size issues We can move forward faster if we prioritize 2012-onward Most of these issues can be spot checked/confirmed/fixed I am more than happy to help with these tasks by providing detailed reports of issues and/or working with your data people to track down and fix problems

AL VLL statistical model Collapsed hooks and drops to station Most stations had same number of drops and same number/sizes of hooks per drop Those that didn t were eliminated before model fitting (e.g. lost gear) Also eliminated all 2010 and 2011 data due to inconsistencies in number of drops, number/size of hooks per drop, bait type, etc., and insufficient data quality to sort these out. No repeat stations at sites Therefore, able to run fixed-effects only model