Supplementary Figures

Similar documents
Legendre et al Appendices and Supplements, p. 1

5.1 Introduction. Learning Objectives

arxiv: v1 [math.co] 16 Sep 2016

What Causes the Favorite-Longshot Bias? Further Evidence from Tennis

Lesson 2.1 Frequency Tables and Graphs Notes Stats Page 1 of 5

LOCOMOTION CONTROL CYCLES ADAPTED FOR DISABILITIES IN HEXAPOD ROBOTS

User s Guide for inext Online: Software for Interpolation and

CHAPTER 1 ORGANIZATION OF DATA SETS

Chapter 12 Practice Test

Wind Flow Validation Summary

Simulating Major League Baseball Games

Introduction to Pattern Recognition

Chapter 5: Methods and Philosophy of Statistical Process Control

Section I: Multiple Choice Select the best answer for each problem.

A IMPROVED VOGEL S APPROXIMATIO METHOD FOR THE TRA SPORTATIO PROBLEM. Serdar Korukoğlu 1 and Serkan Ballı 2.

Setting up group models Part 1 NITP, 2011

Characterizers for control loops

Executive Summary of Accuracy for WINDCUBE 200S

Central Hills Prairie Deer Goal Setting Block G9 Landowner and Hunter Survey Results

Section 5.1 Randomness, Probability, and Simulation

STAT 625: 2000 Olympic Diving Exploration

NCSS Statistical Software

A Cost Effective and Efficient Way to Assess Trail Conditions: A New Sampling Approach

Northwest Parkland-Prairie Deer Goal Setting Block G7 Landowner and Hunter Survey Results

Optimal Weather Routing Using Ensemble Weather Forecasts

HIGH RESOLUTION DEPTH IMAGE RECOVERY ALGORITHM USING GRAYSCALE IMAGE.

Fluid Flow. Link. Flow» P 1 P 2 Figure 1. Flow Model

RELATIVE PLACEMENT SCORING SYSTEM

Implementing Provisions for Art. 411 of the ICR Ski Jumping

INTRODUCTION TO PATTERN RECOGNITION

Opsariichthys uncirostris uncirostris

Ron Gibson, Senior Engineer Gary McCargar, Senior Engineer ONEOK Partners

Predicting Tennis Match Outcomes Through Classification Shuyang Fang CS074 - Dartmouth College

SUMMARIZING FROG AND TOAD COUNT DATA

HYDROGEN - METHANE MIXTURES : DISPERSION AND STRATIFICATION STUDIES

Competition & Ranking Manual

EVOLVING HEXAPOD GAITS USING A CYCLIC GENETIC ALGORITHM

Black Sea Bass Encounter

A Game Theoretic Study of Attack and Defense in Cyber-Physical Systems

LQG Based Robust Tracking Control of Blood Gases during Extracorporeal Membrane Oxygenation

Objectives. Materials

Numerical Simulations of a Three-Lane Traffic Model Using Cellular Automata. A. Karim Daoudia and Najem Moussa

19 INTERNATIONAL CONGRESS ON ACOUSTICS MADRID, 2 7 SEPTEMBER 2007

SIMULATION OF ENTRAPMENTS IN LCM PROCESSES

Artificial Intelligence for the EChO Mission Scheduler

Stat 139 Homework 3 Solutions, Spring 2015

Five Great Activities Using Spinners. 1. In the circle, which cell will you most likely spin the most times? Try it.

PLEA th Conference, Opportunities, Limits & Needs Towards an environmentally responsible architecture Lima, Perú 7-9 November 2012

Online Companion to Using Simulation to Help Manage the Pace of Play in Golf

Estimation and Analysis of Fish Catches by Category Based on Multidimensional Time Series Database on Sea Fishery in Greece

Autodesk Moldflow Communicator Process settings

Dynamic configuration of QC allocating problem based on multi-objective genetic algorithm

Two Machine Learning Approaches to Understand the NBA Data

Players Movements and Team Shooting Performance: a Data Mining approach for Basketball.

Stats 2002: Probabilities for Wins and Losses of Online Gambling

Roundabouts along Rural Arterials in South Africa

Background Information. Project Instructions. Problem Statement. EXAM REVIEW PROJECT Microsoft Excel Review Baseball Hall of Fame Problem

Lesson 14: Modeling Relationships with a Line

A HYBRID METHOD FOR CALIBRATION OF UNKNOWN PARTIALLY/FULLY CLOSED VALVES IN WATER DISTRIBUTION SYSTEMS ABSTRACT

Analysis of Variance. Copyright 2014 Pearson Education, Inc.

Individual Behavior and Beliefs in Parimutuel Betting Markets

CS Lecture 5. Vidroha debroy. Material adapted courtesy of Prof. Xiangnan Kong and Prof. Carolina Ruiz at Worcester Polytechnic Institute

CHISANA CARIBOU HERD

Puyallup Tribe of Indians Shellfish Department

MINNESOTA FRESHWATER MUSSEL SURVEY AND RELOCATION PROTOCOL

Module 3 Developing Timing Plans for Efficient Intersection Operations During Moderate Traffic Volume Conditions

Wildlife Ad Awareness & Attitudes Survey 2015

Application Block Library Fan Control Optimization

February 12, Winthrop University A MARKOV CHAIN MODEL FOR RUN PRODUCTION IN BASEBALL. Thomas W. Polaski. Introduction.

Decision Trees. Nicholas Ruozzi University of Texas at Dallas. Based on the slides of Vibhav Gogate and David Sontag

The development of novel sampling protocols for endangered fish species in Irish lakes: Trials and Triumphs Inland Fisheries Ireland

ORF 201 Computer Methods in Problem Solving. Final Project: Dynamic Programming Optimal Sailing Strategies

Ecology and Utilisation of Harpagophytum procumbens (Devil's Claw) in Southern Africa

Design-for-Testability for Path Delay Faults in Large Combinational Circuits Using Test-Points

Scarborough Spring 2013 Math Exam I 1. "On my honor, as an Aggie, I have neither given nor received unauthorized aid on this academic work.

Understanding Winter Road Conditions in Yellowstone National Park Using Cumulative Sum Control Charts

ME 4710 Motion and Control: Integrator Wind-up Reference: Franklin, Powell, Emami-Naeini, Feedback Control of Dynamic Systems, Prentice-Hall, 2002.

Advanced Hydraulics Prof. Dr. Suresh A. Kartha Department of Civil Engineering Indian Institute of Technology, Guwahati

Automating Injection Molding Simulation using Autonomous Optimization

Introduction to Pattern Recognition

Statistical Analysis of PGA Tour Skill Rankings USGA Research and Test Center June 1, 2007

JUNEAU SECOND CHANNEL CROSSING WATERWAY USER SURVEY RESULTS

In memory of Dr. Kevin P. Granata, my graduate advisor, who was killed protecting others on the morning of April 16, 2007.

Package jackstraw. August 7, 2018

To Illuminate or Not to Illuminate: Roadway Lighting as It Affects Traffic Safety at Intersections

Factors influencing production

Domain (island) wide estimates of mutton snapper (Lutjanus analis) abundance for three US Caribbean Islands based on habitat-derived densities

8th Grade. Data.

GUIDE TO ESTIMATING TOTAL ALLOWABLE CATCH USING SIZE FREQUENCY IN CATCH, EFFORT DATA, AND MPAS

Provably Secure Camouflaging Strategy for IC Protection

ISOLATION OF NON-HYDROSTATIC REGIONS WITHIN A BASIN

Lab: Predator-Prey Simulation

PREDICTING the outcomes of sporting events

SECTION 2 HYDROLOGY AND FLOW REGIMES

You are to develop a program that takes as input the scorecard filled out by Bob and that produces as output the correct scorecard.

arxiv: v1 [stat.ap] 18 Nov 2018

Uninformed Search (Ch )

Chapter 13. Factorial ANOVA. Patrick Mair 2015 Psych Factorial ANOVA 0 / 19

FIRST Research Report Light Management in Greenhouses III. The Effect of Hanging Baskets on the Greenhouse Light Environment James E.

Safety When Using Liquid Coatings

Transcription:

Supplementary Figures Supplementary Figure 1 Optimal number of pair extractions. The plots show the relationships between average number of checkerboard units in sets of 1000 randomizations of the Vanuatu matrix (A) and the Sashegi matrix (B), and the corresponding number of pair extractions applied to the Curveball Algorithm to generate each set of random matrices. Grey horizontal lines indicate the expected average numbers of checkerboards as reported by Miklós & Podani [1], namely 14060 for the Vanuatu matrix and 549626 for the Sashegy matrix.

Supplementary Tables Supplementary Table 1. Transition rate matrix of the five possible configurations shown in Fig. 1 of the article for the Curveball algorithm, expressing the probabilities that a single successful pair extraction (i.e. a pair extraction leading to some trade of elements) will rearrange one configuration into another. A B C D E A 0.00 0.25 0.25 0.25 0.25 B 0.25 0.00 0.00 0.25 0.50 C 0.25 0.00 0.00 0.50 0.25 D 0.25 0.25 0.50 0.00 0.00 E 0.25 0.50 0.25 0.00 0.00 Supplementary Table 2. Eigenvector solution of the transition rate matrix reported in Table 1, providing the expected frequencies of the different matrix configurations after 50 successful pair extractions (i.e. a pair extractions leading to some trade of elements). A B C D E A 0.2000 0.2000 0.2000 0.2000 0.2000 B 0.2000 0.2000 0.2000 0.2000 0.2000 C 0.2000 0.2000 0.2000 0.2000 0.2000 D 0.2000 0.2000 0.2000 0.2000 0.2000 E 0.2000 0.2000 0.2000 0.2000 0.2000

Supplementary Notes Supplementary Note 1 Supplementary evidence that the Curveball algorithm samples null matrices uniformly We tested the robustness of the Curveball algorithm towards unequal sampling of null matrices by comparing its performance with that of the trial-swap algorithm, which has been extensively demonstrated to sample null matrices with equal probability [1]. For this we used a set of 100 random matrices created by using a procedure aimed at keeping the number of possible alternative matrix configurations low. The procedure works as follows: first, an empty matrix (i.e. a matrix of all zeros) of random size (ranging from 5 5 to 15 15) is built, and a random number of checkerboards (ranging from 1 to 5) is selected a priori. Then the matrix is filled by randomly extracting one a cell at a time, and assigning it value 1 only if this addition does not make the total number of checkerboards exceed that selected for the matrix. This trial and error process is reiterated until each row and column of the matrix has at least one presence. For each of these 100 matrices we generated two sets of 1000 null matrices using, respectively, the Curveball algorithm (with the number of pair extractions conservatively set at 10000), and the trial swap method [1]. The trial-swap randomizations were performed using the function randomize Matrix of the R library picante [2] and setting conservatively the number of swap attempts to 50000 [3]. Then, we compared the two sets of null matrices in order to verify that the Curveball algorithm was able to identify as many different configurations as those identified by the trial-swap algorithm. Finally, we used a χ 2 test to assess if the frequency of the different configurations in each set of null matrices generated by the Curveball algorithm was significantly different from a uniform distribution. The Curveball and the trial-swap algorithm were able to identify, respectively 110.04 ± 24.88 and 109.23 ± 24.70 (mean ± S.E.) distinct configurations per each set of 1000 null matrices. In 86% of cases all the different matrix configurations identified by the Curveball Algorithm were equal to those identified by the

trial swap procedure. In 4% of cases the trial-swap algorithm found more distinct configurations than the Curveball Algorithm did (for a total of 6 configurations), while in 10% of the cases the Curveball algorithm found more distinct configurations than the trial-swap algorithm did (for a total of 87 configurations). The observed frequencies of null matrix configurations generated using the Curveball algorithm fitted the expected ones for a uniform distribution in all of the 100 sets (average p-values of χ2: 0.60, standard error: 0.03). This demonstrates that the Curveball algorithm samples null matrices from the uniform distribution with equal probabilities, and that it is able to explore the universe of possible matrix configurations with an efficiency comparable to that of the trial-swap algorithm. Supplementary Note 2 Optimal number of pair extractions Miklós and Podani [1] performed a simple experiment on two real species per site matrices to assess the minimum number of trial swaps necessary to reach a stable value of average checkerboards. The experiment created different sets of null matrices using an increasing number of swaps and computing, for each set, the average of the total number of checkerboards present in each null matrix. The two real matrices were characterized by a very large number of checkerboards. By increasing the number of trial swaps, the average number of checkerboards in the set of the null matrices decreased progressively until becoming stable around a certain value. The authors identified the number of trial swaps necessary to reach this value as the minimum needed to sample different matrix configurations with equal probability. We replicated the same analysis on the two matrices of the original experiment using the Curveball algorithm, in order to estimate the minimum number of pair extractions necessary to reach stable values in the average number of checkerboards. This information may be used as a guideline for the setting of the Curveball algorithm in ecological studies. In addition we verified if the average number of checkerboards in the null matrices generated by the Curveball algorithm converged towards the same values reported by the original study [1], which would provide another indirect evidence for the robustness of our method towards unequal sampling of matrix configurations.

The first matrix included data for the avifauna of the Vanuatu Archipelago (56 species on 28 islands) [4], while the second included presence-absence of 118 plant species in 80 quadrats of 3 3 m located in the Sashegy Nature Reserve, Budapest [5,6]. We computed the total number of checkerboards of any speciessite matrix by summing up the number of checkerboard units (CU) computed for each possible pair of rows. For each pair of species (i.e. rows), a CU value was computed as (R i -S) (R j -S), where R i is the total number of occurrences of the i-th species, R j is the total number of occurrences of the j-th species, and S is the number of shared sites; i.e. the number of sites where both species occur [7]. For both Vanuatu and Sashegy matrix we generated 100 sets of 10000 null matrices using an increasing number of pair extractions (using an arithmetic progression from 0 to 10000, with common difference of 100). In both experiments the average numbers of checkerboards converged to those reported in the original experiment [1], namely 14060 for the Vanuatu matrix and 549626 for the Sashegy matrix. Results for the two matrices are reported, respectively, in Figure S1A and S1B. Very few pair extractions (less than 1000 for both matrices) were enough to reach the stable value of average checkerboards. For any set of null matrices generated using more than this number of pair extractions, the abundance of checkerboards resulted significantly higher than chance in both matrices, with p<0.0001. The Vanuatu matrix has been already investigated in several papers, with different outcomes [4,8,9]. Our results are consistent with those obtained using methods proven to provide unbiased p-values [1,10,11]. However, the optimal number of pair extractions clearly depends on the size of a matrix, or, better, on its smallest dimension (i.e. the minimum between the number of rows and the number of columns). Thus, using the same approach as above, we estimated the number of pair extractions necessary to reach the stable value of average checkerboards in a large set of real matrices of various sizes. For this we used all the 295 matrices provided together with the Nestedness Temperature Calculator software [12]. For each of these matrices, we generated different sets of 1000 null matrices by using the Curveball Algorithm with an increasing number of pair extractions (using an arithmetic progression starting from a value equal to the smallest dimension of the matrix, with a common difference of 1), until the number of expected checkerboards stabilized, i.e. it did not change by more than 1% in 100 subsequent sets of null matrices.

Finally, we compared the size, the minor and major dimensions and the fill of each matrix with the minimum number of pair extractions necessary to reach the stability of average checkerboards. Among all the investigated matrices (n=295), this number was very small, ranging from 3 to 366 (mean: 24.34, standard error: 2.22). Moreover, it was in most cases smaller than or equal to the largest matrix dimension (with an average of 1.2 times the largest matrix dimension). Thus, the value of 1000 pair extractions suggested by the experiment performed on the Vanuatu and the Sashegi matrices is likely to be rather conservative. Supplementary References 1. Miklós, I. & Podani, J. Randomization of presence-absence matrices: comments and new algorithms. Ecology 85, 86 92 (2004). 2. Kembel, S. W. et al. Picante: Tools for Integrating Phylogenies and Ecology. Available from URL: http://picante.r-forge.r-project.org/ (2008). 3. Fayle, T. M. & Manica, A. Reducing over-reporting of deterministic co-occurrence patterns in biotic communities. Ecol. Modell. 221, 2237 2242 (2010). 4. Gotelli, N. J. & Entsminger, G. L. Swap and fill algorithms in null model analysis: rethinking the knight s tour. Oecologia 129, 281 291 (2001). 5. Podani, J. Syntaxonomic congruence in a small-scale vegetation survey. Abstr. Bot. 9, 99 128 (1985). 6. Podani, J., Csontos, P. & Tamas, J. Additive trees in the analysis of community data. Community Ecol. 1, 33 41 (2000). 7. Stone, L. & Roberts, A. The checkerboard score and species distributions. Oecologia 85, 74 79 (1990). 8. Roberts, A. & Stone, L. Island-sharing by archipelago species. Oecologia 83, 560 567 (1990).

9. Sanderson, J. G., Moulton, M. P. & Selfridge, R. G. Null matrices and the analysis of species cooccurrences. Oecologia 116, 275 283 (1998). 10. Lehsten, V. & Harmand, P. Null models for species co-occurrence patterns: assessing bias and minimum iteration number for the sequential swap. Ecography 29, 786 792 (2006). 11. Zaman, A. & Simberloff, D. Random binary matrices in biogeographical ecology instituting a good neighbor policy. Environ. Ecol. Stat. 9, 405 421 (2002). 12. Atmar, W. & Patterson, B. D. The measure of order and disorder in the distribution of species in fragmented habitat. Oecologia 96, 373 382 (1993).