Supplementary Figures Supplementary Figure 1 Optimal number of pair extractions. The plots show the relationships between average number of checkerboard units in sets of 1000 randomizations of the Vanuatu matrix (A) and the Sashegi matrix (B), and the corresponding number of pair extractions applied to the Curveball Algorithm to generate each set of random matrices. Grey horizontal lines indicate the expected average numbers of checkerboards as reported by Miklós & Podani [1], namely 14060 for the Vanuatu matrix and 549626 for the Sashegy matrix.
Supplementary Tables Supplementary Table 1. Transition rate matrix of the five possible configurations shown in Fig. 1 of the article for the Curveball algorithm, expressing the probabilities that a single successful pair extraction (i.e. a pair extraction leading to some trade of elements) will rearrange one configuration into another. A B C D E A 0.00 0.25 0.25 0.25 0.25 B 0.25 0.00 0.00 0.25 0.50 C 0.25 0.00 0.00 0.50 0.25 D 0.25 0.25 0.50 0.00 0.00 E 0.25 0.50 0.25 0.00 0.00 Supplementary Table 2. Eigenvector solution of the transition rate matrix reported in Table 1, providing the expected frequencies of the different matrix configurations after 50 successful pair extractions (i.e. a pair extractions leading to some trade of elements). A B C D E A 0.2000 0.2000 0.2000 0.2000 0.2000 B 0.2000 0.2000 0.2000 0.2000 0.2000 C 0.2000 0.2000 0.2000 0.2000 0.2000 D 0.2000 0.2000 0.2000 0.2000 0.2000 E 0.2000 0.2000 0.2000 0.2000 0.2000
Supplementary Notes Supplementary Note 1 Supplementary evidence that the Curveball algorithm samples null matrices uniformly We tested the robustness of the Curveball algorithm towards unequal sampling of null matrices by comparing its performance with that of the trial-swap algorithm, which has been extensively demonstrated to sample null matrices with equal probability [1]. For this we used a set of 100 random matrices created by using a procedure aimed at keeping the number of possible alternative matrix configurations low. The procedure works as follows: first, an empty matrix (i.e. a matrix of all zeros) of random size (ranging from 5 5 to 15 15) is built, and a random number of checkerboards (ranging from 1 to 5) is selected a priori. Then the matrix is filled by randomly extracting one a cell at a time, and assigning it value 1 only if this addition does not make the total number of checkerboards exceed that selected for the matrix. This trial and error process is reiterated until each row and column of the matrix has at least one presence. For each of these 100 matrices we generated two sets of 1000 null matrices using, respectively, the Curveball algorithm (with the number of pair extractions conservatively set at 10000), and the trial swap method [1]. The trial-swap randomizations were performed using the function randomize Matrix of the R library picante [2] and setting conservatively the number of swap attempts to 50000 [3]. Then, we compared the two sets of null matrices in order to verify that the Curveball algorithm was able to identify as many different configurations as those identified by the trial-swap algorithm. Finally, we used a χ 2 test to assess if the frequency of the different configurations in each set of null matrices generated by the Curveball algorithm was significantly different from a uniform distribution. The Curveball and the trial-swap algorithm were able to identify, respectively 110.04 ± 24.88 and 109.23 ± 24.70 (mean ± S.E.) distinct configurations per each set of 1000 null matrices. In 86% of cases all the different matrix configurations identified by the Curveball Algorithm were equal to those identified by the
trial swap procedure. In 4% of cases the trial-swap algorithm found more distinct configurations than the Curveball Algorithm did (for a total of 6 configurations), while in 10% of the cases the Curveball algorithm found more distinct configurations than the trial-swap algorithm did (for a total of 87 configurations). The observed frequencies of null matrix configurations generated using the Curveball algorithm fitted the expected ones for a uniform distribution in all of the 100 sets (average p-values of χ2: 0.60, standard error: 0.03). This demonstrates that the Curveball algorithm samples null matrices from the uniform distribution with equal probabilities, and that it is able to explore the universe of possible matrix configurations with an efficiency comparable to that of the trial-swap algorithm. Supplementary Note 2 Optimal number of pair extractions Miklós and Podani [1] performed a simple experiment on two real species per site matrices to assess the minimum number of trial swaps necessary to reach a stable value of average checkerboards. The experiment created different sets of null matrices using an increasing number of swaps and computing, for each set, the average of the total number of checkerboards present in each null matrix. The two real matrices were characterized by a very large number of checkerboards. By increasing the number of trial swaps, the average number of checkerboards in the set of the null matrices decreased progressively until becoming stable around a certain value. The authors identified the number of trial swaps necessary to reach this value as the minimum needed to sample different matrix configurations with equal probability. We replicated the same analysis on the two matrices of the original experiment using the Curveball algorithm, in order to estimate the minimum number of pair extractions necessary to reach stable values in the average number of checkerboards. This information may be used as a guideline for the setting of the Curveball algorithm in ecological studies. In addition we verified if the average number of checkerboards in the null matrices generated by the Curveball algorithm converged towards the same values reported by the original study [1], which would provide another indirect evidence for the robustness of our method towards unequal sampling of matrix configurations.
The first matrix included data for the avifauna of the Vanuatu Archipelago (56 species on 28 islands) [4], while the second included presence-absence of 118 plant species in 80 quadrats of 3 3 m located in the Sashegy Nature Reserve, Budapest [5,6]. We computed the total number of checkerboards of any speciessite matrix by summing up the number of checkerboard units (CU) computed for each possible pair of rows. For each pair of species (i.e. rows), a CU value was computed as (R i -S) (R j -S), where R i is the total number of occurrences of the i-th species, R j is the total number of occurrences of the j-th species, and S is the number of shared sites; i.e. the number of sites where both species occur [7]. For both Vanuatu and Sashegy matrix we generated 100 sets of 10000 null matrices using an increasing number of pair extractions (using an arithmetic progression from 0 to 10000, with common difference of 100). In both experiments the average numbers of checkerboards converged to those reported in the original experiment [1], namely 14060 for the Vanuatu matrix and 549626 for the Sashegy matrix. Results for the two matrices are reported, respectively, in Figure S1A and S1B. Very few pair extractions (less than 1000 for both matrices) were enough to reach the stable value of average checkerboards. For any set of null matrices generated using more than this number of pair extractions, the abundance of checkerboards resulted significantly higher than chance in both matrices, with p<0.0001. The Vanuatu matrix has been already investigated in several papers, with different outcomes [4,8,9]. Our results are consistent with those obtained using methods proven to provide unbiased p-values [1,10,11]. However, the optimal number of pair extractions clearly depends on the size of a matrix, or, better, on its smallest dimension (i.e. the minimum between the number of rows and the number of columns). Thus, using the same approach as above, we estimated the number of pair extractions necessary to reach the stable value of average checkerboards in a large set of real matrices of various sizes. For this we used all the 295 matrices provided together with the Nestedness Temperature Calculator software [12]. For each of these matrices, we generated different sets of 1000 null matrices by using the Curveball Algorithm with an increasing number of pair extractions (using an arithmetic progression starting from a value equal to the smallest dimension of the matrix, with a common difference of 1), until the number of expected checkerboards stabilized, i.e. it did not change by more than 1% in 100 subsequent sets of null matrices.
Finally, we compared the size, the minor and major dimensions and the fill of each matrix with the minimum number of pair extractions necessary to reach the stability of average checkerboards. Among all the investigated matrices (n=295), this number was very small, ranging from 3 to 366 (mean: 24.34, standard error: 2.22). Moreover, it was in most cases smaller than or equal to the largest matrix dimension (with an average of 1.2 times the largest matrix dimension). Thus, the value of 1000 pair extractions suggested by the experiment performed on the Vanuatu and the Sashegi matrices is likely to be rather conservative. Supplementary References 1. Miklós, I. & Podani, J. Randomization of presence-absence matrices: comments and new algorithms. Ecology 85, 86 92 (2004). 2. Kembel, S. W. et al. Picante: Tools for Integrating Phylogenies and Ecology. Available from URL: http://picante.r-forge.r-project.org/ (2008). 3. Fayle, T. M. & Manica, A. Reducing over-reporting of deterministic co-occurrence patterns in biotic communities. Ecol. Modell. 221, 2237 2242 (2010). 4. Gotelli, N. J. & Entsminger, G. L. Swap and fill algorithms in null model analysis: rethinking the knight s tour. Oecologia 129, 281 291 (2001). 5. Podani, J. Syntaxonomic congruence in a small-scale vegetation survey. Abstr. Bot. 9, 99 128 (1985). 6. Podani, J., Csontos, P. & Tamas, J. Additive trees in the analysis of community data. Community Ecol. 1, 33 41 (2000). 7. Stone, L. & Roberts, A. The checkerboard score and species distributions. Oecologia 85, 74 79 (1990). 8. Roberts, A. & Stone, L. Island-sharing by archipelago species. Oecologia 83, 560 567 (1990).
9. Sanderson, J. G., Moulton, M. P. & Selfridge, R. G. Null matrices and the analysis of species cooccurrences. Oecologia 116, 275 283 (1998). 10. Lehsten, V. & Harmand, P. Null models for species co-occurrence patterns: assessing bias and minimum iteration number for the sequential swap. Ecography 29, 786 792 (2006). 11. Zaman, A. & Simberloff, D. Random binary matrices in biogeographical ecology instituting a good neighbor policy. Environ. Ecol. Stat. 9, 405 421 (2002). 12. Atmar, W. & Patterson, B. D. The measure of order and disorder in the distribution of species in fragmented habitat. Oecologia 96, 373 382 (1993).