Normalizing Illumina 450 DNA methylation data

Similar documents
Legendre et al Appendices and Supplements, p. 1

DNA Breed Profile Testing FAQ. What is the history behind breed composition testing?

Genetics and Punnett Squares

Chapter 13. Factorial ANOVA. Patrick Mair 2015 Psych Factorial ANOVA 0 / 19

Analyze Illumina Infinium methylation microarray data

Working with Marker Maps Tutorial

The impact of the grey squirrel as an invasive species on red squirrel populations. Deborah Brady

Mission to Mars. Day 8 Heredity

Oklahoma Ag in the Classroom Coats and Genes Genetic Traits in Cattle

Colour Genetics. Page 1 of 6. TinyBear Pomeranians CKC Registered Copyright All rights reserved.

X-Sheet 4 Genetics: Inheritance and Terminology

Genome mapping in salmonid fish

Trout stocking the science

Revealing the Past and Present of Bison Using Genome Analysis

Understanding Genetics

Figure S1a Diagram of the HA412- HO x RHA415 cross. J. E. Bowers et al.

Population structure and genetic diversity of Bovec sheep from Slovenia preliminary results

One-factor ANOVA by example

GENES AND CHROMOSOMES CHROMOSOMES IN SEX CELLS. Horse Science: How Inheritance Works in Horses Page 3. dam unite and grow into the new animal.

Full Name: Period: Heredity EOC Review

Genetics Test Review

Design of Experiments Example: A Two-Way Split-Plot Experiment

1) A child is born with blue eyes even though BOTH his parents have brown eyes. How is this possible?

Trophy hunting & sustainability: Temporal dynamics in trophy size & harvesting patterns of wild herbivores

Background. Do we need an equine biobank in the UK? Online Survey Results. B.iii Online survey results

submitted: fall 2009

Spatial responses to human hunting: a comparison between two sympatric populations of wild boar (Sus scrofa) and red deer (Cervus elaphus)

Society for Wildlife Forensic Science Develop Wildlife Forensic Science into a comprehensive, integrated and mature discipline.

Chapter 3 Mendelian Inheritance

Class 23: Chapter 14 & Nested ANOVA NOTES: NOTES: NOTES:

Introduction to be read or described to the participants:

Why so blue? The determinants of color pattern in killifish, Part II Featured scientist: Becky Fuller from The University of Illinois

THE AUSTRALIAN BRANGUS CATTLE ASSOCIATION LIMITED REGULATIONS

The importance of Pedigree in Livestock Breeding. Libby Henson and Grassroots Systems Ltd

Identification of Species-Diagnostic SNP Markers in Tilapias Using ddradseq

Evaluating genetic connectivity and re-colonization dynamics of moose in the Northeast.

Chapter 2: Traits and How They Change

LABORATORY 7: GENETICS OF ORGANISMS

Package jackstraw. August 7, 2018

ANOVA - Implementation.

Selective Genotyping for Marker Assisted Selection Strategies for Soybean Yield Improvement. Ben Fallen

Reproductive Success of Artificially Reconditioned Kelt Steelhead in the Yakima River

Bighorn Sheep Research Activity Love Stowell & Ernest_1May2017 Wildlife Genomics & Disease Ecology Lab Updated 04/27/2017 SMLS

Università degli Studi di Firenze

The Intrinsic Value of a Batted Ball Technical Details

Laboratory Activity Measurement and Density. Average deviation = Sum of absolute values of all deviations Number of trials

Math 1040 Exam 2 - Spring Instructor: Ruth Trygstad Time Limit: 90 minutes

Chapter 2: Traits and How They Change

DEPARTMENT OF THE NAVY DIVISION NEWPORT OFFICE OF COUNSEL PHONE: FAX: DSN:

The use of Control Charts with Composite materials. E. Clarkson, Statistician ASQ Certified Quality Engineer

NON-MENDELIAN GENETICS:

Spatio-temporal analysis of team sports Joachim Gudmundsson

Genetic consequences of stocking with hatchery strain brown trout: experiences from Denmark. Michael M. Hansen

New genetic technology for the management of Snake River Chinook salmon and steelhead

PLANNED ORTHOGONAL CONTRASTS

A N E X P L O R AT I O N W I T H N E W Y O R K C I T Y TA X I D ATA S E T

8. How many different kinds of gametes can normally be produced by an organism with the genotype RrYy? A) 1 B) 2 C) 3 D) 4

7.013 Problem Set

Archival copy: for current recommendations see or your local extension office.

Biol 321 Genetics S 02 Exam #1

Genetic integrity of the ITC collection : DArT genotyping. J.P. Horry, X. Perrier, N. Roux, S. Channeliere

Unit 6 Review Game Page 1

CS 221 PROJECT FINAL

a. pink x pink b. red x white c. pink x white Genotypic Genotypic Genotypic %: %: %: Phenotypic Phenotypic Phenotypic %: %: %:

How type 1 fimbriae help Escherichia coli to evade extracellular

Data Set 7: Bioerosion by Parrotfish Background volume of bites The question:

Test Results Summary. Owner: Darby Dan Farm 8 th Nov Distance Plus (USA) v1.0. Distance Plus (SAF) v1.0. Distance Plus (ANZ) v1.

2.4 Gutter and Downspout Sizing and Location Calculations

Impact of Lead Free on Automated X-Ray Inspection

Topic 3 Other patterns of Inheritance (heredity) Pre Class Reading Assignment. 1. Read pgs

Risk factors for horse falls in the crosscountry phase of British Eventing competitions: A comprehensive data analysis

Dehorning cattle via genetics

Sasquatch Statistics The Fahrenbach Findings

Genetic analyses of the Franches-Montagnes horse breed with genome-wide SNP data

Cardiac Output Simulation for Specific Makes of Monitor. Each injection yields in a time-temperature curve whose area represents the cardiac output:

Factorial Analysis of Variance

Chapter 11 Waves. Waves transport energy without transporting matter. The intensity is the average power per unit area. It is measured in W/m 2.

This instruction is concerned with the monitoring of items in order to demonstrate that they are Exempt material. It describes:

Black Sea Bass Encounter

Dynamic Cluster-Based Over-Demand Prediction in Bike Sharing Systems

Applied population biology: pacific Salmon

MTB 02 Intermediate Minitab

First results on genomic selection in French show-jumping horses

MODELLING OUTCOMES IN VOLLEYBALL

I got this list of publications from the internet. Hope it is useful.

FEATURES. Features. UCI Machine Learning Repository. Admin 9/23/13

Spatial/Seasonal overlap between the midwater trawl herring fishery and predator focused user groups

Identifying Origins and Pathways of Spread of Zebra Mussels using Genetics and dgenomics

Mendel s Second Set of Experiments Dihybrid Crosses

Distribution of fishing intensity of pulseand beam trawling in the North Sea. Marcel Machiels. IMARES rapport C146A/15

parents to offspring.

Genetic characterization of steelhead population dynamics in the Columbia River. Jon E. Hess, Andrew P. Matala, Joseph S. Zendt, and Shawn R.

WADA Technical Document TD2014EAAS. Endogenous Anabolic Androgenic Steroids Measurement and Reporting

PRESENTS. Solder & Oven Profiles Critical Process Variables

The Sustainability of Atlantic Salmon (Salmo salar L.) in South West England

Evaluating NBA Shooting Ability using Shot Location

Factorial ANOVA Problems

Functional differentiation of goat mammary epithelium. A microarray preliminary approach

Identifying Origins and Pathways of Spread of Zebra Mussels using Genetics and Genomics

Transcription:

Schalkwyk 1 Normalizing Illumina 450 DNA methylation data Leo Schalkwyk, Ruth Pidsley, Manuela Volta, Chloe Wong, Katie Lunnon and Jon Mill Institute of Psychiatry Social, Genetic and Developmental Psychiatry April 20, 2012

Schalkwyk 2 Metrics for 450 array normalization Leonard C Schalkwyk - SGDP - IoP - 2012 450K DNA methylation array processing methods as recommended by the manufacturer are very simple the bead intensities are local-background adjusted and averaged an estimate of methylation fraction for each CG feature is calculated from Methylated(M) and Unmethylated(U) intensities: β = M/(M + U + 100) the product, like the 27K predecessor, produces stable βs with a characteristic bimodal distribution

Schalkwyk 3 Need for normalization Illumina claims normalization not required due to division by total intensity M + U one of the possible problems with this: background inflates both numerator and denominator of β = M/(M + U + 100) this moves β toward 0.5 by a possibly variable amount

Schalkwyk 4 experience from gene expression and allelotyping nothing you can do with normalization replaces careful experimental design a nuisance variable confounded with what you want to test can t be fixed eg case and control in different batches nothing you can do with normalization replaces rigorous QC samples with unusual raw intensity distributions multivariate methods such as PCA often identify mislabeled samples background correction generally counterproductive

Schalkwyk 5 quantile normalization Bolstad, Irizarry et al 2003 descendant of nonparametric techniques where values are replaced by their ranks within array here instead values are replaced by the mean of values of the same rank raw data sorted adjusted reordered

Schalkwyk 6 quantile normalization raw intensity normalized intensity Density 0.0000 0.0010 0.0020 0.0030 Density 0.0000 0.0010 0.0020 0.0030 0 200 400 600-200 0 200 400 600 intensity intensity depends on the majority of the profile being similar shown to perform well for gene expression using dilution and spike-in data

Schalkwyk 7 possible problems applying quantile normalization to 450 arrays no standard testing data sets single probe pair for each CpG site two different assays on the same array I : M and U same colour, different beads II: M and U different colour, same bead mid range beta values most interesting and few in number known potential problem with quantile normalization where quantiles far apart this is one argument against the common practice of quantile-normalizing betas

Schalkwyk 8 Dedeurwaerder et al 2011 recognised a distribution difference between assay I and assay II they devised an ad-hoc scaling to force the distributions together

Schalkwyk 9 Dedeurwaerder et al 2011 testing: yield of differences in methylation between wild type and DNMT double KO cell lines does not distinguish real from spurious positives

Schalkwyk 10 processing ideas adjust background difference assay I vs II likely variance penalty quantile normalize M and U separately quantile normalize I and II separately

Schalkwyk 11 Assay I vs II equalization AssayI is black, II is red. Methylated solid and unmethylated dashed.

Schalkwyk 12 Data sets NIH Alzheimer Project: 90+ arrays for each of A Dorsolateral prefrontal cortex BA9 (90) E Entorhinal cortex (94) F Superior temporal gyrus BA22 (94) H cerebellum (89) Schizophrenia project cortex (44) cerebellum (42) Autism project cerebellum (24)

Schalkwyk 13 possible metrics of normalization goodness several sets of probes with behaviour we can develop tests for imprinting DMRs: monoallelic methylation non-pseudoautosomal X chromosome: differs vs SNP probes: distinct AA, AB, BB genotypes

Schalkwyk 14 imprinting DMR known DMR from https://atlas.genetics.kcl.ac.uk/ (Reiner Schulz) these are a conservative subset 308 CpGs assayed on the array are within known DMR at least one each in 33 of the 38 documented DMR joint distribution quite tight and peaks at β = 0.56 can look at variance across samples (main issue for normalization) but also consistency across loci

Schalkwyk 15

Schalkwyk 16

Schalkwyk 17 X-inactivation less baseline data than you might expect expect full monoallelic methylation in female expect less methylation in hemizygous males should be able to detect male female difference

Schalkwyk 18 X-inactivation there are 11232 X chromosome features in the array annotation none of them are in the pseudoautosomal regions in cerebellum data set for example 9796 Bonferroni sex differences 8969 on X the Bonferroni test recovers 80% of X chromosome probes 91% of the bonferroni differences are X chromosome lends itself to ROC analysis

Illumina blue, beta6 green Schalkwyk 19

Schalkwyk 20

Schalkwyk 21 genotyping genotyping assay not very different from methylation 65 selected SNPs on array (very useful for checking identities) βs should fall in 3 genotype classes

genotype variances genotypes assigned by one-dimension k-means clustering within cluster sums of squares summed across SNPs Schalkwyk 22

Schalkwyk 23 conclusions bgeqqn wins almost every time requires a better name

Schalkwyk 24 next steps package these tests up test a wider variety of data bad data! separate analysis of performance I and II worry about dye bias test other methods effect size and power