Bioinforma)cs Resources - PDB -

Similar documents
Biochemical Applications of Computational Chemistry

BIOL 101L: Principles of Biology Laboratory

Numerical simulation of the ALBA s synchrotron cooling system response to pump start-up and shut-down Page 1

#7 Still more DP, Scoring Matrices 9/5/07

swmath - Challenges, Next Steps, and Outlook

Helium: A Data Driven User Tool for SAR Analysis. May 17 th 2011 Karen Worsfold

Structure RMSd(A) RMSd(B) RMSd(A) RMSd(B)

BCM Lecture #3. Phase Improvement, Model Building & Refinement. Peter Pawelek & Mirek Cygler (slides 10-51(mostly))

Anabolic Androgenic Steroids

For IEC use only. Technical Committee TC3: Information structures, documentation and graphical symbols

Review questions CPSC 203 midterm 2

Ameren Oracle ebusiness CCTM Supplier

Microsoft Windows Software Manual for FITstep Stream Version 4

CHEMISTRY 102D ASSIGNMENTS. WEEK 1 (August 23-27) Introduction, Classification of Matter, Significant Figures, Dimensional Analysis

Example: 25 C = ( ) K = 298 K. Pressure Symbol: p Units: force per area 1Pa (Pascal) = 1 N/m 2

Oracle ebusiness CCTM Supplier: Rate Card

CHEMTRACE Fremont Blvd. Fremont, CA 94538, Tel: (510) Fax: (510)

EAD: The UK Experience

User Help. Fabasoft Scrum

Basic CPM Calculations

Vocabulary. Energy Wave Amplitude Conduction Convection Radiation Color spectrum Wavelength Potential energy

Atomic Energy Central School No.4 Rawatbhata Split up of syllabus for the session ( ) Subject English Class XI. Chapters to be covered

Dive Planet. Manual. Rev Basic User Interface. 2 How to organize your dives. 3 Statistics. 4 Location Service and Map View.

Lab 4: Root Locus Based Control Design

Road Traffic Estimates

Meter Data Distribution Market Trials

Sequence Similarity Networks for the Protein Universe!! John A. Gerlt! University of Illinois, Urbana-Champaign! Blue Waters Symposium! May 13, 2014!

Information Technology for Monitoring of Municipal Gas Consumption, Based on Additive Model and Correlated for Weather Factors

Citation for published version (APA): Canudas Romo, V. (2003). Decomposition Methods in Demography Groningen: s.n.

Hazard Training Guide

Subject: Pedestrian Count Clarification for Electronic Submission of Turning Movement Counts Procedures and File Format

ALPINE SKIING DATA-SOFTWARE BOOKLET

Best practice with OJS a partial view

Purpose. Scope. Process flow OPERATING PROCEDURE 07: HAZARD LOG MANAGEMENT

RELATIVE PLACEMENT SCORING SYSTEM

DOWNLOAD OR READ : JPR SOLUTIONS PHONE NUMBER PDF EBOOK EPUB MOBI

Guidance on the Use of the CHARM Programme

2001 Virginia Department of Transportation Daily Traffic Volumes Including Vehicle Classification Estimates where available. Jurisdiction Report 24

CHE : Organic Chemistry I Fall 2017 Syllabus MWF 8:10-9:10 AM in Hoyt PLH

Composition of Exhaled Breath

#11 - Multiple Sequence Alignment 9/14/07

NCSS Statistical Software

RUNNING A MEET WITH HY-TEK MEET MANAGER

DNA TOPOLOGY (OXFORD BIOSCIENCES) BY ANDREW D. BATES, ANTHONY MAXWELL

Supplementary materials for. Resolving the Morphology of Peptoid Vesicles at the One Nanometer Length-Scale Using Cryogenic Electron Microscopy

ISO INTERNATIONAL STANDARD. Cranes and lifting appliances Selection of wire ropes Part 1: General

Milso& Customer Outage Alerts. A look at how to keep customers informed during outage situa:ons

Ion mobility: towards standard operating procedures? (IM MS interest group workshop) REPORT. 07 June 2017

Analysis of Variance. Copyright 2014 Pearson Education, Inc.

INSTRUCTIONS FOR USING HMS 2016 Scoring (v1)

Review questions CPSC 203 midterm

Biodiversity Drawing Contest. Competition Rules 2018 Edition

Biology Unit 2, Structure of Life, Lab Activity 2-3

FISH 336 Introduction to Aquaculture

[CROSS COUNTRY SCORING]

Trygve Helgaker and Kenneth Ruud. New Tools New Opportunities New Discoveries

A GUIDE TO THE LOOSE ENDS HOCKEY LEAGUE WEBSITE PAGE

BCB 444/544 Fall 07 Dobbs 1

Flames of War and Team Yankee Tournament Results

Dive Sheets & Running Events Meet Management Software Tutorial for EZMeet Version 3.1 revised 2/4/2006

Model 1: Reasenberg and Jones, Science, 1989

EXP: The effect of exercise on the circulatory and respiratory systems

Section 3-5: Principle 3: Determine Critical Limits

Working with Marker Maps Tutorial

Application of Dijkstra s Algorithm in the Evacuation System Utilizing Exit Signs

Safety assessments for Aerodromes (Chapter 3 of the PANS-Aerodromes, 1 st ed)

Instructions for Completing the UW-Madison Laboratory Chemical Hygiene Plan Template

RISK ASSESSMENT. White Paper.

In this assignment, students will analyze statistics from the U.S. Department of Transportation to determine the safest means of travel.

Biol 321 Genetics S 02 Exam #1

Smart Card based application for IITK Swimming Pool management

INSTRUCTION FOR FILLING OUT THE JUDGES SPREADSHEET

Diagnosis of Fuel Evaporative System

In memory of Dr. Kevin P. Granata, my graduate advisor, who was killed protecting others on the morning of April 16, 2007.

MoLE Gas Laws Activities

Hazard Reporting Training Guide

TRAFFIC IMPACT STUDY CRITERIA

EUROPEAN COMMISSION JOINT RESEARCH CENTRE Institute for Health and Consumer Protection Toxicology and Chemical Substances Unit

Supratec AERATION TECHNOLOGY. von-drais-straße 7 D Simmern / Hunsrück. tel.: / fax: /

INSTRUCTIONS FOR COMPLETING FORM H-9 STATEWIDE RULE 36 CERTIFICATE OF COMPLIANCE

Developing a HACCP plan

The Ideal Gas Constant

Chilkat Taku Stikine Unuk. 15 Years of Troll Contribution and Escapement History

Regulations of the International Young Naturalists Tournament

Oracle 11g Secure Files Overview Inderpal S. Johal. Inderpal S. Johal, Data Softech Inc.

Fatigue Analysis of a LUG Assembly

[CROSS COUNTRY SCORING]

TEAM MEDICAL PERSONNEL - ACCREDITATION PROCEDURES

ClubHub. User s Guide

BEFORE YOU OPEN ANY FILES:

C O N C O R D I A R E S P A R V A E C R E S C E N T INTERNATIONAL HELIDECK REGULATORS ASSOCIATION C O D E O F R U L E S

ISO INTERNATIONAL STANDARD. Mechanical vibration Balancing Guidance on the use and application of balancing standards

We release Mascot Server 2.6 at the end of last year. There have been a number of changes and improvements in the search engine and reports.

LISKI B-NET Protection System

ISO INTERNATIONAL STANDARD. Inflatable boats Part 2: Boats with a maximum motor power rating of4,5kwto15kwinclusive

Team BUSY : Excise register RG-23 A-11, provision made to input starting S.No.

Electronic Scrutineering and Electronic Judging of IDSF Competitions Turnier-Protokoll-System

Atomic Energy Central School No.4 Rawatbhata Split up of syllabus for the session ( ) Subject English Class XII. Chapters to be covered

Modular Mars Analog Settlements. 2013, 2017 Kent Nebergall, All Rights Reserved.

Space Pressurization: Concept and Practice ASHRAE Distinguished Lecture Series

Transcription:

Bioinforma)cs Resources - PDB - Lecture & Exercises Prof. B. Rost, Dr. L. Richter, J. Reeb Ins)tut für Informa)k I12

Orga - Exam Date Exam takes place on Friday, July 31 st Room: MW 0250 (Mechanical Engineering Building) Time scheduled: 8.30-10.30 (might be later) Dura)on: approx. 90 min

Adver)sement Bachelor thesis: Carry your Genes (CyG) In collabora)on with Certgate GmbH and Iteratec GmbH Affects: Personalized medicine, mobile apps, encryp)on Hiwi opportunity included see h\ps://www.rostlab.org/teaching/theses

PDB History 1968: Brookhaven RAster Display (BRAD) 1969: Edgar Meyer came up with a file format for atomic coordinates 1971: remote access with SEARCH program wri\en by Meyer - > PDB func)onal 1998: transfer to RCSB (Research Collaboratory for Structural Biology) 2003: forma)on of wwpdb (PDBe, RCSB, PDBj, BMRB(2006))

References F.C. Bernstein, T.F. Koetzle, G.J.B. Williams, E.F. Meyer Jr., M.D. Brice, J.R. Rodgers, O. Kennard, T. Shimanouchi, M. Tasumi (1977) The Protein Data Bank: a computer- based archival file for macromolecular structures. J. Mol. Biol. 112: 535-542. H.M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T.N. Bhat, H. Weissig, I.N. Shindyalov, P.E. Bourne (2000) The Protein Data Bank Nucleic Acids Research, 28: 235-242. H.M. Berman, K. Henrick, H. Nakamura (2003) Announcing the worldwide Protein Data Bank Nature Structural Biology 10 (12): 98. h\p://www.rcsb.org/pdb/home/home.do

Current Composi)on* Experimental Method X-ray diffraction Proteins Nucleic Acids Protein/Nucleic Acid complexes Other Total 90.662 1.622 4.510 4 96.798 9.597 1.118 225 8 10.948 566 29 184 0 779 Hybrid 70 3 2 1 76 Other 165 4 6 13 188 Total 101.060 2.776 4.927 26 108.789 NMR Electron microscopy *May, 18th, 2015

Growth of PDB All Entries 120000 100000 80000 Yearly Total 60000 40000 20000 0 1972 1974 1976 1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014

Entries According to Method 120000 100000 Total X-Ray 80000 NMR EM 60000 40000 20000 0

Growth of X- Ray Structures 100000 90000 80000 70000 60000 50000 40000 30000 20000 10000 0 Yearly Total 1972 1974 1976 1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014

Growth of NMR Structures 12000 10000 8000 Yearly Total 6000 4000 2000 0 1972 1974 1976 1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014

Growth of EM Structures 800 700 600 500 Yearly Total 400 300 200 100 0 1972 1974 1976 1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014

Unique CATH Folds (Topologies) 1600 1400 1200 1000 800 Yearly Total 600 400 200 0 1972 1974 1976 1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014

Unique CATH Superfamilies 3000 2500 2000 1500 Yearly Total 1000 500 0 1972 1974 1976 1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 2014

Atomic Coordinate Entry Format aka PDB format current version 3.30 comprises 190 pages mp://mp.wwpdb.org/pub/pdb/doc/ format_descrip)ons/format_v33_a4.pdf

Record Format allowed characters: abcdefghijklmnopqrstuvwxyzabcdefghi JKLMNOPQRSTUVWXYZ 1234567890 `-=[]\;',./~!@#$%^&*()_+{} :<>?,:; are delimiters, otherwise need to be escaped by \ a file consists of mul)ple lines each line is 80 characters wide including EOL lines are self- iden)fying: first six columns contains the record name followed by a blank

Single Line Records, One Time/One Line CRYST1: Unit cell parameters, space group, and Z. END: Last record in the file. HEADER: First line of the entry, contains PDB ID code, classifica)on, and date of deposi)on. NUMMDL: Number of models....

One Time/Mul)ple Line (incompl.) AUTHOR: List of contributors. KEYWDS: List of keywords describing the macromolecule. SOURCE: Biological source of macromolecules in the entry. TITLE: Descrip)on of the experiment represented in the entry. subsequent lines have a con)nua)on number

Mul)ple Times/One Line (incompl.) ATOM: Atomic coordinate records for standard groups. CONECT: Connec)vity records. DBREF: Reference to the entry in the sequence database(s). HELIX: Iden)fica)on of helical substructures. SHEET: Iden)fica)on of sheet substructures.

Mul)ple Times/Mul)ple Lines (incompl.) FORMUL: Chemical formula of non- standard groups. HETNAM: Compound name of the heterogens. SEQRES: Primary sequence of backbone residues. SITE: Iden)fica)on of groups comprising important en)ty sites. subsequent lines have a con)nua)on number

Record Order Records have to appear in a defined order There are mandatory and op)onal records Some mandatory records depends on condi)ons Mandatory records without content are NULL examples for mandatory records: - HEADER - TITLE - COMPND -...

Records Belongs to Sec)ons Section Record Type Title HEADER, OBSLTE, TITLE, SPLIT, CAVEAT, COMPND, SOURCE, KEYWDS,EXPDTA, NUMMDL, MDLTYP, AUTHOR, REVDAT, SPRSDE, JRNL Remark REMARKs 0-999 Primary structure DBREF, SEQADV, SEQRES MODRES Secondary structure HELIX, SHEET Coordinate MODEL, ATOM, ANISOU, TER, HETATM, ENDMDL......

Records Even Have Formats A Records consists of fields with specified data Data could be: A- Z, a- z, atom name, a nine character string represen)ng a date, a number,... Complex data: token (string followed by : ), a comma separated list of strings, a fixed format string literal...

Example Header COLUMNS DATA TYPE FIELD DEFINITION ------------------------------------------------------------------------------------ 1-6 Record name HEADER 11-50 String(40) classification Classifies the molecule(s). * 51-59 Date depdate Deposition date. This is the date the coordinates were received at the PDB. 63-66 IDcode idcode This identifier is unique within the PDB. * taken from a class list from the current wwpdb Annotation Documentation Appendices (http://www.wwpdb.org/docs.html)

Classifica)on of Structures: CATH/SCOP came up in the middle of the 1990s both are quite similar aim: organize the protein structures available in PDB, based on single domains hierarchical system (roughly): - secondary structure content - fold - super families - families

SCOP: a Structural Classifica)on of Proteins Murzin, A., Brenner, S. E., Hubbard, T. J. P. and Chothia, C. (1995) J. Mol. Biol., 247, 536-540 Hubbard, T. P., Murzin, A., Brenner, S. E. and Chothia, C. (1997), Nucl. Acids Res. 25(1), 236-239 (easier to obtain) fully manually curated, driven by expert analysis associated with the ASTRAL compendium latest news: SCOPe (UC Berkeley), SCOP2 (MRC Lab Mol Biol, Cambridge, UK)

CATH - Faces taken from http://www.tgac.ac.uk/scientific-advisory-board/ taken from http://www.ebi.ac.uk/about/people/janet-thornton

CATH semi- automa)c procedure for deriving a novel hierarchical classifica)on of protein domain structures four main levels: - C: protein class, mainly secondary structure composi)on of each domain - A: architecture, summarizes shapes based on orienta)on of secondary structure elements - T: topology, sequen)al connec)vity is considered - H: homologous superfamily, high similarity with similar func)ons, evolu)onary rela)onship assumed