Chemistry and reactions from non- US patents. Daniel Lowe and Roger Sayle NextMove Software Cambridge, UK

Similar documents
Honeywell Research Chemicals ACS Grade Solvents

Optimizing Vial Pressurization Parameters for the Analysis of <USP 467> Residual Solvents Using the 7697A Headspace Sampler

CHEMISTRY 102D ASSIGNMENTS. WEEK 1 (August 23-27) Introduction, Classification of Matter, Significant Figures, Dimensional Analysis

order-no: / 13 Kunden-Nummer Kunden-Name Aktualisierungsdatum checked (140/140) 1, 2-dichloroethane

SUPPLEMENTARY MATERIAL TO Solvatochromism of isatin based Schiff bases: An LSER and LFER study


SOLVENT PURIFICATION SYSTEMS

Biochemical Applications of Computational Chemistry

PREPARING SOLUBILITY DATA FOR USE UPDATING KEY RESOURCES BY THE GAS PROCESSING INDUSTRY:

Technical Procedure for Drug Chemistry Gas Chromatograph/Mass Spectrometry (GC-MS)

CHEM254 #4 The Synthesis of a Tertiary Alcohol Using a Pre-Made Grignard Reagent 1

TITLE: DEALING WITH LABORATORY SPILLAGES

Physical Pharmacy. Solubility. Khalid T Maaroof MSc. Pharmaceutical sciences School of pharmacy Pharmaceutics department

CH 2 CH 2 CHO CH 3. (d) C H PCC PCC H 3 O + LiAlH PCC CH 2 Cl 2 CH 3 CH 2 CH 2 CH 2 CHO

Gas monitoring using GASMET FTIR

More Carbonyl Chemistry

Instruction Manual TECH0002 / Version 1.0. Sample Concentrator FSC400D & FSC496D

Research. Laboratory. Safety Policy. Manual

SIMPLE, EFFICIENT AND SIMULTANEOUS DETERMINATION OF ANABOLIC STEROIDS IN HUMAN URINE

Food fraud and prediction

SUPPORTING INFORMATION

19. The Grignard Reaction

Technical Manual. Sensepoint XCD Gas Detector

Degasser DG 2.1S User Manual V6880 HPLC

Technical Data Package

Toxicology Gas Chromatography-Mass Spectrometry (GC-MS)

Save 25% on a Selection of Honeywell Research Chemicals Products

Solvent Purification

Pushover Analysis Of Steel Frames Welcome To Ethesis

Presentation Summary Why Use GIS for Ped Planning? What Tools are Most Useful? How Can They be Applied? Pedestrian GIS Tools What are they good for?

Membrane Air Dryer. IDG1-C06 Series IDG1-C06-P Series

Introduction to Pattern Recognition

Grignard Reaction. John Goodwin. Since Victor Grignard discovered his Grignard reagent 110 years ago, these

Outgassing, photoablation and photoionization of organic materials by the electron-impact and photon-impact methods

Packaging HM for shipment:

Acclaim RSLC. (Rapid Separation Liquid Chromatography)

Topic 8: IPC and CPC Basics

SOP: Derivatized testosterone_01 Revision: 01 Date Effective: 05/17/15

DPS MODERN INDIAN SCHOOL, DOHA - QATAR REVISION& ANNUAL EXAMINATION ( ) PORTIONS CLASS: IX

AN INTRODUCTION TO THE DEPTH & COMPLEXITY PROMPTS AND THEIR ICONS. Presented by Daniel Brillhart

S AF E T Y D AT A S HEET

Calgary Christian School

Licensing and change of ownership in international patent legal status data

CHE : Organic Chemistry I Fall 2017 Syllabus MWF 8:10-9:10 AM in Hoyt PLH

EISTEDDFOD SCHOOL EISTEDDFOD 2017

Marine AChE inhibitors isolated from Geodia baretti: Natural compounds and their synthetic analogs

AUK Environmental Sensor. Determining the refractive index of the air. AUK Environmental Sensor

Chemistry 222 Fall 2012

Safety data sheet in accordance with 2001/58/EC AZ MIR 701 PHOTORESIST (14 CPS) (US) Page 1

CHEMICAL ENGINEERING SENIOR LABORATORY CHEG Initiated Chemical Vapor Deposition

Gas Chromatography-Mass Spectrometry Analysis Of Organics In Drinking Water: Concentrates And Advanced Waste Treatment Concentrates, Vol.

Thursday 15 January 2015 Afternoon

Issue 1-6 October 2008

Determination of the Percentage Oxygen in Air

MINIRAE 2000 & MULTIRAE PRE-PROGRAMMED COMPOUND LIBRARIES

AMENDMENTS TO THE IMSBC CODE AND SUPPLEMENTS

Twitter Analysis of IPL cricket match using GICA method

A preliminary analysis of in-depth accident data for powered two-wheelers and bicycles in Europe

A i r c r a f t E l e c t r i c a l S y s t e m s ( 1 2 B )

Brisbane Hotel Performance

USA Computing Olympiad (USACO)

Cambridge International Examinations Cambridge International Advanced Subsidiary and Advanced Level

INTRODUCTION TO PATTERN RECOGNITION

FTIR Detection of Outgassing Chemicals: Instantaneous and Comprehensive Identification and Quantitation

CMSC131. Introduction to Computational Thinking. (with links to some external Scratch exercises) How do you accomplish a task?

: PALLADIUM 2,4-PENTANEDIONATE

Material Safety Data Sheet according to Regulation (EC) No. 1907/2006 (REACH) 1 Identification of the substance/mixture and of the company/undertaking

Table 2. Sample codes and time periods. Sample name period (hours) A Migración 1 24±1h B Migración 2 24±1h

Rules for Determining Significant Figures. AP Chemistry U01L05

Sensitive Analysis of Genotoxins by HPLC-ECD Using Boron-Doped Diamond Electrochemical Cell

Supplementary Information

Paper No and Title. Paper 9: Organic Chemistry-III (Reaction Mechanism-2) Module no.13: Addition of Grignard reagent

PE096: Overview of Gas Processing Technology

FAST EVAPORATION AT NORMAL PRESSURE

Automated Low Background Solid Phase Extraction of Perfluorinated Compounds in Water. Ruud Addink Fluid Management Systems Watertown MA

Chromatography Consumables

FT28_mks.qxp 21/11/ :06 Page 1

THE THEORY AND PRACTICE OF TAIJI QIGONG BY CHRIS JARMEY

University of Saskatchewan Department of Mechanical Engineering Standard Operating Procedure # Mat0002

Safety Data Sheet according to (EC) No 1907/ ISO

Pre-Lab. 1. What do people mean when they say teenagers have raging hormones?

The FTC Gas Model For Balancing Landfill Gas Extraction

Version 1 Print Date Dec IDENTIFICATION OF THE SUBSTANCE/PREPARATION AND THE COMPANY/UNDERTAKING

Exposure Bands. Relationship to the OEL (95. Rating

EXPLOSION PROTECTION USING THE DATABASE. Maria Molnarne. formerly at BAM Federal Institute for Materials Research and Testing, Berlin, Germany

Focus on Enforcement. 7/21/2017 Presentation to SFMTA Policy & Governance Committee. Joe Lapka Corina Monzón

Figure Vapor-liquid equilibrium for a binary mixture. The dashed lines show the equilibrium compositions.

NCUTCD Proposal for Changes to the Manual on Uniform Traffic Control Devices

Standard Operating Procedure

VIRGINIA STANDARDS OF LEARNING TEST ITEM SET CHEMISTRY Science Standards of Learning. Released Spring 2015

Analysis of Curling Team Strategy and Tactics using Curling Informatics

SPILL KITS UNIVERSAL / CHEMICAL. Suitable for water, oils, fuels, solvents, acids and bases OIL-ONLY.

KC Scout Kansas City s Bi-State Transportation Management Center

Participation of Women in the Joint Statistical Meetings: Beginning in 1996, the American Statistical Association s Committee on Women in

MV5: Automated concentration system

EFFECTS OF CLEAROGEN ACNE LOTION ON TESTOSTERONE METABOLISM IN RECONSTRUCTED HUMAN EPIDERMIS

François Auguste Victor Grignard, was a French chemist who discovered one of the world s first synthetic organometallic reactions.

Sinorix MANOB480, MANOCAB, MANOEOL Installation Maintenance

Hazardous Material Emergency Spill Response Protocol WAC

A decade of EUV resist outgas tes2ng Lessons learned. C. Tarrio, S. B. Hill, R. F. Berg, T. B. Lucatorto NIST, Gaithersburg, MD, USA

Transcription:

Chemistry and reactions from non- US patents Daniel Lowe and Roger Sayle NextMove Software Cambridge, UK

Topics Coverage of USPTO vs EPO patents Extraction of non-chemical-compound data Can text-mining provide insights into reaction informatics

What am I missing by only using the USPTO data? 248th ACS National Meeting, San Francisco CA, USA 10th August 2014

EPO and USPTO background USPTO: 303k grants published in 2013 EPO: 67k grants published in 2013 EPO patents must be in one or more of English, French or German USPTO documents are all in English

Epo/USPTO Compounds (using StdInChI) 445,588 3,345,877 2,674,069 USPTO = 1976-2013 patent grants + 2001-2013 patent applications EPO = 1978-2013 patent grants and applications

Epo/USPTO Reactions (using separately sorted StdInChIs for reactants/agents and products) 99,681 706,524 317,533 USPTO = 1976-2013 patent grants + 2001-2013 patent applications EPO = 1978-2013 patent grants and applications

Effect of different Languages 4000000 3500000 3000000 Number of structures extracted 2500000 2000000 1500000 1000000 English German French 500000 0 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 Translation performed by Lexichem v2014.jun

First Disclosure Occurrence by Language All compounds Excluding compounds disclosed by USPTO (1976-2013) patents

Timeliness Competitive intelligence requires up to date information Methodology: Consider compounds present in both EPO and USPTO date not disclosed by either prior to 2006 Compare the earliest publication date for each compound

Average lag between EPO/USPTO 180000 160000 140000 Mean: US 540 days 120000 Compound disclosures 100000 80000 60000 40000 20000 0 US more than 5 US 3-5 US 2-3 US 1-2 US 6-12 months US 3-6 months US 1-3 months US 1-4 weeks Within a week US 1-4 weeks US 1-3 months US 3-6 months US 6-12 months US 1-2 Compounds first disclosed between 2006-2013 US 2-3 US 3-5 US more than 5

70000 Applications Vs Applications 60000 50000 Mean: US 223 days Compound disclosures 40000 30000 20000 10000 0 US more than 5 US 3-5 US 2-3 US 1-2 US 6-12 months US 3-6 months US 1-3 months US 1-4 weeks Within a week US 1-4 weeks US 1-3 months US 3-6 months US 6-12 months US 1-2 Compounds first disclosed between 2006-2013 US 2-3 US 3-5 US more than 5

140000 Grants vs Grants 120000 100000 Mean: US 287 days Compound disclosures 80000 60000 40000 20000 0 US more than 5 US 3-5 US 2-3 US 1-2 US 6-12 months US 3-6 months US 1-3 months US 1-4 weeks Within a week US 1-4 weeks US 1-3 months US 3-6 months US 6-12 months US 1-2 Compounds first disclosed between 2006-2013 US 2-3 US 3-5 US more than 5

Epo/US 1976-2013 and chembl19 overlap 1,155,034 6,278,048 187,486

10000 Average lag between Patents/CHEMbl19 9000 8000 7000 Mean: Patents 345 days 6000 5000 4000 3000 2000 1000 0 Patents 4-5 Patents 3-4 Patents 2-3 Patents 1-2 Patents 0-1 Patents 0-1 Patents 1-2 Patents 2-3 Patents 3-5 Patents 4-5 Compounds first disclosed between 2006-2013

Other data in patents Genes/proteins Reactions including role of reagents Physical quantities Spectra Diseases Organisms Companies

Gene/protein identification Identify trends in drug target popularity Count number of patents (in a year) mentioning a given gene or its gene products and map to the HGNC symbol Many genes are referenced by short ambiguous terms hence great care is required to have good precision

Gene/protein identification EGFR MET Percentage of protein/gene mentions in that year (Patents) 0.35% 0.30% 0.25% 0.20% 0.15% 0.10% 0.05% 0.00% 1976197819801982198419861988199019921994199619982000200220042006200820102012 0.90% 0.80% 0.70% 0.60% 0.50% 0.40% 0.30% 0.20% 0.10% 0.00% Percentage of protein/gene mentions in that year (ChEMBL) Percentage of protein/gene mentions in that year (Patents) 0.14% 0.12% 0.10% 0.08% 0.06% 0.04% 0.02% 0.00% 1976197819801982198419861988199019921994199619982000200220042006200820102012 0.70% 0.60% 0.50% 0.40% 0.30% 0.20% 0.10% 0.00% Percentage of protein/gene mentions in that year (ChEMBL) (Epidermal growth factor receptor) (Hepatocyte growth factor receptor)

Gene/protein identification MTOR PLAT Percentage of protein/gene mentions in that year (Patents) 0.14% 0.12% 0.10% 0.08% 0.06% 0.04% 0.02% 0.00% 1976197819801982198419861988199019921994199619982000200220042006200820102012 0.45% 0.40% 0.35% 0.30% 0.25% 0.20% 0.15% 0.10% 0.05% 0.00% Percentage of protein/gene mentions in that year (ChEMBL) Percentage of protein/gene mentions in that year (Patents) 0.50% 0.45% 0.40% 0.35% 0.30% 0.25% 0.20% 0.15% 0.10% 0.05% 0.00% 1976 1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006 2008 2010 2012 0.40% 0.35% 0.30% 0.25% 0.20% 0.15% 0.10% 0.05% 0.00% Percentage of protein/gene mentions in that year (ChEMBL) (Mammalian target of rapamycin) (Tissue plasminogen activator)

Melting/Boiling Point Extraction

Melting/Boiling Point Extraction Over 99k so far!

CHEMICAL reactions 248th ACS National Meeting, San Francisco CA, USA 10th August 2014

Predicting yield Features to consider: 2D fingerprints (especially around the reaction centers) Reaction type Temperature Time Change in complexity e.g. chiral centres

Data extraction Can pull out yields, quantities, times and temperatures Can sanity check text-mined yield by calculating from amounts (or even masses) mmmm oo cccccccc(g) mmmmm mmmm [cccc ffff sssssssss](gggg 1 ) = mmmm oo cccccccc mmmm oo ppppppp mmmm oo llllllll rrrrrrrr = %yyyyy

Outliers inevitable

scale vs yield 2001-2013 US applications, Suzuki couplings 248th ACS National Meeting, San Francisco CA, USA 10th August 2014

Temperatures

700000 Identify Synthetic Routes 600000 500000 Occurrences 400000 300000 200000 Intermediates Terminal Products 100000 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Intermediates 197702 103114 56611 31403 17268 9230 5057 2701 1256 639 301 136 58 15 5 2 Terminal Products 385149 149445 81837 47579 27670 16619 9320 5263 2511 1330 678 373 111 63 8 6 5 Number of steps

Number of steps vs Complexity ringcomplexityscore = log 10 (nringbridgeatoms + 1) + log 10 (nspiroatoms + 1) stereocomplexityscore = log 10 (nstereocenters + 1) macrocyclepenalty = log 10 (nmacrocycles + 1) sizepenalty = natoms 1.005 natoms Ertl & Schuffenhauer 2009 doi:10.1186/1758-2946-1-8

Number of steps vs Complexity Number of product heavy atoms 50 45 40 35 30 25 20 15 10 5 Average number of heavy atoms 0 1 2 3 4 5 6 7 8 9 10 11 Number of steps

Yield vs Change in Complexity 2001-2013 US applications, all reactions 248th ACS National Meeting, San Francisco CA, USA 10th August 2014

Trends in Reaction Types 8.0% Suzuki couplings as a percentage of reactions in a year 7.0% 6.0% 5.0% 4.0% 3.0% 2.0% 1.0% 0.0% 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 Classification performed using NameRXN

Trends In Solvent Use 20.0% Percentage of reactions in that year 15.0% 10.0% 5.0% Tetrahydrofuran Dichloromethane Water Dimethylformamide Methanol Ethyl acetate Ethanol 1,4-Dioxane Toluene Acetonitrile Acetic acid Chloroform Acetone Benzene 0.0% 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013

Are solvents getting greener? 1976 2013 Water (21%) Tetrahydrofuran (15%) Ethanol (11%) Dichloromethane (14%) Benzene (8%) Water (13%) Methanol (7%) Dimethylformamide (10%) Tetrahydrofuran (5%) Methanol (8%) Dichloromethane (4%) Ethyl acetate (7%) Dimethylformamide (4%) Ethanol (5%) Acetic acid (4%) 1,4-Dioxane (4%) Chloroform (3%) Toluene (3%) Acetone (3%) Acetonitrile (3%) Total for top 10: 71% 82%

Conclusions A significant amount of novel chemistry, from EPO patents, comes from the non-english patents Compounds disclosed by both the USPTO and EPO are on average published by the USPTO but for many compounds an EPO patent will be the disclosure Gene/protein identification can identify clear changes in patenting behaviour over time Text mining provides the tools to answer many reaction informatics questions

Acknowledgements Funding provided by:

Thank you for your time! http://nextmovesoftware.com http://nextmovesoftware.com/blog daniel@nextmovesoftware.com