Satoshi Yoshida and Takuya Kida Graduate School of Information Science and Technology, Hokkaido University

Similar documents
A New Searchable Variable-to-Variable Compressor

Word-based Statistical Compressors as Natural Language Compression Boosters

SUPPLEMENT MATERIALS

GOLOMB Compression Technique For FPGA Configuration

A Novel Decode-Aware Compression Technique for Improved Compression and Decompression

Profile-driven Selective Code Compression

The Evolution of Transport Planning

Compression of FPGA Bitstreams Using Improved RLE Algorithm

EMBEDDED computing systems are space and cost sensitive.

Solving MINLPs with BARON. Mustafa Kılınç & Nick Sahinidis Department of Chemical Engineering Carnegie Mellon University

Reducing Code Size with Run-time Decompression

Efficient Placement of Compressed Code for Parallel Decompression

International Journal of Engineering Trends and Technology (IJETT) Volume 18 Number2- Dec 2014

An Efficient Code Compression Technique using Application-Aware Bitmask and Dictionary Selection Methods

An Architecture of Embedded Decompressor with Reconfigurability for Test Compression

LFQC: a lossless compression algorithm for FASTQ files

Reduction of Bitstream Transfer Time in FPGA

Recycling Bits in LZ77-Based Compression

Fast Floating Point Compression on the Cell BE Processor

A Hybrid Code Compression Technique using Bitmask and Prefix Encoding with Enhanced Dictionary Selection

Fingerprint Recompression after Segmentation

Due on: November 1, 2016: AM Use additional sheets, if necessary. You should staple your exam. Submit on-time at AM

A Quadtree-Based Lightweight Data Compression Approach to Processing Large-Scale Geospatial Rasters

1. Predict what will happen in the following situation. Sketch below your prediction of the interference pattern when the waves overlap:

FEATURES. Features. UCI Machine Learning Repository. Admin 9/23/13

COMPRESSION OF FPGA BIT STREAMS USING EFFECTIVE RUN LENGTH ENCODING TECHIQUES AND ITS PERFORMANCE ESTIMATION

Evaluation of a High Performance Code Compression Method

Arithmetic Coding Modification to Compress SMS

Imperfectly Shared Randomness in Communication

Artificial Intelligence for the EChO Mission Scheduler

Joint Parsing and Translation

ICES REPORT November gfpc: A Self-Tuning Compression Algorithm. Martin Burtscher and Paruj Ratanaworabhan

Scalable Data Structure to Compress Next- Generation Sequencing Files and its Application to Compressive Genomics

Real World Search Problems. CS 331: Artificial Intelligence Uninformed Search. Simpler Search Problems. Example: Oregon. Search Problem Formulation

Data Extraction from Damage Compressed File for Computer Forensic Purposes

GLAST Large Area Telescope Monthly Mission Review

Communication Amid Uncertainty

AIM JOG RUN WALK FAST RUN THINK INCLUSIVE! LiRF Session Card 1

A new Decomposition Algorithm for Multistage Stochastic Programs with Endogenous Uncertainties

Application of Bayesian Networks to Shopping Assistance

Instruction Cache Compression for Embedded Systems by Yujia Jin and Rong Chen

EMBEDDED systems have become more and more important

CS472 Foundations of Artificial Intelligence. Final Exam December 19, :30pm

A Study on Algorithm for Compression and Decompression of Embedded Codes using Xilinx

Using MATLAB with CANoe

Communication Amid Uncertainty

MEMORY is one of the key driving factors in embeddedsystem

Light Loss-Less Data Compression, With GPU Implementation

Safety Critical Systems

Neural Nets Using Backpropagation. Chris Marriott Ryan Shirley CJ Baker Thomas Tannahill

Rotel RX-1052 RS232 HEX Protocol

Transverse waves cause particles to vibrate perpendicularly to the direction of the wave's motion (e.g. waves on a string, ripples on a pond).

The system design must obey these constraints. The system is to have the minimum cost (capital plus operating) while meeting the constraints.

CT PET-2018 Part - B Phd COMPUTER APPLICATION Sample Question Paper

Bulgarian Olympiad in Informatics: Excellence over a Long Period of Time

Evaluating chaff fire pattern algorithms in a simulation environment. JP du Plessis Institute for Maritime Technology South Africa

MECHANICAL WAVES AND SOUND

CSE 3401: Intro to AI & LP Uninformed Search II

Transposition Table, History Heuristic, and other Search Enhancements

DATA MINING ON CRICKET DATA SET FOR PREDICTING THE RESULTS. Sushant Murdeshwar

An Architecture for Combined Test Data Compression and Abort-on-Fail Test

MEMORY is one of the most constrained resources in an

E STIMATING KILN SCHEDULES FOR TROPICAL AND TEMPERATE HARDWOODS USING SPECIFIC GRAVITY

Wave Motion. interference destructive interferecne constructive interference in phase. out of phase standing wave antinodes resonant frequencies

DESIGN AND ANALYSIS OF ALGORITHMS (DAA 2017)

Code Compression for Low Power Embedded System Design

Spreading Activation in Soar: An Update

7),8) (GPU) SIMD ClearSpeed (GSIC) 53% TSUBAME. NVIDIA Tesla GPU TFlops. Tokyo Institute of Technology 2 JST, CREST

Computing s Energy Problem:

Introduction. Strand E Unit 2: The Nature of Waves. Learning Objectives. Introduction.

The API states the following about tube rupture for a shell-and-tube heat exchangers:

Operating instructions Electrical switching facility pco

Chapter 11 Waves. Waves transport energy without transporting matter. The intensity is the average power per unit area. It is measured in W/m 2.

Test Report # Rev 0. Adiabatic Compression With Constant Bleed Valve

Chapter 11 Waves. Waves transport energy without transporting matter. The intensity is the average power per unit area. It is measured in W/m 2.

LONG METAL SPRING ITEM # ENERGY - MOTION

Design and Simulation of a Pipelined Decompression Architecture for Embedded Systems

AccuRAID iscsi Auto-Tiering Best Practice

LFQC: A lossless compression algorithm for FASTQ files

/435 Artificial Intelligence Fall 2015

Statistical Machine Translation

6/16/2010 DAG Execu>on Model, Work and Depth 1 DAG EXECUTION MODEL, WORK AND DEPTH

5.1 Introduction. Learning Objectives

The Implementation and Evaluation of Dynamic Code Decompression Using DISE

Experimental Investigation of Dynamic Load Control Strategies using Active Microflaps on Wind Turbine Blades.

VLSI Design 14. Memories

Constructing Sailing Match Race Schedules: Round-Robin Pairing Lists

ECE 697B (667) Spring 2003

Units of Chapter 14. Types of Waves Waves on a String Harmonic Wave Functions Sound Waves Standing Waves Sound Intensity The Doppler Effect

1. Predict what will happen in the following situation. Sketch below your prediction of the interference pattern when the waves overlap:

Shearwater Cloud Desktop Release Notes

Matrix-based software test data decompression for systems-on-a-chip

Fast Lossless Depth Image Compression

SEARCH SEARCH TREE. Node: State in state tree. Root node: Top of state tree

SNAKY SPRING WAVE DEMONSTRATION ITEM # ENERGY - MOTION

Methods for the Anisotropic Wavelet Packet Transform

Higher, Lower; Faster, Slower? Student Data Page Activity 4B Part 2

125c. United States Patent 19 Clark, II. Y y 117 P 125M. separates the single bit string code based upon length of a 125C

Night Flight. By Robert Dubrow Illustrated by George Ulrich

Effects of Common Economic Space Creation

Transcription:

Satoshi Yoshida and Takuya Kida Graduate School of Information Science and Technology, Hokkaido University

ompressed Pattern Matching ompressed Data Search Directly 0000 000000 Program Searching on ompressed Data Variable-to-Fixed Length (VF) ode has been attracted attention from the viewpoint of compressed pattern matching for a few years. Input Text ompressed Text Fixed Variable Fixed FF ode VF ode Variable FV ode VV ode WT 200 October 4th, 200 2

more memory and time short codeword length long small size of parse tree large low compression ratio high low cost of construct/hold high low preparation cost of pattern matching high WT 200 October 4th, 200 3

ompression method for high compression ratio and fast pattern matching pply other compression method after VF oding STVF oding [Kida2009] Range oder [Martin979] Input Text VF oding Intermediate Output Other oding Output WT 200 October 4th, 200 4

fter decoding compressed text with range coder, we get STVF coded text. We run pattern matching algorithm on it. STVF (short codeword ) + Range oder Range oder (decoding) Intermediate Output (STVF oded) PM on STVF Which is fast? STVF (long codeword) WT 200 October 4th, 200 5

ompression ratio: STVF(2) + range coder slightly improved STVF(6) Pattern matching time: slower by decompression of Range oder. ompression time: almost the same (slightly faster!) Decompression time: almost the same WT 200 October 4th, 200 6

y G. N. N. Martin in 979. G. N. N. Martin, Range encoding: n algorithm for removing redundancy from a digitised message, 979. variation of rithmetic oding [Rissanen, Langdon 979]. Encode using integers instead of real numbers. Encoding is faster than rithmetic oding. ompression ratio is better than Huffman odes. WT 200 October 4th, 200 7

y T. Kida in 2009 T. Kida, Suffix tree based VF-coding for compressed pattern matching, D2009, 2009. VF coding using a pruned suffix tree as a parse tree. chieves higher compression ratio than the basic VF code. WT 200 October 4th, 200 8

P. Weiner, Linear pattern matching algorithms, SWT973, 973. tree structure representing all suffixes in the string. Each branch is labeled by a nonempty string. Each inner node has at least two children. O O O O Suffixes of a string OO: The label of each branch outgoes from an inner node begins different character. OO OO O O WT 200 October 4th, 200 9

Make a compact parse tree. 9 4 3 2 October 4th, 200 0 WT 200 The suffix tree of the string S =

Make a compact parse tree. 2 000 9 4 3 0 0 00 00 0 00 WT 200 October 4th, 200

Input: 2 000 9 4 3 00 00 0 00 0 output: 0 000 00 0 0 0 WT 200 October 4th, 200 2

STVF coded text is represented by regular collage system: a formal system to represent a string [Kida 2003], which is a general framework to capture the essence of compressed pattern matching. We can introduce a (ho-orasick type) pattern matching on STVF code systematically with collage system. The pattern matching algorithm runs in O(n + m 2 ) time and O(D + m 2 ) space. ollage system: a unifying framework for compressed pattern matching WT 200 October 4th, 200 3

ompression methods STVF oding STVF oding + Range oder Data English Text (brown corpus, 6.8M, Σ =96) Environments PU: Intel Xeon processor 3.00GHz dual core Memory: 2G OS: Red Hat Enterprise Linux ES Release 4 odeword Length l = 8-6 bits We compared compression ratios, compression times, decompression times and pattern matching times between the two methods. WT 200 October 4th, 200 4

90% 80% ompress sion ratio 70% 60% 50% 40% 30% 20% 50.6% 5.3% STVF STVF + Range oder 0% 0% 8 9 0 2 3 4 5 6 odeword length WT 200 October 4th, 200 5

ompression time (sec) 35 30 25 20 5 0 5 STVF STVF + Range oder 0 8 9 0 2 3 4 5 6 odeword length WT 200 October 4th, 200 6

Decompressio on time (sec).8.6.4.2 0.8 0.6 0.4 0.2 0 8 9 0 2 3 4 5 6 odeword length STVF STVF + Range oder WT 200 October 4th, 200 7

2.5 Pattern matchi ing time (sec) 2.5 0.5 STVF(6) STVF(2) STVF(2) + Range oder 0 5 0 5 20 25 30 35 40 45 50 Pattern length WT 200 October 4th, 200 8

fter decoding compressed text with range coder, we get STVF coded text. We run pattern matching algorithm on it. STVF (short codeword ) + Range oder Range oder (decoding) Intermediate Output (STVF oded) PM on STVF Which is fast? STVF (long codeword) WT 200 October 4th, 200 9

ompression ratio: STVF(2) + range coder is slightly better than STVF(6) (5.3% 50.6%) Pattern matching time: slow by decompression of Range oder. ompression time: almost the same (slightly faster!) Decompression time: almost the same WT 200 October 4th, 200 20

We investigated the performance of the combination of STVF coding and range coder. We have almost no sacrifices in compression and decompression times. Since the decode of range coder is slower than we expected, we could not improve pattern matching speed. Future work ombine with other methods whose decompression speeds are fast such as gzip. Implement Set-Horsepool algorithm* (oyer Moore type) to improve pattern matching speed when pattern length is long. * G. Navarro and M. Raffinot, Flexible pattern matching in strings, 2007 WT 200 October 4th, 200 2