GOLOMB Compression Technique For FPGA Configuration

Similar documents
Reduction of Bitstream Transfer Time in FPGA

A Novel Decode-Aware Compression Technique for Improved Compression and Decompression

COMPRESSION OF FPGA BIT STREAMS USING EFFECTIVE RUN LENGTH ENCODING TECHIQUES AND ITS PERFORMANCE ESTIMATION

Compression of FPGA Bitstreams Using Improved RLE Algorithm

A Study on Algorithm for Compression and Decompression of Embedded Codes using Xilinx

International Journal of Engineering Trends and Technology (IJETT) Volume 18 Number2- Dec 2014

An Efficient Code Compression Technique using Application-Aware Bitmask and Dictionary Selection Methods

An Architecture of Embedded Decompressor with Reconfigurability for Test Compression

Efficient Placement of Compressed Code for Parallel Decompression

MEMORY is one of the key driving factors in embeddedsystem

A Hybrid Code Compression Technique using Bitmask and Prefix Encoding with Enhanced Dictionary Selection

MEMORY is one of the most constrained resources in an

Reducing Code Size with Run-time Decompression

Fast Floating Point Compression on the Cell BE Processor

A 64 Bit Pipeline Based Decimal Adder Using a New High Speed BCD Adder

EMBEDDED systems have become more and more important

Profile-driven Selective Code Compression

An Architecture for Combined Test Data Compression and Abort-on-Fail Test

Evaluation of a High Performance Code Compression Method

Designing of Low Power and Efficient 4-Bit Ripple Carry Adder Using GDI Multiplexer

Matrix-based software test data decompression for systems-on-a-chip

EMBEDDED computing systems are space and cost sensitive.

Design of Low Power and High Speed 4-Bit Ripple Carry Adder Using GDI Multiplexer

Arithmetic Coding Modification to Compress SMS

Compact Binaries with Code Compression in a Software Dynamic Translator

Control System Nur Istianah-THP-FTP-UB-2014

Instruction Cache Compression for Embedded Systems by Yujia Jin and Rong Chen

Design and Simulation of a Pipelined Decompression Architecture for Embedded Systems

Emergent walking stop using 3-D ZMP modification criteria map for humanoid robot

A Novel Gear-shifting Strategy Used on Smart Bicycles

Post-Placement Functional Decomposition for FPGAs

Compact Binaries with Code Compression in a Software Dynamic Translator

Fast Software-managed Code Decompression

THE PROBLEM OF ON-LINE TESTING METHODS IN APPROXIMATE DATA PROCESSING

Decompression Method For Massive Compressed Files In Mobile Rich Media Applications

Adiabatic Switching. A Survey of Reversible Computation Circuits. Benjamin Bobich, 2004

Code Compression for Low Power Embedded System Design

CPE/EE 427, CPE 527 VLSI Design I L21: Sequential Circuits. Review: The Regenerative Property

Recycling Bits in LZ77-Based Compression

Design-for-Testability for Path Delay Faults in Large Combinational Circuits Using Test-Points

The Implementation and Evaluation of Dynamic Code Decompression Using DISE

Implementation of Modern Traffic Light Control System

Data Extraction from Damage Compressed File for Computer Forensic Purposes

VLSI Design I; A. Milenkovic 1

SUPPLEMENT MATERIALS

HIGH RESOLUTION DEPTH IMAGE RECOVERY ALGORITHM USING GRAYSCALE IMAGE.

Application Notes. SLP85xD Load Cells

ATM 322 Basic Pneumatics H.W.6 Modules 5 7

Computing s Energy Problem:

Satoshi Yoshida and Takuya Kida Graduate School of Information Science and Technology, Hokkaido University

A New Reference Frame Recompression Algorithm and Its VLSI Architecture for UHDTV Video Codec

EVOLVING HEXAPOD GAITS USING A CYCLIC GENETIC ALGORITHM

Distributed Control Systems

Stack Height Analysis for FinFET Logic and Circuit

Determining Occurrence in FMEA Using Hazard Function

VLSI Design 12. Design Styles

AddressBus 32. D-cache. Main Memory. D- Engine CPU. DataBus 1. DataBus 2. I-cache

82C288 BUS CONTROLLER FOR PROCESSORS (82C C C288-8)

Online Companion to Using Simulation to Help Manage the Pace of Play in Golf

Early Skip Decision based on Merge Index of SKIP for HEVC Encoding

WHEN WILL YOUR MULTI-TERABYTE IMAGERY STOP REQUIRING YOU TO BUY MORE DATA STORAGE?

Queue analysis for the toll station of the Öresund fixed link. Pontus Matstoms *

1.1 The size of the search space Modeling the problem Change over time Constraints... 21

Light Loss-Less Data Compression, With GPU Implementation

A self-scrubber for FPGA-based systems

Algorithm for Line Follower Robots to Follow Critical Paths with Minimum Number of Sensors

sorting solutions osx separator series

Predicting Horse Racing Results with Machine Learning

UNIVERSITY OF WATERLOO

The Universal Translator

A Model for Tuna-Fishery Policy Analysis: Combining System Dynamics and Game Theory Approach

A CO 2 Waveform Simulator For Evaluation and Testing of Respiratory Gas Analyzers

THe rip currents are very fast moving narrow channels,

APPLICATION OF TOOL SCIENCE TECHNIQUES TO IMPROVE TOOL EFFICIENCY FOR A DRY ETCH CLUSTER TOOL. Robert Havey Lixin Wang DongJin Kim

Energy Savings Through Compression in Embedded Java Environments

MEDICAL IMAGE WAVELET COMPRESSION- EFFECT OF DECOMPOSITION LEVEL

Understanding Centrifugal Compressor Capacity Controls:

Development of Fish type Robot based on the Analysis of Swimming Motion of Bluefin Tuna Comparison between Tuna-type Fin and Rectangular Fin -

ICES REPORT November gfpc: A Self-Tuning Compression Algorithm. Martin Burtscher and Paruj Ratanaworabhan

Autodesk Moldflow Communicator Process settings

Flyweight Pattern. Flyweight: Intent. Use sharing to support large numbers of fine-grained objects efficiently. CSIE Department, NTUT Chien-Hung Liu

Chapter 5 5. INTERSECTIONS 5.1. INTRODUCTION

Application of Dijkstra s Algorithm in the Evacuation System Utilizing Exit Signs

Operational Comparison of Transit Signal Priority Strategies

Cycle Analysis and Construction of Protographs for QC LDPC Codes With Girth Larger Than 12 ½

Fast Lossless Depth Image Compression

Synchronous Sequential Logic. Topics. Sequential Circuits. Chapter 5 Steve Oldridge Dr. Sidney Fels. Sequential Circuits

HW #5: Digital Logic and Flip Flops

Surfing Interconnect

The Incremental Evolution of Gaits for Hexapod Robots

New Approach for Implementing BCD Adder Using Flagged Logic

Internal Arc Simulation in MV/LV Substations. Charles BESNARD 8 11 June 2009

Front-end Realization of ASIC for Traffic Light Control with Real Time Clock Synchronization

Word-based Statistical Compressors as Natural Language Compression Boosters

Numerical Fluid Analysis of a Variable Geometry Compressor for Use in a Turbocharger

Blocking time reduction for level crossings using the genetic algorithm

LOCOMOTION CONTROL CYCLES ADAPTED FOR DISABILITIES IN HEXAPOD ROBOTS

GaitAnalysisofEightLegedRobot

Introduction to Pneumatics

Data Sheet T 8389 EN. Series 3730 and 3731 Types , , , and. EXPERTplus Valve Diagnostic

Transcription:

GOLOMB Compression Technique For FPGA Configuration P.Hema Assistant Professor,EEE Jay Shriram Group Of Institutions ABSTRACT Bit stream compression is important in reconfigurable system design since it reduces the bit streams size and the memory requirement. It also improves the communication bandwidth and thereby decreases the reconfiguration time. Existing research in this field has explored two directions: efficient compression with slow decompression or fast decompression at the stored in internal or external memory as cost of compression efficiency. This project bitstreams, the limited memory size, and proposes a novel decode-aware compression technique to improve both compression and decompression efficiencies. The proposed work combines golomb codes which is efficient for both variable and fixed length parameters with dictionary and bit mask selection methods for improving compression efficiency. To reduce the hardware overhead during decompression, a smart placement of compressed bitstreams which enables the compressed variable-length coding bit streams to be stored and buffered in the form of multiple fixed length coding bit streams. So that, the decompression hardware for variablelength coding is capable of operating at the speed closest to the best known fieldprogrammable gate array-based decoder for fixed-length coding. The area and configuration delay of the decompression engine are reduced significantly. I. INTRODUCTION FIELD-PROGRAMMABLE GATE ARRAYS (FPGAs) are widely used in reconfigurable systems. Since the configuration information for FPGA has to be access bandwidth become the key factors in determining the different functionalities that a system can be configured and how quickly the configuration can be performed. While it is quite costly to employ memory with more capacity and access bandwidth, bitstream compression technique alleviates the memory constraint by reducing the size of the bitstreams. With compressed bitstreams, more configuration information can be stored using the same memory. The access delay is also reduced, because less bits need to be transferred through the memory interface. To measure the efficiency of bitstream compression, compression ratio (CR) is widely used as a metric. It is defined as the ratio between the compressed bitstream size 2009

(CS) and the original bitstream size (OS).Therefore, a smaller compression ratio implies a better compression technique. There are two major challenges in bitstream compression: 1) how to compress the bitstream as much as possible and 2) how to efficiently decompress the bitstream without affecting the reconfiguration time. Our approach combines the advantages of previous compression techniques with good compression ratio and those with fast decompression. This paper makes three important contributions. First, it performs smart placement of compressed bitstreams to enable fast decompression of variable-length coding. Next, it selects bitmask-based compression parameters suitable for bitstream compression. Finally, it dictionary entries and bitmask patterns. Our decode-aware placement algorithm is employed to place the compressed bitstream in the memory for efficient decompression. During run-time, the compressed bitstream is transmitted from the memory to the decompression engine, and the original configuration bitstream is produced by Decompression. Since memory and communication bus are designed in multiple of bytes (8 bits), storing dictionaries or transmitting data other than multiple of byte size is not efficient. Thus, we restrict the symbol length to be multiples of eight in our current implementation. Since the dictionary for bitstream compression is smaller compared to the size of the bitstream itself, we use d=2^i to fully utilize the bits for dictionary indexing, efficiently combines run lengthencoding and where i is the number of indexing bits. bitmask-based compression to obtain better compression and faster decompression. III DICTIONARY BASED APPROACH Dictionary-based code compression II BLOCK DIAGRAM techniques provide compression efficiency as well as fast decompression mechanism. The basic idea is to take advantage of commonly occurring instruction sequences by using a dictionary. The repeating occurrences are replaced with a codeword that points to the index of the dictionary that contains the pattern. The compressed program consists of both codewords and uncompressed instructions. Figure 3 shows an example of Fig.1 Block Diagram dictionary based code compression using a simple program binary and encoding format The figure 1 shows our decode-aware bitstream compression framework. On the shown in Figure 2. The binary consists of ten 8-bit patterns i.e., total 80 bits. The dictionary compression side, FPGA configuration has two 8-bit entries. The compressed program bitstream is analyzed for selection of profitable requires 62 bits and the dictionary requires 16 2010

bits. In this case, the compression ratio (CR) is 97.5% Fig.2 Format for Dictionary Based Compression The vectors that match directly are compressed with 3 bits. The first bit represents whether it is compressed (using 0) or not (using 1). The second bit indicates whether it is compressed using bitmask (using 0) or not (using 1).The last bit indicates the dictionary index. Data that are compressed using bitmask requires 7 bits. The first two bits, as before, represent if the data is compressed, and whether the data is compressed using bitmasks. The next two bits indicate the bitmask position and followed by two bits that indicate the bitmask pattern. The data which is different for more than 1 bit is left uncompressed. Fig.3 Dictionary Based Compression Example IV BITMASK BASED COMPRESSSION Bitmask-based compression is an enhancement on the dictionary-based compression scheme, which helps us to get more matching patterns. In dictionary- based compression, each vector is compressed only if it completely matches with a dictionary entry. Figure 4 shows an example of bitmask based code compression using a simple program binary and encoding format shown in Figure 5. Fig.4 Bitmask Based Compression Example Fig.5 Format for Bitmask Based Compression 2011

V. GOLOMB CODING In Golomb Coding, the group size, m, defines the code structure. Thus, choosing the m parameter decides variable length code structure which will have direct impact on the compression efficiency. Once the parameter m is decided, a Fig.8 Golomb Coding Example With Parameter m=4 VI MOTIVATION OF WORK In this section, we briefly analyze the table which maps the runs of zeros until the decompression hardware complexity of code is ended with a one is created. common variable-length compression Determination of the run length is shown as in Figure 6. A run length of multiples of m are grouped into Ak and given the same prefix, which is (k 1) number of ones followed by a zero. A tail is given for each members of the group, which is the binary representation of zero until (m 1). The codeword is then produced by combining the prefix and the tail. An example of the table is in Figure 7. techniques. This analysis forms the basis of our approach. In the following discussion, we use the term symbol to refer to a sequence of uncompressed bits and code to refer to the compression result (of a symbol) produced by the compression algorithm.while compression efficiency is straightforward and widely used criteria to evaluate compression techniques, the complexity of decompression hardware determines whether an algorithm with Fig.6 Determination of Run-Length Using Figure 7, binary strings can be divided into subsets of binary strings and replacing the subsets with the equivalent codeword as shown in Figure 8. Fig.7 Golomb Coding Example With Parameter m=4 promising compression ratio can be applied to commercial FPGAs. Interestingly, our study shows that the complexity of the decompression algorithm is not the only determining factor of the hardware complexity. When variable- length coding is employed, the hardware complexity is also determined by the complex buffering circuitry, which is overlooked by previous bitstream compression approaches. Since each code has different length, we have to use barrel shifter to align the input buffer in each cycle. Theoretically, a barrel shifter operating on a n-bit buffer needs nlog(n) multiplexers (MUXes) organized into log(n) layers. When implemented in modern FPGAs (Xilinx Virtex II or Virtex 4), the barrel shifter for an input buffer with typical size of 32 64 bits consumes 200 400 four- 2012

input lookup tables (LUTs), which is similar to the total area of a typical bitmask or Huffman decoder. Moreover, barrel shifter will increase the latency remarkably by introducing several layers of combinational logic in the critical path. Therefore, the buffering circuitry is a major bottleneck in a decompression engine for variable-length coding both in terms of area and performance. VI ALGORITHM Input: Input bitstream Output: Compressed bitstream placed in memory Step 1: Divide input bitstream into symbol sequence SL. Step 2: Perform bitmask pattern selection. Step 3: Perform dictionary selection. Step 4: Compress symbol SL into code sequence CL using bitmask and Golomb coding. Step 5: Perform placement of CL using power two-streams. VII CONCLUSION The existing compression algorithms either provide good compression with slow decompression or fast decompression at the cost of compression efficiency. In this paper, we proposed a decoding-aware compression technique that tries to obtain both best possible compression and fast decompression performance. The proposed compression technique analyzes the effect of parameters on compression ratio and chooses the optimal ones automatically. We also exploit golomb encoding efficiently combined with bitmaskbased compression to further improve both compression ratio and decompression efficiency. VIII REFERENCES 1. Xiaoke Qin, Member, IEEE, Chetan Muthry, and Prabhat Mishra, Senior Member, IEEE(March 2011) Decoding-Aware Compression of FPGA Bitstreams, IEEE transactions on Very Large Scale Integration (VLSI) systems, vol. 19, no. 3. 2. Chandra.A and Chakrabarty.K,(March 2001) System on-a-chip test data compression and decompression architectures based on Golomb codes, IEEE Trans Computer-Aided Design, vol. 20, pp.355-368. 3. Dandalis.A and Prasanna.V.K,(December 2005) Configuration compression for FPGA-based embedded systems, IEEE Trans. Very Large Scale Integr. (VLSI) Syst., vol. 13, no. 12, pp. 1394 1398. 4. Hauck.S and Wilson.W.D,(March 1999) Runlength compression techniques for FPGA configurations, in Proc. IEEE Symp. Field-Program. Custom Comput., pp. 286 287. 5. Hauck.S, Li.Z, and Schwabe.E,( August 1999) Configuration compression for the Xilinx XC6200 FPGA, IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol. 18, no. 8, pp. 1107 1113. 2013

6. Koch.D, Beckhoff.C, and Teich.J,(2007) Bitstream decompression for high speed FPGA configuration from slow memories, in Proc. Int. Conf. Field-Program. Technol., pp. 161 168. 7. Pan.H.J, Mitra.T, and Wong.W.F,(December 2004) Configuration bitstream compression for dynamically reconfigurable FPGAs, in Proc. Int. Conf. Comput.- Aided Des., pp. 766 773. 8. Seong.S and Mishra.P,(April 2008) Bitmask-based code compression for embedded systems, IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol. 27, no. 4, pp. 673 685. 9. Stefan.R and Cotofana.S,(2008) Bitstream compression techniques for Virtex 4 FPGAs, in Proc. Int. Conf. Field Program. Logic Appl., 2014