International Journal of Engineering Trends and Technology (IJETT) Volume 18 Number2- Dec PDF Free Download

Compression and Decompression of FPGA Bit Stream Using Bitmask Technique K.Khuresh Gouse 1 N.Chitra 2 K.Maheshwari 3 1 PG Student (M.Tech), 2 Associate Professor, 3 Associate Professor, Dept. of ECE, Gates Institute of Technology, Gooty Abstract This Paper proposes an efficient decoding aware compression technique for compressing FPGA configuration bit streams to improve compression ratio of the bits and decrease decompression ratio. This is accomplished by efficiently choosing decoding aware data such as (length of the word, dictionary size and number and bitmask type) combined with run length coding of repetitive word pattern. The decompression ratio is reduced by reorganizing compressed bits to form length of fixed words. The experimental results illustrate that our approach improves compression ratio by 10-15% over existing bit stream compression techniques and decompression hardware is capable of running at 300MHZ. The decompression time to configure FPGA is decreased by 20-25% over decompression accelerator. Keywords Bit stream compression, Decompression hardware, Decode aware algorithm, FPGA. I. INTRODUCTION In an ancient mounted word book based mostly compression formula, the computer file with length N bits is split into n words of every length w bits. Lists of all distinctive words square measure then sorted in descendent order of their occurrences. of these words square measure then encoded victimization indices to words keep Field Programmable Gated Arrays (FPGA) store information on memories which are usually limited in capacity and bandwidth. As FPGA are commonly used in reconfigurable systems and application specific integrated circuits (ASIC),. The bit stream compression algorithms solve memory constraint issue by reducing the size of the bit streams and decompression accelerators increase the decoding speed by simple decoding logic. But there are very few algorithms that orders both efficient compression ratio and fast decompression. Figure 1 shows the typical low of compressed FPGA bit stream reconfiguration. Bit streams generated by vendor specific bit generation programs are compressed and stored on a persistent memory. The decompression hardware decoded and transfers the compressed bits from memory to configuration hardware which is then transferred to configurable logic blocks (CLB) memory. Figure 1 Traditional FPGA reconfiguration with compression Compression ratio is the metric commonly used to measure effectiveness of a compression technique, defined as (compressed size) Compression ratio( ) = (uncompressed size) We can classify the existing bit stream compression techniques into two categories: those having good compression ratio but unacceptable decompression overhead and complexity, and others which accelerate decompression but compromises compression ratio. The main idea of these algorithms is to store frequently occurring sequence of bits using a static or sliding dictionary or to use FPGA specific features (partial ISSN: 2231-5381 http://www.ijcttjournal.org Page 77

reconfiguration or read back to obtain repetitive patterns. One of the promising compression techniques is bitmask based code compression because of its good compression ratio and simple decompression logic. The direct application of this algorithm is not flexible in choosing word length or number, size and type of bitmasks or dictionary size. It is obvious that unlimited use of these will result in better matches but will result in multiple variable length encodings. However using them is not profitable as they will result in slower and complex decompression hardware. Hence it is a major challenge to develop an efficient compression technique which significantly reduces the bit stream size without sacrificing decompression performance. There are numerous compression algorithms that can be used to compress configuration bit streams. These techniques can be classified into two categories based on how the redundancies are exploited: format specific compression and generic bit stream compression. The compression techniques within the initial class exploit the native redundancies in an exceedingly single or multiple bit streams by reading back the organized information and storing the differences by activity exclusive-or (XOR) operation. These algorithms need FPGA to support partial reconfiguration and frame read back practicality. Pan et al. uses frame rearrangement within the absence of read back facility on FPGA. during this technique frames square measure reordered specified the similarity between ensuant frames organized is most. The distinction between consecutive frames (difference vector) is then encoded using either Huffman based mostly} run length encryption or LZSS based compression. Another technique projected in the same article organizes and read back the configured frames. The frames are organized such that compressed bit stream contains minimal number of difference vectors and maximal read back of configured frames thus reducing the compressed frames significantly. Such complex encoding schemes tend to produce excellent compression ratio. However, decompression is a major bottleneck and is not addressed by Pan et al. The generic bit stream compression technique uses complete bit stream to extract the redundancies insidelittle window (usually thirty two bytes) and encrypt the data. a plus of those techniques is that no special FPGA support is needed for decompression. Parameterized LZSS chooses economical parameters appropriate for bit stream compression and decompression. The compression focuses on the foremost repetition lengths within the matched strings to encrypt partial set of these lengths exploitation less range of bits and therefore the rest is encoded exploitation canonical illustration. The decompression hardware is fairly straightforward and is in a position to decrypt at acceptable speed. LZ77 algorithmic rule projected in a n exceedingly so works within the same manner by matching the redundant symbols in a little window. In sum, the compression technique in achieves vital compression however incurs forceful decompression overhead. On the opposite hand the approaches in second class try and maintain decompression overhead in a suitable vary however compromises on compression potency. Our technique tries to contemplate decompression bottleneck and overhead throughout the compression of bit streams. The compression parameters area unit chosen such that compressed bit streams area unit decrypt friendly whereas maintaining an honest compression quantitative relation II. DECODE AWARE BIT STREAM COMPRESSION On the compression aspect, FPGA configuration bit stream is analysed for choice of profitable wordbook entries and bitmask patterns. The compressed bit stream is then generated exploitation bitmask-based compression and run length secret writing (RLE). Next, our decode-aware placement algorithmic rule is used to put the compressed bitstream within the memory for efficient decompression. Throughout run-time, the compressed bitstream is transmitted from the memory to the decompression engine, and therefore the original configuration bit stream is made by decompression. Algorithm 1 outlines four important steps in our decode-aware compression framework (shown in Fig.3-1) bitmask selection; 2) dictionary selection; 3) RLE compression; and 4) decode-aware placement. The input bitstream is first divided into a sequence of symbols with length of w. Then bitmask patterns and dictionary entries used for bitmask-based compression are selected. Next, the symbol sequence is compressed using bitmask and RLE. We use the same algorithm in to perform the ISSN: 2231-5381 http://www.ijcttjournal.org Page 78

bitmask-based compression. Finally, placing the compressed bit stream into a decode friendly layout within the memory using placement algorithm. Algorithm 1: Decode Aware Bitstream Compression Input: Input bitstream Output: Compressed bitstream placed in memory Step 1:Divide input bitstream into symbol sequence SL. Step 2: Perform bitmask pattern selection. Step 3: Perform dictionary selection. Step 4: Compress symbol SL into code sequence CL using bitmask and RLE (Run-Length Encoding) Figure 2 Decode-aware bit stream compression framework. Since memory and communication bus are designed in multiple of bytes (8 bits), storing dictionaries or transmitting data other than multiple of byte size is not efficient. Thus, we restrict the symbol length to be multiples of eight in our current implementation. Since the dictionary for bit stream compression is smaller compared to the size of the bit stream itself, we use d=2i to fully utilize the bits for dictionary indexing, where is the number of indexing bits. 1.Bitmask Selection: Our bitmask-based compression is similar to [5], where three types of encoding formats are used. Fig. 3 shows the formats in these cases: no compression, compression using dictionary, and compression using bitmask. The selection of bitmask plays an important ro le in bitmask-based compression. Generally, there are two types of bitmask patterns. One is fixed bitmask, which can only be applied on fixed positions in a symbol. The other one is sliding bitmask, which can be applied at any position. For example, a 2-bit fixed bitmask ( 2f bitmask) is restricted to be used on even locations, but a 2-bit sliding bitmask ( 2s bitmask) can be used anywhere. Clearly, fixed bitmasks require less bits to encode its location, but they can only match bit changes at fixed positions. On the other hand, sliding bitmasks are more flexible, but consume more bits to encode. In other words, only a few number of bitmask patterns or their combinations are profitable for compression. Similar to [5], in our study of bit stream compression, we only use profitable bitmask patterns(1s,2s,2f,3s,3f,4s,4f). 2. Dictionary Selection: Algorithm 2 shows our dictionary selection algorithm. Compared to the dictionary selection approach proposed in [5] for instruction compression, we made an important optimization at Step 5). In the original algorithm [5], any node adjacent to the most profitable node is removed, if its profit is less than certain threshold. This mechanism is designed to reduce the dictionary size. However, if the threshold is not chosen properly, some high frequency symbols may be incorrectly removed. Since the dictionary size in bit stream compression is usually negligible compared with the size of the bit stream, it is not beneficial to reduce the dictionary size by scarifying the compression ratio. Figure 3 Decompression Mechanism ISSN: 2231-5381 http://www.ijcttjournal.org Page 79

Therefore, our algorithm used new heuristics in Step 5), which carefully removes edges instead of nodes. Experimental results in Section V-A show that our approach is more suitable for bit stream compression, because we ensure better dictionary coverage. Input bit steams Divide input bitstream into symbol sequence SL Perform bitmask pattern selection. Perform dictionary selection. Compress SL symbol into CL symbol using Bitmasking Perform decode aware placement of CL Compressed bitstream is placed in memory bitmasks for storing bitmask differences.koch et al. [4] benchmarks are compressed using 16 bit symbols, with 16 entry dictionary and a 2-bit sliding bitmask. Compression Efficiency: We first compare our improved bitmaskcompression technique with the original approach proposed in [5]. To avoid the bias caused by parameter selection, we use the same bitmask parameters for both of them. Three different compression techniques are compared for compression efficiency: 1) bitmaskbased compression (BMC) [5]; 2) BMC with our dictionary selection technique (pbmc); and 3) BMC with our dictionary selection technique and run length encoding (pbmc+rle). Fig. 4 shows the compression results on Pan et al. [1] and Koch et al. [4] benchmarks. The same results are found applicable to other families and vendors too. In our experiments, Pan et al. [1] benchmarks are compressed with 32 bit symbols, 512 entry dictionary entries and two sliding 2- and 3-bit Fig 5.RLE based Compression It can be seen that our dictionary selection algorithm outperform the original technique. The dictionary generated by our algorithm improves the compression ratio by 4% to 5%. Since in our approach we do not have to find the threshold value manually for each bit stream, our algorithm adaptively finds the most suitable dictionary entries for each bit stream. On the other hand, our method has the same performance. The experimental results also illustrate the improvement of compression ratio due to the run length encoding used in our technique. Decompression Efficiency: We measured the decompression efficiency using the time required to reconfigure a compressed bit stream, the resource usage and maximum operating frequency of ISSN: 2231-5381 http://www.ijcttjournal.org Page 80

the decompression engine. The reconfiguration time is calculated using the product of number of cycles required to decode the compressed bit stream and operating clock speed. We have synthesized decompression units for variable-length bitmask-based compression, difference vector-based compression (DV RLE RB), LZSS (8 bit symbols6), and our proposed approach on Xilinx Virtex II family XC2v40 device FG356 package using ISE 9.2.04i to measure the decompression efficiency. III. RESULTS AND CONCLUSION This thesis analysed 2 set of compression algorithms. a group of algorithms that reduces bit stream size with higher compression magnitude relation however doesn't contemplate the decompression overhead. Another set of compression techniques that square measure economical decompression however with unacceptable compression magnitude relation. This thesis planned AN economical rewrite aware compression technique that tries to balance between higher compression magnitude relation and least decompression overhead. The planned compression technique analyses the impact of parameters on decompression overhead and selects compression parameters that square measure rewrite friendly. This well combined with run length encryption of consecutive repetitive patterns improves the compression and decompression potency. This thesis planned a strategic arranging rule to reorganize variable length compressed bits to get mounted length compressed bit streams. The mounted encryption of the compressed words enabled the decompression engine to rewrite at FPGA's high operational frequency. a unique wordbook choice rule is devised that produces wordbook, covering most words exploitation least wordbook size and minimum variety of bitmasks. The planned technique to compress reconfiguration bitstream is found to enhance compression magnitude relation by around 10-15% and therefore the decompression engine capable of in operation at around 200MHZ. The reconfiguration time is reduced by around 15-20% compared to nearest decompression accelerator. Memory and communication bandwidth has been a major bottleneck in most of the system design. The operational speed of different components is diverging apart at an ever increasing pace. Decode aware compression promises to bridge this gap by reducing the data size and by accelerating the decompression process. This thesis explored only few problems in reconfigurable systems where decode aware compression can system performance. The proposed techniques in this paper can be further explored in the following directions: Bitmask compression technique allows better compression and faster decompression engine. Binary tries work on longest prefix and bit differences, drastically reducing the bits required to encode. An interesting approach is to combine these two techniques to compress very hard to compress audio and video data. Such a combination would provide faster decoding and better lossless data compression. The proposed technique can be explored to apply in compressing data sent over heterogeneous network elements. The decode aware decompression can bridge the gap between the different bandwidth at which the existing network elements work. Further studies can be conducted to eliminate the threshold parameter that is used to limit the exploration of word length. The input data pattern can be automatically analysed to choose the parameters for compression. This will potentially bring the compression ratio and decompression overhead closer to optimum efficiencies. The current application of optimal representation of n bit difference can be further explored on systems that store bit differences. The systems that require large number of bitmasks to encode the data will be benefited by the proposed optimal encoding scheme. Some of the systems which we identified are in the area of efficient database storage and differential data backup based systems Figure 4 Simulation result for Compression ISSN: 2231-5381 http://www.ijcttjournal.org Page 81

Input: Uncompressed Bitstream: 00000000000000000000000000000000000000000 000000001000010 Output: Compressed Bitstream: 0100011100000101 Figure 7 Technology Schematic Decompression REFERENCES Figure 5 Simulation result for Compression Input: Compressed Bitstream: 0100011100000101 Output: Uncompressed Bitstream: 000000000000000000000000000000000000000000 00000001000010 1. D. E. Knuth, J. H. M. Jr., and V. R. Pratt, Fast pattern matching in strings, SIAM J. Comput., vol. 6, no. 2, pp. 323 350, June 1977. 2. R. S. Boyer and J. S. Moore, A fast string matching algorithm, Commun. ACM, vol. 20, no. 10, pp. 762 772, October 1977 3. J. H. Pan, T. Mitra, and W. F. Wong, Configuration bitstream compression for dynamically reconfigurable FPGAs, in Proc. Int. Conf. Comput.-Aided Des., 2004, pp. 766 773. 4. L. Feinstein, D. Schnackenberg, R. Balupari, and D. Kindred, Statistical approaches to ddos attack detection and response, in DISCEX, 2003. 5. D. Koch, C. Beckhoff, and J. Teich, Bitstream decompression for high speed FPGA configuration from slow memories, in Proc. Int. Conf. Field-Program. Technol., 2007, pp. 161 168. 6. L. Spitzner, Honeypots: Tracking Attackers. Addison-Wesley, 2002. 7. C.Morrow http://www.secsup.org/tracking. BlackHole Route Server and Tracking Traffic on an IP Network. 8. http://www.snort.org. SNORT: Open-Source Network IDS/IPS. 9. A. V. Aho and M. J. Corasick, Efficient string matching: an aid to bibliographic search, Commun. ACM, vol. 18, no. 6, pp. 333 340, 1975. Figure 6 Technology Schematic for Compression ISSN: 2231-5381 http://www.ijcttjournal.org Page 82

Authors Profiles K.Khuresh Gouse is pursuing his Master degree M.Tech in VLSI &Embedded systems Design in Gates Institute of Technology, Gooty. N.Chitra, is working as Associate Professor in Gates Institute Of Technology, Gooty. Her areas of interest include Communication systems and VLSI. K. Maheswari, is working as Associate Professor in Gates Institute Of Technology, Gooty. Her areas of interest include Mobile Communication, wireless communication, Cryptography ISSN: 2231-5381 http://www.ijcttjournal.org Page 83

International Journal of Engineering Trends and Technology (IJETT) Volume 18 Number2- Dec 2014