previous designs in [I]. 11.6% power saving is achieved for overdl RMW operation if oDnand burst length are assumed as 0.5 and 1, respectively. When all data bits have to be updated (or ~ D = B 1.O), total power consumption is increased by 5.1 % of the overall power consumption. However, this is a very rare case in real applications. Power consumption of the RCW scheme, ranging from 71.8% 105. 1YOof the previous design, depends on the number of bits that have to be updated (or GOB). Even though the compare unit and bit wise CD control increase the die area by 3.5% of previous design in [I], the RCW scheme is well matched with the RMW transaction without any timing penalty because the pre-fetch operation of the stored data is automatically performed in read cycle. The power saving factor by the RCW scheme is further enhanced as memory density and bandwidth are increased. This is because the increase in both DB load capacitance and burst length enlarges the portion of DB power consumption in the conventional power contribution. Conclusions: The proposed RCW scheme provides a clear solution for low-power RMW operations for mobile 3D graphics or various high bandwidth DRAM macros. Its power consumption depends on the bit-wise update ratio. The power saving factor is further enhanced as memory density and bandwidth are increased. 0 IEE 2002 Electronics Letters Oiiline No: 20020069 23 October 2001 Dol: IO. 1049/el:2002V069 Y.-H. Park, S. Choi and H.-J. Yo0 (Dept. of’ EE, Korea Advanced Institute of Science and Technology (KAIST). 373-1. Kusimg-dong, Yusong-gu. Taejon 305-701, Korea) However, as each residue x i , is represented as a bus of nri wires only one of which is high at any given time, there is a large number of storage elements required to implement a register for the OHRNS. For example, consider use of the moduli {37,41,43}, chosen to provide a dynamic range of 65,231 or approximately 16 bits. Each storage register would require 121 flip-flops of which only six would change in each clock cycle, Le. three going high and three going low. Using standard static CMOS flip-flop designs would result in a significant dynamic power dissipation owing to clock switching within each flipflop as well as providing a significant clock distribution loading effect [3]. Such an increase in power dissipation could negate the benefits in power delay products promised by the OHRNS. This Letter describes a register architecture that takes advantage of the OHRNS attributes and overcomes the powcr dissipation problems associated with the one hot encoding of the residues. Proposed register design: The basic flip-flop architecture is shown in Fig. 1 . The storage function is achieved with a level sensitive latch which is transparent when the T / i input is high and latched otherwise. The D input is compared to the Q output by means of the exclusive OR gate. When they are the same logic level, the level sensitive T/L input is set to a logic 0 by means of the PMOS pull up and inverter and the TRIGB input is isolated from the circuit. When the D input is not the same as the Q output, then the TRIGB input is directed through the NMOS device to drive the T / i via the inverter. This TRIGB line is driven by a short pulse to logic 0 at the rising edge of the system clock. Hence, if the flip-flop D and its Q output are different at the rising edge of the system clock, then the short pulse on the TRIGB line is directed to provide a short latching pulse to the level sensitive latch which will store the input data bit. E-mail: [email protected] References 1 PARK, Y.-ll., IIAN. S-ld., LEE, J.-H., and YOO, H.-J.: ‘A 7.1 GB/S low power rendering engine in 2D array embedded memory logic CMOS for portable multimedia system’, IEEE 1 Solid-State Circuits, 200 1, 36, (6), pp. 934-955 2 INOUE: K., NAKAMURA. El., and KAWAI, H.: ‘A 10 Mb kame buffer memory with 2-compare and A-blend units’, IEEE 1 Solid-Stute Circuits, 1995, 30, (12), pp. 1563-1568 3 KOOK, J., and YOO, H.-J.: ‘A single bit line writing scheme for low power reconfigurable 1/0 DRAM macro’. IEEE European Solid-State Circuit Conference of Digest of Technical Paper, September 2000, pp. 420-423 Static register implementation for one hot residue number systems T. Conway method of implemcnting static registers for onc-hot rcsiduc numbcr systems is described. The method ovcrcomcs thc high powcr dissipation problcms associated with conventional Hip-Hops and clock distribution. The proposed design relics on thc low activity factor inherent in thc one hot coding structure and a hybrid clocking system that mininiises the switching capacitance associated with the clock distribution. A Introduction: The residue number system (RNS) is a method of representing a range of integers 0 . . .M - 1 by using their residues xi modulo a series of relatively prime moduli mo . . . mN-1 where M is the product of the moduli. The operations of addition, subtraction and multiplication (all modulo M) can be completed by operating on each residue independently, thus providing the potential for high-speed arithmetic [I]. The one hot RNS (OHRNS) system has been proposed as a means of accelerating the operations on each moduli by representing the individual residues in a one hot encoded manner. The resulting operations of addition, subtraction and multiplication can then be implemented by barrel shifter circuits. This encoding has been shown to achieve favourable power delay products for the arithmetic operations as well as providing practical solutions for the scaling operation [2]. ELECTRONICS LETTERS 17th January 2002 Fig. 1 Busic jipflop archilecture This architecture has two valuable attributes. First, when there is no logic activity on the data input of the flip-flop, there is no switching activity within the flip-flop and hence there will be no dynamic power dissipation if the circuit is designed using CMOS static logic design. This will significantly reduce the power dissipation in the case of the one hot RNS system where the activity level of each one hot encoded residue is very low. Secondly, the TRIGB input of the flip-flop drives the drain of a single NMOS device which can be made a minimum size device. When there is no new data for the flip-flop to store the input capacitance on this line will be one drain junction capacitance. Thus the clock distribution circuit will have to drive a small capacitance in all cases except when the input data has changed and needs to be stored. However, as only two data lines can change in the one hot encoded scheme, the switched capacitance on the TRlCB line will be low, leading to a low dynamic power dissipation in the clock distribution circuit (Fig. 2). I TRIGB I +- CLK D - 1 I I I Fig. 2 Clock distribution circuit Vol. 38 No. 2 63 Performtrnce The proposed architecture can be implemented with 24 transistors which is comparable in ared flip-flop The performance of the propo ture was assessed by simulating the proposed design and flop using a 0.25 pm process First, a 4 1 bit register to store a single residue was designed using 41 storage elements and a clock distribution circuit as described previously This was compared to a 41 bit register comprising standard flip-flops and a suitable Table 1 shows the simulated power dissipation with a 250 signal and representative one hot encoded data The propo reduces the power dissipation by a factor of 6, owing to the redu in switching activity and lower clock distnbution dissipation Word-parallel CRC computation on VLlW DSP D. Hubaux and J.-D. Legat Cyclic rcdundancy check (CRC) is widely used for crror dctcction. For optimal pcrformanccs, a method has bcen devclopcd for bit-parallel proccssing. but it may not takc advantage of parallcl processor architccturc. Hcre, a method is proposed for using thc full powcr of a very long instruction word (VLIW) digital signal processor (DSP) architecture in CRC computation. Thc method is at lcast four timcs faster for 8, 16 and 32 bits CRC. Table 1: Power dissipation for 41 bit register at 250 MHz I L Design II Power dissipation, I Standard FF Prouoscd dcsien I mW I I 4.25 0.7 __ ..-.. =* The register required to store the 12 1 bits required for the moduli set {37,41,43} was designed by using three registers similar to the 41 bit register just described. This was simulated with representative one hot encoded data in which six lines change in each clock cycle. For comparison purposcs, a 16 bit binary register consisting of 16 standard flip-flops was also simulated with each bit changing every second clock cycle to model random data. The resulting power dissipation at a clock frequency of 250 MHz is shown in Table 2. Both have a similar power dissipation despite the large number of flip-flops in the OHRNS register. Table 2: Powcr dissipation for 16 bit dynamic rangc at 250 MHz Design 16 bit Powcr dissipation, binary :37,41,43} OHRNS Conclusions: These simulations show that the proposed design can provide a major reduction in power dissipation due to storage registers and clock distribution when used instead of standard flip-flops in a system using the OHRNS. They also show that the dynamic power dissipation due to the storage registers in the OHRNS can be competitive with the conventional binary system and thus the power delay characteristics of OHRNS could be cxploited in a real system. Acknoivleclgmencs: The author is grateful to Europractice for the 1C design tools and North Carolina State University for their NCSU dcsign kit. 0IEE 2002 Electronics Letters Online No: 20020050 D o l : 10.1049/el:20020050 T. Conway (ECE Department, University Technological Park, Limerick, Ireland) 23 October 2 0 0 I of' Limerick, National References SODERSTKAND. MA, JENKINS. ~ v . K . , JULLIEN, c.A., TAYLOR. EJ. (Eds.): 'Residue number system arithmetic: modem applicationsin digital signal processing' (IEEE Press, 1986) 2 CtIKEN. W.A.: 'One-hot residue coding for low delay-power product CMOS design', IEEE Trans. Circiiits Syst. 11 Analog. Digit. Signal Process., 1998,45. (3). pp. 303-3 13 3 STOJANOVIC.V, and OKLOBDZIJA, V: 'Comparative analysis of masterslave latches and flip-flops for high-performance and low-power systems', IEEE 1 Solid-State Circuir.s, 1999, 34, (4), pp. 536-548 1 64 1 2 1 ELECTRONICS LETERS 1 6 1 17th January 2002 ... Vol. 38 No. 3
1/--страниц