вход по аккаунту



код для вставкиСкачать
previous designs in [I]. 11.6% power saving is achieved for overdl
RMW operation if oDnand burst length are assumed as 0.5 and 1,
respectively. When all data bits have to be updated (or ~ D =
B 1.O), total
power consumption is increased by 5.1 % of the overall power consumption. However, this is a very rare case in real applications.
Power consumption of the RCW scheme, ranging from 71.8% 105. 1YOof the previous design, depends on the number of bits that have
to be updated (or GOB). Even though the compare unit and bit wise CD
control increase the die area by 3.5% of previous design in [I], the
RCW scheme is well matched with the RMW transaction without any
timing penalty because the pre-fetch operation of the stored data is
automatically performed in read cycle. The power saving factor by the
RCW scheme is further enhanced as memory density and bandwidth
are increased. This is because the increase in both DB load capacitance
and burst length enlarges the portion of DB power consumption in the
conventional power contribution.
Conclusions: The proposed RCW scheme provides a clear solution
for low-power RMW operations for mobile 3D graphics or various
high bandwidth DRAM macros. Its power consumption depends on
the bit-wise update ratio. The power saving factor is further enhanced
as memory density and bandwidth are increased.
0 IEE 2002
Electronics Letters Oiiline No: 20020069
23 October 2001
Dol: IO. 1049/el:2002V069
Y.-H. Park, S. Choi and H.-J. Yo0 (Dept. of’ EE, Korea Advanced
Institute of Science and Technology (KAIST). 373-1. Kusimg-dong,
Yusong-gu. Taejon 305-701, Korea)
However, as each residue x i , is represented as a bus of nri wires only
one of which is high at any given time, there is a large number of
storage elements required to implement a register for the OHRNS. For
example, consider use of the moduli {37,41,43}, chosen to provide a
dynamic range of 65,231 or approximately 16 bits. Each storage
register would require 121 flip-flops of which only six would change
in each clock cycle, Le. three going high and three going low. Using
standard static CMOS flip-flop designs would result in a significant
dynamic power dissipation owing to clock switching within each flipflop as well as providing a significant clock distribution loading effect
[3]. Such an increase in power dissipation could negate the benefits in
power delay products promised by the OHRNS.
This Letter describes a register architecture that takes advantage of
the OHRNS attributes and overcomes the powcr dissipation problems
associated with the one hot encoding of the residues.
Proposed register design: The basic flip-flop architecture is shown in
Fig. 1 . The storage function is achieved with a level sensitive latch
which is transparent when the T / i input is high and latched otherwise. The D input is compared to the Q output by means of the
exclusive OR gate. When they are the same logic level, the level
sensitive T/L input is set to a logic 0 by means of the PMOS pull up
and inverter and the TRIGB input is isolated from the circuit. When
the D input is not the same as the Q output, then the TRIGB input is
directed through the NMOS device to drive the T / i via the inverter.
This TRIGB line is driven by a short pulse to logic 0 at the rising edge
of the system clock. Hence, if the flip-flop D and its Q output are
different at the rising edge of the system clock, then the short pulse on
the TRIGB line is directed to provide a short latching pulse to the
level sensitive latch which will store the input data bit.
E-mail: [email protected]
1 PARK, Y.-ll., IIAN. S-ld., LEE, J.-H., and YOO, H.-J.: ‘A 7.1 GB/S low power
rendering engine in 2D array embedded memory logic CMOS for
portable multimedia system’, IEEE 1 Solid-State Circuits, 200 1, 36,
(6), pp. 934-955
2 INOUE: K., NAKAMURA. El., and KAWAI, H.: ‘A 10 Mb kame buffer memory
with 2-compare and A-blend units’, IEEE 1 Solid-Stute Circuits, 1995,
30, (12), pp. 1563-1568
3 KOOK, J., and YOO, H.-J.: ‘A single bit line writing scheme for low power
reconfigurable 1/0 DRAM macro’. IEEE European Solid-State Circuit
Conference of Digest of Technical Paper, September 2000, pp. 420-423
Static register implementation for one hot
residue number systems
T. Conway
method of implemcnting static registers for onc-hot rcsiduc numbcr
systems is described. The method ovcrcomcs thc high powcr dissipation problcms associated with conventional Hip-Hops and clock
distribution. The proposed design relics on thc low activity factor
inherent in thc one hot coding structure and a hybrid clocking system
that mininiises the switching capacitance associated with the clock
Introduction: The residue number system (RNS) is a method of
representing a range of integers 0 . . .M - 1 by using their residues
xi modulo a series of relatively prime moduli mo . . . mN-1 where M is
the product of the moduli. The operations of addition, subtraction and
multiplication (all modulo M) can be completed by operating on each
residue independently, thus providing the potential for high-speed
arithmetic [I]. The one hot RNS (OHRNS) system has been proposed
as a means of accelerating the operations on each moduli by
representing the individual residues in a one hot encoded manner.
The resulting operations of addition, subtraction and multiplication
can then be implemented by barrel shifter circuits. This encoding has
been shown to achieve favourable power delay products for the
arithmetic operations as well as providing practical solutions for the
scaling operation [2].
17th January 2002
Fig. 1 Busic jipflop archilecture
This architecture has two valuable attributes. First, when there is no
logic activity on the data input of the flip-flop, there is no switching
activity within the flip-flop and hence there will be no dynamic power
dissipation if the circuit is designed using CMOS static logic design.
This will significantly reduce the power dissipation in the case of the
one hot RNS system where the activity level of each one hot encoded
residue is very low.
Secondly, the TRIGB input of the flip-flop drives the drain of a single
NMOS device which can be made a minimum size device. When there
is no new data for the flip-flop to store the input capacitance on this line
will be one drain junction capacitance. Thus the clock distribution
circuit will have to drive a small capacitance in all cases except when
the input data has changed and needs to be stored. However, as only two
data lines can change in the one hot encoded scheme, the switched
capacitance on the TRlCB line will be low, leading to a low dynamic
power dissipation in the clock distribution circuit (Fig. 2).
Fig. 2 Clock distribution circuit
Vol. 38 No. 2
Performtrnce The proposed architecture can be implemented with 24
transistors which is comparable in ared
flip-flop The performance of the propo
ture was assessed
by simulating the proposed design and
flop using a 0.25 pm process First, a 4 1 bit register to store a single
residue was designed using 41 storage elements and a clock distribution circuit as described previously This was compared to a 41 bit
register comprising standard flip-flops and a suitable
Table 1 shows the simulated power dissipation with a 250
signal and representative one hot encoded data The propo
reduces the power dissipation by a factor of 6, owing to the redu
in switching activity and lower clock distnbution dissipation
Word-parallel CRC computation on VLlW
D. Hubaux and J.-D. Legat
Cyclic rcdundancy check (CRC) is widely used for crror dctcction. For
optimal pcrformanccs, a method has bcen devclopcd for bit-parallel
proccssing. but it may not takc advantage of parallcl processor
architccturc. Hcre, a method is proposed for using thc full powcr of
a very long instruction word (VLIW) digital signal processor (DSP)
architecture in CRC computation. Thc method is at lcast four timcs
faster for 8, 16 and 32 bits CRC.
Table 1: Power dissipation for 41 bit register at 250 MHz
Power dissipation,
Standard FF
Prouoscd dcsien I
The register required to store the 12 1 bits required for the moduli set
{37,41,43} was designed by using three registers similar to the 41 bit
register just described. This was simulated with representative one hot
encoded data in which six lines change in each clock cycle. For
comparison purposcs, a 16 bit binary register consisting of 16 standard
flip-flops was also simulated with each bit changing every second clock
cycle to model random data. The resulting power dissipation at a clock
frequency of 250 MHz is shown in Table 2. Both have a similar power
dissipation despite the large number of flip-flops in the OHRNS
Table 2: Powcr dissipation for 16 bit dynamic rangc at 250 MHz
16 bit
Powcr dissipation,
:37,41,43} OHRNS
Conclusions: These simulations show that the proposed design can
provide a major reduction in power dissipation due to storage registers
and clock distribution when used instead of standard flip-flops in a
system using the OHRNS. They also show that the dynamic power
dissipation due to the storage registers in the OHRNS can be
competitive with the conventional binary system and thus the power
delay characteristics of OHRNS could be cxploited in a real system.
Acknoivleclgmencs: The author is grateful to Europractice for the 1C
design tools and North Carolina State University for their NCSU
dcsign kit.
0IEE 2002
Electronics Letters Online No: 20020050
D o l : 10.1049/el:20020050
T. Conway (ECE Department, University
Technological Park, Limerick, Ireland)
23 October 2 0 0 I
of' Limerick,
MA, JENKINS. ~ v . K . , JULLIEN, c.A., TAYLOR. EJ. (Eds.):
'Residue number system arithmetic: modem applicationsin digital signal
processing' (IEEE Press, 1986)
2 CtIKEN. W.A.: 'One-hot residue coding for low delay-power product
CMOS design', IEEE Trans. Circiiits Syst. 11 Analog. Digit. Signal
Process., 1998,45. (3). pp. 303-3 13
3 STOJANOVIC.V, and OKLOBDZIJA, V: 'Comparative analysis of masterslave latches and flip-flops for high-performance and low-power
systems', IEEE 1 Solid-State Circuir.s, 1999, 34, (4), pp. 536-548
1 2 1
17th January 2002
Vol. 38 No.
Без категории
Размер файла
280 Кб
Пожаловаться на содержимое документа