INFORMATION TO USERS This manuscript has been reproduced from the microfilm master. UMI films the text directly from the original or copy submitted. Thus, some thesis and dissertation copies are in typewriter face, while others may be from any type of computer printer. The quality of this reproduction is dependent upon the quality of the copy subm itted. Broken or indistinct print, colored or poor quality illustrations and photographs, print bleedthrough, substandard margins, and improper alignment can adversely affect reproduction. In the unlikely event that the author did not send UMI a complete manuscript and there are missing pages, these will be noted. Also, if unauthorized copyright material had to be removed, a note will indicate the deletion. Oversize materials (e.g., maps, drawings, charts) are reproduced by sectioning the original, beginning at the upper left-hand comer and continuing from left to right in equal sections with small overlaps. Photographs included in the original manuscript have been reproduced xerographically in this copy. Higher quality 6" x 9" black and white photographic prints are available for any photographs or illustrations appearing in this copy for an additional charge. Contact UMI directly to order. ProQuest Information and Learning 300 North Zeeb Road, Ann Arbor, Ml 48106-1346 USA 800-521-0600 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. STEADY-STATE METHODS FOR SIMULATION OF RF AND MICROWAVE CIRCUITS A dissertation submitted by Madeline Kleiner In partial fulfillment o f the requirements for the degree o f Doctor o f Philosophy in Electrical Engineering TUFTS UNIVERSITY May 2002 © 2002, Madeline Kleiner ADVISER: Professor Mohammed Afsar Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. UMI Number: 3042933 Copyright 2002 by Kleiner, Madeline Ann All rights reserved. ___ ® UMI UMI Microform 3042933 Copyright 2002 by ProQuest Information and Learning Company. All rights reserved. This microform edition is protected against unauthorized copying under Title 17, United States Code. ProQuest Information and Learning Company 300 North Zeeb Road P.O. Box 1346 Ann Arbor, Ml 48106-1346 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. A b s tr a c t Given the growing demand for today’s wireless communication devices, a more efficient method is needed to determine the steady-state response of circuits frequently found in microwave and RF applications. Problems with the available methods showed the need for a low-cost and efficient alternative for steady-state determination of nonlinear circuits. The shooting-Newton method for steady-state determination has been implemented in SPICE, the most popular general- purpose circuit simulator. The new steady-state analysis option incorporates standard Gaussian elimination and different Krylov methods for solution of the linear system generated by the shooting-Newton algorithm. The steady-state analysis in SPICE is shown to be much faster than SPICE transient analysis when finding the steadystate solution of strongly nonlinear circuits. Krylov methods show even greater improvements in determining steady-state for larger circuits. In addition, a mathematically consistent description of all numerical methods involved in the shooting-Newton method used in circuit simulation is presented. Computational issues of the algorithm are considered. The accuracy of the shootingNewton method and its accuracy with the different solver options are provided. Some eigenvalues are examined for example circuits. A detailed comparison of the Krylov methods along with their convergence properties is presented. In addition, a breakdown of the Krylov methods BiCG and BiCGSTAB has been corrected by changing the initial guess. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. A c k n o w le d g e m e n ts I wish to express my deepest appreciation to my dissertation committee, Professors Mohammed Afsar, Misha Kilmer, Karen Panetta, and Arthur Uhlir for their wisdom, guidance, and friendship. I especially wish to thank Misha Kilmer for helping me understand and use numerical methods and Arthur Uhlir for his wonderful questions and discussions on steady-state circuit theory. I am grateful to Keith Nabors, Joel Phillips, Baolin Yang, Dan Feng, Ilya Yusim, Ricardo Telichevsky, Sergie Pevchin, and Ken Kundert of Cadence Design Systems for their willingness to share their knowledge of circuit simulation with me. They allowed me to make a greater contribution to this field and their expertise will continue to inspire me. I wish to acknowledge and express my gratitude to Paul Draxler, Thomas Brazil, and K.C. Gupta from the Microwave Theory and Techniques Society for their interesting and useful discussions on various aspects of circuit simulation. I wish to thank my husband Scott Wurcer and my four children, Josh, Jonah, Rachael, and Claire for their assistance and patience. Thank you Jonah for all that typing and help with the word processing monster. I also want to also acknowledge all the friends, especially Susan Santucci and Bobbi Tsukahara, who have believed in me. I want to thank the community where I live, Cambridge Cohousing for all the support it has given me. Lastly, I wish to thank my mother, Betty Lawson and my brothers and sisters, Bruce, Craig, Cynthia, and Tami for their constant encouragement. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table of Contents Chapter 1 ............................................................................................................................................ 2 Introduction........................................................................................................................................2 1.1 Problems with available methods.............................................................................5 1.2 New method and its advantages...............................................................................6 1.3 Organization of the thesis......................................................................................... 8 Chapter 2 .......................................................................................................................................... 10 Steady-State Determination for RF and Microwave Circuits...................................................... 10 2.1 Introduction.............................................................................................................. 10 2.2 Transient analysis in SPICE................................................................................... 12 2.2.1 Transient analysis formulation............................................................................... 13 2.3 Shooting-Newton m ethod.......................................................................................16 2.3.1 Derivation of the shooting-Newton method..........................................................17 2.3.2 Forming the Jacobian by numerical differentiation.............................................18 2.3.3 Forming the Jacobian by transient analysis..........................................................19 2.3.4 Steps in the shooting-Newton m ethod..................................................................22 2.4 Harmonic balance method...................................................................................... 23 2.4.1 Phasor analysis........................................................................................................ 24 2.4.2 Simple example of harmonic balance m ethod.....................................................26 2.4.3 Extensions of the harmonic balance method........................................................ 30 2.4.4 Other m ethods......................................................................................................... 31 2.5 Conclusion................................................................................................................32 Chapter 3 .......................................................................................................................................... 34 Numerical Methods Background.................................................................................................. 34 3.1 Introduction............................................................................................................. 34 3.2 How the system of equations are generated......................................................... 36 3.3 Direct versus iterative methods............................................................................. 38 3.4 Iterative methods background............................................................................... 40 3.4.1 Stationary and non-stationary iterative methods................................................. 41 3.5 Krylov-subspace m ethods......................................................................................43 3.5.1 Classification of Krylov methods..........................................................................45 3.5.2 Conjugate Gradient (CG) method.........................................................................48 3.5.3 The Generalized Minimum Residual (GMRES) m ethod................................... 51 3.5.4 BiConjugate Gradient (BiCG) method................................................................. 55 3.5.5 Conjugate Gradient Squared (CGS) method........................................................60 3.5.6 Quasi-Minimal Residual (QMR) method.............................................................65 3.5.7 BiConjugate Gradient Stabilized (BiCGSTAB) m ethod................................... 67 3.6 Sum m ary................................................................................................................. 69 Chapter 4 .......................................................................................................................................... 73 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Implementation of the Shooting-Newton Algorithm Using Krylov-Subspace Methods 73 4.1 Implementation in SPICE...................................................................................... 74 4.2 The Jacobian matrix for the shooting-Newton method......................................76 4.2.1 Minimizing number of sensitivity matrix formations........................................ 76 4.2.2 Damping the sensitivity matrix.............................................................................77 4.2.3 Importance of integration method........................................................................ 78 4.2.4 Eigenvalues and sparsity pattern of the Jacobian............................................... 78 4.2.5 Implicit matrix options.......................................................................................... 82 4.3 Krylov-subspace linear solver optimizations...................................................... 83 4.3.1 Stopping criterion...................................................................................................83 4.3.2 Maximum number of iterations.............................................................................84 4.3.3 Breakdown of the method..................................................................................... 85 4.3.4 Computational issues............................................................................................. 87 4.4 Sum m ary................................................................................................................ 89 Chapter 5 ......................................................................................................................................... 90 Results and Comparisons.............................................................................................................. 90 5.1 DC supply test circuit............................................................................................ 91 5.2 Buck converter.......................................................................................................95 5.3 Class AB amplifier...............................................................................................101 5.4 Class C amplifier..................................................................................................104 5.5 Comparisons......................................................................................................... 109 Chapter 6 ........................................................................................................................................I l l Sum m ary........................................................................................................................................I l l Appendix A .....................................................................................................................................114 SPICE Input files........................................................................................................................... 114 Bibliography.................................................................................................................................. 121 Published Papers........................................................................................................................... 129 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. List of Tables Table 3.1 Computational aspects of Krylov-subspace methods...............................................71 Table 5.1 Steady-state initial conditions for the DC power supply example using different Krylov methods inside the shooting methods. (The transient analysis values are at 1.98s.)...92 Table 5.2 Time to calculate steady-state for DC power supply. The size of the sensitivity matrix is 4 x 4 ............................................................................................................................... 94 Table 5.3 Steady-state initial conditions for the buck converter example using different Krylov methods inside the shooting-Newton algorithm.............................................................96 Table 5.4 Time to reach steady-state for the Buck converter example to .01 decimal places. The initial guess for BiCG, BiCGSTAB, and QMR is not zero................................................99 Table 5.5 Different initial guess for the BiCG method............................................................ 100 Table 5.6 Different initial guess for the BiCGSTAB method................................................. 100 Table 5.7 Steady-state initial conditions for the Class AB amplifier example using different Krylov methods inside the shooting-Newton algorithm........................................................... 102 Table 5.8 Summary of Krylov-subspace methods and transient analysis results of steadystate determination for a Class AB amplifier............................................................................. 102 Table 5.9 Summary of Krylov-subspace methods and transient analysis results of steadystate determination for a Class C amplifier................................................................................ 105 Table 5.10 Relative residual for the Class C amplifier............................................................ 105 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. List of Figures Figure 2.1 Flow chart for harmonic balance simulation from Eesof manual..........................25 Figure 2.2 Example circuit [Rodriques98]..................................................................................27 Figure 2.3 Circuit divided into linear and nonlinear subnetwork............................................. 28 Figure 3.1 Classification of Krylov-subspace method............................................................... 47 Figure 4.1 Sparsity pattern of the sensitivity matrix for an example circuit of a multiplier...79 Figure 4.2 Sparsity pattern of the sensitivity matrix for a buck converter...............................80 Figure 4.3 Eigenvalue Spectrum for Sensitivity Matrix for Class C Amplifier......................81 Figure 4.4 Eigenvalue Spectrum for Buck Converter................................................................ 82 Figure 5.1 DC supply circuit........................................................................................................ 91 Figure 5.2 SPICE input file with steady-state command........................................................... 92 Figure 5.3 Transient analysis plots for the DC power supply circuit....................................... 93 Figure 5.4 Two periods of the transient analysis when the DC supply circuit is in steadystate. The triangle marker indicates the Krylov-subspace solutions.......................................... 94 Figure 5.5 Circuit diagram for the buck converter......................................................................95 Figure 5.6 Results from GMRES run on buck converter........................................................... 97 Figure 5.7 Transient analysis plots for the two variables, voltage across the capacitor Cf, and current through the inductor L f .....................................................................................................98 Figure 5.8 Two periods of transient analysis for the buck converter. The Krylov-subspace methods are marked with a triangle.............................................................................................. 98 Figure 5.9 Class AB amplifier [Meyer89]................................................................................. 101 Figure 5.10 Periodic time-domain response of the Class AB circuit...................................... 103 Figure 5.11 Class C amplifier [Colon73]................................................................................... 104 Figure 5.12 Periodic time domain response for the Class C amplifier example.................... 106 Figure 5.13 Convergence behavior of the residual norm of the BiCG solution algorithm as a function of iteration number........................................................................................................ 106 Figure 5.14 Convergence behavior of the residual norm of the BiCGSTAB solution algorithm as a function of iteration number................................................................................ 107 Figure 5.15 Convergence behavior of the residual norm of the CGS solution algorithm as a function of iteration number........................................................................................................ 107 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 5.16 Convergence behavior of the residual norm of the GMRES solution algorithm as a function of iteration number..................................................................................................... 108 Figure 5.17 Convergence behavior of the residual norm of the QMR solution algorithm as a function of iteration number........................................................................................................108 Figure 5.18 Comparison of the shooting-Newton method to standard transient analysis in SPICE for determination of steady-state (G.E.is Gaussian Elimination)................................110 Figure 5.19 Comparison of Krylov methods toGaussian elimination for determination to the steady-state....................................................................................................................................110 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. STEADY-STATE METHODS FOR SIMULATION OF RF AND MICROWAVE CIRCUITS Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 1 Introduction The digital revolution has created unprecedented growth in the size and complexity of RF and microwave circuits. Communication devices such as cellular phones, PDAs, and pagers use complex integrated circuits that operate in the RF and microwave frequencies. Given this new growth, circuit designers need a more efficient method for determining the steady-state response of non-linear RF and microwave circuits. When circuits are in steadystate, important quantities, including power, frequency response, noise, distortion, and transfer characteristics such as gain and impedance can be evaluated [Steer91a]. Traditionally, the two methods used to determine the steady-state response of circuits to periodic input have been the harmonic balance method and the transient analysis in SPICE [Kundert90]. Although harmonic balance is the most widely used method for these types of circuits, it has difficulty with strongly non-linear circuits that are often needed for RF and microwave applications. Harmonic balance cannot represent the transient behavior of circuits Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3 and, therefore, chaotic or unwanted oscillations are not observed when circuits are simulated only using harmonic balance. The second most popular approach is transient analysis. The transient analysis is allowed to run until all transient signals have died out and only the steady-state response remains. Two disadvantages of transient analysis are that certain types of circuits take a very long time to reach steady-state, and it is sometimes difficult to determine the exact moment when steady-state is reached. Newer methods, first introduced for use in circuit simulation by Aprille and Tricke [Aprille72a, Aprille72b, Colon73] and later used by Telichevesky [Kundert90, Telichevesky95, Telichevesky96a, Telichevesky96b], include the shooting-Newton algorithm for direct steady-state determination. In the shooting-Newton method, the values of the capacitor voltages and inductor currents are calculated so that when used as the initial state for simulation in the time-domain the circuit is in steady-state. Information from the transient analysis in circuit simulators such as SPICE is used for the calculation. The shooting-Newton algorithm iteratively constructs the solution to the steady-state response to that of a two-point boundary value problem [Bailey68, Press92]. The Newton method is used within the shooting-Newton to construct a guess for the initial condition. The original method of Aprille and Tricke is inefficient for large circuits because it used the standard Gaussian elimination method to solve for the iterate formed during the shooting-Newton method. Telichevesky et al utilize Generalized Minimum Residual (GMRES) method and not other Krylov methods as a more efficient method for determining the steady-state. This dissertation presents an expanded and more efficient method for achieving steady-state using the shooting-Newton algorithm. The new steady-state analysis option incorporates standard Gaussian elimination and different Krylov-subspace methods, Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4 including Generalized Minimum Residual (GMRES), BiConjugate Gradient (BiCG), Conjugate Gradient Squared (CGS), BiConjugate Gradient Stabilized (BiCGSTAB), and Quasi-Minimal Residual (QMR) for solution of the system of equations generated by the shooting-Newton algorithm. The vehicle for this implementation is the popular circuit simulator, SPICE [Nagle75], whose code is freely available from UC Berkeley. The steadystate analysis and the Krylov-subspace methods are added to SPICE 3f5, the current version of the popular analog integrated circuit simulator. In addition, a mathematically consistent description of all numerical methods involved in the shooting-Newton method used in circuit simulation is presented. The steadystate analysis in SPICE is shown to be much faster than SPICE transient analysis when finding the steady-state solution of strongly nonlinear circuits, which harmonic balance simulators have difficulty with. Krylov methods show even greater improvements in determining the steady-state for larger circuits. This dissertation also considers the computational issues of the algorithm and examines the accuracy of the shooting-Newton method and its accuracy with the different solver options. Some eigenvalues are examined for example circuits for the purpose of convergence analysis of the Krylov methods. A detailed comparison of the Krylov methods along with their convergence properties is presented. In addition, a breakdown that typically occurs with the Krylov methods BiCG and BiCGSTAB applied to these types of matrices with a standard initial guess has been corrected by changing the initial guess. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5 1.1 Problem s w ith available methods Determining the steady-state response for integrated circuits that are lightly damped and have large time constants is very time consuming. RF circuits used in communication devices often process carriers having a low frequency signal embedded on a high frequency carrier wave. This situation causes problems in transient analysis because the high frequency wave determines how many time steps need to be taken while the low frequency signal determines the interval in which the simulation needs to take place. Long simulation times result and make it difficult to thoroughly analyze them in steady-state. Also, high Q circuits are difficult to simulate in SPICE, and these types of circuits occur regularly in communication systems. Four methods are commonly used to determine the steady-state response [Kundert90]: transient analysis, finite difference, shooting-Newton, and harmonic balance. In SPICE, the most popular analog circuit simulator, the steady-state is determined by allowing the time-domain transient analysis to run until all the transient signals have died off and all that remains is the periodic steady-state. With many integrated circuits, this simulation involves long computer run times. A direct method computes the initial states for a circuit that when applied result in the steady-state response. The finite difference method and the shooting-Newton method both so-called direct methods [Kundert90] meaning the timedomain steady-state is determined as a solution to a two-point boundary value problem and direct matrix solvers are used to update the solution guess. However, the computation time using the standard matrix methods of solution is still too expensive. Finally, the harmonic balance method is used extensively and works well for weakly nonlinear circuits, but becomes computationally expensive for strongly nonlinear circuits. The transient analysis Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 6 option in SPICE can simulate these strongly nonlinear circuits, albeit with poor computer efficiency. 1.2 N ew m ethod and its advantages An improved and expanded method has been developed to determine the steady-state response for analog and RF integrated circuits in the time-domain [KleinerOO, KleinerOl]. The method uses the shooting-Newton algorithm to directly determine the steady-state. Unlike Telichevesky, different Krylov-subspace algorithms have been coded and implemented that can be used to solve the resulting linear equations generated by this method. The expanded steady-state approach is a more efficient method for determining the steady-state response of analog and microwave circuits that are difficult to simulate by traditional methods. This class of circuits includes strongly nonlinear communication circuits. With the implementation of the Krylov-subspace methods, greater computational efficiency is gained and larger, more complex circuits can be simulated. Different Krylovsubspace algorithms are available to compute the solution so as to take advantage of the particular matrix being solved. The shooting-Newton method, along with the other important aspects of the approach using Krylov-subspace methods, is implemented and tested in SPICE. Quantities computed during the transient analysis in SPICE are reused in the calculation of the direct determination of the steady-state response. The shooting-Newton method used to determine the steady-state response is efficient for strongly nonlinear circuits that are frequently found in RF and microwave systems. This is due to its relationship with standard transient analysis. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 7 In transient analysis, time steps can be nonuniform and can, therefore, follow the abrupt transitions that are often encountered in RF and microwave circuits. The shooting-Newton method uses these same time steps and is therefore insulated from the strong nonlinearity of the circuit. Results from simulations are compared with standard transient simulation, standard shooting-Newton with Gaussian elimination, and shooting-Newton using the Krylov methods for solution of the iterate generated. The convergence behavior of the different Krylov methods including BiCG, CGS, BiCGSTAB, GMRES, QMR, and GMRES, are presented in Chapter 5. The matrix sparsity and some eigenvalues are examined in Chapter 4. This approach is shown to be much more efficient than transient analysis when finding the steady-state solution of certain classes of circuits, like self-biasing amplifiers, narrow-band filters, and circuits with high Qs, and lightly damped circuits with long time constants. Because the approach is computationally efficient, it allows for simulation of larger circuits. This method also shows greater efficiency than standard Gaussian elimination solution of the shooting-Newton iterate. Implemented in SPICE, this approach is accurate and efficient for the steady-state response for nonlinear circuits that are found in communication devices. For SPICE-based circuit simulators, the time-domain shooting-Newton method is relatively easy to implement [Ashar85, Quarles89a, Quarles89b, Quarles89c, Quarles89d, Quarles89e]. The direct shooting-Newton method of steady-state determination was proposed by Aprille and Tricke [Aprille72a, Aprille72b, Ashar89]. Because of the computation costs, this limited the use of the algorithm to relatively small circuits. The newer Krylov-subspace methods can solve these equations generated by the steady-state response determination with Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 8 much greater efficiency. The Krylov-subspace method of GMRES has been implemented with the direct shooting-Newton method and implicit matrix methods [Telichevesky96a]. However, it is not implemented in public-domain software and does not include other Krylov-subspace methods for comparison. 1.3 O rganization o f the thesis This dissertation presents a more efficient method for finding the steady-state response of nonlinear integrated circuits with periodic inputs. There are two fundamental parts to this method. First, the direct approach to steady-state determination is implemented using the shooting-Newton method in SPICE. Second, several different Krylov-subspace methods, including BiCG, CGS, GMRES, BiCGSTAB and QMR, are used to solve the linear equations generated. The steady-state method results in the ability to analyze RF and microwave circuits that are difficult to examine with existing methods. Chapter 2 presents three primary methods for determining the steady-state response. These are transient analysis, shooting-Newton, and harmonic balance. Their derivation, formulation and typical implementations are also described. The advantages and disadvantages of the different methods are considered. Some rarely used methods are also introduced. In Chapter 3, the numerical linear algebra background is presented. Methods of solution for the linear system of equations that are generated by the shooting-Newton algorithm are detailed. Chapter 3 begins with a brief overview of the formulation of the system of equations for the determination of the steady-state response. Then a background is given on direct and stationary methods. The main part of the chapter presents the derivation, Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 9 computational resources, algorithms, and theory behind the Krylov-subspace iterative methods used for the efficient solution of the linear system of equations solution needed for steady-state determination. Chapter 4 describes the practical algorithms for the implementation of the shootingNewton algorithm for steady-state determination using Krylov-subspace methods. These algorithms are implemented and tested in SPICE 3f5, the most popular freely available analog integrated circuit simulation program. Two main aspects of implementation are discussed: the implementation details for the Jacobian needed for the shooting-Newton iteration and the use of the Krylov-subspace methods for solution of that shooting-Newton iteration. The chapter concludes with a discussion of important issues for the Krylovmethods, such as stopping criteria, robustness, breakdowns, non-convergence and memory and operation counts. Finally, Chapter 5 presents the results of the implementation of the shooting-Newton method on typical examples of analog integrated circuits. These circuits were chosen as examples because their steady-state response has been difficult to determine using traditional methods. The accuracy and efficiency is demonstrated and compared to that of standard transient analysis in SPICE. The different Krylov-subspace iterative methods are implemented and their results compared and analyzed. Chapter 6 summarizes and concludes the dissertation. It also points out future work that would supplement and extend the present work, including distributed components and taking advantage of the special structure of the Jacobian. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 10 Chapter 2 Steady-State Determination for RF and Microwave Circuits This chapter presents three primary methods for determining the steady-state response. These are transient analysis as found in SPICE, the shooting-Newton method, and the harmonic balance method. Their derivation, formulation and typical implementations are also described. The transient analysis and shooting-Newton sections start with a mathematical description of their derivations and give the same derivations consistent with circuit simulation literature. The advantages and disadvantages of the different methods are considered. Some rarely used methods are also introduced. 2.1 Introduction The determination of the steady-state response of non-linear RF and microwave circuits is of utmost importance. When circuits are in steady-state, important quantities Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 11 including power, frequency response, noise, distortion, and transfer characteristics such as impedance and gain can be evaluated. Further, RF and microwave circuits have become larger and more complex in response to the digital revolution, making the efficient determination of steady-state increasingly important [Maliniak95]. Traditionally, two methods have predominantly been used to determine the steadystate response of circuits to periodic input: the harmonic balance method and the transient analysis in SPICE [Gupta81, Steer9la, Maas99]. Harmonic balance is the more widely used method for these types of circuits. It assumes that the steady-state response contains a fundamental component of known frequency, plus several harmonics that are believed to be dominant. The method evaluates the linear part of the circuit in the frequency domain and the nonlinear part in the time domain. Harmonic balance can have difficulty with strongly non linear circuits that are often needed for RF and microwave applications. It cannot represent the transient behavior of circuits, and, therefore, chaotic or unwanted oscillations are not observed when circuits are simulated only using harmonic balance. The second most popular method is the transient analysis in SPICE. The transient analysis is allowed to run until all the transient responses have died and only the steady-state response remains. Unfortunately, transient analysis presents two problems. First, depending on the type of circuits simulated, it can take a very long time to reach steady-state. Secondly, it can be very difficult to determine the moment when steady-state is reached. Newer methods include the shooting-Newton algorithm for direct steady-state determination. This method was first introduced by Aprille and Tricke [Aprille72a, Aprille72b] and later expanded upon by Telichevesky [Telichevesky 95, 96a, 96b, 96c], In the shooting-Newton method, the steady-state is determined using information from a small Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 12 portion of the transient analysis data. The data is used to construct a two-point boundary value problem. The original method of Aprille and Tricke used standard Gaussian elimination methods to solve the system of equations generated, but the method was not efficient for large circuits. Following their work, Telichevesky introduced the Krylovsubspace method GMRES as a reliable and efficient method for steady-state determination. Before the detailed explanation of the methods of steady-state determination, the term steady-state is defined. In general terms, the steady-state solution of the differential equations which are formed to represent the integrated circuit is that solution which is asymptotically approached as the effect of the initial conditions decays. 2.2 Transient analysis in SPICE The transient analysis mode in SPICE calculates the time-domain response of the circuit over a user specified interval [Nagel7l, Pederson84], SPICE does this by first forming a set of nonlinear equations that represent the components of the circuit along with its connections, power supplies, and inputs and then solving these equations. The interconnection equations use Kirchoff s Current Law (KCL) and Kirchoff s Voltage Law (KVL). KCL states that the sum of all currents flowing out of a node at any instant is zero and KVL states that the algebraic sum of all branch voltages around a loop at any instant is zero. Components such as resistors, capacitors, diodes, and inductors are represented by mathematical models that describe their electrical behavior. The equations constructed are a set of nonlinear ordinary differential equations (ODEs) that describe the circuit. Then an input stimulus is applied along with initial conditions specified, and the equations can be solved for the time-domain response. Because of the nonlinear nature of the ODEs, a Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 13 discretization must be applied to arrive at a numerical solution. Steady-state is reached when all the transient responses of the circuit have died off and only the steady-state response remains. 2.2.1 Transient analysis formulation The equation formulation and solution for a non-linear circuit is detailed below. For the sake of simplicity only nodal analysis is done with resistors, capacitors and current sources [Pederson84, Nagel71]. The methods presented are easily expanded to modified nodal analysis to include other devices, such as inductors and voltage sources. The steadystate solution can be found by having the transient analysis run long enough so that all the transient responses have died out. For some circuits the time required is exceptionally long. The transient analysis used by SPICE is briefly described with an emphasis on the aspects that are relevant to steady-state determination [Banzhaf89, Quarles89c, Vladimirescu94]. In SPICE, the equations are formed (in this discussion, only capacitors, resistors and current sources are considered) using KCL. The system of differential algebraic equations generated is as follows: / ( . r ( t ) , 0 = -7-^ (-v(O )+ i(-Y (O )+ «(O = 0 ’ at (2.1) where x(t) is the vector of node voltages, i(x(t)) is the vector of resistive node currents, q(x(t)) is the vector of node charges, and u(t) is the vector of input currents. Each equation in this system of equations applies KCL to each node in the circuit and using the mathematical models of the elements, expresses the currents in terms of the node voltages. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 14 Equation (2.1) states that at each node, the current from the resistors, i(x(t)) the current from the capacitors, — q(x(tj), and the current from the source, u(t ), connected to that node dt must equal zero. The solution in the interval specified by the user r e [0,r] is obtained by discretizing this interval and computing a solution at each time point. At these chosen time points, a method such as backward-Euler or the trapezoidal method is used to replace the derivative. Because of the stability of the trapezoidal method and its use in the shootingNewton method implemented in this dissertation, the trapezoidal algorithm is used to illustrate the transient analysis solution. For an Ordinary Differential Equation (ODE) of the form: ^ - y = g{yd), dt (2.2) the trapezoid rule estimates the solution, y , at time tk + h by [Heath97]: + g ( y ..O + g(yt ., ■>»„),, ' 2 where tk+l = tk +h and yk , y k+l denote y(rk) and y(tk + h ) , respectively. From (2.2), however, g{yk,tk) = y'(tk) . Rearranging and using this fact, we see: (yt+, - y j j - y ' ( 0 - £ ( v i+,,r,+i) = 0 . h (2.3) Now (2.1) can be rewritten as: dt q{x{t)) = -i{x{t)) - u{t) (2.4) Applying the trapezoidal Rule (2.3) to (2.4) with —i(x(r)) —u(t) playing the role of g and q playing the role of y gives: Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 15 ^-[q(x(t + h ) ) - q(x(t))]~ q'(x(t)) + i(x(t + h)) + u(t + h) = 0 (2.5) (The subscript & has been dropped for simplicity.) The quantities cl(x it )) and *7 MO) in (2.5) are known. Thus the goal at time t + h js to solve (2.5) for the vector x(* + ^ ) , assuming = is replaced by = . This gives using (2.4): F(.x(t + h)) = q(x(t + h)) - q{x{t)) + [/'MO) + “ (0 + *'M' + h)) + "(t + h )\^ = 0 We have an equation of the form: F{z) = 0 (2.6) where F : R" —» R " . Newton’s method [Heath97] updates the solution estimate, z k, to (2.6) according to: j { z k)Azt = -F M ) (2.7) zk*1 = z k +Az k where z is being used to denote the vector x(t + h) , and J denotes the Jacobian matrix of the vector-valued function F . Now, F(z) = q{z)-q(x(t))+[i(x(t)) + u(t) + i(z) + u{t + h)]^ (2.8) so: = dZj =1 ^ - 4 1 ^ ) dZj 2 dZ] The method in (2.7) is continued until z*+1- z i < £ Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. (2.9) 16 where e > 0 is a user specified tolerance and f(z* +i ) is close to zero. When this step is complete, a single time step of transient analysis has been calculated. Newton’s method can be shown to converge quadratically if the initial guess is close to the solution. This process of iteration for solution must be done at each of the time steps. For simplicity of presentation, for now on we refer to zk as the kth iterate. In order for the transient analysis in SPICE to calculate the steady-state response the simulation must be continued as long as it takes to have a response with no transients. This may be a very long time for circuits with high-Q or slowly decaying transients. For many circuits used in communication devices, the simulation time required by SPICE’s transient analysis to reach steady-state is too long to be of practical application. 2.3 Shooting-N ew ton method The shooting-Newton method is an iterative method used to determine the solution of a two-point boundary value problem [Press92], A two-point boundary value problem exists when ordinary differential equations are required to satisfy boundary conditions not just at the initial starting condition but also at some other boundary. In our steady-state analysis problem, the two boundaries are the end and the beginning of the period. Transient analysis is an example of an initial value problem. Given an initial starting point, the solution of the circuit equations marches along from one time point to the next. By determining the initial starting values of the steady-state solution, we end up at the same point at the end of period. The circuit is in steady-state and its output is periodic. The shooting-Newton method is one of many methods that can be used to solve twopoint boundary value problems. In a shooting method the solution is started from a guessed Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 17 initial condition, and integrated for one period. Hopefully the result is the same conditions as specified by the original guess. Usually, this is not the case, and the Newton method is used to make a better guess. This process of calculating an initial guess, simulating, and checking the final solution is continued until the circuit is in steady-state. 2.3.1 Derivation of the shooting-Newton method Aprille and Tricke [Arprille72a, 72b] first proposed the use of the shooting-Newton method to solve the system of equations for the steady-state response. They posed the problem of finding the steady-state response as a two-point boundary value problem solved by the shooting-Newton method. If the circuit equations are represented as the system of equations f( x ( t ) , t) = ^-q(x(t)) + i{x{t)] + u(t) = 0 (2.10) then a constraint for achieving steady-state is that the transient effects have died off. This is represented by: x(0) = x{T) (2.11) In other words, the solution at the end of the period is the same as the condition at the beginning of the period, which means that the circuit is in steady-state. The state transition function can be used to define the two-point boundary value problem according to: *(0) - <f>(x(to),to,T) = 0 (2.12) where (j) is the state transition function. The state transition function isimplicitly derived and determined by integrating the circuit equations (2.10) over T seconds from the initial state Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 18 *(r0). It is dependent on the initial state, *(r0), the period of the response, T, and the starting time,r0. Because equation (2.12) is nonlinear, the shooting-Newton method is used to solve the boundary value problem. This results in the following iteration: *0(/)‘+1 = *„(/)* - [ i - y J - ' l V - 0 ( x { t o),to,T)\, (2.13) where J 0 is the Jacobian, also called the sensitivity matrix, and is represented by: = ^ ( . v ( r 0),r0, r ) ax (2.14) The Jacobian is called the sensitivity matrix because it shows the sensitivity of the initial state to the final state over one period T. 2.3.2 Forming the Jacobian by numerical differentiation It is necessary to determine the Jacobian matrix before equation (2.13) can be evaluated. The Jacobian matrix in (2.14) is given by: 3*,(7\*0) 3*,(7\*0) a*oi 3*0 2 d x 2( T , x 0 ) 3*2(r,*0) 3*0, 3*0, 3*,(7\*0) dxo„ (T, * 0 3*2 ) dxo„ (2.15) Joi .x o ) = dxn(T’Xo) dXa(T ’Xo) dXn(T ’Xo ) 3*n 3*0, dxn Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 19 To calculate the elements of this matrix with the initial condition x(t0) = x0, first the circuit equations in (2.10) are solved from t = 0 to t = T with initial states xQJ . Next, they are solved with the initial states .v0y + Ar0, which is the change in the initial conditions for the circuit. This change in the initial conditions must be as small as possible in order to increase accuracy. The ik element of the Jacobian is approximated by the following equation: dx,(r,x0) _ x , ( t , x 0j + & x 0 ) - x , ( t , x qj ) &xok (2.16) where .r, (7\.v0y) and .v, (7\.r0; + Ax0) are solutions derived at the end of a period T. This is a forward difference approximation to the first derivative [Heath97]. To form the Jacobian, the equations are solved twice with the two different initial conditions: x0J and x 0J + Ax0 from t = 0 to t = T . There are n capacitors and inductors. All n states of the system must be perturbed separately in order to compute the n columns of the Jacobian matrix. This requires nz solutions to the circuit equations. Equation (2.13) can then be solved. A more efficient composition of the Jacobian can be done by using sensitivity networks [BrambiilaOl, D’Amore91, D’Amore93, D ’Amore95]. 2.3.3 Forming the Jacobian by transient analysis The Jacobian or sensitivity matrix can be calculated more efficiently than the method explained in the previous section. In fact, it can be calculated step by step along with the transient analysis because quantities used for Jacobian formation are needed by the transient analysis. After one period of simulation, the state transition function and the Jacobian are Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 20 formed. One can then solve the iterate in equation (2.13) and determine if another iteration should be done. The dual formation of the state transition function and the Jacobian are best explained by the derivation of the equations during the actual simulation [Heath97]. At time step m , during transient analysis, the following discretization and application of the trapezoidal method result in equation (2.4) restated below in an easier-to-understand form with A lm) = xm')1“ dq^X”~' ^ + /(-Vm) + U„ = 0 . dt f (2.17) Again, the derivative is known from the previous time step: dq_ dt (2.18) Substituting (2.18) into (2.17) gives: fhf o t a ) “ ) ] + ' t a - i ) + «„-i + i{xm) + um = 0 . (2.19) The derivative of the jth equation with respect to the kth component, x to , of -vo is by the chain rule (note that all quantities are column vectors except for - I h ): W q j (-Ym + V i j ( xm_{ )— ) ^ — -^,-1 dx.to to y dx.to Xm_l + V i j (Xm) - ^ ~ X m = 0 dx.t o ( 2 .20 ) We can rearrange equation (2.20): r dx.to -x_ + dx.k 0 ,=0 - x m -i Looking at equation (2.21) reveals that the transpose of the first term in square brackets on the left-hand side is the jth row of the Jacobian for the transient analysis Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. ( 2 . 21 ) 21 solution at step lm, while the second is the jth row at the previous time step. Combining equations J = J{x. Y S - x. dx0k = dx0k ■ ( 2 .22 ) so the vector — xm in equation (2.22) is solved for by first computing the matrix-vector oxQk product on the right and then solving the m -vector equation using the factorization of J{xm). Note that a xm is the kth column of in (2.22). Ok The above method can also be looked at using the sensitivity circuit approach [Colon73, Tricke, Aprille72a]. There is a sensitivity circuit for each state variable (capacitor or inductor). The original circuit plus n (number of state variables) sensitivity circuits are simultaneously integrated to determine the derivatives with respect to initial conditions. For example, at time t , the Newton-Raphson iterations for the original circuit are completed. The sensitivity circuits have the same circuit matrix but with different forcing functions. The LU decomposition of the matrix is already computed in order to obtain the solution of the original circuit. To obtain the solution to the sensitivity circuits, one multiplies this factored matrix by the different forcing vectors in the sensitivity circuits that result from the n different initial states. For n state variables, n right hand side vectors are created, and the equations are solved. From their solutions, the columns of the sensitivity matrix are created. All past stored variables of the original and incremental circuits are updated, and the algorithm proceeds to the next time point. The process is continued until time t = T , the end of the period is reached. Now the sensitivity matrix and the solution of the original circuit can be used to determine the next guess for the initial conditions. The initial condition that Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the shooting-Newton method computes is a set of capacitor voltages and inductor currents for the circuit so that if these voltages and currents are used as the initial conditions for the transient analysis, the circuit is directly in steady-state. A comparison is made to see if the steady state response is reached, and the process is stopped if it has been. If not, then the circuit can be simulated in transient analysis for a few periods, and then the shooting-Newton algorithm is applied to search for another set of initial conditions that will put the circuit in steady-state. 2.3.4 Steps in the shooting-Newton method The following five-step shooting-Newton method can be used to determine the steady-state response of nonlinear circuits with periodic inputs [Aprille73]. 1. Compute the transient response x(t) of the circuit from t = 0 to t = T with initial state .t(r) = .v0. At each time step, we solve a nonlinear system of equations by the Newton method. We use the calculated states as the initial conditions for the next time step until we reach the end of the period, x(T) is estimated (also referred to as the state transition function). 2. Compute the sensitivity matrix. This is done at the same time as the transient response is calculated. We now have all the parts needed to solve (2.13). 3. Compute the Newton-Raphson iteration (2.13) using the values for the Jacobian and original circuit determined by the two previous steps. Here we are computing the initial conditions that when applied to the circuit put the circuit into steady-state. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 23 4. Return to step 1 unless ||.t(7\ .r0(y)) - x0(y')j| < £ and |jc0(y + 1) - „r0( y | < 8 , where sand 8 are (user-specified) arbitrary small positive numbers and y refers to the Newton iteration for computation of the iterate. 5. Stop. The shooting-Newton algorithm for steady-state determination is accurate and efficient. Because the Jacobian used for the calculation is dense, standard Gaussian elimination is used to solve the iteration and the computational costs can be prohibitive. It works well for strongly nonlinear circuits, and circuits with transients that die off slowly. SPICE transient analysis typically has difficulty with the second type. 2.4 H arm onic balance method The harmonic balance method is an iterative method. It is based on the assumption that the steady state solution for a circuit with a sinusoidal input can be approximated using a linear combination of sinusoids to build a solution [Gilmore91a, Gilmore91b]. If the steadystate response contains only a few sinusoids, then harmonic balance can find them very efficiently. The basic nonlinear circuit is divided into two parts, the linear part in the frequency domain using phasor analyzes and the nonlinear part that needs to be analyzed in the time domain. The time domain voltages and currents are then transformed into frequency representation using the forward Fourier transform. According to KCL, the currents should sum to zero at all nodes. Usually, the currents do not sum to zero and an iteration to the correct solution must be done. An error function is used to calculate adjustments to the voltage amplitudes and phases. When the error is small enough, the steady state solution has been obtained. A major advantage of harmonic balance is its ability to incorporate frequency Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 24 domain descriptions of distributed components, such as transmission lines. The flow chart shown in Figure 2.1 demonstrates the procedure used in a commercial harmonic balance analysis simulator [HP-Eesof94], 2.4.1 Phasor analysis Harmonic balance is a generalization of phasor analysis [Feldmann96, Filicori88, Gilmore91a, 91b, Maas99, Rodrigues98, Steer91a, 91b]. Phasor analysis is used in SPICE for steady-state sinusoidal analysis of linear circuits, linearized as small-signal analysis. The waveforms are represented in the following form: jt(f) = Acos(<wr + <f>) = Re[X£?'" ] (2.23) where X e jaM is a complex number called a phasor. This form of the equation is then plugged into the ODEs that describe the circuit behavior, resulting in a system of linear algebraic equations for the unknown complex phasors. In non-linear circuits often found in RF and microwave applications, several frequencies can exist even if all time-varying sources are sinusoids of the same frequency. Contrary to what occurs in linear circuits, different frequencies interact, producing new output frequencies in non-linear circuits. Steady-state waveforms in a nonlinear circuit can be expressed as a truncated (generalized) Fourier series: K K x(t) = X(0) + ^ ( X C(k)coscokt + X s (k)sincokt) = £ X(/t)<?"u‘' k=\ (2.24) k= -K where co_k =-cok , X (—k )= X ‘(k), and the set of frequencies is {a)0 = 0 ,cok} where appreciable power is expected. It is assumed that all the waveforms in the nonlinear circuit have this form. This representation of the response as a sum of basis functions is characteristic of all frequency domain methods. As in phasor analysis, the next step is to Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 25 Circuit Sim ulator Overview S a m cfe P oints N um oer o f H arm onics Sim ulation Frequency M e asu re I n e a r Circuit C urrents M easu re N onlinear Circuit Voltage Fretjjency-O cm am Frequency-C om am inverse Fourier Transform Nonlinear V d tag e C alculate N c n in e e r C urrents Fourier Transform Nonlinear C urrents Frequency-Com am Error > Tolerance Another Pow er Yes No Yes Another Frequency No St CO Simulating and Testing 5-19 Figure 2.1 Flow chart for harmonic balance simulation from Eesof manual. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 26 substitute into the ODE’s and obtain a system of nonlinear algebraic equations for the Fourier coefficients of the waveforms. The problem of solving a system of ODE’s is transformed into solving a system of algebraic equations. The general harmonic balance method solves the system of equations of the form: f(x(t),x(t),t) = 0 (2.25) where / is a vector of n nonlinear functions of 2n +1 variables representing the ODE’s, .t(r) is a vector of n variables (state variables) to be determined, and x{t) = — . The dt dependence on t , time, accounts for the periodic forcing terms. The steps involved in a general harmonic balance solution include: 1. Determine the frequency set (required to describe the solution). 2. Truncate the frequency set (cannot use infinite series of frequencies, so choose a finite number). 3. Expand (expand forcing terms as truncated Fourier series: u(t) ->U). 4. Express x(t) as a truncated Fourier series (same as 4. for x(t)). 5. Substitute into system equations. 6. Solve. 2.4.2 Simple example of harmonic balance method The application of the harmonic balance method for integrated circuits is presented through a simple example. In Figure 2.2, vs(t) is an independent voltage source. R l, LI, and C l are linear elements; the voltage controlled nonlinear cap C2 is described by its charge- Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 27 R1 C1 i LI -Q£££2£_ i(t) R2 Vs(t) Q ) C2 Linear Subnetwork Nonlinear Subnetwork Figure 2.2 Example circuit [Rodriques98]. voltage relationship: q = %(vn), and a voltage controlled non-linear resistor is described by i f = f ( v n)- First the nonlinear circuit is partitioned into a linear and nonlinear subnetwork. The currents in each of these subnetworks is represented as: m = - i L(t) = iNL(t) ilVL(t) + iL(t) = 0 Independent sources may be included into the linear subnetwork. Breaking up the circuit in this manner allows the method to take advantage of linear techniques to analyze the linear subnetwork, while keeping the nonlinear subnetwork as small as possible. The problem of solving the original circuit is now equivalent to solving the two subnetworks in Figure 2.3 with the constraints on the currents shown in equation (2.26). Expressing the currents as functions of the voltage allows us to be able to solve for the unknown voltages. We can then analyze each of the subnetworks individually. Since one subnetwork is linear, phasor analysis in the frequency domain can be used. The other Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 28 il(t) inl(t) C2 Vs(t) Vn(t) o o Linear Subnetwork R2 Vn(t) Nonlinear Subnetwork Figure 2.3 Circuit divided into linear and nonlinear subnetwork. subnetwork is nonlinear and must be solved in the time domain. The Fourier transform provides the bridge between the two analyses. We can now proceed through the steps outlined above for the harmonic balance analysis. 1. Determine the frequency set. vs(t) is a combination of a finite number of sinusoids. 2. Truncate the set of frequencies. The steady-state response is the truncated set of waveforms: {"o — COg } where A Kis the set of frequencies. 3. Expand vs (t ) as a Fourier series: K vs(t) = Vs (0) + £ CVsc (k ) cos ojkt +V”S5(k ) cos cokt) k=1 Usually only a few Fourier coefficients are different from zero with independent sources. 4. Expand the unknown voltage as a generalized Fourier series: Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 29 v„0) = Vn(0) + £ (Ync (k) cos cokt W * (k) cos a;*/) i= t Here we must develop the equations for the linear and for the nonlinear subnetwork. 5. Substitute into equation : iNL(/) + iL{t) = 0 to solve. In step I the frequency set is determined from the input signals and combinations of the signals. The truncation of the frequencies in step 2 can be done incorrectly and an important output signal could be lost. The set of equations resulting from this expansion method are solved by using Newton’s method. This sparse linear system can also be solved by direct, stationary methods or by non-stationary techniques such as the preconditioned Krylov subspace methods (CGS, Bi-CGSAB, BiCGSTAB, TFQMR) [Brachtendorf95a, Brachtendorf95b]. The convergence of stationary iterative methods for linear systems associated with simulation of circuits is inefficient. However, the non-stationary techniques usually converge in a finite number of iterations that involves less computational resources. This convergence can be met more easily if preconditioning is applied. Instead of the solving the original system Av = b , a preconditioned system is solved. A preconditioner is an auxiliary matrix in an iterative method that approximates the coefficient matrix or its inverse [Templates94], The preconditioner, or preconditioning matrix, is applied in every step of the iterative method and aids in convergence. The Jacobian matrix, also called the node admittance matrix, is preordered using the Markowitz preordering technique [Heath97], Comparing the time to simulate a standard operational amplifier that was strongly nonlinear demonstrated a five to six faster simulation time for steady state determination than when using the Gaussian elimination direct method [Feldman96, Filicori88, GadOO, Gourary97, GouraryOO]. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 30 2.4.3 Extensions of the harmonic balance method Another way to perform steady state analysis of nonlinear circuits is a technique that is a modification of the standard harmonic balance [Gugleilmi96]. The original algebraicdifferential equations that describe the nonlinear circuit are changed to algebraic equations in the frequency domain. Instead of using the standard Newton’s iteration to solve these equations exactly, an inexact method was used. The node-admittance matrix was preconditioned using the positional dropping strategies and numerical dropping based methodologies. The inexact methods such as GMRES, with and without preconditioning produced steady state determinations at least ten times faster. This new approach to harmonic balance simulation, which is based on inexact Newton methods and iterative system-solving techniques, avoided the storage and factorization of the Jacobian matrix [Rizzoli96]. This allows for very large (>1000 nodes) circuits to be analyzed very efficiently. An extension of the harmonic balance method was implemented using the standard setup [Feldmann96]. Krylov subspace methods were used to solve the resulting nonlinear set of equations. Also detailed is the use of a problem-specific preconditioner for inverting the harmonic balance Jacobian matrix. A comparison of two Krylov subspace methods, GMRES and QMR, as applied to the harmonic balance algorithm [Gourary97] showed higher simulation speed as compared to standard direct methods such as Gaussian elimination. The GMRES method was clearly more efficient than the QMR method. An index method was used to generate a mixed frequency index [deCarvalho97] so that the harmonic balance method could evaluate multitone signals. However, the method still needed large amounts of memory, was very computer intensive and could not be used Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 31 with irregular input spectrums. Another harmonic balance modification [SheIkovnikov92] used multidimensional formulation of the problem. In this approach, the circuit behavior was described with a multidimensional spectrum. Some spectral components were assumed to have negligible values and were ignored. The system was then divided into real and imaginary parts and solved with Newton’s method. Harmonic balance is an efficient and accurate method for computing mildly nonlinear circuits. Strongly nonlinear circuits need special adaptations of the method that limit its efficiency and sometimes its accuracy. The disadvantages of the harmonic balance method include the difficulty in the analysis of oscillators. In time-domain simulation, the frequency of oscillation can be determined, but in harmonic balance the procedure is very complex. In addition, harmonic balance does not represent the transient response and, therefore, could not show chaotic and other non-steady-state behavior. 2.4.4 Other methods Another method that is rarely used is the perturbation method. After an initial guess of some linearized equations, the solution is perturbed until the correct answer is obtained. The Volterra-series and the Picard-iteration are examples of this method. The drawbacks of this approach are that not only do these often fail to converge, but they can also be inaccurate. Using parallel processing for circuit simulation can allow transient simulation to reach steady-state in faster times [Duff90, Buch94, Larcheveque96]. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 32 2.5 C onclusion The three main methods of steady-state determination were presented: standard transient analysis in SPICE (also called the brute force method), the harmonic balance method, and the shooting method [Chua81]. In transient analysis, the differential equations representing the circuit are allowed to integrate until all the transients have died out. Usually one starts with some arbitrary initial conditions and keeps solving until there is only the steady state solution. However, there are major drawbacks to this method. In lightly damped circuits, the time constants are large and the transient component may last for an extremely long time and may involve many calculations for signals with high frequency. The widely used analog circuit simulator SPICE uses the transient analysis method to determine steady state. SPICE, however, encounters numerous problems with an input signal that has both a high frequency and a carrier wave with a low frequency. The high frequency signal determines the time step taken which makes for a very short integration time step during transient analysis. The low frequency signal determines the interval over which the solution must be carried out. This means that the solution must be carried out for a long interval before all the transients have died out. It is sometimes difficult to determine when they have died out because they can change very slowly [Brachtendorf95a, Brachtendorf95b]. The most common method used to determine the steady-state for communication circuits is the harmonic-balance method. Harmonic balance simulators are found in Agilent’s ADS software and in Microwave Office. Harmonic balance is an expansion frequency domain method, which approaches the problem of finding the steady state by a trigonometric polynomial. With this method, the approximate solution is represented as a finite, trigonometric series. The terms with the same frequency are balanced by a method such as Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 33 Galerkin’s. A significant disadvantage to this method is that it uses multidimensional Fourier analysis to determine the frequency components. Nonlinear signals are represented in the frequency domain as combinations of a large number of linear signals, also called harmonics. Nonlinear circuits can have many harmonics and, unfortunately, the simulation computation increases exponentially with the number of harmonics. Therefore, strongly nonlinear circuits are almost impossible to simulate using the standard harmonic balance method. Because of these reasons, specialized, computationally expensive methods must be used for large nonlinear circuits (in large circuits nonlinear elements such as transistors can each generate many harmonics). Steady state solution is slow, is extremely memory-intensive, and may calculate the incorrect solution. The third common method for steady-state determination is called the shootingNewton method. The steady-state solution of the differential equations representing the circuit is solved by posing the problem as a boundary value problem whose solution is the steady-state response. The shooting method is an iterative method, where one calculates the initial conditions to put the circuit in steady-state. The initial states used are calculated by many methods including the Newton [Aprille72a, Aprille72b] or extrapolation [Skelboe70]. This method works well for small, strongly nonlinear circuits. For large nonlinear circuits, however, the shooting-Newton method has large computational and storage costs. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 34 Chapter 3 Numerical Methods Background In this chapter, an overview of the formulation of the system equations for the determination of the steady-state response is presented, followed by a brief background of direct and stationary methods [Demmel97, Dongarra96, Greenbaum97, Kelley95, Saad96, Trefethen97]. The main part of the chapter presents the derivation, computational resources, algorithms, and theory behind the Krylov-subspace iterative methods which are used in this thesis to find an efficient solution of the linear system of equations needed to determine steady-state. 3.1 Introduction In the previous chapter on circuit simulation, a system of equations was generated by the shooting-Newton method to determine the steady-state response of an integrated circuit. The system of equations is used to calculate a set of voltages and currents that when used as the initial conditions for the circuit results in the circuit’s steady-state response. The Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 35 determination of the steady-state response using the shooting-Newton method is an iterative method. Iterative methods mean that the equations must be formed and solved numerous times over the course of the steady-state determination. In the original paper by Aprille and Tricke [Aprille72a, Aprille72b], the equations were solved using the direct algorithm of Gaussian elimination. If the standard Gaussian elimination type direct method is used to solve the iterate formed in the shooting-Newton method, the number of equations that can be in that system of equations is limited. If n is the number of equations describing the system, then these direct methods use <9(«3) operations to solve the system of equations and 0 ( n 2) computer storage. This becomes much too costly for larger systems of equations that are generated by large circuits. Using a direct method, with no round off or other errors, the exact solution is obtained only when all the operations are completed. When round-off errors arise and large systems of equations are being solved, the errors increase, often making the results unacceptable. With iterative methods, the solution is obtained at each step, and the iteration is stopped when the solution is deemed accurate enough. In a direct method, the entire method must be completed before there is an answer. One cannot stop the algorithm before the end with a “less” accurate solution. If the matrix has a special structure, it is not always preserved with a direct method. For example, a sparse matrix is filled in during Gaussian elimination. An alternative approach is to use a non-direct (iterative) method. An iterative method starts with an approximate solution and uses a recurrence formula to calculate another approximate solution. The formula is applied repeatedly until a solution is deemed “good enough.” Although the iterative methods can solve the system of equations in a finite number of steps in exact arithmetic, they can be stopped before they reach the exact answer with a Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 36 solution that is deemed good enough. Their cost in terms of computer operations and storage is typically much lower than the cost for direct methods. Depending on the iterative method used and the accuracy desired, the costs could even be lowered to O (n ). Further, iterative methods are less likely to propagate errors. 3.2 H ow the system o f equations are generated For a more detailed presentation of the derivation of the steady-state system of equations presented here, refer to Chapter 2. There, the equations generated for the transient response of circuits are explained along with the derivation of the shooting-Newton method of steady-state determination. The system of equations formed for the transient analysis of a circuit that includes resistors, capacitors and current sources (for simplification of the discussion) is represented as: / { j : ( r ) , r ) = i ( j r ( r ) ) + ^ - ( j f ( r ) ) + M(/) = 0 dt n m v• / where x(t) is the vector of node voltages, /(.t(0) is the vectorof resistive node currents, <?(■*(») is the vector of node charges, and u(t) is the vector of inputs. Steady-state is achieved when the transient effects have died off. This is represented by .t(0) = x(T) (3.2) In other words, the solution at the end of the period is the same as the condition at the beginning of the period. This means that the circuit is in steady-state. The state transition function can be used to define the two-point boundary value problem, thus: *(0) - <p(x(tQ), r0, r ) = 0 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. (3.3) 37 where <j) is the state transition function. The state transition function was implicitly derived; it was calculated at each time point until the end of the period. It is dependent on the initial state, .v0, the period of the response, T, and the starting time, tQ. We applied the shootingNewton method to solve the boundary value problem that results in the following iteration: .r0(/)i+l = .t0(r)‘ - [ i - y J - ' U / -0 (.r(/o),to,r)J. (3.4) where J 0 is the Jacobian, also called the sensitivity matrix, and is represented by: dx . (3.5) The sensitivity matrix is computed at the same time as the state transition function. Quantities needed for the calculation of the sensitivity matrix are already available at each time point from the transient analysis. It is formed using sensitivity circuits [Aprille72b] and its calculation is computationally expensive. The equation in (3.4) is solved. A new initial condition is determined for the steady-state response. This result is checked to see if the circuit is in steady-state. If not in steady-state, the shooting-Newton procedure is repeated and another initial condition guess is calculated [Aprille72a, Aprille72b]. This process is continued until the steady-state is reached. The initial condition that the shooting-Newton method computes is a set of capacitor voltages and inductor currents for the circuit so that if these voltages and currents are used as the initial conditions for the transient analysis, the circuit is directly in steady-state. A drawback is that the solution of this iterate equation is computationally expensive. Forming and solving the dense sensitivity matrix is the most expensive. If a method like Gaussian elimination is used, then the costs are very high and the speed-up of the direct Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 38 method of steady-state determination is limited. The newer Krylov-subspace iterative methods result in much greater computational savings. The computation cost of forming the sensitivity matrix can be avoided if instead of forming the sensitivity matrix at each transient analysis step, the quantities that are used to form it are stored and then accessed when they are needed for the solution of the iterate in the Krylov-subspace method [Telichevesky95a, Telichevesky95b, KleinerOl, KleinerOO]. 3.3 D irect versus iterative m ethods The equation (3.4) can be looked at in the familiar form: Ax = b where | / - y J = A , (3.6) (x0( r / +1- x Q(t)k) = x , and (x0k~(p(x(t0 ),r0,T) = b). The equation in (3.6) is solved for .v, and then the nextiterate is found by subtracting the two quantities. In this thesis, the matrix A is always presumed real, square, and nonsingular, so that, for each real right-hand side vector b , there is a unique real solution .v given by x = A ' lb (3.7) For our problem the matrix A , also referred to as the sensitivity matrix, is not assumed to be sparse. A direct solver is a linear solver that constructs an explicit operator v i—> A”'v by computing the inverse A-1 itself or a factorization of the matrix, and then applies this operator to b to find the solution .vas in the explicit for (3.8) [Strang76]. A common Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 39 example of a direct linear solution method is Gaussian elimination. There are three distinct stages in the solution process using direct Gaussian elimination. They are: 1. The LU factorization of A , where L is a lower triangular matrix and U an upper triangular matrix. This decomposition is performed by direct Gaussian elimination, no pivoting being used. LUx = b (3.8) 2. Transformation of (3.8), by elimination into: Lc - b (3.9) Equation (3.9) is solved by forward substitution. This stage can be performed at the same time as the LU factorization. 3. Back-substitution to form x : that is, the solution of Ux = c (3.10) If the matrix A has a special structure then special adaptations of Gaussian elimination could be used. This can save many operations and storage space [Kundert88]. Unfortunately, for our problem the general structure cannot be predicted and therefore, these special methods cannot be applied. Floating-point operations or flops, are the basic arithmetic operations of a computer. These operations include addition/subtraction and multiplication/division. Gaussian elimination, QR factorization, and most other algorithms of dense linear algebra Fit the following pattern: there are 0(m) steps, each requiring 0 ( m 2) flops, fora total work estimate of 0 ( m 3) . For example, in solving the n x n matrix equation Ajc = b by the Gaussian elimination method, the augmented matrix (a|&) is First reduced to upper-triangular form. Taking into account the increasing number of zeros as the method goes row by row, the Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 40 reduction of the matrix to zeros below the diagonal will require (n - 1)2 + (n —1)~ +... + 22 +1 multiplications and the same number of subtractions. The next step of back substitution requires n(n - 1)/2 multiplications and the same number of additions. Combining these two operation counts, the number of flops required in the Gaussian elimination method is: — 6 n(n v —lX4n - 1)/ = —3— . This is actually a minimum amount because usually row pivoting must occur if small pivots are encountered. Assuming that a swap occurs for every step, the maximum number of flops is asymptotic to 2/j3/3 [Strang76]. 3.4 Iterative m ethods background The non-direct methods are often called iterative methods. Iterative methods begin with an initial guess and compute a sequence of approximate solutions that converge to the exact solution. An iterative solver is a linear solver that produces a sequence .r0,.r,,.. ,,xk,... of approximations to the solution. The step going from to x k is called the kth iteration of the solver. .t0 is the initial guess and is usually set to the zero vector. At step k of an iterative method the iterate is updated as: x k = x k-i+<XkPt (3-11) where a k is a scalar and pk is a direction vector. The accuracy of an iterative method solution depends on the number of iterations performed. An iterative solver usually terminates if the approximation is “close” enough to the solution as determined by the user [Barrett94]. The original matrix A is not transformed Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 41 meaning that if it is sparse it is not filled in or if it has some special structure that structure is preserved. The amount of work involved in the solution using an iterative method depends on the convergence rate and the desired accuracy. Since the most expensive operation is matrixvector multiplication, the work is proportional to the flops required to perform a matrixvector product times the number of iterations. For some iterative methods convergence can be very slow or irregular. The convergence rate for an iterative solver depends on the spectrum of the coefficient matrix and on the condition number. The rate of convergence can be helped by the technique of preconditioning. There are many iterative methods. Some are stationary and some are non-stationary. A very useful and efficient class of iterative methods is called Krylov-subspace methods. The primary advantage of Krylov-subspace methods is that they can substantially reduce the operation count and storage for linear system solutions. 3.4.1 Stationary and non-stationary iterative methods Iterative methods can be more computationally efficient for the solution of the system of equations generated by the shooting-Newton steady-state determination. In the iterative methods a possible solution is calculated, checked, and recalculated until the answer is close enough. This process can involve far fewer flops and less computer storage than direct methods. Because of this efficiency, much larger circuits can be analyzed and the steadystate response determined by the shooting-Newton method. There are two general classes of iterative methods, stationary and non-stationary [Freund92, Rheinboldt98, Strang76, Steward98]. In stationary iterative methods the transition from calculated solution, x k to x t+l does not depend on the history of the iteration. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 42 In non-stationary methods, the present iterate is dependent on previous iterates. Stationary iterative methods can be expressed in the form: x k = B x k~'+c (3.12) where x k is the is the guess at step k , B and care quantities that do not depend on the iteration step, and x k~l is the previously computed iterate. There are four main stationary methods: 1) the Jacobi method, 2) the Gauss-Seidel method, 3) the Successive Over relaxation (SOR) method, and 4) the Symmetric Over-relaxation (SSOR) method. In the Jacobi stationary method, strong diagonal dominance is needed for convergence. A system where the equations can be ordered so that each diagonal entry is larger in magnitude than the sum of the magnitudes of the other coefficients in that row is called diagonally dominant. The Gauss-Seidel algorithm uses the most recent values of solution approximation, x k, to improve the convergence rate. The Gauss-Seidel method’s convergence depends on the order in which the equations are examined and computed while obtaining the new iterate. It has faster convergence than Jacobi, but slower convergence than non-stationary methods. SOR and SSOR are adaptations of Gauss-Seidel method. They can be used to accelerate convergence of the Gauss-Siedel method or to converge to a solution when Gauss-Seidel fails. Although the stationary iterative methods can use fewer operations and computer storage than non-stationary ones, they converge for a small subset of problems where there is diagonal dominance. Since the system of equations generated for the shooting-Newton method are not usually diagonally dominant, nor can that property be known ahead of time, the non-stationary iterative methods are a better choice. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 43 3.5 K rylov-subspace m ethods The Krylov subspace is defined as the space spanned by successively larger groups of vectors: (3.13) Krylov methods are based on Krylov subspaces and form a class of iterative methods for the solution of linear systems. Krylov algorithms include the Amoldi-based Generalized Minimal Residual (GMRES) algorithm, and the Lanczos-based Conjugate Gradient (CG), Conjugate Gradients Squared (CGS), BiConjugate Gradient squared Stabilized (BiCGSTAB), and Quasi-Minimal Residual (QMR) algorithms (Figure 3.1). Krylovsubspace methods are non-stationary iterative methods [Ipsen98, Skewchuk94]. This means that the solution to the system Ax = b is iterated, and at each iteration the information used to calculate the iterate changes. The general form of a non-stationary iterative method is: xk =**-, + a kp k (3.14) where p k is a search direction and a k is scalar and found by some criteria according to the exact method used. The scalar and direction vector are usually computed by taking inner products of residuals or other vectors arising from the iterative method used. Non-stationary methods differ from stationary methods in that the computations involve information that changes from one step to another. In other words, information changes as the iteration index changes. Krylov-subspace methods start with an initial guess, ,r0 and generate successive approximations to the answer. Given matrix A , and vector b , the associated Krylov sequence is the set of vectors b ,A b ,A 2b,A 2b ,.... During the iteration at step k , a Krylov- Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 44 subspace method produces the approximate solution from the Krylov space at that step that was generated by the vector b . In all the Krylov algorithms, the solution/ iterates can be thought of as matrix polynomials. For example, in GMRES, CG, BiCG, and QMR: xk E span{b,Ab,...,Ak~lb} if x0 = 0 . (3.15) (For simplicity we will assume that x0 = 0 , else .xk E span{r0, Ar0,..., A*"‘r0}.) Then .rt E span\b, Ab,..., A*'1/?} means xk is a linear combination of the vectors b ,A b ,A 2b,...,A"~lb , i.e. X* = a0b + al(Ab)+ a2{A2b)-\----- 1-ak_l[Ak~{b) (3.16) where O o,...,^,, are scalars. Now if we factor out the b on the right we observe: xk = (a0 +alA + azA 2 + ■■■+ ak_xAk~x)fr = pk_l(A)b where (3.17) is the polynomial of degree k —1 pk_t{t) = aQ+ a{t + a2t~ +... + ak_lt k '. (3.18) Since xk can be written in this form, the residual can also be written as a polynomial: rk = b —Axk = b - Apk^ (A)b = { l - A p k_l(A))b = qk(A)b (3.19) where qk is the k degree polynomial qk (r) = 1- a0t - a,/2 - a2t3 - . . . - ak_xt k with the property that qk(0) = 1. If we had taken xQ ^ 0 , then xk = pk_i(A)r0 and rk = qk(A)r0. One can think of thedifferences among CG, GMRES, BiCG, and QMR in how the coefficients of the polynomial are chosen. For GMRES, the qk(t) is the minimalresidual polynomial: qk{t) = argmin||/?(A)r0||2 p ( /) £ n * Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. (3.20) 45 The difference between the methods such as CGS and BiCGSTAB is more easily understood by thinking of the Krylov methods as using different polynomial manipulation. This chapter will first present a general background of Krylov-subspace methods, including the original Krylov method, CG, followed by GMRES, and the group of similar methods of BiCG, CGS, QMR, and BiCGSTAB. We note that CG is not applicable to most matrices since these are not SPD. However, we use CG as a jumping off point to introduce the other Krylov methods. In most cases the Krylov-subspace methods converge to a solution much faster than stationary solvers and in less time than a direct solver. In addition, because they require much less storage, Krylov-subspace methods offer an efficient method for solving the system of equations generated by the shooting-Newton method of steady-state determination. 3.5.1 Classification of Krylov methods This section starts by defining important terminology used in describing the Krylov methods [Strang76, Greenbaum97]. 1. Span: if the vector space V consists of all the linear combinations of particular vectors vw,,...,w/ these vectors are said to span the space V . Every vector v in V can be expressed as some combination of w 's v = c,w, + ... + c,w, for some coefficients c ,. 2. Basis: the basis for a vector space is a set of vectors that are linearly independent and span the space. 3. Orthogonal: Two vectors are orthogonal if their dot product is zero. They are said to be perpendicular. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 46 4. Linear independence: if non-zero vectors v,,...,vt are mutually orthogonal (every vector is orthogonal to every other vector), then they are linearly independent. 5. Orthogonal subspace: two subspaces of the same space are orthogonal if every vector in one subspace is orthogonal to every vector in the other subspace. For example for subspace V containing vectors v and subspace W containing vectors w : vr vv = 0 for all v and w. 6. Gram-Schmidt orthogonalization: Any set of vectors av ...,a n can be converted into a set of orthogonal vectors by the Gram-Schmidt process. First, v, = a ,, and then each v, is orthogonal to the preceding v,,...,v(_,: v .a T ' v,_,. For the current vector the components in the direction of the previous vectors are subtracted. For every choice of I, the subspace spanned by the original al,...,an is also spanned by v,,- -,v,. The final vector is then normalized. 7. Upper Hessenberg matrix: a matrix is said to have Hessenberg form if all the diagonals below the first subdiagonal are zero. 8. Tridiagonal matrix: a matrix is tridiagonal if all the nonzeros entries lie on the main diagonal and the two adjacent diagonals. 9. Biorthogonalization: W TV = D = V TW where D is a diagonal matrix. . If we let v Vj be the orthogonal basis for the span: {r0, Ar0,...,A i_Ir0} and w}'s be the orthogonal basis for the span: {/^, A7,..., An ~‘r0}. Then biorthogonalization : VkUWk+i =Diagonal of size k +1 x k +1 and reduction to tridiagonal form is related to the fact Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 47 AVt = vk+irt +l- Therefore, WkT+lAVk = Wk\ lVk+lTk+l = DTk^ where Tk+l is tridiagonal. Then, D -lWkT^A V k =Tk+l. 10. A-orthogonal: for example .t7Ay = 0 . 11. Conjugate: the complex conjugate of a + ib is a - ib . There are two general classifications of the Krylov methods (Figure 3.1). The original Krylov method of CG produces a tridiagonal matrix in the process of orthogonalization. The two classifications result from the adaptation of the CG method for nonsymmetric and /or not positive definite matrices. GMRES is based on the Amoldi Gram-Schmidt process and results in a matrix that is not tridiagonal but preserves the orthogonalization of the basis vectors. Other Krylov-subspace methods give up the orthogonal basis. In CGS, BiCG, QMR, and BiCGSTAB, the tridiagonal matrix is formed by the biorthogonalization of the system. In the following subsections, equation (3.6): A t = b was solved. The coefficient matrix, A being referred to is the sensitivity matrix, I - J 0. Conjugate Gradient (CG) tridiagonal orthogonalization GMRES Hessenberg orthogonalization CGS, BiCG, QMR BiCGSTAB tridiagonal biorthogonalization Figure 3.1 Classification of Krylov-subspace method. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 48 3.5.2 Conjugate Gradient (CG) method The CG method is the oldest and best known of the Krylov-subspace methods [Saad95, Skewchuk94]. It calculates successive approximations to the solution, the residual, directions vectors and scalars used in the updating of the approximations to the solution. The CG method is used to solve symmetric positive define (SPD) systems which is usually not the case for the systems of interest here. Matrix A is symmetric if: A = A T. (3.21) Matrix A is positive definite if .rr A t > 0 for all .v 0. (3.22) The CG method, described below, is the basis for the iterative non-stationary Krylovsubspace methods. It can only be used when the coefficient matrix is symmetric positive definite, a condition that is generally not met for the coefficient matrix generated by the shooting-Newton iteration. The CG method is explained here because it exemplifies the other Krylov-subspace methods that can be used in circuit simulation. The tridiagonal form of the matrix makes for relatively easy solution of the linear system. The other methods are Krylovsubspace methods similar to the CG method which abandons orthogonalization for biorthogonalization in the CGS, QMR, BiCG, and BiCGSTAB methods. In the GMRES method the orthogonalization is preserved but the matrix is no longer tridiagonal but of Hessenberg form. CG is based on the symmetric Lanczos algorithm, which is a special case of the Amoldi algorithm for symmetric systems. In the Amoldi method, the matrix A is reduced to Hessenberg form by an othogonalization process. When the system is symmetric, the resulting Hessenberg matrix is tridiagonal. This makes for short recurrences in both the Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 49 Amoldi method and in the solution process. This tridiagonal matrix also means that only a few vectors need to be stored. In the Lanczos method, an orthogonal basis for the Krylov-subspace is constructed and the coefficient matrix undergoes a similarity transformation to tridiagonal form [Saad96]. In this process, the residuals created during the Lanczos method are orthogonal to each other. Also, the search directions (or pointer vectors) generated are A-orthogonal or conjugate. We can use these two conditions to show the derivation of the CG algorithm from the Lanczos method. The vector of the iterated approximation to the solution can be represented as: -r,+, = .r j + a jPj (3.23) Then, the residual vectors must satisfy the recurrence: rJ+[ = rt - a j Ap ] (3.24) If the ry ’s are orthogonal, then: (rJ - a j ApJ,rJ) = 0 (3.25) And, therefore, the following scalar relationship must be true: [r , r ) a ‘ mv £ t r It is known that the next search direction p y+1 is a linear combination of rJ+[ and p ; . and again using a scalar to guarantee the required orthogonality and conjugacy: P j +i = r ;+ i + fijPj ( 3 -2 7 > Thus, because Apj is orthogonal to p ,: (•AP j ’rj ) =( ( AP j ' P j ) - P j- i P j-i ) = (■APj ’ P j ) Now we can rewrite (3.26) to a more familiar form: Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. (3.28) 50 (3.29) « Also, since p J+l is orthogonal to Apj we set the scalar P as follows: * _ (rj ^ APj) (3.30) From equation (3.24) we can write: AP j = ~ b i ~ rj) (3.31) And substitution of (3.31) into (3.30) yields: q ' 1 a, ~ ^ ) ) _ (r7+i’r;+i) (Apr Pj) [rr r]) These relations compose the CG method. The Conjugate Gradient Algorithm [96] 1. Compute rQ= Ax0 - b , and set p 0 = rQ. 2. For y = 0 ,l,..., until convergence Do a- ^ ’ r'^ 4. x l„ = x l + a l p l . 5• rJ+i = rj - a JApJ. * _ ( r ,+Pr;J 6- p ‘ 7. Pj+l ='-J+i+/3jPj- 8. EndDo Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. (3.32) 51 Importantly, in the CG method the jth iterate of the approximate solution, x J+l is constructed as an element of the following Krylov-subspace: -Vi 6 -ro + sPan{ro,... , A Jr0} so that (xy+I - .v)r a ( x j+1 - x) is minimized. In this minimization x is the exact solution of Ax = b . This minimum is guaranteed to exist in general only if A is symmetric positive definite (SPD). The CG algorithm must store matrix A , with m rows and m columns, or be able to form the matrix-vector products on the fly, vectors x , p , A p , and r with length m . The computational steps are of order of 0 ( m z) for the one dense matrix-vector product, 0(m) for the three saxpys vector updates, and O(m) for the two inner products. This means that if there are m steps per iteration with the CG method, then the algorithm uses 0 ( m 3) flops which produces no significant improvement over the direct methods. The computational savings only come into play when the number of steps involved in solution is less than m or if A is sparse or has some other special structure. 3.5.3 The Generalized Minimum Residual (GMRES) method GMRES is the generalized minimum residual algorithm used to solve non-symmetric linear systems of equations. It is based on the conjugate gradient (CG) algorithm which can only be used efficiently for coefficient matrices that are symmetric and positive definite. CG uses the symmetric Lanczos method to reduce the coefficient matrix to upper Hessenberg. In the CG method, the basis vectors of the Krylov subspace are orthogonal, and the Hessenberg matrix becomes tridiagonal. With methods such as BiCG, CGS, BiCGSTAB, the CG method is changed to solve non-symmetric coefficient matrices. The orthogonalization is abandoned Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 52 for biorthogonalization basis vectors produced by the non-symmetric Lanczos process (Figure 3.1) and the tridiagonal property of the Hessenberg is maintained. In GMRES, the tridiagonal property is abandoned and the othogonalization is kept. Unfortunately, this means that instead of needing to save a few of the basis vectors, all of them must be saved. A real positive aspect of the GMRES algorithm is that the norm of the residual is minimized at every iteration and, therefore, the convergence behavior is monotonic. In GMRES, an orthonormal basis of the Krylov subspace is constructed and the resulting Hessenberg matrix and basis vectors are used to construct a solution that minimizes the residual of the system Ax = b [Saad86, Barrett94]. The Amoldi method is used to construct these orthonormal vectors and reduce the coefficient matrix A to upper Hessenberg form. In the Amoldi method at the rnth step, the following equation is formed: AVm = m+I m (3.33) v ' where Vmis a matrix whose columns are [v ,,v ,,...,v m+l ] and H m is a (m + l)x»i matrix. Vm is an orthonormal system. The H m matrix has one more row than the H m matrix. The only nonzero entry in this extra row is the element hm+l m in the ((/n + l),m) position. The vectors that make up the columns of the Vm matrix and the matrix H m satisfy an important relation that will be used to derive the GMRES algorithm. Any vector .x can be written as: x = x0 +Vmy (3.34) where y is an m length vector and Vm is the Krylov subspace vector described above. The residual of a system of equations is defined to be r = Ax —b . We now define a norm of the residual to be a function of y : Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 53 J (y ) H Ib - ^ || 2= ||b ~ (3.35) + v my ) L We can define /? =|| r0 ||2 and v, = rQ/ j 3 . Then we can use the relationship in equation (3.34) to obtain: b - A x = b - A ( x 0 +Vmy) (3.36) where v, is the first orthonormal basis vector and can be represented as v, = Vm+let , and is the first unit (m + 1) vector. Now since the column vectors of VmH are orthonormal by construction: J (y) =|| b - A(.r0 + Vmy) ||2=|| fiex- H my ||2. (3.37) The GMRES approximation to the solution .v is in the Krylov subspace and is chosen to minimize the norm in equation (3.35). So we construct the approximation as in equation (3.33) and make y at that step to minimize the norm in equation (3.37): where j ( y ) = argmin v || fiex - H my ||2. The determination of y m is an (m + l)x/?z matrix least squares problem with a Hessenberg structure. It can be solved by a QR factorization in a specialized manner using Givens rotations. Givens rotations are used to annihilate single nonzero elements of matrices in reduction to triangular form. The factorization of H mmis O updated progressively as each column appears during the Amoldi process. This allows the Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 54 residual norm to be calculated without calculating the approximation, xm. Therefore, we can choose when this norm is small enough to perform the calculation of x m. The following algorithm results: GMRES Algorithm [Saad96] 1. Compute r0 - b - Av0, /? =|| r0 | |, , and v, = 2. Define the (m matrix ' + l)x/« ' f m ={/i„} l tj Ji<,<m+|.i<y<m-Set H m m =0 3. For y = l,2,...,m Do: 4. Compute w = Avy 5. For / = 1,2,..., j Do: 6. ht]=( W], V]) 7. wi = w 1 - h llvi 8. EndDo 9- h j + i =11 w j I I - I f = 0 s e t m = J a n d g o to 12 w, 1°. y+i-y 11. EndDo 12. Compute the minimizer of || @ex - H my ||2 and x m = x0 The computation costs for the GMRES depend on the number of steps involved in the solution. At each step m , there is one matrix-vector product, m +1 inner products and m + 1 saxpys. The storage also increases with the step number because all of the orthogonal basis Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 55 vectors must be saved. This means that for a matrix of order n , the storage requirement is the storage for the matrix plus (m + 5)*n . GMRES is the only Krylov method discussed here that minimizes the true residual at each step of iteration. It, therefore, has monotonic convergence. GMRES involves only one matrix-vector product with the coefficient matrix, instead of the two required by the previous algorithms. The main disadvantages of GMRES are that the amount of storage and operation count increase as the iteration progresses. A solution to this is to restart the GMRES method after a fixed number of steps and use the final answer to the first series as the starting guess of the next. Unfortunately, when to restart GMRES is a difficult to determine. 3.5.4 BiConjugate Gradient (BiCG) method When the coefficient matrix of a system of equations is not symmetric positivedefinite, the CG algorithm cannot efficiently solve the system of equations. In the CG method, an orthonormal basis for the Krylov-subspace is generated during the reduction of the coefficient matrix to tridiagonal form. Two basic approaches to using a conjugategradient-like method are possible (Figure 3.1). The first is to abandon the tridiagonal formation and form an upper-Hessenberg matrix where all the basis vectors need to be saved. This approach is used in GMRES. The second approach to use is to abandon the orthogonal basis for a biorthogonal one. The latter approach is used for the BiCG, QMR, CGS, and BiCGSTAB algorithms, which are all based on the CG method. The BiConjugate Conjugate Gradient (BiCG) algorithm is based on the reduction of the coefficient matrix A to tridiagonal form. The basis vectors formed to accomplish this are generated by the non-symmetric Lanczos method. In the non-symmetric Lanczos algorithm, Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 56 two sets of residual and direction vectors are generated. In CG, the residual vectors were orthogonal to all the previous residual vectors, and the direction vectors were A-conjugate to the previous direction vectors. Now, with a non-symmetric coefficient matrix, we cannot maintain these relationships and produce a tridiagonal matrix for A. Instead, another set of residual and direction vectors are generated. They are called the “pseudo-residuals” and the “pseudo-direction” vectors. The following relationship is insured by the non-symmetric Lanczos method: (3.38) where ry is the “pseudo-residual” vector and r] is the residual vector. Equation (3.38) means that the pseudo-residual is orthogonal to the previous residual vectors and that the residual vectors are orthogonal to the previous pseudo-residuals. This is called the biorthogonality of the residuals. Another relationship that the non-symmetric Lanczos also maintains is the biconjugacy of the search directions. This means that: (3.39) In other words, the pseudo-direction vector is A -conjugate to the previous direction vectors and the direction vector is A -conjugate to the previous pseudo-direction vectors. Using the two relationships in equations (3.38) and (3.39), the BiCG method can be derived in much the same way as the CG method. The vector of the iterated approximation to the solution can be represented as: (3.40) Although not usually solved for there is another system that can be solved for and is included here for the derivation: Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Then, both of the residual vectors can be represented as: (3-42) rj+i = r j ~ a j AP, = 7j - a , ATp, - <3-43) If the two residual vectors are to be orthogonal then that means that: ((r, - <3-44) a , A P j l ?, ) = Q (rj ' 7j - a JAP j ^ j ) = ° Therefore, the following scalar relationship must be true: (3.45) ' ( * P r 7j) We also know that the next search directions are a linear combination of the residual and previous search direction. This relationship is represented below with the proper scaling of the previous direction: p J+l =rJ+l (3.46) PJ+X =rj+l+/3Jp J (3.47) A consequence of the above relations is that: (■A P j ' 7j ) = (•A P j ’ ( P j ~ P j - \ P j - i )) where equation (3.47) is transformed to the r residual. Expanding this equation: {APj’Pj ~ A p r p ]_lp J_l ) = 0. We note that: (a P ] ' P , ) =( aT P,iP,) = 0 f°r j <i ■ Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. (3.48) and therefore, {AP r 7i ) = ( AP r P j ) Then: M) a. = ' (APj,Pj) In addition, writing that the relationship in (3.48) that p J+i is orthogonal to A Tp j {pJ+i^ TPj) = 0 where p J+l = r , + f i t p j . Substituting in (3.51): ((rj+i + P s Pj \ p , ) = (r;+I, A Tp ; )+ (/?, p J, A Tp J) = 0 so that: From equation (3.47) we find: a TP j ~ 7, ) - and also looking at the denominator of equation (3.52) we can see: ( / V ^ r Py)=((r, + 0 j - \ P, - \ \ a TP i ) With the condition in equation (3.39), we see that: (p] , A Tp ] )=(rJ, A Tp ] ) We can now substitute (3.53) and (3.54) into (3.52) to yield: Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Combining these relationships results in the following algorithm for BiCG. Biconjugate Gradient (BiCG) Algorithm [Saad96] 1. Compute rQ :=b —Av0. Choose r0 such that (r0, r0) * 0 . 2. Set, p Q:= r0, p 0 := r0 3. For j = 0,1...., until convergence Do: (APj'P,) 5. x J+l= X j + a ]Pj 6. r]^ = r J - a i Ap] 7. rJ+l = r] - a ]A Tp ] P> ~ T r ^ T 9. p jyl = ry>l + P j P j 10- P j+i = 7j+i + P j P, 11. EndDo Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 60 The storage used for the BiCG algorithm includes the coefficient matrix A and its transpose, along with vectors of length m: .r, p , p , Ap, A Tp, r, r . The computational resources used are two matrix-vector products of 0 ( n 2) , five vector product updates (saxpys) of cost 0 ( n ) , and two inner products of O(n) flops. There are two main advantages to BiCG. First, a tridiagonal matrix is generated, and the system is relatively easy to solve. Second, the tridiagonal coefficient matrix requires only a few of the generated vectors to be kept in storage. One disadvantage of the BiCG method, however, is that breakdowns sometimes occur in the solution. Because there is no minimization of a norm as in the CG method, the convergence behavior is irregular. While methods like GMRES require one matrix-vector product per iteration, BiCG requires two matrix-vector products. The biggest disadvantage is that the transpose of the coefficient matrix A is needed, and this matrix is not always readily available. 3.5.5 Conjugate Gradient Squared (CGS) method Based on the BiCG method, the conjugate gradient squared algorithm (CGS) is used to solve systems of equations where the coefficient matrix is not required to be symmetric, positive definite as in the CG algorithm. The CGS method is based on the generation of a sequence of iterates whose residual norms r' are formed by the squaring of the residual polynomial: r ' j = 0 j ( A )ro ( 3 -5 6 > The CGS method does not require a matrix-vector product with the transpose of the coefficient matrix. However, it still requires two matrix-vector products per iteration step as Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 61 in BiCG, but both products are with the standard coefficient matrix. Convergence behavior with CGS is often much better than the behavior seen with the BiCG method. The residual is not minimized for this method, and, therefore, the convergence can be very erratic. The conjugate gradient squared algorithm is a Krylov-subspace method that was developed to overcome some of the shortcomings of the BiCG algorithm. The shortcomings that are addressed include erratic convergence and needing to have the transpose of the coefficient matrix. CGS is based on the non-symmetric Lanczos process where we write the residual vector at step j of the BiCG process as the product of r0, the initial residual, and a jth degree polynomial in A as [Saad96]: rj =<Pj {A)r 0 (3.57) where Qj is a certain polynomial of degree j satisfying the constraint <p] (o) = 1. This same polynomial satisfies the pseudo residual of BiCG: Similarly, the conjugate-direction polynomial ^ry(f) is given by: Pj = n J(A)r0, (3.58) in which 7tJ is a polynomial of degree j . We can then write the pseudo-search direction as: (3.59) The derivation of the CGS method relies on simple algebra. It is based on the BiCG method with polynomial representation of the residual and search direction vectors. First, the recurrences for the squared polynomials are derived. If we define the polynomials by recurrences, we would write: (3.60) Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 62 x ]+x{t) = 0J+l (t) + /3J7TJ(t). (3.61) We can then square equations (3.60) and (3.61) to get: 0;-. (0 = 0) (0 - 2a Jt7r, i{)0, (0 ■+ f'2A (0 (0 = 0,' (')■+20J0J+1 ( 'K (0 + / ? > ; (O'■ (3-62) (3-63) If we introduce one of the cross terms, 0 x{t)n, (0 as a third recurrence, then the equations in (3.62) and (3.63) would form an updateable recurrence system. For the other cross term, we use the following: 0, (<hj (0 = 0, {t)(<p, (0 +P , - Xn,_i (0) = 0j(') + Pj-iQ] ('K-. (0 (3-64) Then if we combine all the terms (leaving out the variable (t) for clarity): 0U = 0) - & A 20 j + 2PJ-X0!71-j-i - 0Cjt7c) ) (3.65) 0 J ^ J = 1 + ^ , - i ^ y - i ~ a jtA (3-66) A (3-67) h = A*i + 2 +A A The recurrences defined above are the basis of the CGS algorithm. Now, if we define: rj = 0 ] ( A Vo (3-68) P , = A ( A )ro (3-69) <7, = 0j+x { A 71} iA )ro (3-70) If we write the recurrences for the equations (3.68) - (3.70), then we have: V> = rj - c C j A 2rj + 2Pi-rtj-i ~ a , AP j ) (3-71) 4j = rj ~ a j APj (3-72) p J+l = rJ+l + 2 P jq j + A P, (3-73) Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 63 The auxiliary vector is defined to be: d l =2rl +2 f i r i q l^ - a l Apl (3.74) We can then use the following operations to compute the approximate solution, starting with r0 := b - Ar0, p0 := r0 , qQ:= 0 , /30 := 0. a . W ' (APj’ Pj) d j = 2r] + 2 P J_xq i_i - a ]A p J q, = r, + / ? , _ , - a ]A p j 'V . = 'r ; + a jd j rj +i =r>~ a j A d J a _ ( r ,+1, r , J p nl = rJ+l + P ; {2q ] + Usually, a simplification of the algorithm is constructed by using another auxiliary vector: i i j = rJ + P ]_lq ]_l Then the following relationships are true: d , = u j + < J j qj = Uj ~ a j APj p ]+l = u J+l +/3J(qj + f i Jp J), so that vector d ] is no longer needed. The algorithm is listed below. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 64 Conjugate Gradient Squared Algorithm [Saad96] 1. Compute r0 := b - Ax0; r0 arbitrary. 2. Set Pq := r0 := u0. 3. For j = 0,1,2,..., until convergence Do: 1 -r - a‘ ~ W ^ 5. q j = Uj —CC] A p J 6- Xj+x = X j + a j {uj + q , ) 7- rJ+l = r t - a j A(uI +<?,) „ _ ( r >+I,ry>1) 9. *<y+l = ry+1 + P jqj 10- Pj +l = « y+I 11. +Pj P>) EndDo A matrix-vector product with the transpose of A is not required which is important if the transpose of A is not readily available. Instead, two matrix-vector products with A are computed at each iteration. One step of the CGS algorithm uses about the same number of arithmetical operations as one step of the BiCG algorithm. Again, as in BiCG, the residual is not minimized and, consequently, produces erratic convergence. In some cases the error is even magnified instead of reduced. Besides the two matrix-vector products with A , there are Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 65 13n flops for vector updates and two inner products. Memory requirements are storage for x ,b , r , and A , and in addition four additional n -vectors r , p ,u ,q . As mentioned above, there is no matrix-vector product with the transpose of A preformed. Instead, two matrix-vector products are done with the matrix A . One can compare this to the GMRES algorithm that has only one matrix-vector product. The CGS algorithm converges twice as quickly as the BiCG algorithm. Unfortunately, it also diverges twice as quickly. The residual vector is calculated from the squaring of the polynomial and, hence, the rounding errors can be greater, resulting in an inaccurate answer. 3.5.6 Q uasi-M inim al Residual (QM R) method Two essential parts of the Krylov-subspace methods involve the construction of a suitable basis for the Krylov-subspace and the formation of the solution iterates. In QMR, the non-symmetric Lanczos method is used to form the basis (Figure 3.1) [Bucker94, Barrett94]. The Lanczos algorithm produces the matrix Vn = [v,v2 •••vn]e |C" whose columns are n basis vectors spanning K„(r0, A). The QMR iterates are of the form: x n = x Q+ Vnz„, zn e C" (3.75) for some parameter vector zn. From the definition of biorthogonalization, we have AVn = Vn+xTn, so the corresponding QMR residual is: rn =Vn+l( f a - H nzn) where J3 = ||Z?||, e = (l,0,...,0)r is used for the first unit vector in R n+l , and H n e (3.76) c(n+l)xn denotes an upper Hessenberg matrix generated by the Lanczos algorithm. Rather than Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 66 minimizing the norm of the residuals, which is desirable but too expensive in terms of work and storage, the QMR approach merely considers the expression in parenthesis from (3.75). The QMR iterates are then defined by (3.74) where the parameter vector x n e C" is chosen as the solution of the least squares problem: = (3.77) where z „ s C " . Note that a true minimization of the residual norm does not take place, but a different, data-dependent norm is minimized, which hopefully is not much different that the true norm. The general method for QMR is shown below. Quasi-Minimal Residual (QMR) algorithm [Saad96] 1. Compute r0 := b - Ax0 and y 0 :=||r0|| , vv, := v, := — yi 2. For m = 0,1...., until convergence Do: 3. Compute CCm,S m+l and vm+I,wm+1 by Lanczos biorthogonalization 4. Update the QR factorization of Tm 5. Apply rotation Q m, to Tm and g m 6- Pm = ( vm~ Z " P ' ) / r ™ 7. x m = x m_l + y mPm 8- if |rm+[| is small enough then Stop 9. EndDo Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 67 The choice of an adequate stopping criterion is an important aspect of any implementation of QMR or other method. Usually, the decision whether an iterate xn is close enough to the solution is based on the residual r0 = b —Ax0. QMR does not compute the true residual at each iteration, so the “other” residual is watched and when small enough the true residual is calculated. Some disadvantages of BiCG can be overcome by using the quasi-minimal residual method. QMR shows smoother convergence and can be implemented with look-ahead to avoid the breakdown that can occur (with the LU factorization) in the BiCG method. Like BiCG, each step of QMR computes two matrix-vector products, one with the coefficient matrix and the other with its transpose. It also has two inner products, and 12 saxpy operations. 3.5.7 B iC onjugate G radient Stabilized (BiCG STAB) method The BiCGSTAB method is similar to CGS in that it is a transpose-free variant of the BiCG method [van der Vorst92], In the CGS method, the residual polynomial is squared while in BiCGSTAB the residual polynomial is the product of an arbitrary polynomial and the residual polynomial used in CGS with the initial residual: r' = Vrj (A)0/ (A)ro where is the residual polynomial associated with the BiCG algorithm and (3.78) is the new polynomial. This new polynomial is defined to smooth out the convergence of the CGS method and describes a steepest descent update. The new polynomial, y/j (r), is defined by the following recurrence: Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 68 0Vi(O = (l - a ; / V y(0 where the scalar Q)} is determined to achieve a steepest descent step in the residual direction. The derivation of the recurrence relations is similar to the CGS method. [Saad96]. BiCGSTAB algorithm [Saad96] 1. Compute r0 : = b - A x 0; rQ is arbitrary; 2. p 0 := r0 3. For y = 0 ,l,..., until convergence Do: 4. 5. s J =rJ - a j ApJ 6 . ^ 7. x J+l = Xj + a jPj + cojSj 8. r J+l = sj - coj A s j 9. Vj ’ ^0/ 10- /V i = V , <3-79> + fiJ ( p J ~ (O j A P j ) 11. EndDo Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 69 For the BiCGSTAB method there are two matrix-vector products with the matrix A , four inner products (two more than the CGS method), and 12 linear computational operations on vectors (i.e. vector updates). The storage needed is the n x n matrix A and 10« more. The BiCGSTAB algorithm does not use the transpose of the matrix A in the calculation of the recurrences. This is advantageous for cases where the transpose of the matrix A is not readily available. It is constructed to smooth out the convergence of the CGS method. The method can also be thought of as a product of the BiCG method and the GMRES method. The residual vector is minimized locally by this GMRES local step. However, when the GMRES step stagnates the residual vector is not minimized and the algorithm breaks down. 3.6 Summary If the standard Gaussian elimination type direct method is used to solve for the iterate formed in the shooting-Newton method, the number of equations that can be in that system is limited. These direct methods use 0 ( n l ) operations to solve the system of equations and 0 ( n 2) computer storage, which becomes much too costly for larger systems of equations that are generated by large circuits. Using a direct method, with no round off or other errors, the exact solution is obtained after a finite number of operations. When round-off errors arise and large systems of equations are being solved, the errors increase and can make the results unacceptable. All the steps in a direct method such as Gaussian elimination must be done before a solution is obtained. The algorithm cannot be stopped before the end with a “less” accurate solution. If the matrix has a special structure, it is not always preserved with a direct method. For example, a sparse matrix is filled in during Gaussian elimination. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 70 An alternative approach is to use a non-direct (iterative) method. An iterative method starts with an approximate solution and uses a recurrence formula to calculate another approximate solution. The formula is applied repeatedly until a solution is deemed “good enough”. Although the iterative methods can solve the system of equations in a finite number of steps, they can be stopped before they reach the exact answer. Their cost in terms of computer operations and storage can be much lower than the cost for direct methods which involve 0 ( n 2) storage and 0 ( n 3) flops. Depending on the iterative method used and the accuracy desired, the costs could be lowered to O(n) . Further, iterative methods are also more insensitive to the propagation of errors [Greenbaum97]. The two types of iterative method are stationary and non-stationary. The stationary methods are efficient for diagonally dominant coefficient matrices. This property is not generally found in the coefficient matrices for steady-state determination. A more attractive method is the Krylov-subspace methods [Saad86, Vuik92]. These include CGS, BiCG, QMR, GMRES, and BiCGSTAB, discussed above. The computational aspects of the Krylovsubspace methods are presented (Table 3.1). GMRES has the least operations when the number of iterations is low. QMR and BiCG need the transpose of the matrix, while CGS and BiCGSTAB do not need the transpose, but require two multiplications of the original matrix. The storage of GMRES increases as the iteration number increases because the orthogonal vectors must be stored. The convergence behavior varies for the different Krylov methods. For some methods, such as CGS and BiCG, the residual is not reduced at each step of the iteration. The GMRES, QMR and BiCGSTAB algorithms can show fast convergence. CGS can have fast convergence, but its convergence can be very irregular. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 71 METHOD INNER PRODUCT SAXPY CG GMRES BICG QMR CGS BICGSTAB 2 k+ 1 2 2 2 2 3 k+ 1 5 8+4 6 6 MATRIXVECTOR PRODUCTS 1 1 2 2 2 2 STORAGE REQUIREMENT Matrix Matrix Matrix Matrix Matrix Matrix + + + + + + 6n (k + l)n lOn 16n 1In lOn Table 3.1 Computational aspects of Krylov-subspace methods. The Biconjugate Gradient (BiCG) method generates two sequences of vectors that are made mutually orthogonal (i.e. biorthogonal). One of the sequences is generated by the original coefficient matrix, while the other is generated by the transpose of that matrix. BiCG uses much less storage than GMRES, but its convergence can be irregular. It does not minimize the residual or the error at each iteration as does GMRES. BiCG also needs two matrix-vectors products at each iteration. The Quasi-Minimal Residual (QMR) method applies a least-squares solution and update to the BiCG residuals. QMR uses less storage than GMRES and does not minimize the residual or error, but minimizes a different data-dependent quantity that is usually not so far from the residual. Typically, QMR has better convergence than BiCG and it requires matrix-vector multiplications of the original matrix and its transpose at each iteration step. The Conjugate Gradient Squared algorithm (CGS) is used to solve systems of equations where the coefficient matrix is not required to be symmetric, positive definite as in the CG algorithm. Although CGS is derived from the BiCG method, it does not need the Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 72 transpose of the coefficient matrix. The CGS method is based on the generation of a sequence of iterates whose residual norms r ' are formed by the squaring of the residual polynomial (3.55). The CGS method does not require a matrix-vector product with the transpose of the coefficient matrix. However, it still requires two matrix-vector products per iteration step as in BiCG, but both products are with the standard coefficient matrix. Convergence behavior with CGS, because of the squaring of the residual polynomial in (3.55), is twice as fast (or twice as slow). The residual or error is not minimized for this method and, therefore, the convergence can be very erratic. The BiCGSTAB method is similar to CGS in that it is a transpose-free variant of the BiCG method. In the CGS method, the residual polynomial is squared, while in BiCGSTAB the residual polynomial is the product of an arbitrary polynomial and the residual polynomial used in CGS with the initial residual (3.77). This new polynomial is defined to smooth out the convergence of the CGS method and describes a steepest descent update. The GMRES method abandons the tridiagonal matrix for true orthogonalization. Because of this, the residual is minimized at each step. It only requires one matrix-vector product per iteration step, but requires the storage of all the orthogonalization vectors up to the present iterative step. When the number of iterations is small, this storage issue is not a problem and the GMRES method is the method of choice. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 73 Chapter 4 Implementation of the Shooting-Newton Algorithm Using Krylov-Subspace Methods This chapter describes the practical algorithms for the implementation of the shooting-Newton algorithm for steady-state determination using Krylov-subspace methods. These algorithms have been implemented and tested in SPICE 3f5, the most popular analog integrated circuit simulation program. Two main aspects of implementation are discussed below, along with the implementation details for the Jacobian needed for the shootingNewton iteration and the use of the Krylov-subspace methods for solution of shootingNewton iteration. The addition of an analysis to SPICE is well documented [Nagle78, Ashar89, Quarles89c, Quarles89d, Cohen] and is expedited by the modular structure of the SPICE code. The implementation of the shooting-Newton algorithm depends on the efficient formation of the Jacobian, also called the sensitivity matrix. Using quantities already available during transient analysis, the matrix can be easily formed [Colon73]. The structure of that matrix, along with its eigenvalues, can give insight into the nature of the nonlinear system and the ease of solution. The integration method used during transient analysis also Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 74 impacts the steady-state determination. The importance of using a method like the trapezoidal method is shown. Another important implementation issue is the use of Krylov-subspace methods. The important issues include stopping criteria, robustness, breakdowns, non-convergence and memory and operation counts. 4.1 Implementation in SPICE The implementation in SPICE is relatively easy. An augmented version of SPICE 3f5 in which algorithms for the computation of the periodic steady-state including Krylovsubspace methods for solution are implemented. The new analysis method is modeled after earlier work [Ashar89]. When using the steady analysis, one can specify the type of circuitautonomous or nonautonomous, the time period, the minimum number of time points per one period of transient analysis, and the desired accuracy with which the convergence to steadystate is tested. When the steady-state analysis is called, an initialization routine is called to set up the data structures and initialize the variables used in the analysis. The user may specify the minimum number of time points per simulation period, the accuracy desired, the amount of damping, if any, of the Jacobian and the method of solution of the iterate: Gaussian elimination, BiCG, QMR, BiCGSTAB, CGS, or GMRES. The program will then perform a few periods of transient analysis so that the initial transients can die down some. Four periods is sufficient for the examples used. This also allows for observation of the transient response for chaotic or unstable responses. Next, the error, or the difference between the states of the capacitors and inductors is calculated. If the Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 75 PSEUDOCODE FOR IMPLEMENTATION IN SPICE 1. Initialize and set up data structures, run three periods of the transient analysis 2. For j = I...maximum iterations 3. if (CalculateSensitivityMatrix = TRUE) 4. do one period of transient analysis 5. calculate sensitivity matrix at the same time 6. else (CalculateSensitivityMatrix = FALSE) 7. do one period of transient analysis 8. Compute error and determine if in steady-state 9. if (Check = TRUE) 10. if (steady-state) return 11. else if (error > threshold) 12. Check = FALSE 13. CalculateSensitivityMatrix = FALSE 14. else if (Check < threshold) 15. calculate new initial state using state-transition function and Jacobian 16. apply shooting method for calculation 17. use different Krylov methods to solve iterate 18. CalculateSensitivityMatrix = FALSE 19. if (Check = FALSE) & (error < threshold ) 20. Check =TRUE 21. CalculateSensitivityMatrix =TRU E 22. End For loop error is small enough, then the next loop will calculate the sensitivity matrix at the same time the transient analysis is done. Note that the state transition function is merely the result of one period of transient analysis using a specified initial condition. The Jacobian, also called the sensitivity matrix, is computed at the same time using sensitivity circuits. Since the same quantities are required for its computation as are generated by the transient analysis, the calculation occurs step by step with the state transition function. Once the Jacobian and the state transition function is available, then the functions that solve for the iterate are called. After the initial conditions are determined (conditions that may put the circuit into steadystate) the circuit is simulated for another period and again the error is calculated. It is checked to see if the steady-state response is reached, whether it is too large to calculate a Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 76 new iterate for a new initial guess, or if it needs to be simulated for another period and the error checked again. This algorithm is looped through until there are too many iterations, or the circuit is in steady-state. 4.2 The Jacobian m atrix for the shooting-N ew ton m ethod Calculation of the Jacobian matrix, also known as the sensitivity matrix, is done using the efficient method described in Chapter 2. It is calculated step by step along with the transient analysis because quantities used for Jacobian formation are needed by the transient analysis. After one period of simulation, the state transition function and the Jacobian are formed. One can then solve the iterate in equation (3.13) and determine if another iteration should be done. Although this method is more efficient than using the integration method, the cost of forming the sensitivity matrix for the shooting-Newton iteration is still 0 ( n 3) . 4.2.1 Minimizing number of sensitivity matrix formations Since the formation of the sensitivity matrix is so computationally expensive, practical implementations must be used to make sure the sensitivity matrix is computed only when needed [Colon73]. The shooting-Newton algorithm has quadratic convergence if the initial guess is close enough to the solution. If the initial guess is not close to the solution, then overshoot may occur or if many solutions exist, the method can find the wrong one. The shooting-Newton algorithm implemented in SPICE has strong convergence properties because it is based on transient analysis which can choose nonuniform time steps to control the error and hence convergence [Kundert90]. This shields the outer iteration from the non- Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 77 linearity of the problem. To ensure that the guess is close to the solution a number of practical adaptations of the shooting-Newton algorithm is made. First, the transient analysis is run for four periods before it is checked to see how much difference there is between cycles. This will allow fast transients to die off. It also allows the observation of the transient analysis response for chaotic or unstable responses. If the error is small enough then the sensitivity matrix is calculated for the next transient analysis cycle along with the state transition function. The iteration of the shooting-Newton method can proceed with the error being checked and repeating transient analysis and sensitivity calculations where dictated. Secondly, the sensitivity matrix is only formed when the error between transient analysis cycles is smaller than a threshold value. The transient analysis simulation continues until the threshold value between cycles is reached and then during the next transient analysis cycle the sensitivity matrix is computed at each time step. The shooting-Newton method is then used to calculate a new set of initial values and checked by another transient analysis cycle to see if the circuit is in steady-state. 4.2.2 Damping the sensitivity matrix Another important practical implementation in the shooting-Newton method in SPICE is to dampen the shooting-Newton Jacobian so that a guess cannot overshoot the solution. Damping reduces the effect of the Jacobian on the iteration. [Fan75, Colon73] The dampening coefficient is calculated by: (4.1) Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 78 where T] is the dampening coefficient, m is user specified and e is an error measure specified as: e = max(||0(.t(ro), t0, T) - .y ( o |) (4.2) The error measure gives an indication of the how close the initial condition guess is to the steady-state response. Inspection of equations (4.6) and (4.7) show that with a large error the effect of the Jacobian is small and with a small error the full Jacobian is used in the calculation of the iterate. 4.2.3 Importance of integration method An important aspect of the transient analysis and shooting-Newton algorithm is that the method of integration assures the accuracy of the solution. If the backward or forward Euler is used, the error can be unacceptable [Nagle75]. Therefore, the implementation uses the trapezoidal integration method exclusively for the transient analysis and shooting-Newton method. The user may also set the minimum number of points to be simulated in each period. This sets the maximum time step which can aid in convergence and accuracy. Setting the time step too small can cause convergence problems because of the chances of trying to analyze a discontinuous region. 4.2.4 Eigenvalues and sparsity pattern of the Jacobian The structure and eigenvalues of the sensitivity matrix can give insight into the difficulty involved in solving for the iterate formed at each shooting-Newton application. Sparsity patterns for many circuits were examined (Figure 4.1 and Figure 4.2). Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 4.1 Sparsity pattern of the sensitivity matrix for an example circuit of a multiplier. As can be seen from the examples in Figure 4.1 and Figure 4.2, there are a variety of nonzero patterns for the circuits examined. This pattern of the sensitivity matrix is repeated for each iteration of the shooting-Newton method. Techniques to use this repeated pattern for faster matrix-vector multiplications was implemented, where the matrix entries were stored in the sparse matrix format. Eigenvalues were determined on the sensitivity matrices arising from some of the typical circuits that benefit from the steady-state shooting-Newton analysis. The eigenvalues Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 80 0| 1 . . j . . • 1 • • • 1 .................................................................................................... * • • • • • • 5k ! ! 1 - ............................................................................ ......... • • • • • • • • • • • j • • • • • • • • • • • • • • • • • • • • • • • • * • • • i -j 1111111 !!!!«!!!!!!! • • • 10k • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • i 15k -! • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • '' m i y.iy.y. ii 20- • • 25- • • • • • • • • • - • • • • • • • • • - 30- ii • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • • 0 5 10 15 nz = 474 20 25 • • • [ 30 Figure 4.2 Sparsity pattern of the sensitivity matrix for a buck converter. were found to have a negative real part (Figures 4.3 and 4.4). Figure 4.3 describes a circuit with eleven states and a sensitivity matrix of dimension 11x11. In Figure 4.4, the buck converter has a 32x32 sensitivity matrix. The Krylov-subspace solution exists in a lower dimension if the minimum polynomial of A is smaller. The minimum polynomial q(t) of A is constructed from the eigenvalues of A [Ipsen98]. The smaller the degree of the minimal polynomial, the smaller the dimension needed for the description of the inverse of A . Therefore, the faster the method will converge. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 81 Eigenvalues for Sensitivity matrix for C lass C Amplifier (11 sta te s) 0 .1 , 1 ; : 1 , 1 : 1 1 *- 0.08 I- - I ! 0.06r I : ; ~ - 0 . 0 4 !- i ; 0 .0 2 r of ! •i 0 .0 2 ■ h ! -0.04 r •0.06 j- _ - -0 .0 8 1- -0.1-11 ; -0.9 ; -0.8 : -0.7 ; -0.6 ; -0.5 1 -0.4 : -0.3 1 -0.2 : -0.1 ^ 0 Figure 4.3 Eigenvalue Spectrum for Sensitivity Matrix for Class C Amplifier. For the limited number of examples examined, the clustered spectrum of the eigenvalues shows that we will not need a preconditioner to aid with convergence. The wellbehaved eigenvalues result from the shooting-Newton being a two level Newton algorithm. The variable time step in the inner Newton protects the outer shooting-Newton from the nonlinearities present in the original circuit. The circuit looks almost linear and, therefore, the shooting-Newton iteration converges very rapidly. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 82 Eigenvalues for Sensitivity matrix for Buck Converter (32 states) 1 1 1 : : : 1 : 0 .8 - -I 0.6 fi 0.4 - 0.2 - -! 0- 0.2 -* r - - -0.4 r I I - 0 .6 r -i ; ! : •0.8ij-1 I -1.4 ' !--------------------------------------:-------------------------------------- 1--------------------------------------;--------------------------------------:--------------------------------------:--------------------------------------- ! -1.2 -1 -0.8 -0.6 -0.4 -0.2 0 Figure 4.4 Eigenvalue Spectrum for Buck Converter. 4.2.5 Implicit matrix options The sensitivity matrix is generally dense. A sparse matrix structure would make its formation cost less. An alternative method of solution is to not form the sensitivity matrix and to store all the components needed for its formation, the capacitance values, and the factored transient analysis Jacobian [Telichevsky96a]. Then, when the solution of the iterate is carried out, the required matrix vector multiplications are done with the same sparse method used in transient analysis. This can reduce the operation count to 0{ k n ) (k is the number in the sequence) instead of the 0 ( n 2) . One drawback of the method, however, is that Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 83 all the needed items must be stored for each time step, and the amount of storage can be very prohibitive. For the shooting-Newton implementation here, the sensitivity matrix is calculated at each transient analysis time step. 4.3 K rylov-subspace linear solver optim izations The implementation of a Krylov-subspace linear solver can be accomplished with a relatively small amount of code. What is really needed is an efficient implementation of these methods. The main concerns of efficient implementation are the stopping criteria for ending the iteration, and issues around convergence and non-convergence within the maximum number of iterations. It is also important to handle breakdowns of a method, if they occur. An iterative method can stagnate with two successive iterations the same. This problem must be dealt with or no progress will be made in the iteration. The computational issues of memory usage and operation counts must also be addressed. This section will consider these issues and more to arrive at an efficient implementation of the Krylov-subspace iterative methods in the determination of the steady-state response of integrated circuits with periodic inputs. 4.3.1 Stopping criterion The choice of an adequate stopping criterion for the Krylov-subspace methods is an important aspect of efficient implementation. A good stopping criterion should [Barrett94, Greeen, Kelley] identify when the error is small enough to stop. It should also stop if the error is no longer decreasing or decreasing too slowly. Lastly, the number of iterations should be limited, for there is no guarantee that the Newton method will converge. In fact, the Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 84 shooting-Newton method is only locally convergent and, in general, its convergence depends on the initial guess being close to the solution. Here, the decision whether an iterate xn is close enough to the solution is based on the residual: rn = b - A x n. (4.3) Unfortunately, in QMR the quasi-residual is calculated. The real residual shown in equation (4.8) must be calculated separately to test for convergence. This adds another matrix-vector multiplication to the algorithm. An efficient implementation is to calculate the real residual only when the quasi-residual is smaller that a specified value. The iteration is ended when the real residual is smaller that the user specified tolerance for all the Krylov-subspace methods that were implemented, BiCG, CGS, BiCGSTAB, QMR, and GMRES. Importantly, for GMRES, the calculation of the iterate is only done when the residual is less than the specified value. This saves the expensive operations involved in its calculation. 4.3.2 Maximum number of iterations Another stopping criterion for the Krylov-subspace methods is the restriction of the maximum number of iterations that can occur. This was set at n because the size of the matrix is n x n . The GMRES algorithm has been shown [Saad86] to converge in at most n steps. Since the Krylov-methods were being compared, the worst case of n iterations for an n x n matrix was allowed. When restarts were implemented the maximum number of iterations was reset to n . Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 85 4.3.3 Breakdown of the method BiCG, CGS, BiCGSTAB, and QMR all have the possibility of a breakdown of the method. This means that some scalar quantity has been calculated that is either very close to zero or is extremely large. Looking at the pseudo-code in Chapter 2 for BiCGSTAB, the breakdowns occur because of a divide by zero. BiCGSTAB algorithm 12. Compute r0 := b - Ar0; r0 is arbitrary; 13. p Q:= r0 . 14. For y' = 0,l,..., until convergence Do: - MJ „ ' 16. (APj^o) s J = r ] - a ]ApJ a. [ASj, As ] ) 18. x j+l = x ] + a Jp J +co]S] 19 • ri ^ = s j - Q)j AsJ 20. v j ’ ^0 / 21. <», Pj+i =r J+l + P ] {pj -C0 JAp J) 22. EndDo Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 86 There are four types of breakdowns [Greenbaum97]. The first is because of an invariant subspace. The Krylov space K ^ A 7,r0) is A r invariant, so that the new dual residual rt in the method is a linear combination of all the previous dual residuals, and thus orthogonal to ry. The denominator of the line search parameter ory is zero. The second breakdown occurs because the dual residuals are orthogonal. The biorthogonal methods breakdown at a division by zero in the computation of the biorthogonalization scalar . The other types of breakdown occur when no one-dimensional or two-dimensional minimization is needed and the line search parameter co or is zero. The breakdown for the examples in the dissertation presented with (Oj = 0. The breakdown was avoided in these experiments by implementing an initial guess not equal to zero. Once the different initial guess was implemented the methods did not use the available restart. Usually, this restart will allow the algorithm to converge. BiCGSTAB, if interpreted as the product of BiCG and repeated GMRES, is also subject to the same kind of breakdown as BiCG. CGS has the same potential for breakdowns as BiCG and the method is restarted with the current approximation. Full QMR employs look-ahead strategies to avoid the Lanczos process breakdowns. A simpler version of QMR was implemented here and is, therefore, still susceptible to the Lanczos process breakdown. Again, the algorithm is restarted and usually converges. Another fix that was implemented in BiCGSTAB was a restart with an initial guess that was not the usual zero vector. On simulations where the breakdown of the method was occurring this nonzero initial guess allowed the iteration to continue to convergence. GMRES is not subject to the kind of breakdowns that plague the bi-orthogonal methods BiCG, CGS, BiCGSTAB, and QMR. It does have a problem with the Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 87 orthogonalization vectors, which may become non-orthogonal because of accumulated roundoff errors [Kelley]. If this happens, the residual, and hence the solution, could be inaccurate. To remedy the problem, the modified Gram-Schmidt variant is implemented. Unrestrarted GMRES was used. 4.3.4 Computational issues The most time-consuming operation for the Krylov iterative methods is the matrixvector multiplications. The sensitivity matrix is multiplied with a vector for each GMRES iteration. For the other Krylov-subspace methods, there are two matrix-vector multiplications, so the matrix-vector multiplication implementation takes on an even greater importance. The most efficient procedure would take advantage of some special structure of the sensitivity matrix. Unfortunately, the specific structure of the sensitivity matrix varies and is generally unpredictable. Although the sensitivity matrices do not generally exhibit a sparse structure, some do. It was felt that the added computational costs of representing these matrices as sparse for dense matrices would be offset by the savings for the matrices and circuit systems that did produce a sparse matrix structure. Some of the tested circuits revealed a sparse structured Jacobian during the shooting-Newton method. A common format to handle sparse matrices is the compressed row storage scheme[Barrett94], This was implemented in the enhanced steady-state version of SPICE. The compressed row storage format makes no assumption about the sparsity structure of the matrix. No zeros are stored, but pointers are used to address the needed values for the matrix-vector multiplication. The nonzero entries of each row are in contiguous memory cells and three one-dimensional arrays are created. One array Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 88 contains the nonzero values of the matrix, and one array contains integer indices of the columns of these values, and there is one array of pointers to indicate the beginning of the row in the original matrix. The pseudo-code for this kind of matrix-vector product is shown below for a nxn matrix c = A x : For i = l,...,N initialize c(i) = 0 For k = row_Pointer(i), ..., row_Pointer(i+l) - I c(i) = c(i) + value(k) * x(col_index(k)) BiCG and QMR also need a matrix-multiplication with the inverse of the sensitivity matrix d = A tx and the pseudo-code is shown below: For i = 1,...,N initialize c(i) = 0 for k = 1 N For k = row_Pointer(k), ..., row_Pointer(k+l) - 1 d(col_index(i)) = d(col_index(i)) + value(i) * x(k) For both matrix-vector multiplications, the storage is much less than the n 2 required for the full matrix. The compressed row storage format needs only (2) * (# nonzeros + n + 1) storage locations. This format also makes for a reduction in the 2n 2 operation count and is reduced to around n[5 because of no multiplies by the zero elements. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 89 However, the sensitivity matrix is not always sparse (see Figures 4.1 and 4.2). For the dense matrices, implementing the sparse data structure involving creating indices to identify the location of the non-zero entries for use in the shooting-Newton iterate matrix multiplication. For matrices that are not sparse, this structure imposes a complicated and much less efficient structure than simply a two-dimensional array. The sparsity of the shooting-Newton sensitivity matrix cannot be predicted. However, the structure does stay the same for each shooting-Newton iteration and the structure frame can be used again. 4.4 Summary Efficient implementation of the shooting-Newton method for the determination of the steady-state response of integrated circuits for periodic inputs involves issues around the main time-consuming parts of the algorithm, the formation of the Jacobian for the iterate and the solution of the iterate formed. The approach of sensitivity circuits was implemented to form the sensitivity matrix at the same time as transient analysis is done. Also, efficient ways were used to implement the Krylov-subspace methods. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 90 Chapter 5 Results and Comparisons This chapter presents the results of the implementation of the shooting-Newton method for finding the steady-state response of integrated circuits. The method is tested with four different circuits which have difficultly obtaining the steady-state response with traditional methods. These circuits include: a DC power supply, a buck power converter, a Class C amplifier, and a Class AB amplifier. The shooting-Newton method is shown to be more efficient on these circuits than standard transient analysis in SPICE. The different Krylov-subspace iterative methods including BiCG, CGS, GMRES, QMR, and BiCGSTAB, are compared to standard Gaussian elimination for solution of the iterate generated by the shooting-Newton method. Then an analysis of the level of accuracy is given for the Class C amplifier. The chapter closes with a detailed analysis of the convergence behavior of the different Krylov methods for the Class C amplifier example. The shooting-Newton method for determining the steady-state response of the simple DC supply circuit [Skeboe80] in Figure 5.1 is both efficient and accurate. The SPICE input file, along with the steady-state commands used for the simulation, is shown in Figure 5.2. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 5.1 DC supply circuit. 5.1 DC supply test circuit For this example, Gaussian elimination or one of the Krylov-subspace methods (GMRES, CGS, BiCG or BICGSTAB) were used to solve the system of equations generated by the shooting-Newton method. The standard SPICE transient analysis was run until steadystate was achieved and all the transient outputs died out. Table 5.1 compares the capacitor voltages and inductor current values of the shooting-Newton variants in steady state to the values of transient analysis. The data reveals that the shooting-Newton methods and the transient analysis are accurate to at least four digits. Plots for the same variables from the transient analysis run in SPICE are shown in Figure 5.3. Two cycles of the transient analysis along with plots of the Krylov-subspace variants are shown in Figure 5.4. The results for the Krylov-subspace methods are almost identical to the transient analysis output, showing that the non-stationary iterative methods obtain the initial conditions for the state variables that put the circuit directly into steadystate. Table 5.2 shows the amount of time saved when the direct shooting-Newton method is Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 92 DC power supply from Skelboe’s paper vin I 0 sin(0 20 50) d l 1 2 m odi .model m odi d (is= 1e -16 cjo=2pf) c l 1 2 luF c2 3 0 lm F c3 4 0 lm F 11 3 4 0.1 H rl 2 3 5 r2 4 0 Ik “"commands for steady-state analysis: "steady ge non_auto 20ms 50 .01 uic 10.0 "steady gmres non_auto 20ms 50 .01 uic 3.0 "steady bicgstab non_auto 20ms 50 .01 uic 10.0 "steady cgs non_auto 20ms 50 .01 uic 10.0 .end Figure 5.2 SPICE input file with steady-state commands. V(C3) volts V(C2) volts V(C1) volts V(L1) amps TRANSIENT ANALYSIS 17.87730 GAUSSIAN ELIMIN 17.86965 BICG BICGSTAB CGS GMRES 17.86965 17.86965 17.86965 17.86965 17.75859 17.74859 17.74859 17.74859 17.74859 17.74859 -17.79250 -17.77805 -17.77805 -17.77805 -17.77805 -17.77805 .01765934 .01779229 .01779229 .01779229 .01779229 .01779229 Table 5.1 Steady-state initial conditions for the DC power supply example using different Krylov methods inside the shooting methods. (The transient analysis values are at 1.98s.) Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 93 V(cap 3) vs 20 [ V(cap 2) vs 20r 19 vo <wVyVy -? r£ vo Ita 10f Ita 10 0.5 1 (T 0 0.5 1 1.5 V(cap 1) vs 0.5 1 1.5 second Figure 5.3 Transient analysis plots for the DC power supply circuit. used for obtaining the solution. With a small circuit such as the DC power supply, Krylovsubspace methods do not show significant improvement over the regular Gaussian elimination method to solve the Newton-Raphson iterate. Importantly, the direct, shootingNewton method is much more efficient than standard transient analysis in SPICE. Because of excellent convergence properties of the shooting-Newton method when close to the solution, higher accuracy is easily obtained at little computational cost. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 94 METHOD USED FOR STEADY-STATE DETERMINATION TIME (SECONDS) TO REACH STEADYSTATE Transient analysis Gaussian elimination GMRES CGS BiCG BiCGSTAB .1833 SHOOTINGNEWTON ITERATIONS TOTAL TRANSIENT ANALYSIS CYCLES 67 - .06667 .0500 .06667 .050 .1000 2 19 2 2 19 19 2 2 19 19 T able 5.2 Time to calculate steady-state for DC power supply. The size of the sensitivity matrix is 4 x 4 . V(cap 3) vs Time V(cap 2) vs Time 17.89 18.1 17.88 18 1 : K~ ^ i ; ; ' !_ _ _ _ _ _ i 17.87!--'.- \ ' , \ . v \ V . !~ 1 ' 17.8617.85 i------ \ -t----------- \ I 1 7 .8 4 1 1.96 1.97 1.98 n 17.7 17.6 1.96 1.99 V(cap 1) vs Time 0.025 " '1 c .1 ' _J -30 -4 0 ! 1.96 1.97 " a . 1.98 1.99 2 r- ! i ‘ -20 : l(inductor) vs Time 10 -10 ; 7 - __________ ' \ / i \ - - - i \---U \ J : 1.98 seconds -\\ - V 1.99 - r 1 / \ N A ;i /1 1.97 0.02f 0.015; 1j 1 0.01 1.96 1.97 1.98 seconds 1.99 2 Figure 5.4 Two periods of the transient analysis when the DC supply circuit is in steady-state. The triangle marker indicates the Krylov-subspace solutions. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. F ig u re 5.5 Circuit diagram for the buck converter. 5.2 Buck converter A buck converter, shown in Figure 5.5 is more complex than the DC supply circuit [Trajkovic98, Lau97]. This circuit’s high number of inductors and capacitors make it very difficult to simulate the steady-state response using standard SPICE transient analysis. Buck converters are used in many communications systems, including cellular phones and PDA’s. The buck converter includes thirty-two state variables, capacitors and inductors. Figure 5.6 shows the SPICE output for a GMRES run. Values for two of the more interesting state variables for transient and iterative analysis are given in Table 5.3. The accuracy of the Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 96 V(cf) volts 1(10 amps T R A N S IE N T A N A LY S IS GAUSSIAN E L IM IN A T IO N BICG 3.367924 3.364743 3.364743 3.370084 3.35289 3.350897 0.7085792 0.7030570 0.703057 0.7024051 0.70974 0.702582 BICGSTAB CGS GMRES Table 5.3 Steady-state initial conditions for the buck converter example using different Krylov methods inside the shooting-Newton algorithm. voltages and currents is within .001 percent. Figure 5.7 provides plots of the state variables, the voltage across the capacitor Cf, and the current through the inductor Lf from a transient analysis. It is important to note that the transient responses die off slowly and produce a steady-state response. Figure 5.8 shows two periods of the steady-state waveforms obtained from the transient analysis (the Krylov-subspace results are marked with a triangle). The plots reveal that the shooting-Newton method is as accurate as the transient analysis in SPICE. Table 5.4 illustrates that the shooting-Newton algorithm takes significantly less time to calculate the steady-state than standard transient analysis found in standard SPICE. To solve for the steady-state, transient analysis requires over 2000 cycles, while the Krylov methods use less than 100, GMRES and BiCG being the most efficient. The sensitivity matrix used in the solution of the shooting-Newton iterate was sparse (Figure 4.2) and the eigenvalues are clustered (Figure 4.4). The biorthogonal Krylov methods, CGS, BiCG, and BiCGSTAB, all had breakdowns in their solution and were fixed by restarting the iteration with an initial guess of 0.1, instead of the usual initial guess of zero. Without this adaptation these methods took over twice the amount of time to calculate a solution. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Spice I -> steady gmres non_auto 500ns 100 .01 uic 10.0 Warning: vsq: no DC value, transient time 0 value used Using gmres algorithm. this is the value o f start_time : 0.000000 the ckt has 32 states. The Steady-State has been reached Values o f the state variables in the circuit are the following cf : 3.350897e+00 Volts cx : 3.807514e+00 Volts c:u9:inv:out : 4.999996e+00 Volts c:u8:inv:out : -6.047986e-04 Volts c:u7:b2:out : 1.582466e-06 Volts c:u7:a2:out : 1.956966e-06 Volts c:u7:bl:out : 4.999996e+00 Volts c:u7:al:out : 4.999996e+00 Volts c:u712:inv:out : 4.999997e+00 Volts cout71 : 3.224653e-06 Volts c:u71:inv:out : 3.224653e-06 Volts c:u6:b2:out : 2.999999e+00 Volts c:u6:a2:out : 2.999999e+00 Volts c:u6:bl:out : -9.999978e-01 Volts c:u6:al:out : -9.999978e-01 Volts c:u612:inv:out : 1,689046e-06 Volts cout61 : 4.999998e+00 Volts c:u61:inv:out : 4.999998e+00 Volts c:u5:inv:out : 2.762276e-06 Volts c:u4:inv:out : 4.999125e+00 Volts c5 : 7.278655e-01 Volts c:u3:inv:out : 1.110857e-02 Volts c:u2:b2:out : -9.778298e-01 Volts c:u2:a2:out : 3.99606le+00 Volts c:u2:bl:out : 1.981769e+00 Volts c:u2:al:out : 1.981769e+00 Volts c:ul3:inv:out : 4 .9 7 1984e+00 Volts co u tl2 : 2.398492e-06 Volts c:ul2:inv:out : 2.398492e-06 Volts coutl : 4.999998e+00 Volts c:ul:inv:out : 4.999998e+00 Volts If : 7.165829e-01 Amps The period of waveform: 5.000000e-07 The number o f NR iterations required : 8 Number o f cycles required : 37 CPU time for the steady-state analysis = 19.966667 sec Figure 5.6 Results from GMRES run on buck converter. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. V(cf) vs Time 98 4.5 vol tag 3.5 0 0.5 1.5 1 x 10 l(lf) vs Time se c o n d s x 10 Figure 5.7 Transient analysis plots for the two variables, voltage across the capacitor Cf, and current through the inductor Lf V(cf) vs Time 3 .3 6 5 '/ 3 .3 6 b 0 .5 \ { \ -A/': 3 .3 5 5 l(lf) vs Time , \ f- 0 .5 3 .3 5 b \ 3 .3 4 5 1 1 .4 9 1 .4 9 5 1 .5 - 1 .5 1 1 .4 9 / 1 .4 9 5 seconds 1 .5 x 10 Figure 5.8 Two periods of transient analysis for the buck converter. The Krylovsubspace methods are marked with a triangle. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 99 METHOD USED FOR STEADY-STATE DETERMINATION Transient analysis Gaussian elimination GMRES CGS BiCG BiCGSTAB QMR TIME (SECONDS) TO REACH STEADYSTATE 350.0 73.6500 14.5333 18.1833 18.5000 18.6833 18.3833 SHOOTINGNEWTON ITERATIONS TOTAL TRANSIENT ANALYSIS CYCLES 63 17 17 17 19 17 1750 286 60 60 78 58 59 Table 5.4 Time to reach steady-state for the Buck converter example to .01 decimal places. The initial guess for BiCG, BiCGSTAB, and QMR is not zero. Originally, the Gaussian elimination method did not use pivoting. This resulted in very long simulation times. When pivoting was added the method was able to converge to the steady-state solution. The close values of the times for the GMRES and Gaussian elimination methods may point to the need for a pre-conditioner. GMRES took too many steps to converge. An important problem with the Krylov methods was investigated on the buck converter example. The BiCGSTAB and BiCG algorithm had a breakdown, a divide by zero, during the iterations. The change of initial guess for the iterative method was implemented as described in Chapter 4. Table 5.5 shows the results of the simulation using the Krylov method BiCG for solution inside the shooting-Newton method. The same information is presented in Table 5.6 for BiCGSTAB. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 100 INITIAL GUESS BICG 0.0 0.1 0.01 0.001 0.0001 TIME (SECONDS) TO REACH STEADYSTATE 171.56 9.83 63.05 18.50 79.98 TOTAL TRANSIENT ANALYSIS CYCLES SHOOTINGNEWTON ITERATIONS 100 21 41 17 70 536 54 208 78 310 Table 5.5 Different initial guess for the BiCG method. INITIAL GUESS BICGSTAB 0.0 0.1 0.01 0.001 0.0001 TIME (SECONDS) TO REACH STEADYSTATE 53.28 43.25 16.40 18.68 15.75 SHOOTINGNEWTON ITERATIONS TOTAL TRANSIENT ANALYSIS CYCLES 40 34 15 13 16 189 178 67 83 63 Table 5.6 Different initial guess for the BiCGSTAB method. The data indicates that an initial guess of .001 allows the methods to efficiently converge to solution without the previous breakdown problems. The breakdown in both methods occur because the two residuals that are calculated in BiCG and BiCGSTAB are already orthogonal. This causes the denominator of the line search parameter to take the zero value, and this division by zero causes the breakdown of the method. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 101 Vc c I 7 - I 2 V 1 .—W V so a O '. 35a' u joa Figure 5.9 Class AB amplifier [Meyer89], 5.3 Class AB am plifier To test the validity of the shooting-Newton solutions, a transient analysis was run on a Class AB monolithic power amplifier (Figure 5.9) until the steady-state was reached. The plots from this transient analysis run are shown in Figure 5.10. Table 5.7 presents the values of the capacitor voltage and inductor current from the transient analysis and compares them to the values for the initial values calculated by the shooting-Newton. The shooting-Newton approach of steady-state determination showed accuracy to seven digits. In Table 5.8 the time and iterations required are presented. It took only 1 shooting-Newton iteration with 5 periods of transient analysis to reach the steady-state. The standard transient analysis showed Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 102 V(Cout) volts V(Cin) volts CGS GMRES TRANS ANAL GAUSSIAN ELIMINATION BICG BICGSTAB -7.50480 -7.504807 -7.504807 -7.504807 -7.50480 -7.50480 -8.37212 -8.372127 -8.372127 -8.372127 -8.37212 -8.37212 Table 5.7 Steady-state initial conditions for the Class AB amplifier example using different Krylov methods inside the shooting-Newton algorithm. METHOD USED FOR STEADY-STATE DETERMINATION Transient analysis Gaussian elimination GMRES CGS BiCG BiCGSTAB TIME (SECONDS) TO REACH STEADYSTATE 2125+ 0.41667 0.43333 0.38333 0.43333 0.41667 SHOOTINGNEWTON ITERATIONS - 1 1 1 1 1 TOTAL TRANSIENT ANALYSIS CYCLES 1000+ 5 5 5 5 5 Table 5.8 Summary of Krylov-subspace methods and transient analysis results of steady-state determination for a Class AB amplifier. a response with transients dying off very slowly and required over 1000 periods of transient analysis to reach the same accuracy. The accuracy of the steady-state solution compared to standard transient analysis is shown in Figure 5.10. All of the Krylov-subspace methods showed the same improvements in accuracy. Because the circuit has few state variables, only two, the size of the system of equations created for the shooting-Newton iteration is small ( 2 x 2 ) and more efficiency from the Krylov methods is not expected or seen for such a small system. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 103 Vout(Volts) — solution by transient an a ly sis - solution by shooting-New ton:GE Vout(Volts) x_10 solution by transient analysis solution by shooting-New ton:GM RES 1 !- X -1 Vout(Volts) 2.1 2.11 2 .1 2 2.13 solution by transient an aly sis solution by shooting-NewtonrBiCG ' j 1 0 2.1 2.11 2 .1 2 Time(Sec) 2.13 .•6 x 10 Figure 5.10 Periodic time-domain response of the Class AB circuit. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 104 u» R1 C 3 7 r^ L* LU ! Iln Figure 5.11 Class C amplifier [Colon73]. 5.4 Class C am plifier The Class C amplifier shown in Figure 5.11 contains many state variables, and the convergence behavior of the Krylov-subspace methods can be observed. In SPICE 3f5, the direct steady-state determination using the shooting-Newton method resulted in significant savings in computational time and resources (Table 5.9). In the traditional SPICE transient analysis, the circuit was simulated for over 3000 cycles and still was not in steady-state. The shooting-Newton methods required only 35 cycles of transient analysis and 26 shootingNewton iterations. A further advantage of the shooting-Newton method is that it provides an accurate method of determining the steady-state response (Figure 5.12). The Krylovsubspace methods for solution of the iterate generated by this method showed slightly more efficiency than the regular Gaussian elimination method (Table 5.9). The convergence behavior of the solution within the shooting-Newton algorithm is examined for GMRES, CGS, BiCG, BiCGSTAB, and QMR. Plots for the convergence behavior and the data are provided in Figures 5.13-5.17 and Table 5.10, respectively. The BiCG, BiCGSTAB, and CGS have somewhat erratic convergence with the relative residual norm increasing, instead of decreasing at each step. They are still able to proceed with the next iteration and finally to converge. QMR and GMRES demonstrate monotonic convergence. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 105 METHOD USED FOR STEADY-STATE DETERMINATION Transient analysis Gaussian elimination GMRES CGS BiCG BiCGSTAB QMR TIME (SECONDS) TO REACH STEADYSTATE 0.24233 0.18333 0.11667 0.13333 0.15000 0.15000 0.13333 SHOOTINGNEWTON ITERATIONS TRANSIENT ANALYSIS CYCLES 24 14 14 14 14 14 14 - 4 4 4 4 4 4 Table 5.9 Summary of Krylov-subspace methods and transient analysis results of steady-state determination for a Class C amplifier. The size of the sensitivity matrix is 11x11. BICG BICGSTAB CGS GMRES QMR ITERATION 1.000000e-00 1.000000e-00 1.000000e-00 1.000000e-00 1.000000e-00 1 2 2.946002e-01 2.910341e-01 7.255579e-01 2.827648e-01 7.232270e-01 3 2.677884e-01 5.074055e-02 2.103980e-01 1.989180e-01 4.519170e-01 4 6.66579 le-02 1.473556e-01 6.42575 le-02 4.739450e-02 5.191843e-02 5 1.869852e-01 8.361029e-03 3.061802e-01 4.734953e-02 2.960304e-03 6 7.523472e-03 2.355485e-05 6.346682e-04 9.724925e-04 1.466368e-03 7 5.114317e-05 1.924730e-05 3.709610e-08 7.632758e-07 9.746117e-07 8 1.453166e-08 1.857176e-09 2.569748e-14 1.134104e-09 1.277015e-08 9 4.463113e-10 3.890520e-13 - - Table 5.10 Relative residual for the Class C amplifier. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 1.143564e-09 106 10 — S o lu tio n with tra n s ie n t a n a ly s is - - so lu tio n with s h o o tin g -N e w to n :B iC G S ta b 5 o 0 > •5 1 1.02 1.01 1 .03 10 S o lu tio n with tr a n s ie n t a n a ly s is - - so lu tio n with sh o o tin g -N e w to n :C G S 5 J[ U 0 -5 1 10 1.02 1.01 1 .03 , S o lu tio n with tra n s ie n t a n a ly s is so lu tio n with sh o o tin g -N e w to n :Q M R T im e (S e c ) x 10 Figure 5.12 Periodic time domain response for the Class C amplifier example. Relative Residual 1.E+00 1.E-03 1.E-06 1.E-09 7 1 1 Iteration Count Figure 5.13 Convergence behavior of the residual norm of the BiCG solution algorithm as a function of iteration number. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 107 Relative Residual 1.E+00 1.E -03 1.E -06 1.E -09 1 2 3 4 5 6 7 8 9 10 Iteration Count Figure 5.14 Convergence behavior of the residual norm of the BiCGSTAB solution algorithm as a function of iteration number. Relative Residual 1 .E +00 1.E -03 1.E -06 1 .E-09 1 2 3 4 5 6 7 8 9 10 Iteration Count Figure 5.15 Convergence behavior of the residual norm of the CGS solution algorithm as a function of iteration number. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 108 Relative Residual 1 .E+00 1.E -03 1.E -06 1.E -09 1 2 3 4 5 6 7 8 9 10 Iteration Count Figure 5.16 Convergence behavior of the residual norm of the GMRES solution algorithm as a function of iteration number. Relative Residual 1 .E +00 1.E -03 1.E -06 1.E -09 1 2 3 4 5 6 7 8 9 10 Iteration Count Figure 5.17 Convergence behavior of the residual norm of the QMR solution algorithm as a function of iteration number. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 109 5.5 C om parisons The shooting-Newton method for the determination of the steady-state response for periodic inputs was implemented in SPICE 3f5. The iterate generated by the method could be solved using standard Gaussian elimination, or the Krylov-subspace methods of GMRES, BiCG, QMR, CGS, and BiCGSTAB. This method was shown to be efficient and accurate when analyzing circuits that have difficulty reaching steady-state in the standard transient analysis available in SPICE 3f5. This greater efficiency means that the method provides shorter simulation times and requires significantly less computer storage needs. The Krylov-subspace methods provided even greater computer efficiency. All the Krylov methods are significantly faster than the transient analysis approach of simulation until all the transients have died out. The Krylov-subspace methods give mixed results in terms of efficiency over the standard solution method of Gaussian elimination. The GMRES, QMR and BiCGSTAB algorithms show the fastest convergence to solution. The biorthogonal methods of BiCG and BiCGSTAB must have a non-zero initial guess in the iteration in order to consistently find a solution. This is due to the fact that these methods do not reduce the residual at every step and have breakdowns that occur. The results are summarized in Figure 5.18 and Figure 5.19 for the shooting-Newton method of steady-state determination. Using GMRES, BiCG, QMR, CGS, and BICGSTAB iterative methods produces the expected significant improvement over regular transient analysis determination of steady-state. Using the shooting-Newton algorithm for direct calculation of the steady-state response shows improvement over the transient analysis. For small circuits the gains made by using Krylov-subspace methods over Gaussian elimination are not readily apparent. The larger the circuit, however, the more benefit derived. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 110 Power Supply Buck Converter Class AB amp Class C amp Figure 5.18 Comparison of the shooting-Newton method to standard transient analysis in SPICE for determination o f steady-state (G.E. is Gaussian Elimination). 1.6 Power Supply Buck Converter Class AB amp Class C amp Figure 5.19 Comparison o f Krylov methods to Gaussian elimination for determination to the steady-state. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Ill Chapter 6 Summary This dissertation presented an improved and expanded method for determining the steady-state response of analog RF and microwave integrated circuits in the time-domain. The approach, which is more efficient than previous methods, used the standard shootingNewton algorithm to directly determine the steady-state response of the circuit with a periodic input. It is implemented by adding a steady-state analysis option in the popular circuit simulator SPICE. The method also involves the implementation of different Krylovsubspace algorithms that can be used to solve the resulting iterative equations generated by this method. Importantly, the steady-state solver in SPICE is freely available to the general public. The improved steady-state method was developed to efficiently determine the steadystate response for specific analog and microwave circuits that have difficulty with standard methods presently available. This class of circuits includes strongly nonlinear communication circuits. Standard simulators, such as SPICE, used transient analysis to Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 112 determine the steady-state response by simulating until all the transients died out. Unfortunately, for the type of circuits used in communication devices, this takes much too long for a detailed analysis to be made. Harmonic balance methods are widely used for steady-state determination but are not optimal for strongly nonlinear circuits. This dissertation found the shooting-Newton method a good choice for steady-state determination. The shooting-Newton algorithm is a direct approach that was used to determine the steady-state of integrated circuits. It posed the formulation of the steady-state response as a two-point boundary value problem. The solution of the two-point boundary value problem was the set of capacitor voltages and inductor currents that when used as initial conditions for the circuit being analyzed resulted in the steady-state response. The calculation involved the solution of an iterative equation that was generated by solution of the nonlinear system, using the Newton-Raphson approach. Historically, this system has been solved by methods such as Gaussian elimination, which can use as much as 0 ( n 2) computer storage and 0 ( n 3) computation cost. Therefore, this limited the method to be used on relatively small circuits. Implementation of the Krylov-subspace methods was found to produce greater computational efficiency and the ability and to simulate larger, more complex circuits. Readily available Krylov-subspace algorithms were used to compute the solution so as to take advantage of the particular matrix being solved. Reusing of the structure of the matrix was also implemented to take advantage of any special structure that was present. The implementation of the Krylov-subspace methods and with special structure can reduce the storage to O(n) and the computation costs to 0 ( n 2) or even O(n). The shooting-Newton method along with the other aspects of the approach was implemented and tested using the most popular analog circuit simulator, SPICE. Quantities Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. computed during the transient analysis in SPICE were reused in the calculation of the direct determination of the steady-state response. The shooting-Newton method was efficient for strongly nonlinear circuits that were often encountered in RE systems. Results from simulations were compared with standard transient simulation, standard shooting-Newton with Gaussian elimination, and shooting-Newton with solution by the Krylov-subspace methods. The results showed that the shooting-Newton method has considerable time and computation savings compared to the transient analysis method in standard SPICE. Further, the results showed that the Krylov-subspace methods are usually more efficient than Gaussian elimination solution. The convergence behavior of the different Krylov methods BiCG, CGS, BiCGSTAB, GMRES, QMR, and GMRES were presented along with matrix characteristics. The sensitivity matrix was shown to be well conditioned and needed few Krylov-subspace iterations to reach solution. This dissertation presented a more efficient and expanded approach for finding the steady-state solution of certain classes of circuits, including self-biasing amplifiers, narrow band filters, circuits with high Qs, and lightly damped circuits with long time constants. However, a drawback of the method is that it cannot include transmission lines, which are critical for microwave and RF applications. The inclusion of transmission lines to this method is a significant direction for future work. Another important direction would be the investigation into the structure and numerical properties of the Jacobian, sensitivity, matrix used during the shooting-Newton method. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 114 Appendix A SPICE Input files Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 115 DC power supply from skelboe's paper vin 1 0 sin(0 20 50) d l 1 2 modi .model modi d (is=le-16 cjo= 2 pf) c l 1 2 luF c2 3 0 ImF c3 4 0 ImF II 3 4 0 .1 H rl 2 3 5 r2 4 0 Ik .options reltol=le-9 .tran 1ms 2 *my attemps *plot (4) *steady horn non_auto 20ms 50 le-3 duic le-2 *plot v(4) *commands for steady-state analysis: *steady act non_auto 20ms 50 .01 uic 10.0 *steady ext non_auto 20ms 50 .01 uic 3.0 *steady tran non_auto 20ms 50 .01 uic 10.0 .end Figure A .l SPICE input file for the DC Power Supply Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 116 Wai’s buck converter with MOSFET switch models * This model does not have the vco * Period of input is 500ns .model .model .model .model .model .model .model dd npn NPN (TR=l0p TF=10p) pnp PNP (TR=10p TF=10p) swml SW (VT=2.5 RON=lg ROFF=lu) swm2 SW (VT=2.5 RON=lu ROFF=lg) swml2 SW (VT=0.995 RON=lg ROFF=lu) nmos NMOS (LEVEL= 1 VTO=0.2 KP=1.20) .subckt inv in out xinv in out 0 nandl .ends inv .subckt nandl in outp outn cout outp outn .Olp rout outp outn 1 .0 k b_gn outn outp 1=0.0025-0.0015929*atan(300*v(in)-750) .ends nandl .subckt andl in outp outn cout outp outn .Olp rout outp outn 1.0 k b_gn outn outp 1=0.0025+0.0015929*atan(300*v(in)-750) .ends andl .subckt nand2 ina inb out vdd vdd 0 dc 5 xal ina vdd out andl xbl inb vdd out andl xa2 ina out nab nandl xb 2 inb nab 0 nandl .ends nand2 vdd vdd 0 dc 5 vsq sq 0 pulse(0 5 0 2ns 2ns 248ns 500ns) ** narrow pulse gen xul sq outl inv coutl outl 0 5p x u l 2 outl o u tl 2 inv coutl2 outl2 0 5p xul3 outl2 outl3 inv xu2 sq out 13 out2 nand2 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 117 * 175ns 1-shot xu3 out2 out3 inv q5 vdd out3 e5 npn r5 out3 e5 253.6k c5 e5 0 lp xu4 e5 out4 inv xu5 out4 out5 inv * deadtime controllers xu61 out5 out61 inv cout61 out61 0 6 p xu612 out61 out612 inv xu6 out612 out5 out6 nand2 xu71 out4out71 inv cout71 out71 0 16p xu712 out71 out712 inv xu7 out712 out4 out7 nand2 xu 8 out6 Is inv xu9 out7 hs inv * buck converter m_gsw3 vin hs x x nmos L=1 W=1 dsw3 x vin d m_gsw4 x Is 0 0 nmos L=1 W=1 dsw4 0 x d cx x 0 400p If x out 292n cf out cf lOu ref cf 0 10 m rl 1 out 0 1k vin vin 0 dc 8 .tran In lOu .end Figure A.2 SPICE input file for the Buck convertor Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 118 * A DC to 1GHz class AB monolithic power amplifier * designed by R.G. Meyer, U.C. Berkeley * Sept. 1988 * input = 101 * output = 1 0 0 * vcc = 99 * * R. G. Meyer and W. D. Mack. * A wide-band class AB monolithic power amplifier". * EEEE Journal of Solid-State Circuits, vol. 24, no. 1, Feb 1989. .opt acct lvltim=2v it 15=5000 .width out=80 .model + + + + + + nb 2 2 0 ew npn is=3.38e-17 bf=205 nf=0.978 vaf=22 ikf=2.05e-2 ise=0 ne=1.5 br=62 nr=l var=2.2 isc=0 nc=1.5 rb=l 15 rbm=28.3 re=l rc=30.5 cje=1.08e-13 vje=.995 mje=0.46 tf= le-l 1 xtf=l itf=1.5e-2 ptf=0 cjc=2.2e-13 vjc=0.42 mjc=0.22 xcjc=0.1 tr=4e-10 cjs=l.29e-l3 vjs=0.65 mjs=0.31 xtb=1.5 eg= 1.232 xti=2.148 fc=0.875 * irb=4.8e-5 * Sources Vin 102 Rin 101 Rout 100 Vpos 99 0 102 0 0 sin 0 1.0 lOOMegHz 50 50 9.5 * input and output matching Cin 101 1 luF Cout 100 4 luF 4 R 3 35 R4 2 3 525 * Level shifter, input R13 1 5 Q18 5 5 Q17 6 6 R12 7 0 R14 I 8 Q16 8 7 stage, lower path 35 6 0 nb2 2 0 ew 7 0 nb220ew 2k 35 0 0 nb 2 2 0 ew * amplifier, input stage, lower path Q ll 99 99 15 0 nb220ew Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 119 Q12 Q13 Q14 Q15 R ll Q4 R6 15 14 13 15 14 13 12 11 10 12 10 8 0 9 14 13 12 11 40 9 60 0 0 0 0 nb220ew nb220ew nb220ew nb220ew 0 nb2 2 0 ew *amplifier, output stage, lower path nb2 2 0 ew Q2 9 26 3 0 0 R2 4.5 26 * amplifier, imput stage, upper path 2 R3 1 35 24 R5 99 400 2 nb220ew 3 24 23 0 Q3 nb 2 2 0 ew 23 0 23 0 Q5 * amplifier, output stage, upper path nb2 2 0 ew 24 25 99 0 Qi 2 R15 3 25 * bias RIO Q10 Q9 R9 Q8 Q7 R8 R16 Q6 R7 network 16 99 16 16 17 17 21 99 21 18 20 18 0 19 20 2 22 0 20 0 3 3 3 3 6 6 6 2 .8 k 17 18 9k 0 0 nb2 2 0 ew nb2 2 0 ew 20 0 0 nb2 2 0 ew nb2 2 0 ew 0 nb2 2 0 ew 19 240 3.4k 22 80 .op .tran .Ins 1 00 ns .plot tran v( 1 0 0 ) v( 1 0 1 ) v( 102 ) .end Figure A.3 SPICE input file for the AB amplifier Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 120 class C amplifier from trick's paper rl 1 0 50 r2 6 0 50 cl 1 2 c2 2 0 c3 3 0 c4 4 0 c5 5 0 c6 5 6 c7 7 0 lOOpf lOpf lOpf lOpf lOpf lOOpf lOOpf 18 2 3 0.025uh 19 3 0 1.2uh 110 4 5 ,025uh 1115 7 1.2uh q l 4 0 3 modi .model modi NPN(IS=le-16 BF=80 RB=100 VA=50 tf=.lns) vdc 7 0 30v fin 1 0 sin(0 0.1 lOOMEGHz) .options reltoI=le-9 ♦plot v(6 ) ♦commands for steady-state analysis: ♦steady act non_auto .01 us 50 .01 uic 10.0 ♦steady ext non_auto .01 us 50 .01 uic 3.0 ♦steady tran non_auto .01 us 50 .01 duic 10.0 .end Figure A.4 SPICE input file for the Class C amplifier Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 121 Bibliography [Antognetti84] P. Antognetti, D. O. Pederson, and H. de Man editors. Computer Design Aids for VLSI Circuits. Martinus Nijhoff Publishers, 1984, pp. 29-112. [Aprille72a] Thomas J. Aprille, Jr. and Timothy N. Trick. Steady-state analysis of nonlinear circuits with periodic inputs. Proceedings of the IEEE, vol. 60, no. 1, January 1972, pp. 108—114. [Aprille72b] Thomas J. Aprille, Jr. and Timothy N. Trick. A computer algorithm to determine the steady-state response of nonlinear oscillators. EEEE Transactions on Circuit Theory, vol. CT-19, no. 4, July 1972, pp. 354-360. [Ashar89] P. N. Ashar. Implementation of algorithms for the periodic steady-state analysis of nonlinear circuits, Masters Thesis, University of California Berkeley, March 1989. [Bailey 6 8 ] Paul B. Bailey, Lawrence F. Shampine, and Paul E. Waltman. Nonlinear Two Point Boundary Value Problems. Academic Press, 1968. [Banzhaf89] Walter Banzhaf. Computer-Aided Circuit Analysis Using Spice. Prentice Hall Press, 1989. [BarmadaOO] Sami Barmada and Marco Raugi.Transient Numerical Solutions of Nonuniform MTL Equations with Nonlinear Loads by Wavelet Expansion in Time or Space Domain. IEEE Transactions on Circuits and Systems. Vol. 47, No. 8 . August 2000. [Barrett94] Richard Barrett, Michael Berry, Tony F. Chan, James Demmel, June Donato, Jack Dongarra, Victor Eijkhout, Roldan Pozo, Charles Romine, Henk van der Vorst. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods. Society for Industrial and Applied Mathematics, 1994. [Borchers96] Carsten Borchers, Lars Hedrich, and Erich Barke. Equation-Based Behavioral Model Generation for Nonlinear Analog Circuits. University of Hanover, Germany, 1996. [Brachtendorf95a] H. G. Brachtendorf, G. W elsch, and R. Laur. A simulation tool for the analysis and verification of the steady state of circuit designs. In International Journal of Circuit Theory and Applications, vol. 23. John Wiley & Sons, 1995, pp. 311-323. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 122 [Brachtendorf95b] H. G. Brachtendorf, G. Welsch, and R. Laur. Simulation of the steady-state of circuits using an expansion technique and Krylov subspace methods for solving the resulting linear indefinite systems. In SAMS, vol. 18-19. Overseas Publishers Association, 1995, pp. 603-606. [BrambillaOl] Angelo and Dario D'Amore. M ethod for steady-state simulation of strongly nonlinear circuits in the time domain. EEEE Transactions on circuits and Systems— I: Fundamental Theory and Applications, vol. 48, no. 7, July 2001, pp. 885-889. [Brown90] Peter N. Brown and Youcef Saad. Hybrid Krylov methods for nonlinear systems of equations. SIAM J. Sci Stat Comput, Vol. 11, no. 3, pp. 450-481, May 1990. [Buch94] Premal Buch, Shen Lin, Vijay Nagasamy, and Ernest S. Kuh. Techniques for Fast Circuit Simulation Applied to Power Estimation of CMOS Circuits. Electronics Research Laboratory, University of California, Berkeley 1994 [Bucker94] Martin Bucker and Achim Baserman. A Comparison of QMR, CGS and TFQMR on a Distributed Memory Machine. Central Institute for Applied Mathematics, Julich, 1994 [Chua75] Leon O. Chua and Pen-Min Lin. Com puter-Aided Analysis of Electronic Circuits: Algorithms & Computational Techniques. Prentice Hall Press, 1975. [Colon73] Francis Robert Colon and Timothy N. Trick. Fast periodic steadystate analysis for large-signal electronic circuits. IEEE Journal of Solid-State Circuits, vol. SC- 8 , no. 4, August 1973, pp. 260-269. [D'Amore91] Dario D'Amore and Mauro Santomauro. Steady state analysis of non-linear circuits using spice. IEEE International Symposium on Circuits and Systems, vol. 5, 1991, pp. 2733-2736. [D'Amore93] Dario D'Amore and William Fom aciari. A SPICE-based approach to steady state circuit analysis. In International Journal of circuit Theory and Applications, vol. 21, pp. 437-442. John Wiley & Sons, Ltd., 1993. [D'Amore95] Dario D'Amore and Margherita Pillan. Efficient steady state simulation of nonlinear integrated circuits. In Proc. Midwest Symposium on Circuits and Systems, pp. 724—727. IEEE Circuits and System Society, 1995. [Demmel97] James W. Demmel, Applied Numerical Linear Algebra. Society for Industrial & Applied Mathematics, 1997. [Dongarra96] Jack Dongarra, Andrew Lumsdaine, Roldan Pozo, and Karin A. Remington. Iterative Methods Library IML++, version 1.2, April 1996. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 123 [D uff90] Iain S. Duff and David K. Kahaner. Two Japanese Approaches to Circuit Simulation. Asian Science and Technology, 1990. [Feldmann96] Peter Feldmann, Bob Melville, David Long. Efficient frequency domain analysis of large nonlinear analog circuits. IEEE 1996 Custom Integrated Circuits Conference, pp. 461-464. [Filicori 8 8 ] Fabio Filicori and Vito A. Monaco. Computer-aided design of non-linear microwave circuits. In Alta Frequenza, vol. LVII, no. 7, September 1988, pp. 355—378. [FIueck96] Alexander J. Flueck and Hsiao-Dong Chiang. Solving the nonlinear power flow equations with a Newton process and GMRES. Proceedings of the IEEE International Symposium on Circuits and Systems, 1996, pp. 657-660. [Freund92] Roland W. Freund, Gene H. Golub, and Noel M. Nachtigal. Iterative solution of linear systems. In Acta Numerica, 1992, pp. 1—44. [GadOO] Emad Gad, Roni Khazaka, Michel S. Nakhla, and Richdard Griffith. A Circuit Reduction Technique for Finding the SteadyState Solution of Nonlinear Circuits. IEEE Transactions on Microwave Theory and Techniques, Vol. 48, No. 12, December 2000 . [Gilmore91a] Rowan J. Gilmore and Michael B. Steer. Nonlinear circuit analysis using the method of harmonic balance— a review of the art. Part I. Introductory concepts. In International Journal of Microwave and Millimeter-W ave Computer-Aided Engineering, vol. 1, no. I, pp. 22—37. John Wiley & Sons, Inc., 1991. [Gilmore91b] Rowan J. Gilmore and Michael B. Steer. Nonlinear circuit analysis using the method of harmonic balance— a review of the art. Part II. Advanced concepts. In International Journal of Microwave and Millimeter-W ave Computer-Aided Engineering, vol. 1, no. 2, pp. 159—180. John Wiley & Sons, Inc., 1991. [GouraryOO] M. M. Gourary, S. G. Rusakov, S. L. Ulyanov, M. M. Zharov, S. J. Hamm, and B. J. Mulvaney. Compuational method of stability investigation for large analog circuits. IEEE International Symposium on Circuits and Systems, May 28-31, 2000, pp. II168-11-171. [Gourary97] M. M. Gourary, S. G. Rusakov, S. L. Ulyanov, M. M. Zharov, K. K. Gullapalli, and B. J. Mulvaney. Iterative solution of linear systems in harmonic balance analysis. IEEE MTT-S Digest, 1997, pp. 1507-1510. [Greenbaum97] Anne Greenbaum. Iterative Methods for Solving Linear Systems. Society for Industrial and Applied Mathematics, 1997. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 124 [Guglielmi96] Nicola Guglielmi. Inexact Newton methods for the steady state analysis of nonlinear cirucits. In Mathematical Models and Methods in Applied Sciences, vol. 6 , no. 1. World Scientific Publishing Company, 1996, pp. 43-57. [Gunupudi98?] Pavan K. Gunupudi and Michel S. Nakhla. Model-reduction of nonlinear circuits using Krylov-space techniques. [Gupta81] K.C. Gupta, Ramesh Garg and Rakesh Chadha. Computer-Aided Design of Microwave Circuits. Artech House, 1981 [Hanselman98] Duane Hanselman and Bruce Littlefield. Mastering Matlab 5: A Comprehensive Tutorial and Reference. New Jersey, 1998. Michael T. Heath. Scientific Computing: An Introductory Survey. McGraw-Hill, 1997. Boston. [Heath97] [Ipsen98] Ilse C. F. Ipsen and Carl D. Meyer. The idea behind Krylov methods. The American Mathematical Monthly, vol. 105, no. 10, December 1998, pp. 889-899. [ItohOO] Tatsuo Itoh. Full Wave Time-Domain CAD Tools for Distributed Non-Linear Microwave Circuits. Final Report for MICRO Project 99-053. 2000 [Kelley95] C. T. Kelley. Iterative Methods for Linear and Nonlinear Equations. Society for Industrial and Applied Mathematics, 1995. [KleinerOO] Madeline A. Kleiner and Mohammed N. Afsar. Determining the steady-state responses in RF circuits using GMRES, CGS, and BiCGSTAB solution in sSPICE for Linux. IEEE MTT-S Int. Microwave Symposium Dig., June 2000, pp. 87-90. [KleinerOl] Madeline A. Kleiner and Mohammed N. Afsar. Steady-state determination for RF circuits using Krylov-subspace methods in SPICE. 2001. [Kundert 8 8 ] Kenneth S. Kundert and Alberto Sangiovanni-Vincentelli. Finding the steady-state response of analog and microwave circuits. Proc. IEEE Custom Integrated Circuits Conference, vol. LVII, no. 7, September 1988, pp. 379-387. [Kundert90] Kenneth S. Kundert, Jacob K. White, and Alberto SangiovanniVincentelli. Steady-State Methods for Simulating Analog and Microwave Circuits. Kluwer Academic Publishers, 1990. [Kundert95] Kenneth S. Kundert. The Designer’s Guide to Spice & Spectre. Kluwer Academic Publishers, 1995. [Kundert97] Ken Kundert. Simulation methods for RF integrated circuits. Proc. 1997 IEEE/ACM International Conference on ComputerAided Design, November 1997, pp. 752-765. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 125 [K undert99] Kenneth S. Kundert. Introduction to RF sim ulation and its application. IEEE Journal of Solid-State Circuits, vol. 34, no. 9, September 1999, pp. 1298-1319. [Larcheveque96] R. Larcheveque and E. Ngova. Compressed transient analysis speeds up the periodic steady state analysis of nonlinear microwave circuits. 1996 EEEE MTT-S Digest, pp. 1369-1372. [Lau97] W. Lau. An integrated controller for a high frequency buck converter. M aster’s thesis, University of California at Berkeley, 1997. [Maas99] Stephan A. Maas. An Essay on Nonlinear Analysis in RF Design. Applied Wave Research, Inc. 1999 [Maliniak95] Lisa Maliniak. RF simulator overcomes speed, capacity obstacles. Penton Publishing, Inc., 1995. [McCalla71] William J. McCalla and Donald O. Pederson. Elements of computer-aided circuit analysis. EEEE Transactions on Circuit Theory, vol. CT-18, no. 1, January 1971, pp. 14—26. [Meyer89] Robert G. Meyer, William D. Mack. A wide-band class AB monolithic power amplifier, EEEE Journal of Solid-State Circuits, vol. 24, no. 1, Feb 1989, pp. 7-12. [Nagel71] Laurence Nagel and Ronald Rohrer. Com puter analysis of nonlinear circuits, excluding radiation (CANCER). EEEE Journal of Solid-State Circuits, August 1971, pp. 166-181. [Nagel75] Laurence Nagel. SPICE2: A Computer Program to Simulate Semiconductor Circuits. University of California, Berkeley, May 1975. Available through Electronics Research Laboratory Publications, U. C. B., 94720, Memorandum No. UCB/ERL M75/520. [Nichols94] K. G. Nichols, T. J. Kazmierski, M. Zwolinski, and A. D. Brown. Overview of SPICE-like circuit simulation algorithms. In EEE Proc.-Circuits Devices Syst., vol. 141, no. 4, August 1994. Institution of Electrical Engineers, pp. 242-250. [Ogrodzki] Jan Ogrodzki. Circuit Simulation Methods and Algorithms.CRC Press, 1994. [Part-Enander96] Eva Part-Enander et al. The MATLAB Handbook. AddisonWesley, 1996. [Pederson84] Donald O. Pederson. A historical review of circuit simulation. IEEE Transactions on circuits and Systems, vol. CAS-31, no. 1, January 1984, pp. 103-111. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 126 [Press92] W illiam H. Press, Brian P. Flannery, Saul A. Teukolsky, William T. Vetterling. Numerical Recipes in C: The Art o f Scientific Computing. Cambridge University Press, 1992. [Quarles89a] Thomas L. Quarles. Analysis of Performance and Convergence Issues for Circuit Simulation. Ph D dissertation, University of California, Berkeley, April 1989. Available through Electronics Research Laboratory Publications, U. C. B., 94720, Memorandum No. UCB/ERL M89/42. [Quarles89b] Thomas L. Quarles. The front end to simulator interface. Part of Ph D dissertation Analysis of Performance and convergence Issues for Circuit Simulation, University of California, Berkeley, April 1989. [Quarles89c] Thomas L. Quarles. The SPICE3 implementation guide. Part of Ph D dissertation Analysis of Performance and convergence Issues for Circuit Simulation, University of California, Berkeley, April 1989. [Quarles89d] Thomas L. Quarles. Adding devices to SPICE3. Part of Ph D dissertation Analysis of Performance and convergence Issues for Circuit Simulation, University of California, Berkeley, April 1989. [Quarles89e] Thomas L. Quarles. SPICE3 version 3C1 users guide. Part of Ph D dissertation Analysis of Performance and convergence Issues for Circuit Simulation, University of California, Berkeley, April 1989. [Quarles89f] Thomas L. Quarles. Benchmark circuits: Results for SPICE3. Part of Ph D dissertation Analysis of Performance and convergence Issues for Circuit Simulation, University of California, Berkeley, April 1989. [Razavi98] Behzad Razavi. RF IC design challenges. In proc. Design Automation Conference 1998, pp. 408-413. Association of Computing Machinery. [Rheinboldt98] W erner C. Rheinboldt. Methods for Solving Systems of Nonlinear Equations. Society for Industrial and Applied Mathematics, Philadelphia, 1998. [Rodrigues98] Paulo J. C. Rodrigues. Computer-Aided Analysis of Nonlinear Microwave Circuits. Artech House, Inc., 1998. [Saad81] Y. Saad. Krylov subspace methods for solving large unsymmetric linear systems. In Mathematics of Computation, vol. 37, no. 155, July 1981. American Mathematical Society, pp. 105—126. [Saad96] Yousef Saad. Iterative Methods for Sparse Linear Systems. PWS Publishing Co., 1996. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 127 [Saad99] Yousef Saad and Henk A. van der Vorst. Iterative solution of linear systems in the 20-th Century. University of Minnesota Supercomputing Institute Research Report UMSI 99/152, September 1999. [Sangiovanni98] Alberto L. Sangiovanni-Vincentelli. Design and verification of analog and RF integrated circuits. Proc. Final Report 1998-99 for MICRO Project 98-134. [Simon93] C. Simon, M. Sadkane, and S. Mottet. Newton-GMRES method for coupled nonlinear systems arising in semiconductor device simulation. Simulation of Semiconductor Devices and Processes, vol. 5, September 1993, pp. 81-84. [Skelboe80] Stig Skelboe. Computation of the periodic steady-state response of nonlinear networks by extrapolation methods. EEEE Transactions on Circuits and Systems, vol. CAS-27, no. 3, March 1980, pp. 161-175. [Skewchuk94] Jonathan Richard Skewchuk. An Introduction to the Conjugate Gradient Method Without the Agonizing Pain. CMU-CS-94-125. [Sleijpen93] Gerard L. G. Sleijpen and Diederik R. Fokkema. Bi-CGSTAB(L) for linear equations involving unsymmetric matrices with complex spectrum. In Electronic Transactions on Numerical Analysis. Kent State University, vol. 1, September 1993, pp. 11-32. [Strang76] Gilbert Strang. Linear Algebra and Its Applications. Academic Press, 1976. [Steer91a] Michael B. Steer. Simulation of nonlinear microwave circuits — an historical perspective and comparisons. IEEE MTT-S Digest, 1991, pp. 599-602. [Steer91b] Michael B. Steer, Chao-Ren Chang, and George W. Rhyne. Computer-aided analysis of nonlinear microwave circuits using frequency-domain nonlinear analysis techniques: The state of the art. In International Journal of Microwave and Millimeter-Wave computer-Aided Engineering, vol. 1, no. 2, pp. 181—200. John Wiley & Sons, Inc., 1991. [Stewart98] G.W. Stewart. Aftemotes goes to Graduate School: Lectures on Advanced Numerical Analysis. Society for Industrial and Applied Mathematics, 1998. [Telichevesky95] Ricardo Telichevesky, Kenneth S. Kundert, and Jacob K. White. Efficient steady-state analysis based on matrix-free Krylovsubspace methods. Proc. 1995 Design Automation Conference, June, 1995. Association of Computing Machinery, pp. 480^4-84. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 128 [Telichevesky96a] R. Telichevesky, K. Kundert, I. Elfadel, and J. W hite. Fast Simulation Algorithms for RF Circuits. Proceedings of EEEE Custom Integrated Circuits Conference, May 1996, pp. 437—444. [Telichevesky96b] Ricardo Telichevesky, Ken Kundert, and Jacob White. Receiver characterization using periodic small-signal analysis. Proc. IEEE Custom Integrated Circuits Conference, 1996, pp. 449-452. [Telichevesky96c] Ricardo Telichevesky, Ken Kundert, and Jacob White. Efficient AC and noise analysis of two-tone RF circuits. In proc. Design Automation Conference, Association of Computing Machinery, June 1996, pp. 292-297. [Trajkovic98] Ljiljana Trajkovic, Eula Fung, and Seth Sanders. HomSPICE: simulator with homotopy algorithms for finding DC and steadystate solutions of nonlinear circuits. Proc. IEEE Int. Symposium on Circuits and Systems, June 1998. [Trefethen97] Lloyd N. Trefethen and David Bau. Numerical Linear Algebra. SIAM, 1997. [Trick75] Timothy N. Trick and Francis Robert Colon. Computation of capacitor voltage and inductor current sensitivities with respect to initial conditions for the steady-state analysis of nonlinear periodic circuits. IEEE Transactions on Circuits and Systems, vol. CAS-22, no. 5, May 1975, pp. 391-396. [Valtonen94] Martti Valtonen. Microwave Circuit Simulation Using HigherOrder Dynamic Elements. Circuit Theory Laboratory Report CT20, October 1994 [van der Vorst92] H. A. van der Vorst. Bi-CGSTAB: a fast and smoothly converging variant of Bi-CG for the solution of nonsymmetric linear systems. SIAM Journal on Scientific and Statistical Computing, vol. 13, no. 2, March 1992, pp. 631-644. [Vladimirescu94] Andrei Vladimirescu. The Spice Book. John Wiley and Sons, 1994. [Vuik92] C. Vuik and H. A. van der Vorst. A comparison of some GMRESlike methods. In Linear Algebra and its Applications. Elsevier Science Publishing Co., Inc., 1992, pp 131-162. [YangOOa] Baolin Yang and Joel Phillips. A multi-interval Chebyshev collocation method for efficient high-accuracy RF circuit simulation. In Design Automation Conference 2000, Los Angeles, CA. Association for Computing Machinery, 2000, pp. 178-183. [YangOOb] Baolin Yang and Dan Feng. Efficient finite-difference method for quasi-periodic steady-state and small signal analyses. EEEE, 2000, pp. 272-276. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 129 Published Papers Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Steady-State Determination for RF Circuits Using Krylov-subspace Methods in SPICE Madeline A. Kleiner and Mohammed N. Afsar Department of Electrical Engineering and Computer Science Tufts University. Medford. MA 02155 Abstract — A SPICE-bascd direct shooting-Newton method for the determ ination o f the steady-state response of RF circuits has been developed. Different Krylov-subspace methods, including G.MRES, CC S, BiCG, QM R, and BiCGSTAB, w ere used to solve the iterative equations generated by the shooting-Ncwton algorithm. T he publicdomain circuit sim ulator, SPIC E, was used for the implementation o f the new steady-state analysis. Com pared to standard transient analysis for the determination o f the steady-state response for non-linear circuits encountered in RF design, this new method is much more efficient. RF circuits that a re difficult to sim ulate arc evaluated. For larg er circuits, the G M RES, Q M R , and BiCGSTAB algorithm s show the most im provem ent in the time to calculate the steady-state. I. generated by the steady-state response determination with much greater efficiency. The Krylov-subspace method o f GMRES has been implemented with the direct shootingNewton method [6]. However, it is not implemented in public-domain software, and does not include other Krylov-subspace methods for comparison. There are two parts to this new method. First, the direct approach to steady-state determination is implemented in SPICE, second, several different Krylov-subspace methods, including BiCG and QMR are used to solve the iterate generated by the direct method [5]. Examples from microwave circuits are used to illustrate the new method. II. STEADY-STATE In tro d u c tio n Given today's increasing demand for communication devices, including cellular telephones and other cordless devices, more effective methods are needed to examine integrated circuits that make up these devices in the steady-state mode. Quantities such as power, distortion, and noise are evaluated in the steady-state and need to be studied in detail [1], Standard simulators, such as SPICE, can use transient analysis to determine the steady-state response by simulating until all the transients have died out. Unfortunately, for the type o f circuits used in communication devices, this takes much too long for a detailed analysis to be made. Harmonic balance methods are not optimal for strongly non-linear circuits that we need to evaluate. Direct time-domain methods are a good choice for steady-state determination [2]-[3]. For SPICE-based circuit simulators, the time-domain shooting-Newton method is relatively easy to implement [4]-[5]. The direct shooting-Newton method o f steadystate determination was proposed by Aprille and Tricke [2], and when implemented in an older version o f SPICE [4], used the traditional Gaussian elimination to solve the iterate. Because o f the computation costs, this limited the use o f the algorithm to relatively small circuits. The newer Krylov-subspace methods can solve these equations DETERMINATION The shooting-Newton direct method o f steady-state determination for circuits with periodic input was implemented in SPICE 3f5. In addition, the Krylovsubspace methods were also implemented in SPICE 3f5 for the solution of the iterate generated by that method. Direct methods, such as the shooting-Newton method presented here, were used to find the initial state needed to put the circuit directly in steady-state [2], If the circuit equations are represented as the system: x = f ( x, / ) - (1) where x and f are n vectors, the vector f is periodic in time, t, and has a period o f T. A constraint for achieving steadystate is that the transient effects have died off. This is represented by: x(0) = x( T) (2) In other words, the solution at the end o f the period is the same as the condition at the beginning o f the period. This means that the circuit is in steady-state. The state transition function can be used to define the two-point boundary value problem, thus: x(O )-0(x(/o ),/0 ,7~) = 0 0-7803-6540-2/01/S10.00 (C) 2001 IEEE Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. (3) where § is the state transition function. The state transition function was implicitly derived; it was calculated at each timepoint until the end o f the period. It is dependent on the initial state, Xo, the period o f the response, T, and the starting time, to- We applied the shooting-Newton method to solve the boundary value problem that results in the following iteration: =x 0(/)* -[l-J .H x 0* -0(x('oMo.n].(4) where k is the iteration index and J 0 is the sensitivity matrix represented by: (5) J 0 = -— <P(xV0 ) ,t 0 .T ) ax The sensitivity matrix was computed at the same time as the state transition function. Quantities needed for the calculation o f the sensitivity matrix were already available at each timepoint from the transient analysis. The forming o f the sensitivity matrix is computationally expensive. The iterate was solved and, using a user-defined limit, was considered converged. If not, the circuit was resimulated and another initial guess was used [2]-[4], This process was continued until the steady-state was reached. The shooting-Newton method computed a set o f capacitor voltages and inductor currents for the circuit so that if these voltages and currents are used as the initial conditions for the transient analysis, the circuit is directly in steady-state. The accuracy o f the direct steady-state determination and the accuracy o f the Krylov-subspace methods are shown in Fig. 1. The circuit simulated is a DC to 1 GHz class AB amplifier [7], This circuit is difficult to simulate because the bias circuitry transients take a long time to settle out. II. KRYLOV-SUBSPACE METHODS The Krylov-subspace methods o f GMRES, BiCG, QMR, CGS, and BiCGSTAB were incorporated into SPICE 3f5 for the solution of the linear system of equations generated by the shooting-Newton method to determine the steady-state response [8]. These methods are for general matrices, including the type that is generated by the shooting-Newton iteration (non-symmetric) [8]. In the following subsections, equation (4) was solved. The matrix being referred to is the sensitivity matrix J 0 . The convergence figures were generated using a common-base Class C amplifier [9] operated at microwave frequencies. Appropriate changes were made to the original circuit for microwave operation. The circuit contained a total o f 11 capacitors and inductors. The residual error was calculated within the Krylov-subspace method. The circuit was difficult to simulate using transient analysis because the biasing circuitry takes thousands o f cycles for the transients to settle out. A. G M R E S The Generalized Minimal Residual (GMRES) method generated a sequence o f orthogonal vectors. The vectors were generated using a special method for Krylovsubspaces called the Amoldi method. It used these vectors to do a least squares solution. One drawback o f this method is that all the orthogonal vectors must be stored. So for large circuits, this storage need could be very large. It used the actual matrix and not its transpose for solution. The convergence behavior for the Class C amplifier is shown in Fig. 2. E o 1 6 |- Numbe r of Iterations Fig. 2. C onvergence b e h av io r for K rylov-subspace m ethod GM RES. ’ i : l os : 11 Time (se co n d s) : us ’ i; X| 0‘ Fig. I. Tw o p erio d s o f th e tran sie n t an aly sis o f the C lass AB a m p lifie r [7] in stead y -state. T he curves for all the m ethods lie o n to p o f each o th er, as th ey sh o u ld . 0-7803-6540-2/01 /SI 0.00 (C) 2001 IEEE Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. B. BiCG D. CGS BiCG is the Biconjugate Gradient method. It generated two sequences o f vectors that are made mutually orthogonal to each other called bi-orthogonal. One o f the sequences was generated by the original coefficient matrix, and the other by the transpose o f that matrix. It used much less storage than GMRES, but had problems with convergence, and had two matrix-vectors products at each iteration. Fig. 3 shows its convergence behavior for the microwave example circuit. The Conjugate Gradient Squared (CGS) method is a variant o f BiCG. It reformulated BiCG so that only the original matrix was needed and avoided transpose vector operations. Fig. 5 shows the convergence behavior for the BiCG solution. £ 1 A •/ \ h_ o !/ 3 CD ' A'' CO CD cr - CD > \ iS QT CD Numbe r of Iterations Fig. 5 C G S. 1 2 3 4 5 0 7 0 9 10 H Number of Iterations Fig. 3. B iC G . C. C o n v erg en ce b eh av io r for K rvlov-subspace m ethod. QMR The Quasi-Minimal Residual (QMR) method applied a least-squares solve and update to the BiCG residuals. QMR uses less storage than GMRES. It required matrixvector multiplications o f the original matrix and its transpose at each iteration step. Fig. 4 shows the convergence behavior for the QMR solution. C onvergence b e h a v io r Ib r K ry lo v -su b sp ac e m ethod o f E. BiCGSTAB Biconjugate Gradient Stabilized (BiCGSTAB) method is a variant of BiCG that used different updates to avoid using the transpose o f the matrix. The convergence behavior for the example is shown in Fig. 6. z 10 \ z _co 10 CD 0C Numb e r of Iterations 10 Fig. 6 C onvergence b e h av io r fo r th e K ry lo v -su b sp ace m ethod BiC G STA B . 10 2 3 4 S 0 7 a e III. 10 11 Number of Iterations Fig. 4. QMR. C o n v erg en ce b e h av io r for the K ry lo v -su b sp ace m ethod RESULTS In SPICE 3f5, the direct steady-state determination using the shooting-Newton method resulted in significant savings in computational time and resources (Table 1). The shooting-Newton method is also an accurate way to 0-7803-6540-2/01/S10.00 (C) 2001 IEEE Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. determine the steady-state response (Fig. 1). The system is in steady-state when the error is less than the userspecified value. The error measured was obtained using the maximum difference between the state o f the circuit after a cycle o f transient analysis and the initial state used for that cycle. Even greater efficiency was found using the Krylov-subspace methods for solution o f the iterate generated by this method (Table 1). Table 1. Summary o f Krylov-subspace methods and transient analysis results o f steady-state determination for a Class C amplifier. Method Used Time ShootingTransient for Steady(Seconds) Newton Analysis State To Reach Iterations Cycles Determination Steady-State Transient 67.8 >3000 analysis Gaussian 1.76 26 35 Elimination GMRES 1.53 26 35 CGS 1.68 26 35 BiCG 1.79 26 35 BiCGSTAB 1.53 26 35 QMR 1.53 26 35 The convergence behavior (Fig. 2 through Fig. 6) for the solution o f the iterate, generated by the shooting-Newton, pointed out the difficulty in convergence for some the Krylov-subspace methods. For some methods, such as CGS and BiCG, the residual is not reduced at each step o f the iteration. The GMRES, QMR and BiCGSTAB algorithms converged the fastest. CGS also had fast convergence, but its convergence was very irregular. much faster than the transient analysis approach o f simulation until all the transients have died out. The Krylov-subspace methods give mixed results in terms of efficiency (Table 1) over the standard solution method o f Gaussian elimination. The GMRES, QMR and BiCGSTAB algorithms show the fastest convergence to solution. Analysis with larger RF circuits is needed in order to investigate whether one Krylov-subspace method would be more computationally efficient in terms o f storage and operation count for RF circuits. The direct steady-state determination using Krylovsubspace methods was shown to be an efficient and accurate method. Its application to larger RF circuits should result in even greater computer resource savings. ACKNOW LEDGEM ENT The authors wish to acknowledge the assistance of Kenneth Kundert at Cadence Design for the microwave example circuit used for analysis, and Misha Kilmer of Tufts University for many discussions on Krylov methods. Referen ces [1] [2] [3] [4] [5] V. C o n c l u s io n The shooting-Newton direct steady-state determination was implemented in SPICE 3f5. The iterate generated by the method can be solved by using standard Gaussian elimination, or the different Krylov-subspace methods o f GMRES, BiCG, QMR, CGS, and BiCGSTAB. The direct method o f steady-state determination is shown to be efficient and accurate (Fig. I ) when analyzing RF circuits that have difficulty in the standard transient analysis available in SPICE 3f5 (Table I). This efficiency means shorter simulation times and less computer storage needs. The Krylov-subspace methods make for even greater computer efficiency (Table 1). All the Krylov methods are [6] [7] [8] [9] K. S. Kundert. J. K. White, and A. Sangiovanni-Vincentilli. Steady-State Methods fo r Simulating Analog and Microwave Circuits. Boston. MA. T. J. Aprille and T. N. Trick. "Steady-state analysis of nonlinear circuits with periodic inputs.” Proc. IEEE. vol. 60. pp. 108-114. January 1972. R. Telichevesky. K. Kundert. I.Elfadel. and J. White. "Fast simulation algorithms for RF circuits." Proc. o f the Custom Integrated Circuits Conference. May 1996. P. N. Ashar. "Implementation o f algorithms for the periodic steady-state analysis o f nonlinear circuits." Masters Thesis. University o f California Berkeley. March 1989. M. A. Kleiner. M. N. Afsar. "Determining the steady-state responses in RF circuits using GMRES. CGS. and BiCGSTAB solution in sSPICE for Linux.” 2000 IEEE MTT-S Int. Microwave Symp. Dig., pp. 87-90. June 2000. R. Telichevesky. K. S. Kundert. and Jacob K. White. "Efficient steady-state analysis based on matrix-free Krylov-subspace methods." Proc. 1995 Design Automation Conference. Santa Clara. California. June 1995. R. G. Meyer. W. D. Mack. "A widc-band class AB monolithic power amplifier." IEEE Journal o f Solid-State Circuits. Vol. 24. no. 1. pp. 7-12. Feb 1989. R. Barrett, M. Berry. T. F. Chan. J. Demmel. J. Donato. J. Dongarra. V. EijkhouL R. Pozo. C. Rominc. and H. van der VorsL Templates fo r the Solution o f Linear Systems: Building Blocks fo r Iterative Methods. Philadelphia: SIAM. 1994. F. R. Colon. T. N. Trick. "Fast periodic steady-state analysis for large-signal electronic circuits." IEEE Journal o f Solid-State Circuits. Vol. 8. no.4. pp. 260-269. August 1973. 0-7803-6540-2/01/S10.00 (C) 2001 IEEE Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. DETERMINING THE STEADY-STATE RESPONSES IN RF CIRCUITS USING GM RES, CGS, AND BICGSTAB SOLUTION IN sSPICE FOR LINUX Madeline A. Kleiner, Mohammed N. Afsar, Fellow IEEE Department of Electrical Engineering and Computer Science Tufts University Medford, MA. ABSTRACT sSPICE, an extension o f SPICE3f5 for Linux, is under development to determine the steady-state responses for RF integrated circuits. The shooting-Newton algorithm for steady-state determination is incorporated with solution of the resulting iterates using the Krylov-subspace methods, GMRES, CGS, and BICGSTAB. The direct method shows much more computer efficiency than the regular transient analysis in SPICE. Only small improvements are seen using the explicit matrix instead of Gaussian elimination methods. Using the implicit form o f GMRES resulted in improvements over the explicit GMRES and over the shootingNewton Gaussian elimination method. INTRODUCTION Determining the steady-state response for integrated circuits that are lightly damped and have large time constants is computationally expensive. Simulating high Q circuits is also computationally expensive and these types o f circuits occur regularly in communication systems. Four methods are commonly used to determine the steady-state response [1]: transient analysis, finite difference, shooting-Newton, and harmonic balance. In SPICE, the most popular analog circuit simulator, the steady-state is determined by allowing the time-domain transient analysis to run until all the transient signals have died o ff and all that remains is the periodic steady-state. With many integrated circuits this simulation takes a long time. The use o f SPICE, therefore, is limited to integrated circuits having few components. The finite difference method and the shooting-Newton method both use the direct method [2], The direct method uses the determination o f the steady-state as a solution to a two-point boundary value problem. But using the standard matrix methods o f solution, this m ethod is still too expensive. Lastly, the harmonic balance method is used extensively and works well for weakly nonlinear circuits, but becomes computationally expensive for strongly nonlinear circuits. SPICE can sim ulate these strongly nonlinear circuits, but with poor computer efficiency. Much work has explored the use o f newer numerical methods to take advantage o f matrix sparsity and the freedom from having to explicitly form the matrix needed for solution. GMRES, a robust non-stationary iterative method has been extensively reported as a speed up for steady-state determination in the timedomain [3,4]. Both GMRES and Q M R have been used in the harmonic balance m ethod to speed up steady-state solution on linear circuits [5]. Numerical methods not included in the nonstationary iterative methods have also been explored. Homotopy methods, for example, have been used to find the steady-state solution [6], but, unfortunately, they were found to be computationally expensive. sSPICE as presented here solves som e of these problems. It is an extension o f SPICE 3f5, the analog circuit simulator developed at the University o f California Berkeley. It uses the same direct methods for steady-state determination as incorporated into the SPICE 3c 1-version [6,7], GMRES, CGS, and BICGSTAB, all non-stationary iterative methods are incorporated into the code and com piled on Linux. A version o f GMRES is available that uses the implicit matrix technique[4]. Users may choose either the standard transient analysis in SPICE or the shooting-Newton method in combination with the GMRES, CGS, or BICGSTAB method to solve the equations 0-7803-5687-X/00/S10.00 (c) 2000 IEEE Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. formed in the solution. Results from the comparison o f the different methods are presented below. 1. ITERATIVE METHODS Non-stationary iterative methods are an efficient means to solve non-symmetric linear systems, such as the large system o f equations generated during the shooting-Newton method o f steady-state determination. They are Krylovsubspace methods and variations on the Conjugate Gradient m ethod for symmetric, positive definite systems. Iterative methods are used to solve the system represented as: Ax = b. The main differences between the nonsymmetric iterative methods lies in (1) the method o f constructing the basis vectors for the Krylov-subspaces and (2) the choice o f the iterate [9]. For GMRES, an orthogonalization method, the matrix A itself is used. BICGSTAB and CGS use a biorthogonalization method. The 'A 0 j matrix is used to construct the Krylov subspace. The direct solvers such as Gaussian elimination have the workload o f O(m) steps each requiring O(rrf) work for a total work estimate o f 0 (m 3)[8]. This severely limits the number o f equations that can make up the system being solved. The iterative methods theoretically require the same amount o f work, but this applies only to the worst case. By using the sparsity and special structure o f the large matrices generated by the shooting-Newton method o f solution for the steady-state response, the amount o f work can be reduced to O(m) at best or at least 0 (m : ). The Krylov-subspace algorithms also allow one to avoid the formation o f the matrix A above. Instead o f actually forming this matrix, its parts can be used in the matrix-multiplications needed for the calculations in the GMRES, CGS, and BICGSTAB. This method allows much larger circuits to be analyzed. The sSPICE code implementing these methods was written in C and incorporated into the University o f California Berkeley SPICE 3f5 [9], Users can specify the direct steady-state determination using either the standard Gaussian elimination or the non-stationary iterative methods GMRES, CGS, or BICGSTAB. 2. STEADY-STATE The determination o f the steady-state response o f many wireless communication devices driven by periodic inputs is o f great interest to designers. I f an analog circuit simulator such as SPICE is used for analysis, the differential equations generated by the system are integrated over tim e interval long enough for transient waveforms to die out. With rapidly decaying transients, SPICE can determine the steady-state response w ith great efficiency. But many o f the RF circuits have transients that decay very slowly and it becomes almost impossible to integrate the differential equations over the entire transient response needed. Direct methods, such as the shooting-Newton method presented here, will find the initial state needed to put the circuit directly in steady-state [2], If the circuit equations are represented as the system: (1) x = f( x, D, where x and f are n vectors, the vector f is periodic in time, t, and has a period o f T. A constraint for achieving steady-state is that the transient effects have died off. This is represented by: (2) x(0) = x(D. In other words, the solution at the end o f the period is the same as the condition at the beginning o f the period. This means that the circuit is in steady-state. The state transition function can be used to define the two-point boundary value problem , thus: (3) x(0)-<#x(/0),/0, T) =0, where $ is the state transition function. The state transition function is implicitly derived; it is calculated at each tim epoint until the end o f the period. It is dependent on the initial state, Xo, the period o f the response, T, and the starting time, to- Applying the shooting-Newton method to solve the boundary value problem results in the following iteration: (4) *0(')*+l = x0(O* - [ l - Jd] W -<#x(r0),/0,n] where k is the iteration index and J 0 is the sensitivity matrix and is represented by: (5) J j, =^-<#x(r0),r0, n . 0-7803-5687-X/00/S 10.00 (c) 2000 IEEE Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. The sensitivity matrix can be computed at the same time as the state transition function. Quantities needed for the calculation o f the sensitivity m atrix are already available at each timepoint from the transient analysis. One only uses these quantities and calculates the sensitivity matrix. The iterate is solved and, using a user defined limit, is considered converged or the circuit is sim ulated and another initial guess is used [2,7]. This process continues until the steady-state is reached. The shooting-Newton m ethod computes a set o f capacitor voltages and inductor currents for the circuit so that if these voltages and currents are used as the initial conditions for the transient analysis, the circuit is directly in steady-state. The solution o f this iterate equation is computationally expensive. Forming and solving the sensitivity matrix, because it is dense, is the m ost expensive. If a method like Gaussian elimination is used, then the costs are very high and the speed-up o f the direct method o f steady-state determination is limited. The newer iterative m ethods result in m uch greater computational savings. An even greater savings in computation time can be realized if instead o f forming the sensitivity matrix at each transient analysis step, the quantities are stored and then accessed when needed.for the solution o f the iterate in the Krylov-subspace m ethod[l, 3]. T able C ircuit # 1 2 3 4 5 6 7 8 3. 1: T e st C ircuits C irc u it # S tates DC Pow er Supply Class C am plifier-low Q Class C am plifier -h ig h Q Class C am plifier Colpitts Oscillator- Q=50 Colpitts Oscillator-Q =100 CoIpitts-Oscillator-high freq. EC-Colpitts Oscillator 4 5 5 11 3 3 3 2 R E SU LT S Table 1 lists the circuits used for simulation. Using SPICE3f5, their steady-state responses produce very long transient sim ulation times. The results are summ arized in Figure 1 and Figure 2 for the shooting-Newton m ethod o f steady-state determination. The relative CPU time is the time o f the iterative method divided by the tim e for Gaussian elimination. Using GMRES, CGS, and BICGSTAB iterative m ethods produce the expected significant improvement over regular transient analysis determ ination o f steady-state. However, BICGSTAB seems to have difficulty converging to solution for some circuits, and for circuit 7 converged to the incorrect solution. U sing the shooting-Newton algorithm for direct calculation o f the steady-state response show s improvement over the transient analysis. However, large improvements expected from the iterative methods used to solve the shootingNewton iterate are not evident. Using the implicit matrix method with GM RES resulted in more significant timesavings. Because o f memory overflow problems simulation o f larger example circuits was not possible. The next round o f examples will be run on a PC that has available more swap space available for Linux. Other non-stationary Krylov-subspace methods will be implem ented and compared. 4. REFERENCES [1] K. S. Kundert, J. K. White, and A. SangiovanniVincentilli, Steady-State M ethods fo r Sim ulating A nalog and Microwave Circuits. Boston, MA: Kluwer Academic Publishers, 1990. [2] T. J. Aprille and T. N. Trick, “Steady-state analysis o f nonlinear circuits with periodic inputs,” Proc. IEEE, vol. 60, pp. 108-114, January 1972. [3] R. Telichevesky, K. S. Kundert, Jacob K. White. “Efficient steady-state analysis based on matrixfree Krylov-subspace methods." Proc. 1995 Design Automation Conference, Santa Clara, California, June 1995. [4] R. Telichevesky, K. Kundert, I.ElfadcI, J. White, “Fast simulation algorithms for RF circuits”, Proc. O f the Custom Integrated Circuits Conference, May 1996. [5] M. M. Gourary, S. G. Rusakov, S. L. Ulyanov, M. M. Zharov, K. K. Gullapalli, B. J. Mulvaney, “Iterative solution of linear systems in harmonic balance analysis,” pp. 1507-1510, 1997 IEEE MTT-S Digest. [6] L. Trajkovic, E. Fung, S. Sanders, “ HomSPICE: simulator with homotopy algorithms for finding 0-7803-5687-X/00/S10.00 (c) 2000 IEEE Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [9] R. Barren, M. Berry, T. F. Chan, J. Demmel, J. Donato, J. Dongarra, V. Eijkhout, R. Pozo, C. Romine, H. van der Vorst, Templates fo r the DC and steady-state solutions o f nonlinear circuits,” Proc. IEEE Symposium o f Circuits and Systems, Monterey, California, June 1998. [7] P. N. Ashar, “Implementation o f algorithms for the periodic steady-state analysis o f nonlinear circuits,” Masters Thesis, UC Berkeley, March 1989. [8] L. N. Trefethen, D. Bau,III, N um erical Linear A lg eb ra , Philadelphia: SIAM, 1997. 1 2 3 Solution o f Linear System s: Building Blocks fo r Iterative Methods, Philidelphia: SIAM, 1994. 4 5 6 7 8 Circuit Figure 1. Comparison o f GMRES, CGS, and BICGSTAB methods □ I MPLI CI T G M R E S Q g m res zt CL (J 0 6 F igure 2. Comparison o f explicit and implicit GMRES. 0-7803-5687-X/00/S 10.00 (c) 2000 IEEE Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.

1/--страниц