вход по аккаунту


Patterns of protein-fold usage in eight microbial genomes A comprehensive structural census

код для вставкиСкачать
PROTEINS: Structure, Function, and Genetics 33:475–495 (1998)
A Hierarchical Method for Generating Low-Energy
Conformers of a Protein-Ligand Complex
James A. Given and Michael K. Gilson*
Center for Advanced Research in Biotechnology, Rockville, Maryland
A novel dynamical protocol
for finding the low-energy conformations of a
protein-ligand complex is described. The energy functions examined consist of an empirical force field with four different dielectric
screening models; the generalized Born/surface area model also is examined. Application
of the method to three complexes of known
crystal structure provides insights into the
energy functions used for selecting low-energy
docked conformations and into the structure
of the binding-energy surface. Evidence is presented that the local energy minima of a ligand
in a binding site are arranged in a hierarchical
fashion. This observation motivates the construction of a hierarchical docking algorithm
that substantially enriches the population of
ligand conformations close to the crystal conformation. The algorithm is also adapted to
permit docking into a flexible binding site and
preliminary tests of this method are presented.
Proteins 33:475–495, 1998. r 1998 Wiley-Liss, Inc.
Key Words: docking; landscape; dynamics; dielectric; affinity; binding; methotrexate; thermolysin; dihydrofolate reductase; HIV protease;
generalized Born
The specific, non-covalent binding of small, organic ligands by proteins is of central physiologic
and pharmacologic importance. As a consequence,
computational methods for predicting how tightly
and in what conformation a small molecule and a
protein will bind to each other have received considerable attention. Great progress has been made in
the development of these methods, as recently reviewed,2–9 and work on such methods continues
today. Existing methods for predicting binding affinities generally include ‘‘docking’’ algorithms that seek
the most stable bound conformation of the complex.
Ideally, this conformation will match the bound
conformation that is observed crystallographically.
It should be noted however, that low-energy conformations other than the global energy minimum of
the complex could, at least in principle, contribute
appreciably to the binding affinity. The contribution
of such conformations depends upon their energy
level, their width, and their number. Also, there is
some looseness associated with the concept of a
‘‘conformation’’ here. Thus, should the ‘‘bound conformation’’ be regarded as a single point in conformational space, as a single energy well, or perhaps as an
ensemble of similar conformations separated by small
energy barriers? These issues are relevant to the
kinetics and thermodynamics of binding. They also
may be relevant to the developing of docking algorithms and algorithms for computing binding affinities. This is because the shape of the energy surface
for binding will determine which type of docking
algorithm will most rapidly identify the global energy
minimum, as recently emphasized.10 In addition, the
shape of the energy surface is directly related to the
number of energy wells that must accounted for
when estimating the chemical potential of the complex with a predominant states method.11,12
It is reasonable to imagine that the low-energy
conformations of a stable protein-ligand complex are
distributed hierarchically in space and in energy.
Intuitively, this view follows from the fact that many
flexible molecules are composed of several separately
mobile chains or subunits, each of which has its own
subunits or side chains. Mathematical studies of
model hierarchical structures of this kind, such as
random dendritic, or tree-like, structures, show hierarchical clustering of both resonant energies and
relaxation time-constants.13–15 If this picture is correct, one expects to observe only a finite number of
levels to this hierarchy, because there are only three
or four levels to the geometry of most ligands of
interest. Thus, the energy surface near a tightly
bound conformation is likely to possess a hierarchical structure like that shown in Figure 1. Indeed,
previous studies of other molecular systems have
reported just such hierarchical arrangements of
energy-minima for isolated molecules.16–18 Note that
this picture contrasts with the ‘‘rugged landscape’’19,20
picture of ligand-binding that was recently proposed.10 It should be noted that the analysis of
Grant sponsor: The National Institutes of Health; Grant
number: GM54053; Grant sponsor: The National Institute of
Standards and Technology.
*Correspondence to: Michael K. Gilson, Center for Advanced
Research in Biotechnology, 9600 Gudelsky Drive, Rockville,
MD 20850.
Received 19 February 1998; Accepted 31 July 1998
In addition, the hierarchical nature of the energy
surface is exploited by assembly of a four-phase
docking protocol that involves a series of searches
and energy screens. Each search explores the vicinity of the lowest-energy structures found in the
previous phase. Finally, a preliminary application of
the present protocol to docking in a flexible binding
site is described. The insights provided by these
studies are reviewed in the Discussion and Conclusions section.
Phases of a Hierarchical Docking Protocol
Fig. 1. Diagram of a hierarchically-structured energy surface.
At low resolution, there is only one energy well. Closer examination shows that this is subdivided into three shallower energy
wells, each of which is further subdivided.
multidimensional energy surfaces in terms of local
energy minima dates back at least to Stillinger and
Weber’s introduction of the concept of inherent structures in liquids.21
The present paper uses a novel docking algorithm
to explore these and related issues. Unlike most
other docking algorithms (but see References 22–24),
this one is based upon molecular dynamics. It treats
all ligand degrees of freedom as flexible, including
bonds and angles, and can readily be used with a
flexible representation of the binding site. The present version of the algorithm is not designed to drive
toward the global energy minimum of a complex as
quickly as possible, but rather to explore the conformational possibilities of a ligand in a binding site in
a relatively thorough manner. In addition, because
we wish to study something resembling the actual
energy surface for binding, the energy functions used
here are variants of a force field designed for use in
molecular simulations, rather than a heuristic or
fully empirical ‘‘scoring function.’’
The paper is organized as follows. The Methods
section describes the docking algorithm. The Results
section then examines the properties of this algorithm and assesses the ability of five energy functions, differing in their treatment of the solvent, to
pick out the crystal conformation of three complexes
from a wide range of test conformations generated by
the algorithm. The most successful of these energy
functions is then used to examine the structure of
the low-energy sector of the binding-energy landscape. Several basic questions are addressed: How
many distinct local minima are there? Do the lowenergy states cluster in a hierarchical manner? How
‘‘rugged’’ is the low-energy sector of such a landscape?; i.e., how many levels of structure are there?
In the present protocol, low-energy docked structures are generated in four phases, termed randomization, predocking, docking, and redocking. The
randomization phase generates a variety of starting
conformations and orientations of the isolated ligand. The predocking phase looks for reasonable
docking modes. The docking phase studies the sensitivity of these modes to changes in the energy
function by seeking nearby energy-minimized structures using several different energy functions. The
redocking phase explores local minima near the very
lowest-energy docked structures, seeking structures
of still lower energy. The implementation of these
four phases is now described.
Randomization—In the randomization phase, the
isolated ligand is subject to a novel cyclic process of
heating and cooling that generates a wide variety of
starting orientations and conformations. Except as
otherwise noted, the randomization phase uses a
dielectric constant of 80 in order to promote the
generation of a wide distribution of initial conformations of the ligand that is not strongly biased by
Coulombic interactions. These are stored in a file to
be used sequentially in the predocking phase of the
Predocking—In the predocking phase, randomized
ligand structures are docked into the binding site of
the protein and their energy is minimized. Both the
initial center-of-coordinates (COC) of the ligand, and
the initial conformers of the ligand are random. As a
consequence, the probability is low that any single
predocking event will discover a realistic docked
conformation. As in the randomization phase, and
for the same reason, a dielectric constant of 80 is
used during predocking, except as otherwise noted.
Docking—In the docking phase, each conformation generated by predocking is optimized several
times starting from the same initial conformation,
each time with a different energy function. For
comparison, the crystal conformation of the complex
also is energy-minimized with the various energy
functions. The chief purpose of the docking phase in
this study is to determine which energy function
most reliably selects the experimental conformation
of the complex as the most stable from among the
varied alternatives. In the present study, the energy
functions vary in their treatments of the solvent and
of long-ranged nonbonded interactions.
Redocking—The redocking phase explores in more
detail the most stable conformers identified in the
previous phases. Here, the 25–30 conformers from
the docking phase that are of lowest energy are
subjected to serial heat-cool cycles within the binding site to generate, from each, a set of new local
energy-minima. Results obtained in the docking
phase (see below) indicate that the best electrostatic
screening function of those tested here is the distancedependent dielectric constant ⑀ij ⫽ 4rij. Therefore,
this screening function is used in the redocking
Implementation of the Protocol
A randomization dynamics for producing
diverse ligand conformers
It is common practice to use molecular dynamics
(MD) simulations at artificially high temperatures to
generate a range of different molecular conformations for a small molecule. Sampling can be further
enhanced by the use of Langevin Dynamics (LD),25
which adds two new forces to the simulation: a
stochastic force, updated at each time-step of the
simulation, that simulates the bombardment of the
molecule by molecules of solvent; and a deterministic
force, proportional to atomic velocity, that accounts
for viscous drag. For a given temperature, LD is
more effective at generating conformational transitions than energy-conserving MD, as shown below;
the rates of conformational transitions in LD simulations have been examined in more detail elsewhere.26 Even LD, however, is not particularly efficient at generating conformational transitions. The
problem is that the forces simulating solventbombardment are randomized at every time-step
and thus cannot provide the sustained impetus
needed to drive the system over an energy barrier.
However, if these stochastic forces are updated less
frequently, even fairly weak forces will drive the
system across energy barriers.
The use of such sustained stochastic forces yields
the present ‘‘randomization dynamics’’ (RD) method.
Specifically, RD is implemented here by modifying a
normal LD algorithm25 so that the stochastic forces
are updated less frequently. The force-update cycles
systematically through the N atoms of the solute, so
that the stochastic force on each atom is updated
every N time-steps. It should be noted that RD does
not yield a Boltzmann distribution of conformers,
but this is not necessary in the present application.
The RD method is used as the heating part of a
heat-cool cycle to generate the starting conformations of the ligand that will be fitted into the binding
site in the predocking phase of the protocol. Heating
involves five hundred steps of RD with stochastic
forces appropriate to normal LD at a temperature of
500 K, and with a coefficient of viscous drag of 10
ps⫺1. Harmonic restraints on improper dihedrals are
used to prevent racemization of chiral centers. ‘‘Cooling’’ involves energy minimization by the BroydenFletcher-Goldfarb-Shanno method.27 The resulting
energy-minimized structures are stored for subsequent use.
Initial positioning of the ligand
in the binding site
To start the predocking phase, the center of coordinates of one conformation of the ligand generated in
the randomization phase is translated to a randomly
selected member of a set of candidate starting positions in the binding site. The candidate positions lie
on a cubic lattice whose boundaries are established
by the user, based upon graphical examination of the
binding site. In the present study, the lattice spacing
is 1 Å, and the boundaries are set to encompass all
apparently reasonable placement positions. Lattice
dimensions of 10–15 Å are typical. Only those lattice
points furthest from atoms of the protein and from
the edge of the lattice are actually used as starting
positions for the ligand. These candidate starting
positions are established as follows.
Each lattice point is associated with one lattice
cube, and also with an integer that is, initially, not
specified. To begin, the integer associated with each
point whose cube contains an atom is set to zero. The
integer associated with each point at the boundary of
the lattice also is set to zero. Next, each lattice point
whose integer has not yet been set is examined. If it
neighbors a point already set to zero, it is assigned
an integer value of 1. Next, neighbors of points set to
1 are set to 2, and so on, until every lattice point has
been assigned an integer that indicates the shortest
distance from that point to either a protein atom or a
lattice edge. Upon completion of this process, the
points with the highest integer values are those at
the center of the binding site. Each set of points with
the same integer value is termed a ‘level’. Levels of
points are accumulated until the total number of
points is at least 40, starting with the level whose
integer value is the highest and proceeding in descending order. The complete set of accumulated
points constitutes the candidate lattice points. The
number of initial placement positions that results
from this procedure typically varies between 50 and
350, depending upon the geometry of the binding site
and the dimensions of the user-defined lattice (see
Accommodation or ‘‘growing-in’’ of the ligand
to the binding site
A ligand placed at random in the binding site of a
protein will usually overlap with protein atoms and
the resulting steric clashes must be relieved by the
docking protocol. One obvious approach to this problem is to gradually scale up the nonbonded interactions between the ligand and the protein from zero to
jected to 300-K Langevin Dynamics in order to allow
it to adapt to the contours of the binding site as it
grows. With this procedure, the ligand never overlaps with the protein. Occasional trajectories still
fail because the ligand is wedged tightly in an
conformation that leads to steric clashes. Such trajectories are discarded.
The protein is kept rigid during ‘‘growing-in’’ to
protect it from distortion due to the strong forces
exerted by the growing ligand. In calculations where
the protein is treated as flexible, the protein is
allowed to move only after ‘‘growing-in’’ is complete.
Fig. 2. Scaling of ligand coordinates. Outer line: protein binding site. Molecule ABCD: ligand at full size. Molecule abcd: scaled
ligand. Dot: center of coordinates of ligand. Arrows exemplify the
vectors used in scaling the coordinates of atom A to a, B to b, etc.
their full values. However, this causes parts of the
mobile ligand to become trapped among fixed protein
atoms. The forces on these trapped atoms become
enormous as the nonbonded forces continue to increase, causing gross structural distortions and eventual failure of the dynamics integration. That interlocking ligand-protein conformations are often not
relieved by gradual scaling of non-bonded interactions has been noted recently.28
A novel method for relieving such ligand-protein
overlaps is used here. The method involves a simple
transformation of the nonbonded interactions between the ligand and the protein. As illustrated in
Figure 2, the pairwise interactions between protein
atoms and ligand atoms are evaluated as if the
ligand atoms were shifted toward the center of
coordinates of the ligand. A scaling factor ␣ determines the amount of the shift. The transformation
changes the effective position of atom i from xi to a
new position x8i, given by
x8i ⫽ [(1 ⫺ ␣)xcoc ⫹ ␣xi],
where xcoc is the position of the center of coordinates
of the ligand. When ␣ is small, no steric clash can
occur so long as long as the center of coordinates of
the ligand is greater than a few Ångstroms from any
protein atom. As ␣ increases to unity, the ligand
‘‘grows’’ to its full size. Note that nonbonded interactions internal to the ligand are computed with
untransformed coordinates. Preserving atom-atom
distances within the ligand in this way avoids the
creation of steric overlaps within the ligand itself
when ␣ is much less than unity. The present approach differs in this respect from the otherwise
similar ‘‘bond scaling’’ method of loop closure described by Zheng et al.29
The predocking phase starts with ␣ ⫽ 0.1 and ␣
then increases linearly with time over a period of 500
femtoseconds. During this period, the ligand is sub-
Heat-cool redocking protocol
The conformations neighboring each of the thirty
lowest-energy structures located in the docking phase
are explored further with twelve iterations of a
redocking protocol. In each iteration, the structure is
‘‘grown-in’’ again by setting ␣ ⫽ 0.1 and increasing it
to unity over 1 picosecond of Langevin dynamics at
350 K. Then the energy is minimized with zerotemperature LD, the final minimized structure is
stored, and the next heating cycle is started.
Including long-ranged nonbonded forces
Including long-ranged nonbonded interactions in
molecular simulations is time-consuming, because
these interactions are numerous. One way to save
computer time is to make the approximation that
each atom interacts only with other atoms lying
within some cutoff distance. With this approximation, atoms at one end of the ligand may interact
with a difference set of protein atoms than atoms at
the other end of the ligand. In addition, the ligand
interacts with different protein atoms depending
upon its own location within the binding site. We find
that this approach, when used with cutoffs of 10–12
Å, disrupts whatever correlation may exist between
computed energy and the deviation from the crystal
Alternatively, one can treat long-ranged interactions by consistently neglecting all protein atoms
that are far from the binding site, in effect reducing
the size of the protein. In this commonly used
approach, all ligand atoms interact with all of the
protein atoms that are retained, and the nonbonded
pair list of a ligand atom does not depend upon its
location. This approach is examined in the present
The fact that much of the protein is fixed in typical
docking calculations also permits long-ranged interactions to be treated more precisely at modest additional computational cost. This is done by computing
in advance the potentials generated by some or all of
the fixed atoms of the protein, and storing these
potentials on a grid of positions that encompasses all
regions that mobile atoms might enter. Several
forms of this approach have been used previously.23,30–32 In most previous implementations, all
fixed atoms, including those in direct contact with
the ligand, contribute to the gridded potentials and
no protein atoms are treated explicitly. However, we
find that this approach yields inaccurate potentials
for grid points close to protein atoms, unless very
fine grids are used. In addition, molecular dynamics
simulations in which the atoms lining the binding
site are not treated explicitly require very small time
steps. Greater accuracy is achieved here by treating
a shell of fixed atoms lining the binding site explicitly, and using the potential grids to account only for
the influence of more distance atoms. A similar
procedure has been used previously.23 Three potential grids having lattice spacings of 1 Å are used
here. One grid bears electrostatic potentials, and two
bear Lennard-Jones ‘‘potentials.’’30 Energies are computed by trilinear interpolation, and forces are then
computed by finite-differences. As shown in the
Results section, this procedure can yield energies
and forces that agree extremely well with those
computed with a complete atom-atom non-bonded
This method requires that each atom of the system
be assigned to one of three categories. The ‘‘live’’
atoms are those which are mobile. The ‘‘real’’ atoms
are all those in the nonbonded pair list; this set
comprises the live atoms and the shell of fixed
protein atoms. The ‘‘grid’’ atoms are those whose
influence on the live atoms is accounted for only via
their contributions to the potential; these do not
appear in the pair list. Note that this categorization
is used only in computing nonbonded interactions.
All bonded interactions are currently accounted for,
including those involving atoms that are not ‘‘live.’’
In the present study, all ligand atoms are live.
Except in a limited study of a flexible binding site
(below), no protein atoms are live. Atoms are allocated to the real or the grid set as follows. First, the
lowest and highest x, y, and z coordinates of the
ligands among the predocked ligand conformations
are found. These values are termed (xmin, ymin, zmin)
and (xmax, ymax, zmax). The real atoms are then defined
as those contained by a box with lower and upper
corners (xmin ⫺ 4, ymin ⫺ 4, zmin ⫺ 4) and (xmax ⫹ 4,
ymax ⫹ 4, zmax ⫹ 4), where the coordinates are in
Ångstroms. All atoms outside this box are grid
atoms. The potential grids extend over a box with
lower and upper corners (xmin ⫺ 2, ymin ⫺ 2, zmin ⫺ 2)
and (xmax ⫹ 2, ymax ⫹ 2, zmax ⫹ 2). This suffices to
accomodate the excursions of the live atoms in the
docking and redocking phases. Table I lists the
actual bounds used in the present calculations.
Correcting ligand RMS deviations
for symmetry and thermal factors
The deviation of a test conformer of the ligand
from the crystal conformation is commonly reported
TABLE I. Start-Site Lattices and Real-Atom Boxes
for the Three Complexes, Specified by Their
Span in Each Dimension
xmin, xmax
ymin, ymax
zmin, zmax
Real box
xmin, xmax
ymin, ymax
zmin, zmax
19.0, 31.0
⫺25.0, ⫺13.0
⫺17.0, ⫺5.0
⫺7.0, 1.0
28.0, 40.0
20.0, 28.
45.0, 55.0
11.0, 21.0
⫺11.5, ⫺1.5
8.8, 38.1
⫺27.4, ⫺3.5
⫺30.3, 7.4
⫺19.0, 10.0
20.0, 53.0
8.0, 38.0
32.5, 63.8
⫺0.50, 29.3
⫺24.2, 7.1
N: number of lattice points retained for use as predocking
as the root-mean-square deviation (RMSD) of atomic
coordinates. However, the RMSD can be misleading.
One reason is that a ligand with symmetry can
assume a single conformation in more than one way.
For example, a 180° rotation of a phenyl group does
not generate a new conformation. Also, interchanging the indices of all the atoms on the right and left of
the HIV protease (HIVP) inhibitor XK263 does not
create a new conformer, because this molecule possesses an external rotational symmetry. Here, we
correct for such symmetries by recalculating the
RMSD with all possible combinations of symmetryrelated exchanges of atomic indices. Only the lowest
value found is reported. A similar approach has been
reported previously.33 No least-squares superposition algorithm is used in calculating RMSD in the
present study because because the rigid protein
establishes a fixed frame of reference.
Another reason that the RMSD can be misleading
is that a crystallographic study provides different
uncertainties for the positions of different atoms, as
expressed in the crystallographic B-factors. Also,
hydrogen positions are not determined by protein
crystallography, so is not appropriate to include
them when comparing ligand conformers with the
crystal conformer. In the present study, the contribution of each atom to the RMSD is weighted according
to its B-factor and hydrogen atoms are omitted from
the calculation. The formula for RMSD becomes:
0 ri ⫺ r°i 0 2
(Bi / B )
where the sum extends over all N8 non-hydrogen
atoms of the ligand; ri and ri° are respectively the
test and crystal coordinates of atom i; Bi is the
B-factor of atom i; and B is the mean B-factor
averaged over all atoms in the crystal structure.
Normalizing all B-factors by this mean corrects for
the fact that the overall scaling of protein B-factors
depends upon the details of the refinement process
and does not provide information on the actual
positional uncertainties of the atoms.
Docking Into a Flexible Protein
So far, discussion has been restricted to the docking of a ligand into a rigid binding site. It is also of
interest to determine whether the present protocol
yields reasonable conformations when part of the
binding site is treated as flexible. One would like,
ideally, to treat a large shell of atoms lining the
binding site as mobile, or ‘‘live.’’ However, a large live
set markedly increases the computational time. Here,
the computational demands are limited by mobilizing only those protein atoms within 6 Å of any ligand
atom in its initial conformation. Thus, the live set
varies according to the initial location and conformation of the ligand in the predocking phase. The live
set is not updated as different protein atoms approach or recede from the ligand; this restriction
improves the stability of the calculations. The set of
live atoms for each predocking event is stored for
future use, along with the final coordinates generated by that event. This makes it possible to use the
same set of live atoms in subsequent docking or
redocking calculations.
The use of different live sets for different ligand
conformations requires two modifications of the docking procedure. First, it is necessary to energyminimize the binding site in advance of the entire
docking procedure. Otherwise, each different ligand
position would cause a different part of the binding
site to be mobilized and therefore energy-minimized.
As a consequence, different ligand locations would
generate different changes in the intraprotein energy, making it impossible to compare the final
energies of the various docked conformations on an
equal footing. Note that such changes in intraprotein energy would occur even if protein-ligand
interactions were neglected, because they result
largely from the relief of stresses in the crystal
conformation of the protein itself. Energy-minimizing the empty binding site in advance prevents such
relaxations during docking, and makes it easier to
compare the energies of different ligand conformations with each other.
Second, it is necessary to report energies in a
different way. This is because the live set determines
the entries in the nonbonded pair list. Therefore,
docking events with different live sets have different
nonbonded lists so their energies are not directly
comparable. In fact, if two docking runs with different live sets happened to generate the same final
conformer, these might be assigned very different
energies only because their live sets differ. (Note that
this could happen only if the positions of some atoms
remained fixed in their initial positions.) One way to
solve this problem would be to compute energies
with a live set that was the union of all relevant live
sets. However, this would generate a large non-
bonded pair list, and would therefore be computationally costly. A more efficient approach is used here.
The absolute nonbonded energies used so far, which
are sums of all nonbonded energy terms involving at
least one live atom, are replaced by relative energies,
which are defined with respect to the energyminimized configuration of the crystal conformation.
The relative energy of a complex is defined as the
difference of two absolute energy calculations using
the same live set: one of the conformation of interest
and one of the reference state. Thus, the energy
differences we report are of the form
Nlive Nreal
⌬E ⫽
兺 兺 [E
i⫽1 j⫽i
ij (rij )
⫺ Eij (r°ij )]
where Eij is the nonbonded interaction between
atoms i and j as a function of their distance rij, rij° is
the distance between atoms i and j in the reference
structure, and Nlive and Nreal are the numbers of live
and real atoms, respectively (see above). For a given
conformation, these relative energies are independent of the choice of live set because all atoms that
are not live are in their reference position and
therefore contribute nothing to ⌬E. Live atoms that
have not moved also contribute nothing. Therefore,
the energies computed with different live sets can be
compared on an equal footing. Returning to the
example given above, it should be clear that the
relative energies of two identical conformations will
be equal even if they are computed with different live
sets. This is because any atom that has not moved
will contribute nothing to the relative energy whether
or not it is included in the live set.
Coordinates, Energy Models, and Parameters
The present study uses the crystal structures of
three protein-ligand complexes from the Protein
Data Bank.34 These are methotrexate-dihydrofolate
reductase (MTX/DHFR; 3DFR35 ); XK263 with HIVP
(1HVR36 ); and the phosphonamidate peptide inhibitor carbobenzoxy-GlyP-L-Leu-L-Leu with thermolysin (ZLL1/TL; 5TMN37 ). The CHARMm force field,
version 23.1 is used.1 Certain assignments of charge
states must be made to important functional groups
in the complexes. Aspartate residues 25 and 125 in
the HIVP dimer are crucial to binding inhibitors of
this protein. Both are treated as deprotonated, in
accordance with the ‘‘null model’’ for protein pKas.38
This choice is reasonable because in a typical initial
ligand-design study, the protonation states of such
residues would not be known. The protonation states
of these critical aspartyl residues are analyzed in
detail elsewhere.39 The charge state of the phosphate
group in the ligand ZLL1 is taken to be ⫺1, as is that
of its C-terminal Leu residue. The NADPH molecule
is included as part of DHFR. Proteins are described
Fig. 3. Comparison of the efficiency with
which three dynamics explore a ‘‘configuration space’’ defined by two dihedrals in the
central seven-membered ring of XK263.
Each dot gives the dihedral angles of one
conformation extracted from a trajectory.
in the ‘‘polar hydrogen only’’ representation; ligands
are described in the ‘‘all hydrogen’’ representation.
In the first four energy models examined here, the
solvent is accounted for only by rescaling all Coulomb interactions by 1/⑀ where the dielectric constant ⑀ takes values 1, 4, or 80 during the docking
phase. The distance-dependent dielectric constant
⑀ij ⫽ 4rij is also tested, where ⑀ij is the dielectric
constant specific to the interaction of atoms i and j
and rij is the interatomic distance.
The fifth energy model uses CHARMm with the
generalized Born/surface area (GB/SA) 40 implicit
model of water, as implemented in a local version41 of
the program UHBD.42,43 The solute dielectric constant is set to 1 and the solvent dielectric constant is
set to that of water; i.e., 80. The surface area is
computed for the locus of points swept out by a
probe-sphere of radius 1.4 Å in contact with the
solute, and the coefficient of the surface areadependent energy term is 0.025 kJ/mol Å2. The
radius of each atom is set to the Rmin value from
CHARMm, except that hydrogen radii are set to
1.2 Å.
When running molecular dynamics with the GB/SA
model, the effective Born radius44 of each atom is
updated along with the complete GB electrostatic
forces at every fifth time-step. However, the energy
derivatives associated with the dependence of these
radii upon conformation are neglected in order to
save computer time. The derivatives of the molecular
surface area with respect to atomic positions are
computed by a previously described method.45
Of these five energy models, the distance-dependent screening model proves to be the most successful at selecting the correct ligand conformations of
the three complexes examined here (see below). Therefore, it is the only one used in the redocking phase.
The lattices of initial placement sites and the
dimensions of the real atom boxes (see above) are
specified in Table I. Note that the coordinates of the
XK263/HIVP structure are rotated relative to their
published crystal structures. The orientation used
here can be recovered by superimposing three noncolinear atoms. The following may be used for this
purpose: Arg 41A CZ (48.781, ⫺16.910, ⫺20.536);
Arg 41B CZ (⫺4.871, ⫺15.825, ⫺9.989); Asp 25 CG
(27.757, ⫺23.096, ⫺12.090).
Performance of the Algorithms
Randomization Dynamics
The effectiveness of Randomization Dynamics at
generating conformational transitions is compared
with that of two other dynamics algorithms: Langevin Dynamics and an energy-conserving leap-frog
MD algorithm. The test system is the 7-membered
cyclic urea ring that forms the core of XK623 and the
other cyclic urea inhibitors.36 This ring possesses two
boat and two chair conformations36 that are separated from each other by energy barriers. Two dihedral angles internal to the ring suffice to specify
these four conformations the ring. The graphs in
Figure 3 plot these dihedral angles at every fifth step
Fig. 4. Histograms of errors showing
accuracy of the grid treatment of longranged nonbonded interactions for the
XK263/HIVP complex. Top row: ⑀ ⫽ 1. Bottom row: ⑀ij ⫽ 4rij. Left column: potential
energy. Right column: total force on ligand.
of a series of 50,000-step trajectories. The result is a
series of Ramachandran-like diagrams that show
the effectiveness of the conformational searches. The
results of the leap-frog dynamics trajectory at 3,500
K provides a clear picture of the locations of the
energy-minima corresponding to the four ring conformations.
Each dynamics algorithm is run at three temperatures, as indicated in the graphs. The highest temperature used is just below the point at which the
integrator failed. The temperatures listed for RD are
only ‘‘nominal,’’ because this dynamics does not yield
a Boltzmann ensemble: the random forces used in
the RD are drawn from distributions that would be
appropriate to the listed temperatures if standard
LD were used with the same coefficient of viscous
It is clear from Figure 3 that LD is more effective
than routine high-temperature MD at randomizing
the ring, and that RD in turn is much more effective
than LD. Even when the random forces used in RD
are those appropriate to LD at 100 K, the ring is
randomized much more effectively than with LD at
much higher temperatures.
RD also generates less strained conformations
than LD. For example, the mean potential energy of
the conformations generated by RD at a nominal
temperature of 100 K is three times lower than that
of the conformations generated by LD at 2,000 K,
even though RD samples conformations more rapidly. This demonstrates that weak, persistent forces
generate conformational transitions more effectively
and with less strain than do strong, short-lived
In summary, the sustained, stochastic forces that
result are extremely effective at randomizing conformations without distorting bonds and angles. The
sequence of conformations resembles that which
would be generated by a child playing with a molecular model.
Grid calculation of long-ranged nonbonded
The accuracy of the method employed here for
treating nonbonded interactions is tested with a set
of 463 randomly docked conformations of XK263 in
the active site of HIVP. Energies and forces are
compared to accurate reference calculations that
explicitly include all nonbonded interactions in the
system except for the constant interactions among
the fixed atoms of the protein. The net forces reported here are not zero because these mis-docked
conformations are not fully energy-minimized. Results are presented for a dielectric constant of 1 and
a distance dependent dielectric constant ⑀ij ⫽ 4rij.
Figure 4 presents histograms of the absolute errors in the energies and forces relative to the reference results. The errors are very small, especially for
⑀ij ⫽ 4rij. It is worth noting that, although the
electrostatic part of the grid energies is much larger
than the Lennard-Jones part, Lennard-Jones energies computed with the grids still vary by about 4
kJ/mol from one conformation to another. This variation in the long-ranged Lennard-Jones energies might
significantly influence the relative energies of the
Fig. 5. Scatter plot of symmetry-corrected versus uncorrected
RMSD values for various conformations of XK263.
Fig. 7. Initial energy of randomized ligand versus predocked
ligand energy for complex XK263/HIVP.
are significantly lower than the uncorrected values.
This is chiefly because the phosphate group has a
very low B-factor and the docking algorithm is quite
good at placing this anionic group at its correct site
near the cationic zinc of the binding site. The Bfactor corrections for MTX and XK263 also tend to be
favorable, but less so than for ZLL1. Correcting for
B-factors can either raise or lower the corrected
RMSD, depending upon the B-factors of the atoms
whose positions are compared.
Results of the Separate Phases
of the Docking Protocol
Randomization phase
Fig. 6. Scatter plot of B factor-corrected versus uncorrected
RMSD values for various conformations of ZLL1.
Symmetry and B-factor corrections
to the RMSD
The symmetry correction to the RMSD can be
substantial. This is shown in Figure 5, which compares symmetry-corrected and uncorrected RMSD
values for 463 predocked conformations of XK263.
On the other hand, correcting RMSD values for
symmetry is essential only for highly symmetric
molecules, such as XK263. This ligand possesses an
overall rotational symmetry, as well as two internal
rotational symmetries associated with its phenyl
rings. It is worth noting that correcting for symmetry
can only reduce the RMSD or leave it unchanged.
B-factor corrections to the RMSD can also be
significant. The largest corrections here are found
for ZLL1. Figure 6 compares B-factor-corrected and
uncorrected RMSD values for 1000 predocked conformations of this ligand. The corrected values of RMSD
The randomization phase generates a wide variety
of conformations of the free ligand that serve as
initial states for predocking. It is expected that
initial conformations that are high in energy are less
likely to yield docked conformations that are low in
energy. Figure 7 examines the correlation between
the energy of the initial randomized conformation of
the ligand and the energy of the resulting pre-docked
conformation for XK263/HIVP. The energy function
here uses ⑀ij ⫽ 4rij and the gridded long-range
potentials. It is clear that the initial conformations of
highest energy never yield low-energy docked conformations. However, the correlation between initial
ligand energy and final docked energy is remarkably
weak for the main cluster of conformers in the graph.
In particular, the docked conformations of lowest
energy do not result from the ligand conformers of
lowest energy. Thus, it would be a mistake to discard
any initial ligand conformers other than those of
very high energy.
Predocking phase
Characterization of conformations from the
predocking phase. The distributions of corrected
Fig. 8. Cumulative probability distributions of RMSD values for docked conformations. Light line: conformations generated
by simple predocking. Heavy line: conformations generated by complete protocol. (See
text for details.)
RMSD values for the three complexes are shown in
Figure 8 (thin line). About 2% of the docked conformations have RMSD values less than 3 Å. Further
energy-minimization with various energy functions
produces only small changes in these RMSD values.
Correlation of final energy with initial position. An interesting result of the present study is
that a relatively compact cluster of initial positions—
i.e., of the candidate lattice sites described in Methods—is responsible for the conformations with the
lowest energy. This cluster, or ‘‘hotspot,’’ is close to
the center of coordinates of the ligand in the crystal
structure. This clustering is apparent from the data
in Tables II–IV.
This observation suggests that a docking method
can be made more efficient by early discovery of the
initial cluster of positions of the ligand that lead to
the lowest energy structures, and then restricting
future docking trials to this hotspot. This strategy is
used in the complete protocol, described below.
Docking phase
In the docking phase, the ligand conformations
generated by predocking are re-minimized with the
five different energy models detailed in the last part
of Methods. For the dielectric screening models, the
calculations are also done with and without the
long-ranged grids. The crystal conformations also
are re-minimized under the same conditions so that
they relax slightly into the local minimum dictated
by the current energy model. This allows the minimized ligand conformations to be compared with an
TABLE II. XK263/HIVP Hotspots. Cartesian
Coordinates of Start Sites That Yielded the Lowest
Energies (kJ/mol) in the Predocking Phase
Predock numbera
number: sequence number, out of ⬃500 predocks. The
center of coordinates of XK263 in the crystal complex is (25.2,
⫺18.0, ⫺12.8).
appropriate reference crystal conformation on an
equal footing.
Figure 9 shows scatter plots of energy versus
corrected RMSD relative to the crystal conformation
Frame number
†The center of coordinates of MTX in the crystal complex is
(⫺2.4, 34.9, 22.6). See Table II for definitions.
TABLE IV. ZLL1/TL Hotspots†
Frame number
†The center of coordinates of ZLL1 in the crystal complex is
(51.2, 17.2, ⫺5.4). See Table II for definitions.
for the three different complexes and for the five
different energy models. These calculations do not
use the long-ranged nonbonded grids. The results for
the dielectric screening models are arranged so that
dielectric screening increases from left to right. The
scatter plots show that all five energy models yield a
strong correlation between energy and RMSD for
XK263/HIVP, particularly for conformations that are
similar to the crystal structure. Picking out the
crystal conformation from all the rest seems to be
fairly straightforward for this system. This might
result from the electrostatic affinity of the aspartyl
dyad of HIVP—here assumed to be doubly ionized—
for the hydroxyl groups of XK263. Also, there are
probably few ways this bulky ligand can fit into the
tunnel-like binding site of HIVP.
In contrast, MTX and ZLL1 are long and flexible,
so they can fit into their receptors in more conformations that are incorrect yet unstrained. This makes
the treatment of electrostatics more important. Thus,
for MTX/DHFR, the correlation between energy and
the RMSD from the crystal conformation improves
monotonically with increasing dielectric screening,
and both ⑀ij ⫽ 4rij and ⑀ ⫽ 80 succeed in picking out
the crystal conformation as the most stable, for the
conformations examined here. However, the GB/SA
model yields a rather poor correlation between RMSD
and energy. For ZLL1/TL, energy models with intermediate levels of electrostatic screening pick out the
crystal structure from the misdocked alternatives.
The models with least screening (⑀ ⫽ 1) and most
screening (⑀ ⫽ 80) fail to pick out the crystal conformation. The electrostatic attraction between the zinc
of the protein and the phosphate of the inhibitor
probably is important in selecting the correct conformation in this case. Completely neglecting electrostatic interactions for ZLL1/TL yields results even
worse than those obtained with ⑀ ⫽ 80. (Data not
shown.) It is also worth mentioning that calculations
with GB electrostatics but no surface-area dependent energy term yield results virtually indistinguishable from those obtained with the full GB/SA model
for all three systems examined here.
Figure 10 shows shows the corresponding results
for the four dielectric screening models, now including the long-ranged interactions provided by the
present grid method. The only substantive change is
that ⑀ ⫽ 1 now successfully picks out the crystal
conformation of ZLL1/TL.
The results in Figures 9 and 10 show that considerable electrostatic screening is essential for detecting
the bound conformation of the ligand. However, ⑀ ⫽
80 is excessive, for it reduces the ability to discriminate correct from incorrect conformations. The most
successful of the five energy models at this stage
appears to be ⑀ij ⫽ 4 rij.
It is somewhat surprising that so simple an energy
function performs this well in picking out the crystal
conformations of three protein-ligand complexes. A
trivial explanation of these results would be that the
growing-in method yields strained conformations of
the ligand, making it easy to select the unstrained
crystal conformation as the most stable one. If ligand
strain accounted for the success of the docking
calculations, then RMSD and ligand energy would be
correlated. However, Figure 11 shows that RMSD
and ligand energy are, if anything, anticorrelated,
disproving the hypothesis. Evidently, the success of
the energy function results from an appropriate
Fig. 9. Energy versus corrected RMSD from crystal for about 500 docked conformers of each complex. Energies are computed with four
different dielectric screening functions or the GB/SA solvation model. Long-ranged nonbonded interactions are excluded.
Fig. 10.
Same as Figure 9, but including long-ranged nonbonded interactions via the grid method.
Fig. 11. Ligand strain versus corrected RMSD for the docked
conformations of Figures 9 and 10. Energies are for the ligand
alone and are computed with ⑀ij ⫽ 4rij and no cutoff.
Fig. 12. Consequences of redocking for XK263/HIVP. The
lowest energy conformations from a predocking calculation (dots)
are ranked by energy and serially redocked twelve times each
balancing of the ligand energy with the ligandprotein interaction energy. The anticorrelation of
RMSD and ligand energy suggests that docking only
the lowest-energy conformations of the free ligand
would risk missing the most stable conformations of
the complex.
Based on the results of this section, only the
distance-dependent dielectric model ⑀ij ⫽ 4rij is used
in the redocking phase. The nonbonded grids are
retained only because they are not computationally
expensive; it is not clear that they improve the
Redocking phase
The intent of the redocking phase is to find new
local minima near the most stable conformations
generated in the previous phase. Here, the thirty
predocked structures of lowest energy are ranked
from lowest to highest energy and then subjected to
twelve heat-cool cycles apiece. The resulting sequences of final cooled energies are shown in Figures
12–14. Each filled circle represents one of the thirty
starting structures. The figures show that the redocking procedure usually leads to new conformations of
dramatically lower energy. However, occasional increases in energy also are observed.
Figures 15–17 show the consequences of redocking
on the scatter plots of energy versus RMSD. Each of
the thirty starting conformations is plotted with an
‘‘x’’, and the conformations generated by the redocking procedure are plotted with filled circles. (The
data set used here is different from that in the
preceding set of figures.) These scatter plots show
that the marked drops in energy due to redocking are
associated with only small drops in RMSD from the
Fig. 13. Consequences of redocking for MTX/DHFR. (See Fig.
12 for details.)
crystal conformations. Nonetheless, the correlation
between energy and RMSD is fairly well preserved
for XK263 and ZLL1.
The situation is different for MTX, where redocking uncovers a cluster of conformations of lower
energy than the crystal conformation but with RMSD
values of about 6 Å. (Higher-energy representatives
of this cluster can be seen in Figure 9.) In this cluster
of mis-docked conformations, the pteridine ring is
not in the binding site but lies at the surface of the
protein between His 18 and His 22. However, the
benzyl and carboxyl groups are near their crystallographic positions. The energetic dominance of this
mis-docked class of conformations points to a weakness in the energy function used here, as discussed
below. It is worth mentioning that these mis-docked
Fig. 14. Consequences of redocking for ZLL1/TL. (See Fig. 12
for details)
Fig. 16.
Fig. 17.
Fig. 15. Scatter plots of energy versus corrected RMSD for the
thirty lowest docked structures of XK263/HIVP (X’s), and for 12
redocked conformations of each low-energy conformation (dots).
(The data set is different from but similar to that in Fig. 12.)
conformations contribute to the cluster of conformations with RMSD of about 7 Å that are found by
GB/SA to be more stable than the crystal conformation (see Figure 9).
The Structure of the Energy Landscape
This section uses the energy-minima discovered
for the complexes to study the distribution of energies and conformations for these protein-ligand systems. This analysis should be viewed only as a first
approximation, however, because of the simplifications used in representing these systems. Thus, the
⑀ij ⫽ 4 rij energy model leaves out the energy penalty
associated with desolvation of polar groups. In addition, we have so far treated the protein as rigid.
Therefore, the energy surface examined here is only
Same as Figure 15 for MTX/DHFR.
Same as Figure 15 for ZLL1/TL.
one particularly interesting ‘‘slice’’ of the higherdimensional energy surface that includes protein
degrees of freedom. Nonetheless, the energy surfaces
examined here do account for many of the important
forces. In addition, most other docking algorithms
use much more simplified energy or scoring functions; the energy function here is comparatively
detailed. It is therefore of interest to examine the
energy surface defined by the present representation.
The energy-minima generated by the present algorithm can be used to construct an approximation to
the low-energy density of states. This is done by
ranking the conformations generated in the redocking phase by energy, and then plotting energy versus
rank. Plots of this type are presented in Figures
18–20 for the three complexes. The XK263 and MTX
complexes show a narrow distribution of low energy
structures, well separated from the rest. This feature is particularly evident in the detailed inserts to
Figure 18.
Figure 19.
Figure 20.
Fig. 21. Sequence of energies for a long serial redocking
analysis of XK263/HIVP, showing hierarchical arrangement of
energy levels of the local minima.
the figures. In contrast, the ZLL1 complex shows a
rather broad distribution of low energy structures.
This suggests that the entropic component of the free
energy is more important for the ZLL1 complex than
for the other two. The broader distribution of states
for ZLL1 may be a consequence of the fact that its
binding site is relatively exposed to solvent.
The present calculations also show that the energy
minima of the complexes cluster hierarchically. Thus,
as documented in Tables II–IV, the initial center-ofcoordinate placement sites that yield low docking
energies form clusters, or ‘‘hotspots.’’ Each hotspot in
turn corresponds to a broad class of binding modes
that are discovered by multiple predocks with the
same start-site. Each predocked conformation, when
redocked, gives rise to a set of structures differing in
fine-scale details. This description is consistent with
the hierarchical energy diagram in Figure 1.
The hierarchical structure of the energy surface
can also be discerned in the final energies generated
by a long sequence of redocks of XK263 in the HIVP
binding site (Fig. 21). The structure of the graph is
suggestive of a series of energy steps of various sizes
that can occur alone or in combinations. Magnification of the graph (not shown) shows superimposed
energy-variations on a smaller scale. The theme of
multiple scales of conformational variation is explored further in Figure 22. This plots pairwise
energy differences against pairwise RMSD values for
Figs. 18–20. Final energy levels attained in the redocking
phase studies, ordered by size, to provide an approximation to the
low-energy density of states. The inset shows a magnification of
the low-energy portion of the graph. Note that ZLL1/TL differs from
the other complexes studied in that it lacks a small, well-defined
set of lowest-energy structures.
Fig. 22. Pairwise RMSD versus pairwise energy difference for each pair of XK263/HIVP structures generated in a single cyclic redocking
procedure. The graph is ‘‘web-like’’ and highly correlated on all scales. The central graph shows all data points; the others show successive
magnifications of the boxed regions. Energies calculated with ⑀ij ⫽ 4rij.
the redocked conformations. The graph shown in the
center is at full scale; those around it show successive magnifications that reveal striking, multilevel
structure. Similar structure is found for the other
two complexes.
A Hierarchical Protocol for Docking
Here, a complete protocol based upon the results
described above is assembled and used to generate
low-energy bound conformations for each of the
protein-ligand complexes studied in this paper. To
begin, random ligand conformers are generated by
the RD method described above. Conformers with
energies 25 percent higher than the running ligand
energy minimum are rejected without being docked.
This uses the observation that high-energy ligands
do not dock to give low-energy complexes; see Figure
7. Next, fifty of the resulting ligand conformers are
predocked using a lattice of initial placement sites as
already described. The coordinates of the 10 placement sites that yield the lowest energies are then
averaged to give their center of coordinates. This
represents a docking ‘‘hotspot,’’ as discussed above.
The initial placement sites of subsequent predocks
are now restricted to the 3 ⫻ 3 ⫻ 3 lattice of initial
placement sites whose center is closest to the hotspot.
The predocking phase is continued until a total of
500 predocked conformations is generated. Finally,
the thirty lowest-energy structures from this predocking phase are each redocked twelve times. All steps
of the protocol use ⑀ij ⫽ 4rij .
The energy-based selection steps in this protocol
result in a marked increase in the yield of low-RMSD
structures. This is illustrated in Figure 8, where the
Fig. 23. Energy versus RMSD for predocked conformers of the
XK263/HIVP complex with a 6 Å shell of protein atoms mobile.
Energies are computed with ⑀ij ⫽ 4rij.
heavy line plots the cumulative distributions of
RMSD values for the complete protocol. The fraction
of conformations with RMSD less than 3 Å is 20–
40%, about ten-fold higher than with the simple
predocking protocol. There are also significant populations of conformations with RMSD under 2 Å. This
enrichment of low-RMSD conformations was
achieved entirely by the hierarchical selection of
low-energy regions of conformational space.
Preliminary Study of Docking
Into a Flexible Protein
The flexible-protein docking procedure described
in Methods was applied to the XK263/HIVP system.
The calculations use the distance-dependent dielectric model ⑀ij ⫽ 4 rij and a 14 Å atom-based nonbonded cutoff. The reported RMSD values are based
only upon the coordinates of the ligand relative to its
crystal conformation. In this preliminary study, the
predocking phase placed the randomized conformations of the ligand with their center of coordinates at
the crystallographic center of coordinates of the
Figure 23 shows a scatter plot of energy versus
RMSD for 786 predocked conformers of the XK263/
HIVP complex. There is little if any correlation,
despite the restricted conformational search, possibly because the available configuration space is so
expansive. However, when the thirty lowest energy
conformations are subjected to twelve redocking
cycles each, a distinct correlation between energy
and RMSD emerges, as shown in Figure 24. The high
degree of correlation observed results in part from
the relaxation of the binding site around the ligand.
This optimizes the geometry of the polar interactions
between the two molecules and thus creates a more
pronounced energy-minimum.
Fig. 24. Results of redocking the thirty lowest-energy conformations in Figure 23 with the same set of mobile atoms.
This paper describes a novel dynamical protocol
for finding the low-energy conformations of a proteinligand complex. Application of the method to three
complexes of known crystal structure provides insights concerning dynamical algorithms for docking
ligands into binding sites, the ability of empirical
energy functions to select the crystallographic conformation of a complex, and the structure of the energy
surface for binding.
A docking protocol requires a method for generating a wide range of plausible conformations of the
ligand. This is often done with quenched hightemperature dynamics simulations. The novel
method used here is based instead upon the application of weak but persistent random forces during a
dynamical simulation. Such forces prove to be highly
effective at altering conformation, as detailed in
Another requirement for a docking protocol is a
method of discovering the low-energy conformations
of the ligand within the binding site. In this paper,
we describe the use of MD within a hierarchical
protocol to find such conformations. Conformations
are evolved in stages, with the lowest energy conformations from one stage serving as starting points for
the next. The use of MD algorithms makes it straightforward to mobilize any part of the protein-ligand
system, including ring structures, anchored loops,
bond-stretches, and angle-bends.
One likely reason that MD has not been used
extensively in previous docking protocols is that the
initial placement of the ligand in the binding site can
be difficult, due to the extensive steric constraints.
One approach to this problem has been to associate
an elevated temperature with motions of the center
of mass of the ligand.22 This helps to drive the ligand
over energy barriers. Another approach is to artificially weaken nonbonded interactions initially, and
then gradually scale them up again during the
simulation. However, existing methods of scaling
nonbonded interactions can leave the ligand interlocked with the protein,28 causing failure of the
docking run. The present paper uses a novel ‘‘growingin’’ method of placing the ligand in the binding site.
In this method, the ligand is transformed to an
atom-sized object within the binding cavity, then is
gradually grown to its full size during a dynamical
simulation. This allows the ligand to adapt to its
surroundings and find a sterically acceptable final
conformation. The transformed ligand is always
within the binding site yet never overlaps with the
A third requirement for a successful proteinligand docking protocol is an energy model that is
computationally efficient, yet distinguishes between
correctly docked and mis-docked conformations. That
such energy models may exist is suggested by the
‘‘specificity versus stability’’ concept which notes
that, for structure prediction, one does not need an
energy function that yields actual binding energies;
a correct ranking of energies suffices. Thus, although
a truly accurate energy function must correctly rank
the stabilities of various docked conformations, it
could also happen that an energy function that does
not yield reliable binding affinities might nonetheless rank the stabilities of docked conformations
correctly. This point was first made explicit46 in the
context of theoretical protein folding, a problem that
has much in common with docking but which is more
complex because of the very large number of degrees
of freedom. The present study assesses the ability of
a series of dielectric screening functions, combined
with the CHARMm force-field, to rank artificially
generated conformations of three ligands according
to their deviations from their crystal conformations.
It is found that a moderate to high degree of screening is necessary: ⑀ij ⫽ 4rij or ⑀ ⫽ 80, but not ⑀ ⱕ 4.
One might not have expected this result, because
all of the conformations examined here are within
the binding site, and one tends to think of binding
sites as relatively desolvated environments in which
solvent screening would be modest. Thus, one might
imagine that a low dielectric constant of 2–4 would
be optimal for docking. In fact, a low dielectric
constant does perform well at selecting the correct
conformation for XK263/HIVP, and this binding site
is the least solvent-exposed of the three. The complete failure of the low dielectric constant models for
the MTX/DHFR case appears to result, at least in
part, from the need to screen otherwise excessive
electrostatic interactions between the carboxylate
groups of MTX and cationic side-chains at the protein surface, such as that of Arg 31. For ZLL1/TL, the
crystal conformation is found only with intermediate
levels of dielectric screening; decreasing the dielectric constant to 1 or increasing it to 80 destroys the
correlation between energy and RMSD. However,
the failures are not so dramatic as that observed for
MTX/DHFR with ⑀ ⫽ 1, so it is difficult to point to a
specific basis. Further analysis of these results may
offer a useful route to developing improved energy
The GB/SA model is more sophisticated than the
other solvent models used here. It has been shown to
yield good agreement with experiment in systems
involving small molecules in solution.40,41,47–50 On
the other hand, a previous application to HIVP,41
suggested that some modification of the GB/SA model
is needed for it to yield reliable results for complex
molecules like proteins. Here, GB/SA yields good
results for XK263/HIVP and ZLL1/TL, but relatively
poor results for MTX/DHFR. Examination of the
scatter plot for GB/SA (Fig. 9) reveals an anticorrelation between RMSD and energy that is reminiscent
of, though less severe than, the results obtained with
⑀ ⫽ 1. Perhaps, then, the GB electrostatics model,
like ⑀ ⫽ 1, is underestimating the degree of solvent
screening. Thus, including ionic (Debye-Huckel)
screening, or increasing the solute dielectric constant from 1 to 2, say, might improve the results.
Further work is warranted on this promising solvent
The other side of the specificity versus stability
concept is that an energy function that can correctly
rank the conformations of a receptor-ligand pair by
stability may not succeed at ranking different ligands according to their affinity for the receptor. If
nothing else, the solvation energies and conformational entropies of the uncomplexed ligands in solution are likely to be important in such relative
binding affinities. Thus, the present study does not
necessarily imply that the CHARMm/4rij model will
be useful for ranking different ligands according to
When assessing the effectiveness of a docking
energy function, it is important that the RMSD of
test conformations from the crystal conformation be
evaluated appropriately. This study highlights the
need33 to account for symmetry when computing the
RMSD. It is also noted that a docking protocol should
not be expected to yield high-precision coordinates
for atoms that are not well localized in crystallographic studies. We believe this is the first docking
study in which crystallographic B-factors are used to
account for the relative precision of of atomic coordinates when computing the RMSD of a test conformation.
Having identified the CHARMm/4rij energy function as a reasonable working model, the present
study examines the resulting energy surface for a
ligand in a binding site. It is found that local
energetic minima show strong hierarchical cluster-
ing, especially in the lowest-energy sector of configuration space. This contrasts with the suggestion10
that the landscape of binding energy is highly irregular and even ‘‘frustrated.’’19 Although there is indeed
a fairly wide range of barrier heights, the energy
minima are arranged in a hierarchy with only a few
levels. This structure is directly exploited by our
multistage protocol. A similar two-stage procedure
with an intermediate energy screen was found useful
in a previous study.51 A limitation of the present
study of the energy landscape is that we have kept
the receptor conformation rigid for the most part.
Therefore, the energy surface examined is only a
particularly interesting ‘‘slice’’ through a higher dimensional energy surface.
We do, however, report preliminary docking studies of XK263/HIVP in which the binding site is
treated as flexible. It is important to note that
obtaining a positive correlation between energy and
RMSD in this case required several important modifications of the protocol. First, we have found that
stresses in the protein must be relieved before the
docking run is initiated in order to avoid energy
differences that result entirely from local relaxation
of the part of the binding site into which the ligand is
introduced. Second, nonbonded cutoffs and the nonbonded pair list must be handled in a consistent
manner. Otherwise, the energies reported for different positions of the ligand contain substantial, arbitrary offsets that make energy comparisons meaningless.
It will be of interest in the future to pursue studies
of protein-ligand binding that use this flexible representation of the receptor, along more sophisticated
energy functions and additional methods of mapping
the energy surface.11,17,18,52,53 Such studies should
further elucidate the physical chemistry of proteinligand binding, and thus facilitate the development
of effective docking protocols.
Comparison With Other Docking Algorithms
The present docking protocol is not designed to be
particularly computationally efficient. Rather, it is
focused on generating a picture of the ligand-binding
energy surface that includes all solute degrees of
freedom. However, it would be possible to make it
faster by taking steps such as substituting efficient
energy-minimization routines for some of the Langevin dynamics steps and narrowing the search for
low-energy conformations earlier in the protocol. An
appealing feature of the present protocol is that it
readily allows for mobilization of additional degrees
of freedom belonging to the receptor. On the other
hand, this protocol would not permit the use of those
‘‘scoring’’ functions for which energy gradients are
not readily calculated.
It is of interest to compare the predictive performance of our approach with the results of other
docking methods that have been applied to the same
protein-ligand systems. To review, the present protocol, used with ⑀ ⫽ 4rij, is relatively successful for
XK263/HIVP and ZLL1/TL: the lowest energy conformations generated via the predocking/redocking protocol have corrected RMSD values of about 2 Å for
both of these systems (see Figs. 15 and 17). On the
other hand, for MTX/DHFR, a cluster of conformations with a corrected RMSD of about 6Å is found to
be significantly more stable than the crystal conformation. Several other conformations with RMSD
values of about 3 Å, found at the redocking stage,
also are predicted to be more stable than the lowest
RMSD conformations generated (Fig. 16).
We are not aware of any previous docking studies
that have examined the XK263/HIVP system. However, a previous study did use a genetic algorithm to
dock ZLL1 to TL.54 When the plain CHARMm force
field was used, the lowest energy conformation generated had an RMSD of 1.8 Å. This is similar to the
result presented here. The GA results became worse
when an atomic surface parameter solvation model
was added.
Detailed comparisons with previous docking studies of MTX to DHFR is not possible. This is because
previous studies have focussed on an Eschericha coli
enzyme without NADPH (4DFR35 ), while we examine an Lactobacillus casei structure with bound
NADPH (3DFR35 ). In addition, most previous reports do not specify how large a region of the receptor
was searched for low-energy conformations of the
ligand.33,55–57 If the searches were not wide enough to
encompass our incorrect MTX conformation, then a
comparison of methods would not be valid. This is a
significant issue. Thus, at least one study clearly did
not include our incorrect conformation in its search
range54: the largest allowed translation of the ‘‘pivot
atom’’ in the pteridine ring from its crystal position
was 2 Å.54 This would disallow our alternate energy
minimum conformation, which requires a 10 Å translation of the pivot atom.
However, we are aware of one study in which the
search range for MTX in 4DFR was wide enough to
have a chance of finding our alternate conformation.
In that study, the GA docking method DIVALI58 was
used with an AMBER-type potential function and
⑀ij ⫽ 4rij. A conformation with RMSD 2.3 Å was the
most stable one found. It is interesting that our
incorrect 6 Å RMSD conformation was not identified
as the most stable, especially given that the DIVALI
study used a similar energy function. One possible
explanation is that the DIVALI study kept the bond
lengths and angles of MTX fixed in their crystallographic values; only torsion angles were allowed to
vary. This restriction should make it easier to find
the crystal conformation of the complex. In our
method, all ligand coordinates vary, so bonds and
angles are reoptimized for each docked conformation
that is examined. This helps to stabilize noncrystallographic conformations and thus provides a
more stringent test of the energy model. Another
possible source of discrepancy, of course, is the
difference between 4DFR and 3DFR.
We are grateful to J.A. McCammon, C. Nicholas
Hodge, and J. Moult for insightful and stimulating
discussions. We thank M. Head both for important
discussions and for assistance with computer graphics. We thank J. Pedersen for providing software for
comparing molecular conformations. We thank L.
David for providing the software for computing
forces in the GB model. This work was supported by
the National Institutes of Health (GM54053) and the
National Institute of Standards and Technology.
Certain commercial equipment or materials are
identified in this paper in order to specify the methods adequately. Such identification does not imply
recommendation or endorsement by the National
Institute of Standards and Technology, nor does it
imply that the materials or equipment identified are
necessarily the best available for the purpose.
1. Brooks, B.R., Bruccoleri, R.E., Olafson, B.D., States, D.J.,
Swaminathan, S., Karplus, M. CHARMm: A program for
macromolecular energy, minimization and dynamics calculations. J. Comput. Chem. 4:187–217, 1983.
2. Kuntz, I.D., Meng, E.C., Shoichet, B.K. Structure-based
molecular design. Acc. Chem. Res. 27:117–123, 1994.
3. Ajay, Murcko, M.A. Computational methods to predict
binding free energy in ligand-receptor complexes. J. Med.
Chem. 38:4953–4967, 1995.
4. Rosenfeld, R., Vajda, S., DeLisi, C. Flexible docking and
design. Annu. Rev. Biophys. Biomol. Struct. 24:677–700,
5. Clark, D.E., Westhead, D.R. Evolutionary algorithms in
computer-aided molecular design. J. Comput. Aided Mol.
Des. 10:337–358, 1996.
6. Böhm, H.-J., Klebe, G. What can we learn from molecular
recognition in protein-ligand complexes for the design of
new drugs? Angew. Chem. Int. Ed. Engl. 35:2588–2614,
7. Böhm, H.-J. Computational tools for structure-based ligand design. Prog. Biophys. Mol. Biol. 66:197–210, 1996.
8. Parrill, A. Recent advances in computer-aided drug design
methods. Expert Opin. Ther. Pat. 7:937–945, 1997.
9. Marrone, T.J., Briggs, J.M., McCammon, J.A. Structurebased drug design: Computational advances. Annu. Rev.
Pharm. Tox. 37:71–90, 1997.
10. Verkhivker, G.M., Rejto, P.A., Gehlhaar, D.K., Freer, S.T.
Exploring the energy landscapes of molecular recognition
by a genetic algorithm: Analysis of the requirements for
robust docking of HIV-1 protease and FKBP-12 complexes.
Proteins 25:342–353, 1996.
11. Head, M.S., Given, J.A., Gilson, M.K. ‘‘Mining minima’’:
Direct computation of conformational free energy. J. Phys.
Chem. 101:1609–1618, 1997.
12. Gilson, M.K., Given, J.A., Head, M.S. A new class of models
for computing receptor-ligand binding affinities. Chem. &
Biol. 4:87–92, 1997.
13. Aomoto, K. Point spectrum on a quasi-homogeneous tree.
Pac. J. Math. 147:231–242, 1992.
14. Richert, R., Blumen, A., (eds.) ‘‘Disorder Effects in Relaxation Processes: Glasses, Proteins, Polymers.’’ New York:
Springer-Verlag, 1994.
15. Nakayama, T., Yakubo, K., Orbach, R.L. Dynamical properties of fractal networks—Scaling, numerical simulations,
and physical realizations. Rev. Mod. Phys. 66:381–443,
Czerminski, R., Elber, R. Reaction-path study of conformational transitions and helix formation in a tetrapeptide.
Proc. Natl. Acad. Sci. USA 86:6963–6967, 1989.
Czerminski, R., Elber, R. Reaction path study of conformational transitions in flexible systems: Applications to peptides. J. Chem. Phys. 92:5580–5601, 1990.
Becker, O.M., Karplus, M. The topology of multidimensional potential energy surfaces: Theory and application to
peptide structure and kinetics. J. Chem. Phys. 106:1495–
1517, 1997.
Toulouse, G., Rammal, R., Vannieminus, J. ‘‘Spin Glass
Theory and Beyond.’’ New York: Springer-Verlag, 1994.
Wolynes, P. Folding funnels and energy landscapes of
larger proteins within the capillarity approximation. Proc.
Natl. Acad. Sci. USA 94:6170–6175, 1997.
Stillinger, F.H., Weber, T.A. The inherent structure in
water. J. Phys. Chem. 87:2833–2840, 1983.
Di Nola, A., Roccatano, D., Berendsen, H.J.C. Molecular
dynamics simulation of the docking of substrates to proteins. Proteins 19:174–182, 1994.
Luty, B.A., Wasserman, Z.R., Stouten, P.F.W., Hodge, C.N.,
Zacharias, M., McCammon, J.A. A molecular mechanics/
grid method of evaluation of ligand-receptor interactions.
J. Comput. Chem. 16:454–464, 1995.
Nakajima, N., Higo, J., Kidera, A., Nakamura, H. Flexible
docking of a ligand peptide to a receptor protein by
multicanonical molecular dynamics simulation. Chem.
Phys. Lett. 278:297–301, 1997.
van Gunsteren, W.F., Berendsen, H.J.C. A leap-frog algorithm for stochastic dynamics. Mol. Simulat. 1:173–185,
Loncharich, R.J., Brooks, B.R., Pastor, R.W. Langevin
dynamics of peptides: The frictional dependence of isomerization rates of N-acetylalanyl-N8-methylamide. Biopolymers 32:523–535, 1992.
Press, W.H., Flannery, B.P., Teukolsky, S.A., Vetterling,
W.T. ‘‘Numerical Recipes. The Art of Scientific Computing.’’
Cambridge: Cambridge University Press: Chapter 10, 1989.
Apostolakis, J., Plückthun, A., Caflisch, A. Docking small
ligands in flexible binding sites. J. Comput. Chem. 19:21–
37, 1998.
Zheng, Q., Rosenfeld, R., Vajda, S., DeLisi, C. Loop closure
via bond scaling and relaxation. J. Comput. Chem. 14:556–
565, 1993.
Pattabiraman, N., Levitt, M., Ferring, T.E., Langridge, R.
Computer graphics in real-time docking with energy calculation and minimization. J. Comput. Chem. 6:432–436,
Goodsell, D.S., Olson, A.J. Automated docking of substrates to proteins by simulated annealing. Proteins 8:195–
202, 1990.
Meng, E.C., Schoichet, B.K., Kuntz, I.D. Automated docking with grid-based energy evaluation. J. Comput. Chem.
13:505–524, 1992.
Rarey, M., Kramer, B., Lengauer, T., Klebe, G. A fast
flexible docking method using an incremental construction
algorithm. J. Mol. Biol. 261:470–489, 1996.
Bernstein, F.C., Koetzle, T.F., Williams, T.F. et al. The
Protein Data Bank: A computer-based archival file for
macromolecular structures. J. Mol. Biol. 112:535–542, 1977.
Bolin, J.T., Filman, D.J., Matthews, D.A., Hamlin, R.C.,
Kraut, J. Crystal structures of Eschericha coli and Lactobacillus casei dihydrofolate reductase refined at 1.7 Angstroms resolution. I. General features and binding of
methotrexate. J. Biol. Chem. 257:13650–13662, 1982.
Lam, P.Y.S., Jadhav, P.K., Eyermann, C.J. et al. Rational
design of potent, bioavailable, nonpeptide cyclic ureas as
HIV protease inhibitors. Science 263:380–384, 1994.
Holden, H.M., Tronrud, D.E., Monzingo, A.F., Weaver, L.H.,
Matthews, B.W. Slow- and fast-binding inhibitors of thermolysin display different modes of binding. Crystallographic analysis of extended phosphonamidate transitionstate analogues. Biochemistry 26:8542–8553, 1987.
Antosiewicz, J., McCammon, J.A., Gilson, M.K. Prediction
of pH-dependent properties of proteins. J. Mol. Biol. 238:
415–436, 1994.
Trylska, J., Antosiewicz, J., Geller, M., Hodge, C.N., Klabe,
R.M., Head, M.S., Gilson, M.K. Thermodynamic linkage
between protonation and binding of inhibitors to HIV
protease. Prot. Sci., in press.
Qiu, D., Shenkin, P.S., Hollinger, F.P., Still, W.C. The
GB/SA continuum model for solvation. A fast analytical
method for the calculation of approximate Born radii. J.
Phys. Chem. 101:3005–3014, 1997.
Luo, R., Head, M.S., Moult, J., Gilson, M.K. pKa shifts in
small molecules and HIV protease: Electrostatics and
conformation. J. Am. Chem. Soc. 120:6138–6146, 1998.
Davis, M.E., Madura, J.D., Luty, B.A., McCammon, J.A.
Electrostatics and diffusion of molecules in solution: Simulations with the University of Houston Brownian Dynamics program. Comput. Phys. Commun. 62:187–197, 1991.
Madura, J.D., Davis, M.E., Gilson, M.K., Wade, R.C., Luty,
B.A., McCammon, J.A. Biological applications of electrostatic calculations and Brownian dynamics simulations.
Rev. Comput. Chem. 5:229–267, 1994.
Still, W.C., Tempczyk, A., Hawley, R.C., Hendrickson, T.
Semianalytical treatment of solvation for molecular mechanics and dynamics. J. Am. Chem. Soc. 112:6127–6129,
Sridharan, S., Nicholls, A., Sharp, K.A. A rapid method for
calculating derivatives of solvent accessible surface areas
of molecules. J. Comput. Chem. 16:1038–1044, 1995.
Srinivasan, R., Rose, G.D. LINUS—a hierarchic procedure
to predict the fold of a protein. Proteins 22:81–99, 1995.
Chambers, C.C., Hawkins, G.D., Cramer, C.J., Truhlar,
D.G. Model for aqueous solvation based on class IV atomic
charges and first solvation shell effects. J. Phys. Chem.
100:16385–16398, 1996.
Best, S.A., Merz, K.M., Reynolds, C.H. GB/SA-based continuum solvation model for octanol. J. Phys. Chem. B
101:10479–10487, 1997.
49. Watanabe, T., Hashimoto, K., Takase, H., Kikuchi, O.
Monte Carlo simulation study on the conformation and
interaction of the glycine zwitterion in aqueous solution.
Theochem.—J. Mol. Struct. 397:113–119, 1997.
50. Scarsi, M., Apostolakis, J., Caflisch, A. Comparison of a GB
solvation model with explicit solvent simulations: Potentials of mean force and conformational preferences of
alanine dipeptide and 1,2-dichloroethane. J. Phys. Chem.
B. 102:3637–3641, 1998.
51. Caflisch, A., Niederer, P., Anliker, M. Monte Carlo docking
of oligopeptides to proteins. Proteins 13:223–230, 1992.
52. Kolossvary, I. Evaluation of the molecular configuration
integral in all degrees of freedom for the direct calculation
of conformational free energies: Prediction of the anomeric
free energy of monosaccharides. J. Phys. Chem. A 101:9900–
9905, 1997.
53. Guo, Z.Y., Brooks, C.L. Thermodynamics of protein folding:
A statistical mechanical study of a small all-beta protein.
Biopolymers 42:745–757, 1997.
54. Judson, R.S., Tan, Y.T., Mori, E., Melius, C., Jaeger, E.P.,
Treasurywala, A.M., Mathiowetz, A. Docking flexible molecules: A case study of three proteins. J. Comput. Chem.
16:1405–1419, 1995.
55. Jones, G., Willett, P., Glen, R.C., Leach, A.R., Taylor, R.
Development and validation of a genetic algorithm for
flexible docking. J. Mol. Biol. 267:727–748, 1997.
56. Makino, S., Kuntz, I.D. Automated flexible ligand docking
method and its application for database search. J. Comput.
Chem. 18:1812–1825, 1997.
57. Mizutani, M.Y., Tomioka, N., Itai, A. Rational automatic
search method for stable docking models of protein and
ligand. J. Mol. Biol. 243:310–326, 1994.
58. Clark, K.P., Ajay. Flexible ligand docking without parameter adjustment across four ligand-receptor complexes. J.
Comput. Chem. 16:1210–1226, 1995.
Без категории
Размер файла
304 Кб
structure, comprehension, fold, patterns, genome, protein, microbial, usage, eighth, census
Пожаловаться на содержимое документа