close

Вход

Забыли?

вход по аккаунту

?

asm.3150100204

код для вставкиСкачать
APPLIED STOCHASTIC MODELS AND DATA ANALYSIS, VOL. 10, 91-102 (1994)
INFORMATION THEORY APPROACH IN EFFICIENCY
MEASUREMENT
JATI K . SENGUPTA
Department of Economics, University of California, Santa Barbara, C A 93106-9210. U S A .
SUMMARY
The current non-parametric method of measuring productive efficiency of input-output systems is
generalized here in the stochastic case in terms of an information theory approach based on the concept
of entropy. Use of maximum entropy as a method of finding the most probable distribution of the
input-output data set and as a predictive criterion is illustrated for production systems with multiple
inputs and outputs.
KEY WORDS
Principle of maximum entropy Efficiency measurement by data envelopment analysis
Use of conditional entropy
INTRODUCTION
The non-parametric approach of data envelopment analysis (DEA) has been frequently used
in recent literature 1,2,3 to specify and measure productive efficiency. Two most attractive
features of this approach are its flexibility and its emphasis on data-based method of estimating
efficiency. In the stochastic case, however, the statistical distribution of the input-output data
plays an important role in the specification and estimation of the production frontier but the
DEA model fails to incorporate it in any significant way. It is clearly plausible that the estimate
for the efficiency surface by the DEA method would differ according as the data set follows
one distribution or another. Given a sample set of observed data points, one may therefore
ask: ‘What is the most plausible form of the true distribution that generates the given data?’
Information theory and the associated probability distribution, called the maximum entropy
(ME) distribution, seek to provide an answer to this question. The ME principle states that if
the statistician’s decision problem is to fit a distribution as an estimator of some true
distribution, he should formulate his prior knowledge on the latter distribution in the form of
constraints on the distribution to be fitted; then he should choose the most uninformative
distribution subject to these constraints, with entropy used as a measure of uninformativeness.
Clearly the DEA model, which basically computes a sequence of n linear programming (LP)
problems to test if each of the n data points is on the production frontier, is quite flexible in
terms of allowing prior knowledge as additional constraints.
There is a second way of incorporating the data distribution aspect into a DEA model. This
is through a predictive criterion adjoined to the original model. The information theory
approach can be used here to generalize the predictive power criterion, when the underlying
distribution need not be normal.
Finally, when the DEA model is considered in its dynamic version with a production
CCC 8755-0024/94/02OO91- 12
01994 by John Wiley & Sons, Ltd.
Received 2 April 1992
Revised 22 December 1993
92
J. K. SENGUPTA
function involving lagged inputs, the question arises: what should be the order of the lags? For
time series data, the technology is frequently represented in the production function by lagged
capital inputs and the question of how many lags there should be in the frontier production
assumes special importance. The information theory approach can be profitably used here to
determine the optimal number of time lags in a dynamic production frontier.
Our object here is to develop and apply two basic concepts of information theory: the
entropy and the mutual information statistic in the DEA framework, thereby generalizing the
scope of applicability of the latter. These applications are intended to develop a joint method
of modelling and estimation of productive efficiency in a stochastic view of the DEA model,
where the set of sample observations is divided into two subsets: one containing efficient units
and the other non-efficient ones.
2. ENTROPY IN DEA MODELS
Lack of complete knowledge about the random state of nature has been a pervasive
characteristic of most decision models. For the DEA model this is due to the imprecise
knowledge of the probability distributions of inputs and outputs. In the standard DEA model
we observe the rn inputs X j = ( X i j ) and the single output y j for each decision-making unit
(DMU) j E I,, In= (1,2, ...,n) and formulate the following LP model for unit k as follows:
min gk = xi^ =
4
C
Xkipi
i=l
where
P E C(P>= IPI XP 2 y ;P 2 01
is efficient if it lies on the convex hull of the convex set, i.e. if it holds
Then the unit k E In
that x i ~ * ( / c =) y k and Sk = y : - y k = o where Sk is the slack variable, which is equal to zero
in non-degenerate cases and y : = x L P ( k ) is the potential or maximum output. Then, by
varying k in the index set I, in the objective function, the whole efficiency surface can be traced
out from the optimal solutions of the family of LP models of the form (1).
In case where the data set (X,y)
is subject to a stochastic generating mechanism, the
objective function of (1) may be written more generally as a convex loss function L ( q , p ) ,
where q = [email protected] - y is the potential loss of output and p = ( p j ) is the probability of occurrence
of the j t h state of nature, assuming a discrete distribution framework. In the production
frontier literature, the following specifications of the loss function have been frequently used:
INFORMATION THEORY A N D EFFICIENCY MEASUREMENT
93
Both Farrell‘ and the DEA approach3 used the formulation (2b) to minimize the loss
function L ( 4 , p )= xkp - yk, assuming that the reference unit k has the highest probability of
realizing a zero loss level. Timmer4 and Johansen’ used the formulation (2a) to minimize the
average loss function g = Z’P - j j , where Z, j j denote the respective averages of inputs and
output. The formulation (2c) is based on the least squares norm, which is appropriate if the
loss vector 4 follows a normal distribution.
Note that one common assumption underlying all the three specifications above is that the
probability p = ( p i ) of the random state of nature is known. In reality, however, this is rarely
the case. Trimmer equated the sample means of inputs and output as estimators of the
population means and minimized the mean loss function g = X’p - J . For each objective
function g k = xtP for the kth reference unit in (I), the DEA model assumes the probability
p k to be one; hence this yields the result that if unit k is not efficient for the model having
pk = 1, it cannot be efficient for any other model having an objective function: min gr = x : p
where r # k. The case of incomplete knowledge about the random states of nature can be
handled in two different ways. One is to introduce partial information as a compromise
between complete ignorance and complete knowledge about the probabilities pj and thereby
assume that the DM can rank the random states of nature in terms of their probabilities, e.g.
pl 2 pz 2 ... 2 pn. This line has been developed by Fishburn6 and Kmietowicz and Pearman,’
who have derived optimal sets of decision rules incorporating the incomplete information in
a partial order as above. A second method of incorporating partial information is to allow it
as prior information before deriving the estimates of the unknown probabilities p j of the
random states of nature. Note that Timmer4 does precisely this in his method of moments
approach when he replaces the population means of inputs and output by their sample
counterparts. Recently Kopp and Mullahy have introduced higher-order sample moment
restrictions (e.g. sample moments of orders two to four) as a basis for identifying and
estimating the technical efficiency parameters. In particular, they have empirically tested the
reasonableness of the assumption of a half normal distribution for the error terms
e, = x,!p - yj in the cost data for the U.S. electric; power industry. Their empirical results failed
to support the hypothesis of a half-normal specification. This suggests the need to approximate
the most appropriate distribution generating the input-output data set. Information theory
with its entropy approach provides a most convenient method for approximating the true
distribution underlying the data set.
Now consider Timmer’s transformation (2a) of the DEA model, where the probabilities p j
of the random states of nature generating the output data are not completely known. We
assume, however, that sample observations on output are available and we assume that the
sample information is given by the sample moments as in the method of moments. For
simplicity, we assume that the first moment, i.e. the sample mean, is given by
2P i Y j = i
j=1
where i; is the sample mean output. Given this prior information, ji, one may now ask: ‘Which
way of assigning the prior probabilities p j in the output distribution makes the fewest
assumptions?’ Since the estimated distribution should be widely applicable, statistical decision
theory suggests that we choose the most uninformative distribution. As a measure of
uninformativeness the concept of entropy has been frequently applied, where entropy
associated with a distribution for example is defined as follows:
n
H = - C p j In p j
j=1
(discrete case)
94
J. K. SENGUPTA
[ p ( y )In p ( y ) ] d y
(continuous case)
Here entropy H can be interpreted in two ways: either as a measure of average
uninformativeness (i.e. prior uncertainty)
_ . in the random states of nature, or as the average
information obtained when a given realization of the state of nature is observed. In either case,
we maximize entropy W under the prior information summarized by the first sample moment
condition ji. This yields the maximum entropy (ME) principle for determining the probabilities
Pj
On solving this nonlinear programming problem (NLP) one obtains the exponential density
Pj = (I/@) ~ X P( - Y j / i ) ; Y j 2 0
(4)
This may be called an optimal density, since it maximizes entropy subject to the given prior
knowledge. Several interesting extensions of this approach may easily be conceived. First, prior
knowledge in the form of sample moments of orders higher than one may be assumed to be
known and in this case the DM can specify a sequential method of revising the optimal density
estimates. Secondly, the sample moment constraints are used only as summary statistics for
specifying inadequate prior knowledge. One may have to apply a criterion of goodness of fit
when several competing statistics seem plausible. Thus if several values ji(i) of ji seem
plausible, we have to apply the chi-squared test to determine which of the optimal exponential
densities fits the observed data best. Finally, once we have the prior densities optimally
determined by the ME principle, we may proceed to estimate the parameters of the production
frontier either by a parametric procedure (i.e. maximum likelihood method) or by a nonparametric method. Taking the non-parametric case and the first sample moment as the only
prior knowledge, we obtain the following transformation of the DEA model:
Min
Max d ( P , P )= ( P ' X - Y ' ) P + W P )
BEC(P) PEC(P)
where
This model has a number of interesting features. For example, it can easily be proved that if
y ) has all positive elements, then there must exist a vector point
the observed data set (X,
( P o , p o )that solves the maximum entropy model (5). Furthermore, if the probability vector
p is known or preassigned, the resulting model always yields an optimal solution of the
efficiency parameters that minimizes the sum of the absolute values of errors.
Note that Timmer's transformation which minimizes the objective function g = C?=I X i P i
is a special case of this entropy model ( 5 ) when it is assumed that the probabilities [ P j ] are
INFORMATION THEORY AND EFFICIENCY MEASUREMENT
95
known or preassigned. Secondly, if we assume the prior information to be such that ji lies in
the interval 0 Q a Q i Q b, then the entropy H(p) is maximized with respect to both pj and
ji and this density estimator pj possesses a number of desirable features. First of all, as
Akaike’ has shown, the standard maximum likelihood principle of estimation may be viewed
as a special case of maximizing the entropy, where entropy is defined by the log likelihood
function. Secondly, the density estimator (pj]belongs to the class of non-parametric estimates
increasingly applied in modem times. lo Note however that the prior information need not be
specified by the first moment condition alone. For instance, a second moment condition may
be preassigned by including in the constraint set C ( p ) the following condition:
5 pj(p’xj-
qj
- /i)*
= s^*
j=l
Thus, other distributions like the normal, gamma, etc. can be derived. For any specific
empirical case, however, one has to choose between the alternative densities for the error
q j = D’Xj- y, and this choice problem can be resolved by means of a chi-squared test of
goodness of fit.
3. MEASURING DYNAMIC EFFICIENCY
The concept of productive efficiency used by Farrell and the DEA models does not make a
distinction between the current and the capital inputs, and in this sense this is static since the
production function does not introduce any dynamic considerations either through time lags
between the inputs and output or the presence of capital inputs which may generate output
beyond the current period. One general way to incorporate the dynamic elements into a static
frontier is to introduce time lags between the capital inputs and the output as:
m
Yt=Po+
C
i=l
r
pixi+
C Yizt-i-ct;
is0
where Xi is the current input and z f - i is a proxy for various dynamic inputs. One could, of
course, replace the proxy variable by a vector of dynamic inputs such as capital of different
types (i.e. its rates of utilization) or different vintages, provided such data are available. The
proxy variable can also be interpreted in terms of the theory of adjustment cost, where it is
postulated that the firms tend to adjust to a long run dynamic production frontier.
Two types of empirical problems arise in estimating the dynamic production frontier above.
The first problem is one of optimally determining the order r of the maximum lag associated
with the dynamic inputs t - i . The second is the case when the input coefficients y i follow an
adjustment process of a distributed lag model, where the marginal impact of lagged capital
inputs declines over time so that the distant inputs have negligible effects on the current output.
In both cases one could apply the maximum entropy principle to determine the optimal lag ro
say and the optimal lag distribution. The optimal order of the lag in the first case may be
determined by maximizing the correlation determinant associated with the problem, where the
latter is related to the entropy concept. In the second case, one may rewrite the model as
m
yt = 00
+ i2
=l
pixi
+ i=O p i ( y ) z t - i
-4
where pi is the probability density of the lag distribution. If we assume that the mean lag is
ji and it is given by prior information with a range (0, a),then the maximum entropy principle
96
J . K. SENGUPTA
yields the optimal density as
This has the property that the more distant the lag, the less important is its influence on output.
The optimal value ro of r, the maximum time lag can also be determined by the ME principle,
since any value higher than ro would not increase the value of entropy. Sengupta” has
discussed elsewhere several applications of the ME principle in recent economic models.
4. PREDICTIVE USE OF INFORMATION THEORY
We now consider the use of information theory in terms of its predictive power and how it
can be incorporated in the multiple output case of a DEA model.
In case of multiple outputs, the specification and estimation of the production frontier raise
additional problems not considered by the current econometric applications. If several outputs
are combined into a singIe weighted output as in the theory of canonical correlation, the choice
of weights becomes an important issue. We consider here a prediction criterion and show how
this correlation-based criterion can be used on the basis of the entropy principle.
Consider the case where each unit j E In has s outputs ( y j h ) and rn inputs ( X j i ) . Then we
would test for each point k E I, if it is efficient by solving the following L P model:
7
rn
Min
a,P
gk
=
Xkipi
i=l
Clearly if the elements OLh were given or known, then the composite output y; = C I Y j h a h
can be used as a single output and our earlier discussions can be readily applied. However, the
weights a = ( a h ) for defining the composite output are not generally known, although prior
information may sometimes be subjectively available. The standard DEA model (6a)
determines the optimal weights (a,p ) without giving any consideration to the correlation
between the composite input xi’ = x j p and the composite output yf =ria. This is in sharp
contrast with the regression approach to the response function, where these weights are so
determined as to maximize the correlation R‘ between x,C,yj’;jE I n :
R C =( a ! ’ V ~ ~ ) 1 ( a f V y ~ a ) ( ~ ’ ~ ~ P ~ 1 ~ ” *
(6b)
where Vpq denotes the variance-covariance matrix of the two vectors p and q. It is clear that
the DEA model would improve considerably in terms of predictive power if this correlation
measure R Ccould be incorporated. Note that the equality relation c h YkhOLh = 1 in (6a) is used
as a normalization condition. If this relation is dropped then the objective function should be
reformulated as
Min
gk
=
XkiPi
i=l
-
2
h- I
YjhClh f P O
2
- 010 = i =O
XkiPi
- h=O
f:
Yjhah
INFORMATION THEORY AND EFFICIENCY MEASUREMENT
97
where 00,OLO are intercept terms that are also incorporated in the constraints for each fixed j .
Here the observed data set D = ( X j , y j ; j E Z,,) comprises input ( X j ) and output ( y j ) vectors of
dimensions m + 1 and s + 1 , respectively, for each unit j = 1,2, ...,n. Let OL * ( k ) , P * ( k ) be the
optimal solutions of the extended model. Then the unit or firm k is efficient by the DEA
efficiency criterion if P*'(k)xk - a*'(k)yk= 0 and the solution vector is non-degenerate. The
latter condition is needed for uniqueness. By varying k in the index with I,,= [ 1,2, ...,n) and
solving n LP models of the form (6a), all the efficient units can be determined. Let DI be the
subset of the entire data set D which contains efficient points only. For all j E D1,define two
random variables y' = O L ' ~and x' = 0'2 as the composite output and the composite input,
respectively. Now we define a measure of mutual information in the random variable y'
relative to the other variable x' by
+
Z ( y C ,x') = H ( y C ) H ( x ' ) - H ( y Cx, ' )
(7)
where H ( . ) is the entropy defined before. Thus, if the probability densities of yc,xc and
(y', x ' ) are denoted by f ( y c ) , f ( x c ) and f ( y ' , x ) and assumed to be continuous, then the
mutual information statistic Z(yc,x ' ) can also be expressed in terms of the conditional entropy
of y' given x':
Z ( y C , x C=) H ( y C )- H ( y CI x ' )
(8)
where
1
m
W Y C )= E [- In f(u"
H ( y C1 x ' ) = E [ - In f(y'
=-
-m
f0')
In f(r') dYC
I x')]
Now suppose ( y ' , x ' ) have a joint bivariate normal density with a correlation coefficient p ;
then the information Z(yc 1 x ' ) about y c contained in the random variable X' can easily be
computed from (8) as
- (&>In (1 - p ' )
(9)
It is clear from (8) that if y' and x' are statistically independent, then Z ( y c , x c )= 0. Thus
Parzen" has shown that the maximum likelihood estimator of f ( y ' , x ' ) , i.e.
z ( y C , x c=
)
f ( y ' , x ' ) = - ($1 In ( I
- 6')
where 6 is the sample correlation coefficient, can be used to test the statistical independence
of y c and x'. We may employ this statistic in our framework in two different ways. One is
to choose the vectors a and 0so as to maximize the mutual information Z(y', x c )between the
composite output and the composite input. This is equivalent to maximizing their squared
correlation (I?')' defined in (6b). In the case of a single output, this amounts to choosing the
parameter vector 0 so as to maximize the predictive power of x' in explaining the variations
in output. Note that this result would hold asymptotically if the variables ( y , x ' ) are not
normal but tend to joint normality only asymptotically. Furthermore, prior information may
be allowed as additional constraints before one maximizes the information statistic Z(y, x').
Sengupta l 3 has discussed several types of transformation which may convert the bivariate nonnormal density to approximate normality.
A second way to incorporate the predictive power is to reformulate the multiple output
98
J. K. SENGUPTA
model (6a) so that the objective function reflects the maximum of mutual information
f ( y Cx‘).
, This yields the following model:
MU z = c Y ’ J - p ’ X + I ( y C , x c )
a,B
s.t.
xp - Ya 2 0
where (X,J)are the mean input and output vectors and the last two equalities specify the
normalization conditions. This generalized model has a number of flexible feature. First of all,
if the composite variables are normally distributed, then the objective function can be
reformulated as
MU z = (~’7
- p’X
+ 01‘Vmfl
where V, is the covariance matrix of the input and output vectors. This model is no longer
linear and hence its optimal solutions (a*, p *) are more diversified than the LP solutions used
in the DEA model. Secondly, since the errors &j are additive in the formulation
one can apply a simple transformation to obtain a least-squares-like formulation as
a ’ y j = p’xj
+ Po +
~
j
;Po
= -/L, Uj = /L - ~j
where p is the unknown mean of the error term € j and the new disturbance term U j satisfies
most of the least squares conditions, e.g. zero mean and constant variance. Note that if
approximate normality holds, then this method can be improved further by utilizing any prior
information about p, e.g. it belongs to the interval a Q p Q b where a, b are fixed constants,
e.g. truncated normal. Clearly the information theory approach can handle such a prior
information by adjoining additional constraints to the generalized model (10). Finally, one can
evaluate alternative ways of aggregating outputs into a composite variable. For instance one
may preassign equal weights ffh = 1 for all h so that y c is merely the sum of all types of output
and then compare it with an optimal composite output y*‘ where the optimal weights
a* = (a:) are used. Clearly, the predictive power of the optimal weight model would be much
higher, since this objective is already built into the model.
5 . EMPIRICAL APPLICATIONS
Our empirical applications consider input-output data, all in logarithmic units, for selected
public elementary school districts in California for the years 1976-77 and 1977-78 over which
separate regression estimates of educational production functions were made in earlier studies
by the present author. Statistics of enrollment, average teacher salary, and standardized test
scores are all obtained from the published official statistics. Out of a larger set of 35 school
districts we selected 25 in three contiguous counties of Santa Barbara, Ventura and San Luis
Obispo on the basis of separate homogeneity tests based on the Goldfeld-Quandt statistic.
Four output variable were the standardized test scores in reading (yl ), writing ( y 2 ) , spelling
(y3) and mathematics (y4). Two measures of aggregate output can then be defined. One is the
composite output y c = a’y based on the optimal weights and the other is the average output
INFORMATION THEORY AND EFFICIENCY MEASUREMENT
99
j i = (yl +y2 + y3 +y4)/4 with equal weight on each output. As input variables we had a choice
of eight variables of which the following four were utilized in our LP estimates: XI, the average
instructional expenditure; x2, the proportion of minority enrollment; x3, the average class size;
and xq, a proxy variable representing the parental socioeconomic background. Again, one can
define a composite input x c = P’x, and an average input, though the latter may not be very
meaningful owing to the diverse nature of the four inputs.
In our previous studies, l4 we applied the DEA model for the multiple output and multiple
input case, but since the results are not much different from those of the single composite
output case we report here only the latter results. Also, the input-output data set may be
divided into two groups according as the LP solution of the DEA model are degenerate or nondegenerate. The first sample group with nl = 16 contains only the non-degenerate cases, while
the remaining samples (n2 = 9) comprise the degenerate cases where some of the parameter
estimates are zero.
On the basis of these statistical data three types of illustrative applications are made. One
is the application of the DEA model (1) by which the data set of 25 school districts is
decomposed into a subset D I of efficient units containing 35% of the districts and the
remaining subset D2 containing 65% of the districts. The overall frequency distribution of the
efficiency ratio ej defined as ej = 1 - (yj/yf) where yf is the efficient output for j appears as
shown in Table I. Normality testing by the Shapiro-Wilk statistic strongly rejected the null
hypothesis that the underlying distribution is normal. The exponential density derived under
the ME principle with the first moment restriction is then tested by the Kolmogorov-Smirnov
statistic and it was not rejected at the 5% level of significance.
The second application compares the multiple output situation in three cases in terms of the
mean square error (MSE) of prediction. The first case uses averages of the four observed
outputs (9); the second uses the composite output (US)from the mean LP model when the
terms Z ( y c , x c ) ,P‘V,,P and a’VYyare dropped; and finally the third case uses composite
output (yR) which maximizes the sample correlation between the inputs and outputs. The
results are given in Table 11.
It is clear that by incorporating the measure of mutual information defined in (lo), the
predictive power of the DEA model improves considerably. Since the composite output yR
may be interpreted as a canonical variate, one may conclude that the use of the canonical
variates that maximize the correlation between the inputs and the outputs helps improve the
Table I
~
Domain of e,
Frequency (Yo)
~
~~~
0-0-04
36
0.05-0.07
20
0.08-0.11
28
0.12-0-15
4
Table 11. MSE over LP solutions
Non-degenerate
output
measure
cases
nl = 16
I
0.141
Yb
0-008
YCR
0.002
Degenerate
cases
Total
nz = 9
n = 25
0.172
0.045
0 027
0.164
0-032
0-011
>0*15
12
100
J.
K. SENGUPTA
predictive power of the DEA model, as it does in a multiple regression model. This is observed
more strongly when the data points are suitably clustered around cone domains of the mean,
e.g. for data point close to the mean or the median LP model the predictive power is observed
to be the highest.
It is of some interest here to report briefly some previous calculations of canonical
correlation coefficients of different orders, analysed elsewhere, l4 that confirm that the twostage procedure of estimation of the production frontier considerably improves the goodness
of fit of the estimated model. Denoting the composite output by ~ C L )for the kth canonical
correlation k = 1,2,3,4 and the squared correlation coefficient by r f k ) the results are given in
Table 111.
It is clear that the DEA model outperforms the ordinary canonical approach in terms of the
predictive r 2 criterion. Thus, if the efficient units are Erst identified and screened and then the
ordinary canonical correlation applied, it tends to improve the predictive power. Secondly,
the first two squared canonical correlations rfl),rfz, are found to be statistically significant at
the 5% level in term of Fisher's z-statistic. By the same test, the third and fourth-order
canonical correlations are not significant. This raises a strong case for ignoring the third- and
fourth-order canonical variates in our composite output transformation applied to the DEA
models.
Finally we consider the third application, which considers time series data for the 10 years
1976-86 to evaluate the marginal impact of one of the most important inputs: ~3~ = zf denoting
the average class size. Over the years the changes in class size have reflected the rising trend
of minority enrollment and the declining trend of budget allocations for school districts. The
regression estimates based on the subset D Iof efficient unit for each year yield the results given
in Table IV for the production frontier
3
4
yt=
PO
+
C
i=l
Pixi
+ iC
=l
yit-i
over the period 1976-86.
Table 111
k
Method
A.
B.
Ordinary canonical
approach: rfk)
Two-stage DEA
approach: r t k ,
1
2
3
4
0-83
0.50
0-20
O.OOO4
0.70
0.49
0.09
0.01
Table IV
~
BO
-1 1 *4*
(-9.1)
P1
-0.11*
(-2.5)
t-Values are in parentheses.
02
03
4-9**
(10.2)
5*3**
(10.2)
a
-1.6**
(-10.3)
Yl
YZ
-0.01
(0.85)
0.02
(0.21)
Y3
0.004
(0-1)
R2
0.96
INFORMATION THEORY AND EFFICIENCY MEASUREMENT
101
When we include all the data containing both the efficient and inefficient units, the class size
coefficients b3 and yl,yz,y3 tend to be reduced considerably with 63 turning out to be
statistically insignificant at the 5% level of the t-test.
Also, the maximum lag turned out to be ro = 2.0, suggesting that this input may not have
played a critical role in a dynamic sense. The optimal lag distribution determined by the ME
principle shows the following geometric form:
~ i ( y ) = ( 0 ' 1 4 ) ( 0 * 8 6 ) ~ i, = O , 1,2, ...
This suggests that the regression coefficient of 0.86 associated with the static model, which
ignores the time effects, probably overestimates the marginal impact of the class size on student
performance.
6. CONCLUDING REMARKS
Our methods of modelling the maximum entropy approach in the current theory of estimation
of production frontiers have emphasized here three most important aspects of data-based
modelling of economic systems. First, this approach seeks to utilize the observed data
characteristics in the form of mean and other higher moments to estimate a best approximation
of the true distribution underlying the data. Then it incorporates the distribution of the data
set in specifying a two-stage process of estimation. The first stage determines the distribution
most consistent with the observed data and the decision maker's prior knowledge about them,
while the second stage uses the distribution to estimate the parameters of the production
frontier. For example, if the observed data is generated by a gamma distribution, as specified
by the maximum entropy approach, then the maximum likelihood method can be applied at
the second stage to estimate the parameters of the gamma model. Secondly, by relating this
approach to the method of moments and the case of canonical correlation for multiple outputs
we have attempted to show its generality and usefulness in situations where significant
departures from normality are suspected. Thus it provides a more general framework which
is alternative to traditional least squares. Finally, the maximum entropy principle can
obviously be applied in other areas of operation research such as demand and cost studies, and
reliability and replacement models, although we have applied it in the context of a production
frontier. Modelling complex systems that have subsystems provides a very natural framework
for applying the maximum entropy approach, as various applications Is in image processing
and game theory models show.
ACKNOWLEDGEMENT
The author would like to thank an anonymous referee for helpful suggestions.
REFERENCES
1 . L. M. Seiford and R. M. Thrall, 'Recent developments in DEA: the mathematical programming approach to
frontier analysis', Journal of Economerrics, 46(1), 7-38 (1990).
2. M. J. Farrell, 'The measurement of productive efficiency', Journal of fhe Royal SfutbficalSociety, Series A, 120,
253-290 (1957).
3. A. Charnes. W. W. Cooper and E. Rhodes, 'Measuring the efficiency of decision-making units', European
Journal of Operational Research, 2, 429-444 (1978).
102
J. K. SENGUPTA
a probabilistic frontier production function to measure technical efficiency’, Journa/ of
Political Economy, 5 , 776-794 (1971).
5. L. Johansen, Production Functions, North-Holland, Amsterdam, 1972.
6. P. C. Fishburn, Decision and Value Theory, Wiley, New York, 1964.
7. Z. W. Kmietowicz and A. 0.Pearman, Decision Theory and Incomplete Knowledge. Gower, Aldershot. U.K.,
4. C. P. Timmer, ‘Using
1981.
8. R. J. Kopp and J. Mullahy, ‘Moment-based estimation and testing of stochastic frontier models’, Journal of
Econometrics, 46, 165-184 (1990).
9. H. Akaike, ‘Onentropy maximization principle’, in Applications ofstatistics, North-Holland, Amsterdam, 1977.
10. R. L. Eubank, Spline Smoothing and Nonparametric Regression, Marcel Dekker, New York, 1988.
11. J. K. Sengupta, ‘Maximum entropy in applied econometric research’, International Journal of Systems Science,
22, 1941-1951 (1991).
12. E. Parzen, ‘Time series model identification by estimating information’, in Studies in Econometrics, Time Series
and Multivariate Statistics, Academic Press, New York, 1983.
13. J. K. Sengupta, ‘Transformations in stochastic DEA models’, Journal of Econometrics, 46, 109-124 (1990).
14. J. K. Sengupta, ‘Data envelopment with maximum correlation’, International Journal of Systems Science, 20,
2085-2093 (1989).
15. J. Skilling and R. K. Bryan, ‘Maximum entropy image construction’, Journal of the Royal Astronomical Society,
211, 111-124 (1984).
Документ
Категория
Без категории
Просмотров
2
Размер файла
724 Кб
Теги
3150100204, asm
1/--страниц
Пожаловаться на содержимое документа