Estimating Expenditure Distribution in Indonesia: A Bayesian Approach 1996-2008

2. Statement of the Research Problem 2.1. Introduction

Distribution of income across individuals and households has always been a main concern of many governments who have experienced positive economic developments like Indonesia. Specifically, income distribution facilitates derivation of main development indicators namely, poverty and inequality indices. An accurate measure of these key indicators will allow policy makers to better assess the impact of various policies being initiated over time on poverty and inequality magnitude. Obviously, an appropriate estimation of income distribution plays a major role as an improper model valuation would in contrast imply inaccurate measures of the key indicator, and may lead to unsuitable economic policy. Moreover, development and changes in Indonesia over the last ten years have made it ultimately important to evaluate the movement of income distribution. For these points, research on income distribution in Indonesia is fundamental as it helps creating more accurate poverty and inequality measures, and so has become a prime concern for the policy makers. As a diverse nation geographically and economically, Indonesia has experienced rapid economic growth over the last three decades. The economy in President Suharto's government (1966-1998) developed from a per capita GDP of $70 in 1967 to more than $1,000 by 1996. As the annual real GDP growth averaged nearly 7% from 1987-1997, the country was also renowned as a newly industrialized economy and emerging market by most analyst. This quick growth was also followed by fast poverty reduction. Just prior to the regional financial problems in 1997, number of people below the poverty line dropped to below 1/8 of the population, from around 2/3 of the population in 1967. The Asian financial crisis which began to affect Indonesia in late 1997 has quickly generated a major transformation in Indonesia's economy. The financial crisis has caused a huge economic contraction and a significant decline in public spending. In 1998, real GDP contracted by an estimated 13.7 percent with inflation reached 77 percent. As a result, debt and subsidies noticeably increased, while development expense was sharply reduced. This economic instability has affected much of the country, in the form of increased prices for staple foods and goods, and lowered standards of living and quality of life. It is also believed that the effects of this economic hardship on the poor were more severe (Friedman & Levinsohn, 2002; Suharyadi & Sumarto, 2003). Accordingly, the crisis has also triggered reformation in the country. The reformation in Indonesia is the name commonly used for the post 1998 era in the history of Indonesia. After the revolution of 1998, Presiden Soeharto who governed the new order period for three decades resigned and thus the political and social climate in Indonesia reforms to be more open and liberal. Despite its more positive economic outlook in few years after the reformation, Indonesia continued to experience natural disaster and political reformation which turned out to be the country most persistent development problem. Additionally, as a result of international oil prices rises and imports in late 2005, the country met another crisis which then required the government to reduce fuel subsidies. Because the price of consumer fuels grew to more than double, the economy then once more confront with inflation at double digits. However, the economic growth still increases positively to 5.7, 5.5, 6.3 and 6.1 percent in 2005, 2006, 2007 and 2008 respectively. While other countries may experience negative growth in their economy due to the impact of global financial crisis which occurs in the last quarter of 2008, Indonesia economic performance is expected to still grow at average level of 4.5 percent throughout 2009. With regards to development and changes formerly described, many income distribution studies on Indonesia were conducted. Starting from the year of 1996, the researches mainly focused on the transformation of income distribution over various time periods as an impact of Asian financial crisis in 1997. Most of the studies approached the income distribution using per capita expenditure data collected from the National Socio-Economic Survey (Susenas) and utilized different inference method in classical reference. In general, those reported studies agreed that during 1996-1999 there has been a substantial shift in expenditure distribution, especially at the end part where the upper end narrowing and the lower end expanding (Beegle, Frankenberg, & Thomas, 1999; Skoufias, 2001). Furthermore, the studies concluded that during the crisis period, income inequality represented by the inequality index has considerably fallen whereas poverty rate calculated has notably risen. The research also confirmed that before and after the financial crisis the inequality has been increasing where urban areas has experienced higher inequality than rural areas, increasing disparity between those two areas (L. Cameron, 2002; Kadarmanto & Kamiya, 2005; Skoufias, 2001; Suharyadi & Sumarto, 2003). Unfortunately, nearly all of the earlier studies only reported descriptive measure for the magnitude of poverty and inequality to explain the change in income distribution over time. Particularly, not enough research has been done to picture movements in the entire shape of the income distribution over much longer period of time following the crisis. Cowell, Jenkins and Litchfield (1994) argue that giving a picture of the shape of income distribution will allow us to directly examine changes in three main distributional features all together. These three attributes include locality of income levels, spread of income or income inequality and income concentration or modality. A recent work by Sakamoto (2007) has been attempted to address this issue by using the kernel density estimation. However, he does not examine inequality and poverty as he aims to investigate how the distribution comes together in the long run. Commonly, income distribution has been investigated with parametric models properly selected to describe the distribution accurately. Parametric families with uni-modal and right skewed features are usually employed in fitting income distributions. The majority of previous parametric income distribution studies have used classical inference methods to estimate and evaluate the suggested functional form. These practices include method of moments, maximum likelihood and least squares techniques (Bandourian, McDonald, & Turley, 2003; J. B. McDonald, 1984; J.B. McDonald & Jensen, 1979; J.B. McDonald & Ransom, 1979; Singh & Maddala, 1976). In the context of three-parameter functions, the Singh-Maddala and the Dagum distributions have been shown to perform well in different situation using different data sets (Majumder & Chakravarty, 1990; J.B. McDonald & Ransom, 1979; Singh & Maddala, 1976). Furthermore, they have also been proven to be working about at an equivalent level as the more complex functional form namely, generalized beta of the first kind (GB1) and the second kind (GB2) (Bandourian, et al., 2003; J. B. McDonald, 1984; J.B. McDonald & Mantrala, 1995). Recent literature in income distribution researches has shown growing application of Bayesian inference for the parametric model. Griffiths, Chotikapanich and Rao (2005) for example, attempt to clearly incorporate model uncertainty over competing models into the inference and specifically, estimate posterior densities of some economic quantities of interest in a Bayesian way. Chotikapanich and Griffiths (2006) extend and perform Bayesian analysis of inference to compare two income distributions with regard to Lorenz and stochastic dominance. Using posterior probability, they investigate whether or not one distribution dominates the other. Bayesian technique of inference offers a great promising practice in analysing the income distribution transformation accurately, in a much more desirable way. In Bayesian technique of inference, we do not only get a compact summary measures, but also gain ability to express uncertainty about the measures in terms of a probability distribution. Bayesian inference updates the related probability statement before sampling (prior density) with the sample information to form probability statement about uncertainty after sampling (posterior density). The resulted posterior density for the parameters of the income distribution model could then be used to draw inferences about quantities of interest such as inequality and poverty measures as functions of the parameter estimates. Accordingly, one can then easily depicts distributional plots to learn changes in the level, spread and concentration of the distributions simultaneously, and summarize the whole distribution compactly. The Bayesian approach also allows for evaluation of more than two proposed models through contrasting posterior model probabilities. It is then worth to perform Bayesian inference to income distribution in Indonesia as we could get much benefit of the Bayesian approach. Moreover, there are still not many works on this area and even none of the case is of Indonesia.

2.2. Research Question

This paper aims to track the transformation of the expenditure distribution in Indonesia over an analytical period of 1996 - 2008, characterized by the worst financial crisis in 1997 as well as economic and institutional reforms. The evolution of the whole shape of the distribution in term of location, modality and spread is analysed using Bayesian inference technique. Two candidate models namely, the Singh Maddala and the Dagum distributions are proposed as the functional form and their performances are investigated. To give a clearer picture of the impact of the shifts in the distribution, we use the expression of mean and mode of distribution along with the Gini coefficient, headcount ratio, poverty gap index and poverty severity index to quantify the inequality and poverty measures. Subsequently, the best model is selected by contrasting the corresponding posterior model probabilities. The procedure that we describe is applied to per adult equivalent expenditure data obtained from the National Socio-Economic Survey (Susenas) over the period of 1996 to 2008. In Section 3, country profile, data and adjustment that have been made are explained in more detail. An overview of income distribution, inequality and poverty measures is given in Section 4. Section 5 describes the Bayesian methodology, along with the detail of prior specification, posterior probability distribution and posterior model probabilities. Empirical results from applying the method to Susenas consumption expenditure Data 1996-2008 are reported and discussed in Section 6, before concluded in Section 7.

3. Data

This section starts with providing the background of the country in terms of its population, economy and social indicators. Then, the survey is introduced and the data limitation is explained. Price adjustment and the use of equivalence scales are discussed in the two final sub sections while the features of the adjusted data are described at the end.

3.1. Country Profile

Indonesia is an archipelago nation located in South East Asia between the Indian and Pacific Oceans. It has more than 17,000 islands with 6,000 of those permanently inhabited. The total land area is about 1.9 million square kilometres and in 2008, it had a population of about 240 million. Administratively, the Republic of Indonesia is divided into 33 provinces. The World Bank has classified it as a lower middle income country with per capita GDP of PPP amounting to US$3,979 in 2008. According to the United Nation Development Programme (UNDP, 2009), the Human Development Index (HDI) for Indonesia in 2007 is at 0.734 and ranked 111. The official s of some indicators related to poverty and inequality measures from 1996 to 2008 are shown in Table 3.1. The trend of inequality is generally increasing while the trend of poverty index is decreasing. Except for the period affected by the crisis in 1999, where the inequality decreases and the headcount index increases.

Table 3.1 Trend in Poverty and Inequality Related Indicators 1996-2008

In the present study, we approach the income distribution using the household expenditure data obtained from the National Socio-Economic Survey (Susenas). Susenas is a cross-sectional household survey for Indonesia which provides national coverage and available over an extensive time period. A part of Susenas is conducted annually collecting information on the characteristics of over 200,000 households and over 800,000 individuals. This part of Susenas is known as the core Susenas. Another part is conducted every 3 years, collecting information on very detailed consumption expenditures on food and non-food items from approximately 65,000 households. This part is popularly known as the consumption module Susenas. The dataset is created by merging the core and the module for 1996, 1999, 2002, 2005 and 2008. The created dataset has a combination of information on household consumption from the consumption module Susenas and household characteristics from the core Susenas. The analysed variable is the monthly household expenditure of food and non-food consumption. Naturally, the expenditure level may vary according to relative prices, demographic factors and preferences. In fact, the characteristic of urban and rural areas are very different in term of these aspects so, this environment will actually determine the wellbeing interpretation in Indonesia. For this reason, we analyse urban and rural areas separately. The total sample size is around 60,000 each year. The variation in different survey years are resulted from some limitation of the data. For example, in 2002 due to some political instability, the survey did not cover 4 provinces which reduce the sample size in national level. Some treatments to missing and extreme values in the merging of Susenas core and module datasets have also reduced the amount of data being processed.

3.3. Price Adjustment

In order to make the expenditure data set comparable across different survey years, the data were corrected for inflation using the consumer price index (CPI). CPI index for food and non-food groups reported by Badan Pusat Statistik (BPS) Statistics Indonesia was constructed for urban prices collected from 27 cities in 1996, 44 cities in 1999 and 45 cities in 2005 and 2008. Due to limitation in data availability, we used urban price indices as proxies for the changes in prices for the rural areas in each province. For urban and rural areas that were not covered in the CPI series, we approximate them by using the CPI values of the neighbourhood cities. In such locality, we expect to have quite similar characteristics in terms of the price index. In this paper, household expenditures were adjusted to real expenditure at 2002 prices.

3.4. Equivalence Scale

The household expenditure is also required to be adjusted for the demographic differences to incorporate such adult-child variation in household composition as well as positive economies of scale as household size increases. Thus, instead of using per capita, we deflated the household expenditure by equivalence scale, which consist of adult equivalent scale and economies of scale. There is an extensive study regarding the equivalence scale practice. As setting the equivalence scale was not our main research question, we have opted to use the formulation used by Banks and Johnson (1994) and Jenkins and Cowell (1994), which is recommended by Deaton and Zaidi (2002) for the case of developing countries such as Indonesia. (3.1) where mi is the number of adult equivalents in household i while na,i and nc,i correspondingly denotes the number of adults and children in household i. Parameter f is the cost of a child relative to that of an adult while parameter l represents the economies of scale in the costs of equivalent adults. In this study, adults were defined as household members aged 15 years and over because age of 15 is used to begin the working ages. Further disaggregation of age groups and gender were not considered as the demographic factors. For the case of poor economies, Deaton and Zaidi (2002) suggest to set f to be as low as 0.3 and l close to 1. The recommended s are motivated by the fact that child costs in poor countries are relatively not expensive, and households in poor countries devote a larger share of their expenditure to food. So, there would be not much space for economies of scale. However, these s may not be applicable for the case of Indonesia as the expenditure shares on food has declined over the years, from about 0.65 in 1996 to around 0.50 in 2008. Moreover, the shift in expenditure shares would considerably affect the magnitude of child cost as well. When searching for the appropriate values for the economies of scale, l, and the size of children relative to adults, f, we started by fixing the bounds for the equivalence scales based on recent study by Lancaster and Ray (2002). The values of f and l were then verified using a simple generalization of the Engel methodology developed by Valenzuela (1996). In summary, we have arrived at the conclusion that f = 0.85 and l = 0.8 are the realistic values of the child cost and economies of scale for Indonesia. The characteristic of the size adjusted expenditure can be seen in Table 3.1 while the associated histograms are reported in s 3.1-3.5.

Table 3.2 Summary Statistics of Per Adult Equivalent Expenditure (Rp'000)

1996-2008 Region Statistics 1996 1999 2002 2005 2008 Urban Mean 370.9706 332.6298 432.3625 477.0349 454.0787 Median 292.0631 270.4114 337.8042 357.9471 355.2715 Minimum 34.0048 44.1204 57.6498 38.3182 59.8378 Maximum 9,388.0860 5,973.9160 24,902.6700 30,216.5100 13,181.9900 Std. Deviation 321.9729 249.3224 477.2953 511.1888 401.2329 Observation 23,875 25,175 29,279 24,687 26,648 Rural Mean 204.3220 199.2879 220.7423 236.2176 251.8224 Median 173.9769 175.1857 191.8688 198.9731 210.0325 Minimum 40.7979 38.2171 37.7086 24.6667 38.4507 Maximum 5,123.6520 6,286.8730 3,595.8790 4,165.3190 23,635.2300 Std. Deviation 139.4953 111.9059 126.0481 153.6882 223.5371 Observation 35,977 35,426 35,143 35,320 40,076 2 2 The mean, median and standard deviation in 1999 decreased to some extent compared to the 1996's s due to the impact of the Asian monetary crisis. Those statistics in general increased gradually after 1999, except for urban areas in 2008 where the s decreased slightly again. Urban areas were also found to have almost twice higher mean, median and standard deviation than that of the rural ones. The histograms generally show a typical uni-modal and right-skewness pattern for income distributions, with urban areas apparently having more skewed and dispersed distribution than the rural areas. 2 2 2 2 2 2 2 2

4. Income Distribution, Poverty and Inequality Measures

4.1. Parametric Income Distribution

The three parameter distribution considered in this paper are the Singh-Maddala and the Dagum models. The corresponding Singh-Maddala density which was introduced by Singh and Maddala (1976) is (4.1) where a, b, q > 0 The cumulative density function (cdf) of the Singh-Maddala distribution is obtainable in a definite form: (4.2) The mean and mode of the Singh-Maddala model can be expressed as, (4.3) (4.4) where a >1, so the mode will present. Dagum distribution is proposed as an income model by Camillo Dagum (1977). The density is (4.5) where a, b, p > 0 The cdf of the Dagum distribution is: (4.6) The expression for the mean and mode are as follows, (4.7) (4.8) where a > 1/p is necessary for the mode to be present.

4.2. The Gini Coefficient

There are various numerical indices that commonly applied in measuring inequality. The Gini coefficient is the most commonly used one in the literature. It is derived from a Lorenz curve which provides a visual way of measuring the degree of inequality. The curve illustrates the relationship between the cumulative proportion of population ranked according to increasing levels of income at the horizontal axis and the corresponding cumulative proportion of income at the vertical axis. The income share of any selected cumulative proportion of the population can subsequently be determined from the graph so for instance, the bottom x per cent of population would gain y percent of the total income ( 4.1). If incomes are equally distributed within the entire population so every individual retains an identical proportion of income, then inequality is at a lowest level. Accordingly, the Lorenz curve would be the 45-degree line which is also named as the line of perfect equality. Conversely, if all of incomes are hold by one member of the population and everyone else obtains zero income, then the inequality is at a highest level, or there is complete inequality (Kakwani, 1980). Therefore, the further the Lorenz curve from the diagonal line, the more uneven the distribution of income is.

Gini coefficient

Before discussing the Gini coefficient, it is necessary to understand the Lorenz curve which gives visual representation of the extent of inequality. Developed by Max Otto Lorenz in 1905, the Lorenz curve relates the cumulative proportion of population ranked according to increasing levels of income at the horizontal axis to the corresponding cumulative proportion of income at the vertical axis (Kakwani, 1980). The graph then shows the income share of any selected cumulative proportion of the population ( 2.1). In other words, the bottom x per cent of population would have y percent of income. If all persons hold the same percentage of income, meaning that incomes were distributed evenly across the whole population, then inequality is at a minimum and the Lorenz curve would be the 45-degree line, which is called the line of perfect equality. If one member of the population holds all of incomes and every one else has zero income, then the inequality is at a maximum level, or there is complete inequality (Kakwani, 1980). The Gini coefficient is measured by taking the ratio of the area between the actual Lorenz curve and the perfect equality line to the total area under the diagonal ( 2.1). As a result, the Gini index inclusively has a range of between 0 and 1. When there is perfect equality, the shaded area will have zero quantity so the Gini coefficient will also be zero. In contrast, when there is complete inequality, the shaded area will overlap with the triangle so the Gini coefficient is equal to 1 (Kakwani, 1980). Consequently, the closest to zero the Gini coefficient, the more equally distributed the income will be. Numerous mathematical expressions have been suggested for the Gini coefficient. With some manipulation, the Gini index can be written as (4.10) (Duangkamon Chotikapanich, 1994) where pi is the cumulative proportion of units that receive income up to i and hi is the cumulative proportion of total income received by the same unit. For the Singh-Maddala distribution, the Gini coefficient can be expressed as (4.11) where G(.) denote the gamma function (J. B. McDonald, 1984). Dagum (1977) shows that the Gini coefficient corresponding to the Dagum distribution is represented as (4.12) The Gini coefficient allows for direct comparison between units with different size populations. Additionally, it is also found to be relatively insensitive to the extremely low income that can be reported, compared to other inequality measures (Trewin, 2006). 4.3. Poverty Indices Poverty measure quantifies degree of deprivation in the population. In this paper, three poverty measures are used to measure the incident of poverty in Indonesia. They are headcount index, poverty gap index and poverty severity index. These measures can be shown as a member of the class of poverty measures proposed by Foster, Greer and Thorbecke (FGT index) (Foster, Greer, & Thorbecke, 1984). (4.13) where z is the poverty line, H is the number of poor with incomes below z, xi are individual incomes, N is the number of people in an economy and a is a sensitivity parameter. When a is 0, the FGT index is simply corresponds to the headcount index, given by the proportion of the population with standard of living below the poverty line. Although the headcount ratio is easy to interpret, it does not indicate the degree of relative poverty among the poor income recipient. Thus, it is hard to differentiate between the poorest to the less poor. The headcount index is also known to be insensitive to the changes in the distribution among the poor because when a poor person becomes poorer, the headcount index is unaffected. The FGT index becomes the average poverty gap in the population when a equal to 1. The poverty gap measures the depth of poverty. It evaluates the mean aggregate income or consumption shortfall relative to the poverty line across the whole population. Based on this measure we can estimate the total resources needed to bring all the poor right up to the poverty line. The FGT poverty measure of a equal to 2 is coincided to the poverty severity measure. The index combines information on the incidence of poverty, the depth of poverty and income inequality among the poor, which is very useful when the policy aim is to eradicate extreme poverty. The Gini coefficient, developed by an Italian Statistician Corrado Gini in 1912, benefits from an intuitive geometric interpretation in the form of the Lorenz curve and provides quantitative measures for inequality. The Gini coefficient is obtained by taking the ratio of the area between the actual Lorenz curve and the diagonal (equality line) to the total area under the diagonal ( 2.1). Thus it has a range of between 0 and 1 inclusively. When there is perfect equality, the shaded area will have zero measure so that the Gini coefficient will be zero. Conversely, when there is complete inequality, the shaded area will coincide with the triangle so that the Gini coefficient will equal 1 (Kakwani, 1980). Hence, the smaller the Gini coefficient the more even the distribution of income will be. The Gini coefficient compares a person's income to another person's income. Hence, it is a function of differences between every pair of individual income. A large number of mathematical expressions have been proposed for the Gini coefficient. Mathematically, the Gini coefficient can be defined as half of the arithmetic average of the sum of the absolute differences between all pairs of incomes in a population, normalized to mean income .where n is the total number of person, ?(y) is the mean income, yi and yj are person i and j income respectively. For the Singh-Maddala and beta-2 distribution, McDonald (1984) expressed the Gini coefficient as and where B(.) and G(.) denote the beta and gamma function respectively. Dagum (1977) demonstrated that the Gini coefficient for the Dagum distribution could be represented as Gini coefficient is the most widely used indicator of inequality. It is not only because it can be easily understood through the graphical interpretation of the Lorenz curve, but it also allows for direct comparison between units with different size populations. The value of Gini will represent the expected difference in incomes of two individuals or households randomly selected from the population as a whole (World Bank Institute, 2005). Moreover, the Gini coefficient is also not overly sensitive to the extremely low income that can be reported, compared to other inequality measures (Trewin, 2006).

Headcount ratio

The most critical point in measuring poverty is the determination of the poverty line, which largely establishes the poverty measure. Poverty line is defined as level of income or expenditure which is sufficient to obtain the minimum necessities of life including both food and non-food items. A person is considered to be poor if his or her income falls below the poverty line. Once poverty threshold is specified, various types of poverty measures can be calculated. In this study, we used the Indonesian official poverty line published by BPS Statistics Indonesia.

5. Bayesian Methodology 5.1. Bayesian Approach

Principally, Bayesian method assumes probability as a subjective approach to uncertainty. For that reason, under Bayesian framework probability distribution on a parameter is utilized to specify uncertainties about the parameter's true value. Differ from the classical approach, the inference about parameter vector ? is made in terms of probability statements which are conditional on the observed value of x, p(?|x). That is, inference about ? is drawn based on probability density function (pdf) of ? after seeing the data. This density is called posterior density. In Bayesian concept, the posterior density, p(?|x), is derived from the joint distribution of likelihood function, f(x|?), and prior density, p(?), according to Bayes' rule (5.1) where . Since p(?|x) is a density function of ?, the f(x) term which does not depend on ? could then be regarded as a constant. Thus, the Bayes' theorem can be written as (5.2) So the posterior distribution, p(?|x), combines the information in the likelihood function, f(x|?), with that of the prior distribution, p(?). For the income distribution model considered in this paper, f(x|?) is defined as either (4.1) for the case of the Singh-Maddala, or (4.5) for the case of the Dagum distribution. The ? is defined as (a, b, q)' for the Singh-Maddala or (a, b, p)' for the Dagum distribution. 5.2. Prior Specification The prior distribution, p(?) summarizes information about ? and represents how likely different values of ? are, before seeing the data. Specification of suitable prior distribution for parameters of selected model is the most important part of Bayesian analysis that differentiates it from the classical analysis. In the case of the parameters ? = (a, b, q) for the Singh-Maddala or ? = (a, b, p) for the Dagum distribution, it is difficult to formulate the prior distribution for these parameters conceptually. Griffiths, et al. (2005) demonstrate the feasibility of using prior information on economic quantities of interest such as the mean, the mode and the Gini coefficient to define the prior information for the parameters of the income distributions. We will follow their procedure in this paper. Let ? be a k-dimensional vector of the unknown parameters of the income distribution, where k = 3 for our case. Also, let ? be a k-dimensional vector of the quantities of interest. We will assume that ? contains the mean ?, mode mo and the Gini coefficient g. Then the prior density function for the parameters of the income distribution, p(?) can be expressed as (5.3) where p(d) is the joint prior density on quantities of interest and is the Jacobian term. The Jacobian technique requires a symmetric matrix to operate. Thus, if we specify three parameter functions for the income distribution, three quantities of interest are also needed to be identified. The expressions for derivatives used in computing the Jacobian term , reproduced from Griffiths, et al. (2005), is provided in the Appendix. The joint prior density on quantities of interest is written as , mo<µ (5.4) The restriction that the mode is less than the mean is required for the case of income distribution, where it is normally skewed to the right. With this restriction, a constant value, c, needs to be added in (5.4) to make the joint pdf integrate to one. In this paper, gamma distributions are chosen for the prior pdfs for ? and mo and a beta distribution is chosen for g. Their expressions are (5.5) (5.6) (5.7) The next step is to assign values for the parameters of these distributions to complete the specifications of the prior pdfs. We attempt to select the parameter values so that they produce the prior information for ?, mo and g that are consistent with general expectation and relatively non informative. Selected parameter values are summarized in the Table 5.1. These parameter values produce prior pdfs with wide ranges of the 80% and 95% probability intervals. So, the prior information covers the extensive potential parameter values and becomes uninformative.

Table 5.1. Prior Density functions and the Associated Parameter Values

Prior density Region 1996 1999 2002 2005 2008 a b a b a b a b a b Mean Expenditure Urban 350 1.06 300 1.11 400 1.08 450 1.06 450 1.01 Rural 200 1.02 190 1.05 210 1.05 230 1.03 250 1.01 Modal Expenditure Urban 170 1.03 190 1.02 190 1.04 270 1.01 195 1.03 Rural 140 1.04 140 1.02 170 1.01 150 1.06 190 1.03 The Gini Coefficient Urban 1.10 2.00 1.10 2.35 1.10 2.05 1.10 1.80 1.10 1.90 Rural 1.10 2.90 1.20 3.70 1.10 3.15 1.10 2.75 1.10 2.60

5.3. Posterior Probability Distribution

Combining (5.2) and (5.3), the posterior pdfs for the parameters of each income distribution are obtained as (5.8) As the closed forms of these pdfs are not tractable, we use a random walk Metropolis-Hastings algorithm to draw observations ?(t) (t=1,2,…,T) from each of the pdfs. We use the algorithm steps similar to that employed by Griffiths and Chotikapanich (1997). The steps are as follows. 1. Select initial values for the elements of ?, say ?0. We use ?0 from the maximum likelihood estimates. Perform the remaining steps with t set equal to 0. 2. Compute a value for log p(?(t)|x). 3. Generate ? from N(0,sS) where S is an adjusted covariance matrix of the maximum likelihood estimates, and s is chosen by experimentation. 4. Compute ?* = ?(t) + ? 5. If ?* fall outside the feasible region, set ?(t+1) = ?(t) and return to step 2; otherwise, proceed with step 6. 6. Compute a value for log p(?*|x) and the ratio of the pdfs exp[log p(?*|x) - log p(?(t)|x)] 7. If r ? 1, set ?(t+1) = ?* and return to step 2; otherwise proceed with step 8. 8. Generate a uniform random variable, say u from the interval (0,1). If u ? r, set ?(t+1) = ?*; otherwise set ?(t+1) = ?(t). Return to step 2. Stata version 9 software was used to run the algorithm. The simulation was run for 25,000 iterations and the first 5000 samples were discarded as a burn-in period, leaving 20,000 parameter draws for each model. Plots of the observations were then taken to confirm the convergence of the Markov Chain. For each draws of ?, we computed the mean, mode as well as the Gini and poverty indices using the expressions in Section 4. Posterior means and posterior standard deviations for each indicator were then summarized from the whole draws.

6. Empirical Analysis

Step 1. Select initial values for the elements of ?, say ?0. Perform the remaining steps with n set equal to 0. (We use ?0 from ML estimates) Step 2. Compute a value for log p(?n|data). Step 3. Generate d from N(0,kV) where V is an adjusted covariance matrix of the ML estimates, and k is chosen by experimentation. Step 4. Compute ?* = ?n + d Step 5. If ?* fall outside the feasible region, set ?n+1 = ?n and return to step 2; otherwise, proceed with step 6 Step 6. Compute a value for log p(?*|data) and the ratio of the pdfs Step 7. If r?1, set ?n+1 = ?* and return to step 2; otherwise proceed with step 8. Step 8. Generate a uniform random variable, say v from the interval (0,1). If v ? r, set ?n+1 = ?*. Return to step 2 7. Step 1. Select initial values for the elements of ?, say ?0. Perform the remaining steps with n set equal to 0. (We use ?0 from ML estimates) 8. 9. Step 2. Compute a value for log p(?n|data). 10. 11. Step 3. Generate d from N(0,kV) where V is an adjusted covariance matrix of the ML estimates, and k is chosen by experimentation. 12. 13. Step 4. Compute ?* = ?n + d 14. 15. Step 5. If ?* fall outside the feasible region, set ?n+1 = ?n and return to step 2; otherwise, proceed with step 6 16. 17. 18. Step 6. Compute a value for log p(?*|data) and the ratio of the pdfs 19. 20. 21. 22. 23. Step 7. If r?1, set ?n+1 = ?* and return to step 2; otherwise proceed with step 8. 24. 25. Step 8. Generate a uniform random variable, say v from the interval (0,1). If v ? r, set ?n+1 = ?*. Return to step 2 In this section, the Bayesian methodology is applied to per adult equivalent expenditure data and the empirical results is discussed in more detail. The explanation will start by evaluating the posterior means and posterior standard deviations of the parameter draws, before describing the predictive densities as well as their comparison over different years and across different models. Next, the investigation is performed to posterior densities of the mean and modal expenditure, the Gini coefficient, headcount index, poverty gap index and poverty severity index.

6.1. Predictive Densities for Expenditure

Following the procedure in the methodology, we obtain five-year posterior densities of the model parameters for each urban and rural area. The posterior means and standard deviations for the 20,000 parameter draws of each model considered are presented in Table 6.1. Overall, the posterior standard deviations for the Singh-Maddala and Dagum models are relatively small. It may then suggest that the parameters of these two models are quite well estimated.

Table 6.1. Posterior Means and Standard Deviation (in the brackets) of the Parameter Draws

Model Parameter Urban Rural 1996 1999 2002 2005 2008 1996 1999 2002 2005 2008 Singh-Maddala b 230.2102 219.2610 268.7203 279.0893 289.9738 144.4132 152.4475 161.9263 165.1343 173.2325 (2.2426) (2.0474) (2.4238) (2.9277) (3.2443) (0.8650) (0.9354) (0.8981) (1.0382) (1.1725) a 3.9185 4.1333 3.8782 3.5535 3.4734 4.9867 5.0708 5.3634 4.7375 4.4625 (0.0488) (0.0508) (0.0433) (0.0425) (0.0412) (0.0493) (0.0496) (0.0527) (0.0468) (0.0428) q 0.5678 0.5823 0.5789 0.5778 0.6443 0.5638 0.6350 0.5656 0.5728 0.5865 (0.0131) (0.0135) (0.0121) (0.0130) (0.0154) (0.0103) (0.0120) (0.0103) (0.0103) (0.0107) Dagum b 167.9710 161.9604 199.4407 204.5580 211.0664 118.9092 133.1617 137.7839 143.1555 137.6364 (4.7745) (4.0827) (4.6494) (5.6011) (5.7196) (1.8528) (1.6991) (1.8702) (1.9627) (2.2114) a 2.4091 2.5775 2.4259 2.2273 2.3135 3.1065 3.4666 3.3746 3.0458 2.8356 (0.0227) (0.0233) (0.0198) (0.0198) (0.0215) (0.0233) (0.0267) (0.0253) (0.0220) (0.0198) p 2.8614 2.8571 2.7378 2.6768 2.5343 2.5415 2.0907 2.4233 2.2101 2.5642 (0.1399) (0.1325) (0.1090) (0.1143) (0.1112) (0.0861) (0.0632) (0.0773) (0.0635) (0.0815) The proposed models portray the changing in the expenditure distribution over the years through predictive densities. 6.1 and 6.2 depict the movement of predictive densities for urban and rural areas under the Singh-Maddala and the Dagum specification, respectively. These densities are generated based on the posterior means of the parameter draws from each model in Table 6.1. The predictive densities are cut at a certain level of expenditure in order to provide much clearer pattern of the changes. Overall, the predictive densities are of uni-modal and right-skewness nature. These characteristics are the typical natures of income or expenditure distribution. The Singh-Maddala and the Dagum predictive densities also demonstrate similar transformation in the distribution over the years. Substantially, the location of the modes as well as the expenditure spreading and the thickness of lower and upper tails are gradually changing. Consider urban areas first. Between 1996 and 1999, the posterior pdf shift to the left with higher mode, suggesting a higher number of people in 1999 are poorer than in 1996. Between 1999, 2002 and 2005 the pdf's move back out to the right with lower modes and fatter right tail. This suggests that from 1999 to 2005 there are more people with higher level of expenditure. There is not much change in the pdf between 2005 and 2008. The obvious reason to explain the backward shift in the distribution between 1996 and 1999 is because of the Asian financial crisis. The general movement in the distribution between 1996 and 1999 is consistent with the previous findings by Beegle, Frankenberg and Thomas (1999) and Skoufias (2001). Unlike the urban areas, the patterns of the shift of the expenditure distributions in rural areas during 1996-2008 are quite different. Between 1996 and 1999 the posterior pdf shifts to the right with higher mode. From 1999 to 2002 the pdf shifts slightly further to the right but with a lower mode. However, from 2002 to 2005 and then 2008, the distributions shift only very slightly to the right but the location for the modes stays the same between these three years.

6.2. Posterior Densities of the Mean and Modal Expenditure, the Gini Coefficient and Poverty Indices.

Using the expressions outlined in Section 4, the simulated parameter draws are used to obtain the estimates of mean and mode expenditure as well as the Gini and poverty indices. The characteristics of posterior densities are then summarized by posterior means (point estimates) and standard deviations reported in Table 6.2.

Table 6.2. The Actual Sample Values, Posterior means and Standard Deviation (in the Brackets) of Statistics of Interest

Survey Year Statistics Urban Rural Sample value Singh-Maddala Dagum Sample value Singh-Maddala Dagum 1996 Mean 370.9706 380.9622 379.4622 204.3220 205.7849 204.9931 (2.6079) (2.2643) (0.7446) (0.6749) Mode - 224.3966 210.6409 - 145.7175 140.4592 (1.0733) (1.2055) (0.4533) (0.5007) Gini coefficient 0.3467 0.3671 0.3622 0.2715 0.2825 0.2778 (0.0033) (0.0027) (0.0019) (0.0015) Headcount index 0.1432 0.1234 0.1253 0.2095 0.1867 0.1937 (0.0050) (0.0050) (0.0058) (0.0059) Poverty gap index 0.0272 0.0272 0.0240 0.0378 0.0354 0.0342 (0.0015) (0.0013) (0.0015) (0.0014) Poverty severity index 0.0077 0.0096 0.0072 0.0103 0.0108 0.0094 (0.0007) (0.0005) (0.0006) (0.0005) 1999 Mean 332.6298 340.9449 341.0964 199.2879 200.4719 200.4968 (1.9379) (1.7729) (0.6045) (0.5724) Mode - 214.8472 202.3798 - 151.3635 146.6685 (0.9781) (1.0400) (0.4581) (0.4991) Gini coefficient 0.3169 0.3388 0.3364 0.2451 0.2544 0.2528 (0.0029) (0.0024) (0.0015) (0.0014) Headcount index 0.2151 0.1766 0.1824 0.2723 0.2487 0.2558 (0.0057) (0.0058) (0.0064) (0.0065) Poverty gap index 0.0428 0.0386 0.0358 0.0513 0.0482 0.0472 (0.0017) (0.0015) (0.0017) (0.0016) Poverty severity index 0.0126 0.0134 0.0108 0.0145 0.0148 0.0134 (0.0008) (0.0006) (0.0007) (0.0006) 2002 Mean 432.3625 440.1354 438.3898 220.7423 222.3354 221.4634 (2.6888) (2.2796) (0.7296) (0.6511) Mode - 260.5147 244.8307 - 164.3126 159.5259 (1.1454) (1.2332) (0.4788) (0.5191) Gini coefficient 0.3550 0.3655 0.3608 0.2510 0.2603 0.2556 (0.0030) (0.0023) (0.0017) (0.0014) Headcount index 0.1535 0.1339 0.1366 0.2227 0.1986 0.2056 (0.0051) (0.0051) (0.0059) (0.0060) Poverty gap index 0.0295 0.0299 0.0268 0.0380 0.0358 0.0349 (0.0015) (0.0013) (0.0014) (0.0014) Poverty severity index 0.0086 0.0106 0.0082 0.0100 0.0104 0.0093 (0.0007) (0.0006) (0.0006) (0.0005) 2005 Mean 477.0349 493.8840 489.9874 236.2176 239.2102 237.4441 (3.8739) (3.2397) (0.9137) (0.8100) Mode - 265.3697 247.9469 - 165.3536 160.4663 (1.3459) (1.4939) (0.5374) (0.5832) Gini coefficient 0.3804 0.4041 0.3971 0.2822 0.2952 0.2881 (0.0038) (0.0029) (0.0019) (0.0016) Headcount index 0.1231 0.1106 0.1111 0.2071 0.1926 0.1990 (0.0047) (0.0047) (0.0059) (0.0059) Poverty gap index 0.0240 0.0260 0.0227 0.0393 0.0382 0.0377 (0.0015) (0.0013) (0.0016) (0.0015) Poverty severity index 0.0073 0.0097 0.0072 0.0116 0.0121 0.0111 (0.0007) (0.0006) (0.0007) (0.0006) 2008 Mean 454.0787 467.0608 471.0123 251.8224 255.1404 254.8222 (3.1785) (2.8821) (0.9960) (0.8820) Mode - 268.3062 248.9518 - 171.5346 163.6459 (1.3126) (1.4766) (0.5743) (0.6069) Gini coefficient 0.3618 0.3793 0.3826 0.2939 0.3095 0.3062 (0.0033) (0.0028) (0.0020) (0.0016) Headcount index 0.1275 0.1565 0.1598 0.2044 0.1743 0.1797 (0.0054) (0.0055) (0.0056) (0.0057) Poverty gap index 0.0239 0.0384 0.0345 0.0381 0.0357 0.0336 (0.0018) (0.0016) (0.0015) (0.0014) Poverty severity index 0.0068 0.0147 0.0114 0.0108 0.0117 0.0097 (0.0009) (0.0007) (0.0007) (0.0006) The posterior means are similar to the actual sample values. Comparing the two distributions, the Dagum specification produces the closest posterior means to the observed values in most cases. The posterior standard deviations are also relatively small where the smallest standard deviations are mainly given by the Dagum model. As there is no sample estimate value available for the mode of expenditure, we can only compare the posterior means of the mode between the two competing models. Unlike other posterior means, in all cases the point estimates for the mode under the Dagum model are relatively lower than those of the Singh-Maddala model, while its posterior standard deviations are a bit higher than the Singh-Maddala ones.

6.2.1. The Immediate Impact of the Crisis

Looking at the changes in the posterior means of the indicators of interest between consecutive surveys provides useful suggestion of changes in the expenditure distribution. As the Singh-Maddala and the Dagum models give similar patterns of the shifts, we use the Dagum estimates to describe the changes over time. s 6.3-6.5 show the trend over survey years for the mean, mode, inequality and poverty indices respectively. As indicated by the graphs, the 1997 monetary crisis considerably relocates the mean and mode of expenditure, and significantly improves the inequality but not the poverty in 1999. For the urban areas, the estimated mean and mode of expenditure decrease by around 10 and 4 percent, respectively while the Gini coefficients decline by about 8 percent. The crisis also worsens poverty to be much higher in 1999 than 1996. The posterior means of the headcount index, poverty gap and poverty severity indices grow to approximately 40-50 percent. For rural areas on the other hand, the impact of the crisis is not as severe as for the urban ones. Posterior estimates of the mean expenditure and the inequality in 1999 are slightly lower than those indicated in 1996. Interestingly, unlike the urban regions the mode increases by roughly 4 percent over the period 1996-1999. Although poverty incidences in rural areas are much higher than those in urban areas, the degree of rise in the posterior means of the poverty indices are still lower than that of the urban regions (32-43 percent). All these findings substantiate the conclusion of previous researches which infer that in the period affected by crisis (1999), the inequality index has notably fallen whereas poverty rate has greatly risen, and the urban regions suffered more than the rural areas (L. Cameron, 2002; Kadarmanto & Kamiya, 2005; Skoufias, 2001; Suharyadi & Sumarto, 2003).

6.2.2. The Period after 1999

The Means and Modes

After 1999, the mean expenditure in urban areas grows quickly from around Rp 340,000 in 1999 to Rp 490,000 in 2005, corresponding to roughly a 44 percent increase, before declining slightly to about Rp 470,000 in 2008 ( 6.3). The rise of the mean expenditure is also accompanied by almost a 23 percent increase in the estimated mode between 1999 and 2005, followed by more or less a 1 percent increase in 2008. A quite different trend is noticeable for the rural areas. In fact, the changes in the estimated mean and mode of expenditure are not as large as those of the urban regions. Between 1999 and 2008, the mean expenditure rises steadily from around Rp 200,000 to about Rp 255,000. However, it still cannot get closer to the associated mean expenditure in urban areas, which are already almost twice as big as those of the rural areas. Similar to the mean, the mode of expenditure also rises by around 9 and 2 percent in 1999-2005 and 2005-2008 respectively.

The Inequality and Poverty

In 2002, an expanding trend in inequality for urban areas starts at the same time as a falling drift in poverty measures. The posterior means of the Gini coefficient rises roughly by 18 percent during 1999-2005, but in 2008 the posterior means of the inequality index reduces nearly by 4 percent ( 6.4). As we can see in s 6.4 and 6.5, the posterior estimates of poverty indices in 2002 drop in the region of 32-39 percent, which makes the estimates to be situated around that of the 1996's. After decreasing slightly in 2005, the estimated poverty measures increases again approximately by 43-58 percent in 2008. Different from the urban region, the posterior estimate of the rural inequality index is still increasing in 2008 by approximately 6 percent, after having a 14 percent increase between 1999 and 2005 ( 6.4). Regarding the headcount index, after having a declining trend by about 25 percent between 1999 and 2005, the estimated headcount index is still reducing by around 10 percent in 2008. In 2002, the poverty gap and poverty severity indices decrease by about 26 and 31 percent respectively. There is not much change in both indices between 2005 and 2008 ( 6.5). Therefore, although the individuals seems to be wealthier than earlier as the mean and concentration of expenditure increases substantially, the inequality after 1999 is also increasing with much higher disparity in urban regions. This reasoning is coherent to what has been indicated previously by Cameron (L. Cameron, 2002) and Kadarmanto and Kamiya (Kadarmanto & Kamiya, 2005).

6.3. Posterior Densities Comparison

In order to compare the performance of posterior densities under the Singh-Maddala and Dagum models, s 6.6-6.8 show the example of paired posterior densities for the mean, Gini and headcount index respectively, for urban areas in 1996, 2002 and 2008. The graphs suggest that posterior densities of the mean and headcount index for both models are relatively similar. However, posterior densities of the Gini coefficient are quite sensitive to the specification of expenditure distribution model ( 6.7). Posterior densities of the mode, poverty gap and severity index are also found to have a quite high dependency on the model condition. Therefore, it is necessary to further assess the performance of both models using posterior model probabilities.

7. Model Selection

In this section, the performance of the Singh-Maddala and Dagum models are compared by looking at the posterior model probability. The first subsection explains how to calculate the posterior model probability while the next subsection describes the calculation results.

7.1. Posterior Model Probability

As we work with two competing income distribution models, we may want to investigate which model gives the best fitting. In Bayesian analysis, posterior probabilities are principally utilized to evaluate the model where the density function with the highest posterior probability is deemed to be the best one. According to the Bayes' rule, the posterior probability of model i, P(Mi|x) can be derived as (5.9) where p(Mi) is the prior probability for model i and f(x|Mi) is the marginal likelihood, which is the likelihood of the data under model Mi, marginalized over the parameter vector from the prior (5.10) To evaluate two models of i and j we use the posterior odds ratio, which is the ratio of the corresponding posterior model probabilities. (5.11) As we are looking for the model with the highest posterior probability, then model i is favoured than model j when the posterior odds ratio is greater than one, and the reverse is true when the posterior odds ratio is less than one. In practice, the prior probability of the competing model p(Mi) is treated to be identical under the uninformative approach. That is, each proposed density function is assumed to have an equal chance of being the correct model. Consequently, the posterior odds ratio is simply the ratio of marginal likelihoods, which is referred to as the Bayes factor. These marginal likelihoods will then measure how well the proposed models predict the observed data. Since the marginal likelihood, f(x|Mi), is analytically difficult to compute, some techniques have been suggested in the literature. Gelfand and Dey (1994) proposed an approach using the modified harmonic mean to obtain the marginal likelihood. The main advantage of the technique is that it utilizes the posterior parameter draws to calculate the posterior likelihood of the model. Specifically, Gelfand and Dey (1994) express the estimate of inverse of the marginal likelihood as (5.12) where h(?) is a density function that approximate the posterior density with support contained in Q. The f(x|?,Mi) represents the likelihood function for the data under model Mi, p(?|Mi) denotes the prior density function of the parameters under the same model. The inverse of the marginal likelihood is evaluated at each retained parameter draws ?(t), t=1,2,…,T. To avoid the inverse of the marginal likelihood to be unbounded in the tails, Geweke (1999) improves the technique by suggesting the truncated multivariate Normal density for h(?), (5.13) The density is truncated such that (5.14) with denotes the value from the chi-squared distribution evaluated at the(1-r)th percentile with k degrees of freedom, and k is the number of parameters contained vector ?. and are the vector of posterior means and covariance matrix of the retained sampled draws, respectively. As recommended by Geweke (1999), the inverse marginal likelihood for each model is evaluated at different value of r to carefully monitor its performance. The exact form of the posterior distribution for expenditure distribution considered in this paper was unknown, so we were unable to obtain the exact marginal likelihood for each model. We then followed the uninformative approach for the prior model probability and implemented the marginal likelihood calculation technique suggested by Gelfand and Dey (1994) and Geweke (1999). The inverse marginal likelihood for each model was evaluated for r = 0.1, 0.2,…, 0.9. As the marginal likelihood is computed on the natural logarithm scale and the associated exponential value became infinite, we directly used the log of the marginal likelihood value instead of calculating the Bayes factor. The model with the largest log of marginal likelihood value is therefore deemed to be the best fitting model. If model comparison is inconclusive, that is no single best model appears, we can then proceed to represent model uncertainties through averaging these competing specifications, with the weights being the posterior model probabilities.

7.2. Results

From the two candidate models namely, the Singh-Maddala and the Dagum distribution, the best model is selected by comparing the corresponding posterior model probabilities. Table 6.3 presents the log of marginal likelihood values for each competing model under different values of r. Since the log of marginal likelihood values conditional on the Dagum model are constantly larger that those under the Singh-Maddala model, the Singh-Maddala distribution is concluded to be entirely inferior to the Dagum specification for both urban and rural areas over the period of interest.

Table 6.3 The Log of Marginal Likelihood under The Singh-Maddala and Dagum Specification over Different r and Survey Years.

Bayesian inference analysis provides a valuable method to investigate the change in expenditure distribution by producing posterior distribution for model parameters as well as associated indicators of interest. Consequently, the Bayesian technique makes portraying the movements on the entire shape of the distribution possible. Not only interpretation of Bayesian interval estimate that becomes much clearer and straighter, but also model evaluation turns out to be more directly and determinedly analysed through contrasting posterior model probabilities. In this paper, Bayesian analysis is used to empirically examine the changes in Indonesia expenditure distribution in the last 15 years characterized by the worst financial crisis in 1997 as well as economic and institutional reforms, from the perspective of the complete distribution. Three-parameter functions namely, the Singh-Maddala and the Dagum distributions are proposed to model expenditure distribution obtained from the National Socio-Economic Survey (Susenas) over period 1996-2008. Specifically, to give a clearer picture of the transformation impact, some indicators of interest which are functions of parameter estimates are thoroughly evaluated. The proposed models are found to typify the histogram of expenditure distribution very well. They mainly produce posterior means of both parameter estimates and indicator of interests, which are relatively closer to the corresponding sample estimates with fairly small posterior standard deviations. The non-informative prior that we assumed has obviously made sample information rules the posterior density as expected. For most of the time, the Dagum model gives the closest posterior means with the smallest posterior standard deviations to the observed sample estimates. The Singh-Maddala and Dagum distributions also demonstrate similar expenditure transformation over survey years. The empirical results suggest that the Indonesian expenditure distribution has noticeably changed over time, where the increase of spread of expenditure as well as the mean and expenditure concentration is more marked. The Bayesian inference conducted has visually illustrated the impact of the Asian financial crisis in 1997 to the structure of expenditure distribution. The crisis has altered the location and shape of the distribution in 1999 in a substantial way, moving the density slightly towards lower level than those identified in 1996 and decreasing the expenditure spreading to some extent. Although expenditure disparity seems to be reduced, with regard to consumption expenditure this shift has negatively influenced the welfare of a large percentage of people in the population at that time. After 1999, the portrait of increase in the mean and concentration of expenditure as well as the expenditure dispersion becomes more apparent. Consequently, the shift gradually gives a positive result to the wellbeing of a great proportion of individuals in the distribution. However, the shift leads to an increase in inequality as well. With several exceptions, rural areas are found to mainly follow the same transformation path as the urban areas but, the extent of the changes is not as large as those of the urban regions. All these findings are attractively coherent with what have been advised by previous studies. Posterior densities of statistics of interest further support preceding inferences. This feature has not only made the assessment of indicator shift becomes much clearly, but also has enabled us to analyse the sensitivity of the associated indicator to different model assumed. The mode of expenditure is learned to be the most sensitive indicator to model condition. Consequently, measurement of the expenditure concentration is really determined by the specification of the spending distribution model. The headcount index and mean expenditure in both urban and rural areas as well as poverty gap index in rural areas conversely, are realized to be relatively having least reliance on the choice of model. On the other side, inferences drawn from the Gini and poverty severity index for both urban and rural areas along with poverty gap index in urban areas are shown to be very determined by what model they are conditional on. Model comparison performed has confirmed that the Dagum function is the best fitting parametric density to model expenditure distribution for both urban and rural regions. Its posterior model probabilities, represented by the log of marginal likelihood, have completely dominated the Singh-Maddala ones.

9. Future Plan 9.1. Mixture of the Dagum Densities

The previous finding reveals that the Dagum model performs better than the Singh-Maddala model. 9.1 shows the example of Dagum predictive density fit to the associated empirical histogram of expenditure in 2008 for both urban and rural areas. The Dagum predictive densities are obtained from Section 6.1 while the histograms are drawn based on the empirical data. The predictive density of the Dagum distribution seems to not fit the empirical expenditure histogram very well, especially at the peak and upper-middle part of the expenditure distribution. Therefore, in this section we investigate how the performance of the Dagum model can be improved through combining several similar functions within a mixture model framework. Additionally, the data are derived from a heterogeneous population so we can also expect that expenditure pattern of the low income earners to be very different to that of the high income ones, for instance. Consequently, when the population can be divided into multiple subpopulations and a mixture model that combines the samples from each subpopulation is build, better estimates for the quantities of interest is most likely to be achieved. In a mixture model, a probability model for each of the subpopulations is developed by creating a probability density function (pdf). In practice, it is common to assume that the density functions for each mixture elements are all from the same parametric family with different parameter vectors. Next, the probability that an observation comes from a certain component is set as the mixture weight. Together with mixture densities, these mixture weights then form a mixture model. Often the case, the degrees of heterogeneity of expenditure are unknown so, the number of components in the mixture model, c, is also unidentified. In this case, the number of subpopulations that the data supports should be examined. The use of mixture model combines the merits of both parametric and nonparametric approaches in density estimation. The parametric density function established in an expenditure mixture model not only allows for direct inference of inequality and poverty measures, but also simplifies the associated interpretation. The combination of these parametric functions further makes the model less restrictive and very flexible, just like the nonparametric model. Thus, under a specific condition the mixture model can approximate any shape of a distribution. For instance, a mixture of two similar probability density functions with different parameter vectors may yield a bimodal density, which cannot be formed by any single standard parametric functions. Therefore, by managing the number of element to be included in a mixture model, the best features of parametric methods and the flexibility of nonparametric techniques can be well maintained. Works on mixture models under the Bayesian framework has been developed by Diebolt and Robert (1994) who assume that the number of components, c, is known. They have shown that the Normal mixture offers an easy and useful foundation for Bayesian density estimation. Richardson and Green (1997) on the other hand, consider the case where the number of elements is unknown. They use the reversible jump method that requires the complex calculation of a Jacobian matrix to evaluate the posterior distribution. The works on finite mixture model within Bayesian context are then continuously developed by some other researchers such as Escobar and West (1995) and Stephens (2000). Recently, Chotikapanich and Griffiths (2008) reveal the use of mixture of two Gamma densities to estimate predictive density for income as well as the Lorenz curve and posterior densities of the Gini coefficient. Using Gibbs sampling algorithm developed by Wiper, Insua and Ruggeri (2001), the Gamma mixture is found to fit income data points much better than the well-known two and three-parameter models under the root mean squared error criteria. In this paper, the performance of the Dagum model as the best fitting model for Indonesian expenditure data will be improved through employing a mixture density estimation procedure under the Bayesian approach. The mixture of Dagum densities will be utilized to find a suitable approximation for posterior densities of quantities of interest. Using per adult equivalent expenditure data obtained from the National Socio-Economic Survey (Susenas) over the period of 1996 to 2008, the parameter of the model will be estimated.

9.2. Density Based Decomposition Analysis under Bayesian Framework

In order to be of use to policy makers, the study will next examine the link of the changes in spread of expenditure as well as inequality and poverty to underlying changes in the economy. This purpose would be achieved by decomposing those changes into components associated with changes in demographic and economic structures. There are a number of decomposition methods offered in the inequality decomposition literature. Shorrocks (1982) and (1984) for instance, researched inequality decomposition by income sources and population subgroup, correspondingly. However, these techniques apply only to inequality measures that is monotonic transformations of additively decomposable indices (Shorrocks, 1984). Moreover, Shorrocks (1982) indicated that the decomposition result is quite sensitive to the choice of inequality measures. Fields (2003) and Morduch and Sicular (2002) then develop regression based methods of decomposition. In this method, regression results of an income generation equation are basically used to quantify the contribution of considered explanatory variables to total inequality. While this type of decomposition is simple to implement and interpret, the decomposition is evaluated only at the changes of the mean of the distribution. This kind of evaluation may exclude other important information about specific income distribution dynamics, such as differences in the variance of the income in each subgroup. Increased interest for better inequality decomposition has promoted the developments of alternative decomposition approaches. DiNardo, Fortin and Lemieux (1996) then proposed a semi parametric method which presents the decomposition in terms of a probability density function rather than a summary statistics. This method then could measure the results of changes in the considered determinants on the entire distribution. More specifically, the technique could indicate where exactly on the distribution these changes yield the greatest influence so noticeably contribute to a larger inequality. In the DiNardo, Fortin and Lemieux (1996) methodology, a counterfactual income distribution is constructed by placing a what if condition. That is, what density that would have prevailed in the period of interest, given the characteristics of the determinants had remained the same as the previous period. The counterfactual distributions can then be estimated using standard technique such as kernel density estimation. Under these counterfactual densities, the distributional changes of the indicated factors over time are analysed. Adaptations of the method have been applied in some recent papers. Cameron (2000) for example, modified DiNardo, et al. (1996) technique by decomposing the changes in the cumulative distribution functions, Lorenz curves and generalized Lorenz curves of per capita income in Java between 1984 and 1990. Another extension was done by D'Ambrosio (2001), who developed the technique using adaptive kernel density estimate as well as decomposition way of between and within group components. The recent application of the technique is given by Hyslop and Mare (2005), who adapted the DiNardo, et al. (1996) method to examine contribution of changes in social and demographic factors to the inequality rise in New Zealand during 1983-1998. Under similar non parametric method, Jenkins and Van Kerm (2004) disaggregate changes in the income distribution density by taking advantage of the additive decomposability property of density functions. In their approach, Jenkins and Van Kerm (2004) separate the decomposition of changes in the density into two components namely, subgroup shares and subgroup densities. These components summarize the effects of changes in subgroup population shares and subgroups distributions correspondingly. In particular, subgroup densities are further disaggregated to account for changes in subgroup income location, variation and modality. The calculated components are then applied to the counterfactual distribution which observes the change in the density between a base and a final period. Unlike the DiNardo, et al. (1996) method which used changes in individual samples weight to characterize specific changes that they examined, Jenkins and Van Kerm (2004) method utilize changes in sample weights for groups of individuals to characterize changes in subgroup shares. However, the two decomposition methods are complements rather than substitutes. While the DiNardo, et al. (1996) method is practical in examining specific changes in determinant factors, Jenkins and Van Kerm (2004) technique is informative when investigating development of the determinants influence on income distribution. In this paper, we will adapt the density based method of decomposition proposed by DiNardo, et al. (1996) as well as Jenkins and Van Kerm (2004) under the Bayesian perspective. As we work within Bayesian framework, the decomposition of changes in the complete distribution is not only applicable to the predictive density of the expenditure, but also to posterior density of the mean and mode of expenditure as well as the Gini and poverty indices. In this case, the predictive densities and its associated posterior densities are derived from posterior distributions of mixture of the Dagum densities. We intend to examine contributory factors that produce the greatest influence on changes in predictive densities of expenditure distributions as well as posterior densities of indicators of interest. We plan to relate the changes to the age, educational attainment and occupation industries of the household head, and number of household members.

10. Timetable 10.1. Progress to date

March 2009 - May 2009 The raw data from the core and the consumption module Susenas 1996-2008 have been merged and adjusted with regards to the price index as well as the equivalence scale. April 2009 - September 2009 After learning about basic programming using STATA software, a STATA program for the Metropolis Hastings algorithm in Bayesian methods to obtain posterior densities for the indicators of interest has been written and developed. July 2009 - November 2009 Sitting in a class on Bayesian Econometrics at the second semester. August 2009 - February 2010 An introduction to Indonesia has been written and an initial work on the literature review has been performed and written up. Regarding the first paper, the methodology has been written, and tables, graphs as well as results have been produced and analysed. At this stage, this is being compiled into the thesis format. For the second and third topics, brief introduction and methodology have been outlined and written.

10.2. Timetable for Completion of Thesis

March 2010 - April 2010 Revise and complete the first paper. Fitting the proper poverty line and adding the Beta-2 model in the analysis. May 2010 - November 2010 Start working with the second topic. Develop STATA programs for posterior distributions of the mixture Dagum densities. Complete the literature review and methodology. Produce tables, graphs and results to be analysed. Writing the second paper and finish it by the end of November 2010, before inserting the completed paper into the thesis. December 2010 - February 2011 Start working with the third topic and completing the literature review and methodology. Collect additional raw data for decomposition analysis purpose. Match the additional collected data to the current per adult equivalent expenditure data. March 2011 - September 2011 Develop STATA programs for density based decomposition analysis using Bayesian approach. Generate tables, graphs and results to be analysed. Writing the third paper and complete it by the end of September 2011, to be inserted into the thesis. October 2011 -February 2012 Compile all the papers into chapter in the thesis and complete conclusion to the thesis. Revise and edit thesis and submit the final version of the thesis at the end of February 2012.