What is a Hedonic Model in Economics? – Real Estate

Check out more papers on Real estate Real Estate Regret

Given the available data would it be possible to build a hedonic model?

Don't use plagiarized sources. Get your custom essay on

“What is a Hedonic Model in Economics? – Real Estate”

Get custom essay

In real estate and property studies, the hedonic regression (P = f1, f2, f3, …, fn) is often used to study the impact of a number of factors that affect housing prices [1] In economics, hedonic regression or hedonic demand theory is a revealed preference method of estimating demand or value. It breaks down the item being researched into its constituent characteristics, and obtains estimates of the contributory value of each characteristic. This requires that the composite good being valued can be reduced to its constituent parts and that the market values those constituent parts. Hedonic models are most commonly estimated using regression analysis, although more generalized models, such as sales adjustment grids, are special cases of hedonic models. Hedonic models are commonly used in real estate appraisal, real estate economics, and consumer price index (CPI) calculations. In CPI calculations, hedonic regression is used to control the effect of changes in product quality. Price changes that are due to substitution effects are subject to hedonic quality adjustments.

All in all, Hedonic regression is a revealed-preference method used in economics to determine the relative importance of the variables which affect the price of a good or service. These factors are determined using regression analysis.


Discuss whether HousePriceXYZ should use a hedonic model to explain and predict the determinants of housing values. [Hint: discuss considerations such as missing data, measurement error, endogenous variables and the pros and cons of using a hedonic model to predict housing values.]

Most real datasets consist of missing values, thereby requiring one to identify the missing values, to determine the extent and type of missingness, and to choose a course of action accordingly. Although a rich literature exists on data imputation, it is monopolized by an explanatory context. In predictive modeling, the solution strongly depends on whether the missing values are in the training data and/or the data to be predicted. The proposed solution is to estimate multiple ‘reduced’ models, each excluding some predictors. When predicting an observation with missingness on a certain set of predictors, the model that excludes those predictors is used. This approach means that different reduced models are created for different observations. Although useful for prediction, it is clearly inappropriate for causal explanation. However, it works in a hedonic model.

In biased parameter estimates, endogeneity can occur due to different reasons. One is incorrectly omitting an input variable, say Z, from f when the causal construct Z is assumed to cause X and Y. In a regression model of Y on X, the omission of Z results in X being correlated with the error term. Endogeneity can arise due to other reasons such as measurement error in X. The methods of detecting endogeneity are constructing instrumental variables and using models such as two stage-least-squares (2SLS).

Pros: The hedonic pricing model has many advantages, since this method of valuation can require a strong degree of statistical expertise and model specification, following a period of data collection. This model includes the ability to estimate values, based on concrete choices, particularly when applied to property markets with readily available, accurate data. At the same time, the method is flexible enough to be adapted to relationships among other market goods and external factors. The values of properties are broken down into components. With regards to valuing properties, a hedonic pricing model is relatively straightforward as relies on actual market prices and comprehensive, available data sets.

Thus, I believe the company should use the hedonic model.

Cons: its ability to only capture consumers’ willingness to pay for what they perceive are environmental differences and their resulting consequences.

After all: When the characteristics of a commodity change, its price will change accordingly. Taking partial derivatives of each characteristic variable of the function, the influence range of each characteristic change on commodity price is obtained, and it is assumed that the influence relationship is fixed and unchanged in a certain time. So, in the absence of homogeneous goods, can use the non-homogeneous property comparison between the base and the reporting period, from the changes in the total price item by item, eliminating the influence of characteristic changes, the last is purely caused by the price change of supply and demand, this calculation of price index is based on the characteristics of the price law of the real estate price index.

The dataset contains the variable ‘logvalue’. Describe the distribution of house values by Census region (variable ‘region’). Plot the distribution for each region. Conduct pair-wise t-tests for the difference in mean house values for ‘West’ vs Northeast, Midwest, South.

HousePriceXYZ would like to have an econometric model of house values using 10 variables.

Using the available data choose the 10 variables you think are most relevant to explaining house values.

Explain why you chose those variables.

Lot, Unitsf, Climb, Numair, Age, Busper, Exclus, Howh, Numdry, Numsew,

The most significant problem is the identification of estimated variables. Namely, in estimations using a hedonic model, the price is determined as a result of the locational activity In this case, data specific to the individual such as external environment, housing conditions and so on is required. In addition, when a hedonic model is constructed for a wider area, neighborhood variables that can explicitly handle the differences among neighborhoods must be incorporated as explanatory variables.

Also, most of them took a higher display format.

Write down your preferred econometric model and explain any variable transformation you chose to employ and why. [Hint: you may choose to transform variables using any of the methods we studied such as logs or polynomials.]


Log(value)(i)=?1+?2Lot(i) +?3 log(Unitsf)(i) +?4 log Climb(i) +?5 log Numair(i) +?6 log Age(i) +?7 log Busper(i) +?8 log Exclus(i) +?9 log Howh(i) +?10 log Numdry(i) +?11 log Numsew(i) +ui

I change logvalue to value because we need the actual value to see the hedonic model’s accuracy. Although Logs turn multiplicative models additive, and they neutralize exponentials, they are nevertheless ‘linear in logs.’ In addition to turning certain non-linear models linear, they can be used to enforce nonnegativity of a left-hand-side variable and to stabilize a disturbance variance. I transfer every variable to logs because, such as ‘unisf’ into log(unitsf) because I think according to the model, logs can mostly make significant factor stable enough to determine value. And the model setting need log(x).

Estimate your preferred econometric model using STATA. Report the results in a table. Interpret the coefficient estimates and/or marginal effects and explain whether HousePriceXYZ can use these results to better understand what drives house values in the US.

As can be seen from table, under the significance level of 1%, lnunitsf, lnclimb, busper, exclus and lnhowh have a positive impact on logvalue, but lnage and lnnumdry have a negative impact on logvalue.Lnnumair and lnnumsew had little effect on logvalue.

HousePriceXYZ would now like to develop a model which better predicts house values using a larger set of variables because it thinks that a model with more variables will predict house values better.

Split the data at random into a ‘training dataset’ containing 80% of the observations and a ‘validation dataset’ containing the remaining 20% of the observations. Create a new indicator variable ‘TD’ which is 1 for observations in the training dataset and 0 for observations in the validation dataset.

Run a linear probability model for TD using the 10 variables you have used before. Show the resulting regression table. Conduct an F-test for regression significance and argue that the randomization procedure you employed did indeed produce a random 80/20 split of the data.

After the sample is randomly selected into two parts, lnunitsf, busper and lnhowh have a positive effect on logvalue and lnage has a negative effect on logvalue at the significance level of 1%.Lnlot, lnnumair and lnnumsew had little effect on logvalue.The three variables of lnclimb, lnnumdry and exclus have the same influence on logvalue, but the coefficient significance level of lnclimb is not strong in the training dataset, and the coefficient of validation dataset is only significant at the 10% significance level.The coefficient significance level of exclus and lnnumdry in the training dataset was strong, while the coefficient significance level in the validation dataset was not.In general, the estimation results of the HPM model in 3.1 have little change, indicating that the samples meet the randomness and the estimation results have certain reliability.

Define the mean squared prediction error in the validation data as where n is the number of observations in the validation dataset. ??, ? are the predicted and actual log house values in the validation data.

HousePriceXYZ would like you to start by estimating a model with at least 30 variables on the training dataset and then apply the model to the validation dataset. Compute the resulting

MSEV. How does this new model compare with the earlier model with 10 variables? Does it predict house values betters?

The judgment R-suqred coefficient of the model changed from 0.1427, which did not include the 20 variables, to 0.3168, and the explanatory ability of explanatory variables on the explained variables was strengthened.

It has a positive effect on logvalue, and a negative effect on logvalue.Lnnumair and lnnumsew had little effect on logvalue.

At the significance level of 5%, lnlot, lnunitsf, lnage, busper, lnhowh, LNFPLWK, lnregion, lnbath, lnbedroom, lnfloor, lneroach, phone_1, newc_1, dens, dining, famrm, cracks, WHN have significant positive effects on logvalue.Lnclimb, lnnumdry, lnmetro, LNFPLWK, dish and dry have significant negative effects on logvalue, while lnnumair, exclus, lnnumsew, lnkitchen, wash and nowire have insignificant effects on logvalue.

STATA uses the command stepwise to conduct model selection. Explain the difference between backward and forward model selection. What role do the options pr(.) and pe(.) play in obtaining a parsimonious model which also predicts house values?

Stata15 automatically selects 25 explanatory variables from 30 explanatory variables, and stepwise regression estimation removes the five variables of lnnumair, exclus, lnnumsew, lnkitchen and wash.Forward stepwise and Backward stepwise retain the same variable, so stepwise regression estimate result is more reliable.

Starting from your model with at least 30 variables use the stepwise command in STATA to perform both backward and forward model selection. Estimate the models on the training data. Compute the MSEV on the validation data. [Hint: you can compute a number of different models using different options which you are free to choose in the stepwise command. However, in your report, you should explain which options you chose and why. Report the best models you were able to obtain using this procedure.]

The sample has zero-tail-breaking effect, so this paper USES zero-tail-breaking Poisson regression to analyze the model:

10. In class we also discussed other model selection procedures (e.g. LASSO). You don’t have to estimate these additional models. In your report to HousePriceXYZ discuss additional strategies that could be employed to derive even better econometric models to predict home values. Based on the work you have completed so far do you think it would be beneficial for HousePriceXYZ to invest in building additional prediction models for house values. What are the costs and benefits of this decision?

LASSO, a classical linear model estimated with OLS, linear OLS model including geographical coordinates, Spatial Expansion model, spatial lag and spatial error models, and geographically weighted regression.

To better and more elaborate describe the marginal effect of each variable, an additional an better strategies to help to predict more precise home values would be benefits. however, I do not mean Hedonic is not good enough, it just not so concise and the logvalue would make trouble for the costumers to read. Costs of this decision would be the time, money spend on the new prediction models while the benefits of this decision would be more precise money can reduce unnecessary risk and lower the price to attract customers. In addition, the company can see more clearly how the variables affect the housing value and which ones are most significant and focus to build housing estate according to the models.

Did you like this example?

Cite this page

What Is a Hedonic Model in Economics? - Real Estate. (2022, Sep 02). Retrieved February 1, 2023 , from

Save time with Studydriver!

Get in touch with our top writers for a non-plagiarized essays written to satisfy your needs

Get custom essay

Stuck on ideas? Struggling with a concept?

A professional writer will make a clear, mistake-free paper for you!

Get help with your assigment
Leave your email and we will send a sample to you.
Stop wasting your time searching for samples!
You can find a skilled professional who can write any paper for you.
Get unique paper

I'm Chatbot Amy :)

I can help you save hours on your homework. Let's start by finding a writer.

Find Writer