Many factors contribute to the process of determining the value of a residential property. Two of the key components are quality and condition. For example, a newly renovated home that has a gourmet kitchen and an updated bathroom may be valued more highly than a similar home that possesses only above-average amenities and is in average condition.
Unfortunately, the ability to obtain information on individual property data about these two factors is hard to come by, short of an interior physical inspection. This is why most automated valuation models (AVM) default to assessing “average” quality and condition. As a result, it is often difficult to differentiate (and accurately value) properties at the above-average and below-average ends of the spectrum.
However, there is now a way to identify characteristic and condition differences by searching out words used to describe these seemingly hard to distinguish homes. And, as lenders and other AVM users will be pleased to note, they are not the ones who need to do the detective work — CoreLogic automated tools are doing it for them.
Difficult, But Not Impossible
It is feasible to obtain details on a property’s quality and condition; Fannie Mae and Freddie Mac have even established specific definitions for rating and standardizing these factors. The GSEs require appraisers to evaluate them on a scale from C1/Q1(best) to C6/Q6(worst). Detailed definitions of these rating points can be viewed in the Fannie Mae and Freddie Mac Uniform Appraisal Dataset Specification.
However, outside of the GSEs who can access large stores of appraisal data, it is difficult to get a full picture of property quality and condition. Thanks to the advances in machine learning and cloud computing, it is now possible to extract information from nontraditional data sources — namely, texts and images from Multiple Listing Service (MLS) data.
Realtor Comments Hold Vital Info
Below is part of a real estate agent’s comment for a property which was sold recently and appraised as a C5 rating:
“Great opportunity to own a large home on a large lot with tons of possibilities!”
That’s a typical comment one might find for a property that was appraised as a C5 rating. By the GSE definition, a C5 is a home in need of some significant repairs — and from the comment, it appears the agent hinted so when the property was listed for sale.
CoreLogic has developed a model that leverages various machine learning techniques, in combination with its rich appraisal and MLS data assets, to extract property quality and condition information from the comments of agents who have written descriptions of these homes in their listings with knowledge of the subject property and the surrounding neighborhood. There are a variety of words commonly used when listing properties for sale – some terms that frequently align with highly-rated homes (C1s and C2s), while others repeatedly appear for lower-rated homes (C5s and C6s).
A Cloud Provides a Clearer Picture
According to the GSEs, a property rated as C6 has substantial damage and requires substantial repairs and rehabilitation. So, one is likely to see a listing with keywords and terms such as TLC…needs repair…fixer-upper…fantastic opportunity…and rehab. Counting the appearance of negative and positive words is a simple way to arrive at indicators of quality and condition. And with the assistance of a sophisticated machine learning model, it becomes even easier to derive meaning from text. Think of it like a “word cloud” – the most commonly used words will appear larger in the cloud and thus will more likely signify an above- or below-average property condition.
Predictive learning techniques such as these are a must in future automated valuation models in order to provide the most accurate valuation possible in situations where an in-person inspection or appraisal is not feasible or desirable. This property condition predictor is just one of several technology innovations underlying CoreLogic’s new Total Home Valuex AVM, with the goal of providing the most accurate AVMs with the highest hit rate in the industry.