Credit scoring models that include alternative data have proven their predictive power in a wide variety of use cases across the financial services industry. We generally define alternative data as differentiated and incremental to trade-line data reported by the three national credit bureaus. Models driven by alternative data have a variety of use cases such as:
- Standalone scores that are often used alongside traditional scores like FICO or VantageScore to provide additional segmentation.
- Standalone alternative data scores for consumers lacking sufficient credit bureau history to be traditionally scored.
- Building scores that combine alternative data with other traditional credit tradeline data at the attribute level to create a more predictive solution.
Alternative data generally captures information not contained within traditional tradeline data sources. When alternative and traditional data is combined at the score or attribute level, the combination creates a broader information set that can be used to generate a more robust score. This combined set can fill in consumer behavioral gaps that may be exposed when considering only credit tradeline data.
The benefit of a combined data set is obvious in a thin file scenario where the consumer has little or no credit tradeline data. However, in the case of a consumer with a robust scoreable credit file, it is easy to get the false perception that alternative data shows little to no lift over purely tradeline data.
A closer look will show that alternative data combined with traditional tradeline data can more accurately predict delinquency and charge-off. By grouping poorly performing loans more appropriately in the lower score bands, these blended models can give the lender a competitive advantage when it comes to pricing across all credit tiers.
In order to achieve this additional segmentation of consumers with a full credit file, certain considerations must be taken when building the model. If a modeler reduces the number of variables based on ranking the predictive power of the individual variable’s correlation to the dependent variable and only keeping those variables that perform over a certain threshold the model can leave out variables that are uniquely predictive and lose overall predictive power.