A while ago, we finished a challenging and fun project with our colleagues from Adastra Czech assessing a bank client’s income. With our colleagues Dagmar Binova, Ksenya Degtyar, Nainika Medhi, and Tomas Wolf, with the help of Alisa Stasevich, and Barbora Gaislerova, we achieved over 80% accuracy (explained later). The client, an established bank on the Czech… Read more
A while ago, we finished a challenging and fun project with our colleagues from Adastra Czech assessing a bank client’s income.
With our colleagues Dagmar Binova, Ksenya Degtyar, Nainika Medhi, and Tomas Wolf, with the help of Alisa Stasevich, and Barbora Gaislerova, we achieved over 80% accuracy (explained later).
The client, an established bank on the Czech market, wanted to simplify the loan application process for New-to-Bank clients. Among many improvements, verification of an income had to be automated. Until recently, clients had to deliver paper income proof.
We were allowed to use only secondary data. Hence, no transactional or paid external data. Consequently, we turned to ML techniques to deliver over 80% accuracy.
As for data, we were able to tap into three sources of internal data and three sources of external data. We selected only reliable and predictive variables. In the end, we had over 150 predictors (simple, complex, and everything in between).
As a result of the income distribution, we had to find the right balance between prediction error and stability. Hence, we ended up with a two-step model. Starting with Logit to split income groups between “standard” and “high,” and xGBS (Extreme Gradient Boosting) ML technique to estimate the actual income.

Logit validated at GINI .82, see the figure below.

How did we measure a successful estimate?
Using WPE “Windowed Percentage Error,” we achieved high accuracy of over 80% in the areas “standard” and 60% for “high” income groups. Both within one band of 20% from the validated value.
To achieve higher accuracy, we recommend using PSD2 transactional downloads combined with our in-house transaction categorization engine. However, if no transaction data are available, we recommend utilizing external data sources to improve accuracy further.