Freddie Mac is one of the largest secondary mortgage market actors. Financial institutions sell their mortgages to Freddie Mac, which stimulates the market as the banks know they can – if they evaluate their customers correctly – reduce their risk by having a larger institution take the burden of a default were it to occur. Freddie Mac on the other hand will evaluate the portfolios they buy to make informed decisions on which mortgages should enter their portfolio. Freddie Mac has been chartered by the US congress and its mission is to “provide liquidity, stability and affordability to the nation”. As part of their transparency push, they make available the Single-Family Loan Dataset that, as the name implies, covers mortgages to single family households.
In this coursework, you will develop a fully compliant PD model from the data they make available, from the raw data to the level 2 calibration, using what you have learned in the lectures. The objective of the coursework is to estimate the capital requirements for Freddie Mac as if they were a bank.
You are given information from approximately two million loans, corresponding to operations that originated between the years 2014 and 2018. The data includes information from the origination of the loan (variables present in the table “Origination Data File” in the user guide) which can be used to predict the performance of the loan, and the information from the last time the loan was observed in the dataset (variables present in the table “Monthly Performance File” in the user guide) which are NOT for prediction but can be used to understand costs and benefits for each loan (question 5 and 6). You are also given a Default flag (variable “Default”) which marks if the mortgage has defaulted for the purposes of the coursework.
With this information, the dataset, and your knowledge from the course, answer the following questions:
1. (25%) Clean the dataset so it is ready to apply models to it. Discuss all your decisions.
2. (15%) Calculate the WoE and perform the variable selection procedures you see fit. Explain your decisions.
3. (20%) Construct a scorecard which can model the probability of default for the loans. Discuss your choice of variables, embedded selection methods, choice of parameters of these and your final performance in terms of AUC. How many variables do you recommend using?
4. (20%) Compare your scoring model with an XGBoosting model and Random Forest model trained over the data without the WoE transformation. Use cross-validation to determine your optimal parameters, if necessary, discuss the accuracy metrics you deem relevant. Compare the performance of the three models and discuss your findings.
5. (10%) Discuss the variable importance for all models. Do they agree? Why? Design a twocut-off point strategy for your scorecard and discuss its results.
6. (Extra credit, 20% See extra submission tab in OWL) Using the monthly macroeconomic information you consider relevant (see for example https://stats.oecd.org/Index.aspx), calibrate a long-run PD for the loans granted. For this, first segment your scorecard curve
FM 9528 – Banking Analytics Coursework 2
into 7 to 15 groups, then regress your monthly Freddie Mac’s PDs (grouped from your objective variable) against the macroeconomic variables and the past PDs as discussed in the additional material left in OWL. Use the long-term forecasts you can find online from reputable sources (for example the OECD) for your long-term calibrated values. If you cannot find them, assume a value which makes sense to you and explain why. Analyse your results.
The remaining 10% is given by the format and style as discussed in the rubric.
Conditions of the coursework
Software: You must use Python to run the numerical calculations over your portfolio. A copy of your jupyter notebook must be attached to the coursework as an appendix in readable format, and a link to the notebook must also be included. Instructions how to export to PDF can be found here: https://stackoverflow.com/questions/52588552/google-co-laboratory-notebook-pdf-download. The notebook text MUST be machine readable (so no screenshots of the notebook please) otherwise a 25% discount will apply.
Word Limit: 2000 words +/-10% either side of the word count is deemed to be acceptable. Any text that exceeds an additional 10% will not attract any marks. The relevant word count includes items such as cover page, executive summary, title page, table of contents, tables, figures, in-text citations and section headings, if used. The relevant word count excludes your list of references and any appendices at the end of your coursework submission (including the code).