Coursework Brief: 

Question 1 (65 marks)

Banks play a crucial role in market economies. They decide who can get finance and on what terms and can make or break investment decisions. For markets and society to function, individuals and companies need access to credit.

Credit scoring algorithms, which make a guess at the probability of default, are the method banks use to determine whether or not a loan should be granted. This requires banks to improve on the state of the art in credit scoring, by predicting the probability that somebody will experience financial distress in the next two years.

Historical data (cs-training.csv) are provided on 150,000 borrowers. The following variables are available to you:

Variable Name Description Type
SeriousDlqin2yrs Person experienced 90 days past due delinquency or worse Y/N
RevolvingUtilizationOfUnsecuredLines Total balance on credit cards and personal lines of credit except real estate and no installment debt like car loans divided by the sum of credit limits percentage
age Age of borrower in years integer
NumberOfTime30-59DaysPastDueNotWorse Number of times borrower has been 30-59 days past due but no worse in the last 2 years. integer
DebtRatio Monthly debt payments, alimony,living costs divided by monthy gross income percentage
MonthlyIncome Monthly income real
NumberOfOpenCreditLinesAndLoans Number of Open loans (installment like car loan or mortgage) and Lines of credit (e.g. credit cards) integer
NumberOfTimes90DaysLate Number of times borrower has been 90 days or more past due. integer
NumberRealEstateLoansOrLines Number of mortgage and real estate loans including home equity lines of credit integer
NumberOfTime60-89DaysPastDueNotWorse Number of times borrower has been 60-89 days past due but no worse in the last 2 years. integer
NumberOfDependents Number of dependents in family excluding themselves (spouse, children etc.) integer

The goal of Question 1 is to build a model from training dataset that banks can use to help make the best financial decisions on borrowers in testing dataset (cs-test.csv).

1.1 Carefully pre-process the dataset by considering the following activities:

•       Exploratory data analysis.

•       Missing value handling (if any). Marks will be discounted by just replacing by a value, a correct study of missing values is necessary.

•       Outlier detection and treatment (if any). Marks will be discounted by just eliminating or replacing by a value without justification, a correct study of outliers is necessary.


1.2 Build a credit scoring model in which Serious Dlqin2yrs is used as a target (default) and report the following:

•       What method do you use?

•       Why you use this method?

•       Discuss your results.

•       The most important variables

•       The impact of the variables on the target

•       The performance of the model.  Use various performance metrics and discuss their relationship if any.

•       What do banks win and lose by doing this?


In terms of software, use SAS Enterprise Miner or anything else (e.g., Python, R and so on).  Carefully report the various steps of your methodology and discuss your results in a rigorous way!


Question 2 (35 marks)


Find an academic or business paper published in 2019 or later discussing a real-life application of data mining or credit scoring. It is important that the case considered is a real-life case and not an artificial one.  Some suggested journals are:


•       Management Science

•       Operations Research

•       INFORMS Journal on Computing

•       INFORMS Journal on Applied Analytics

•       Journal of Machine Learning Research

•       European Journal of Operational Research

•       ICDM (The IEEE International conference on data mining)

•       NeurlPS (Conference on Neural Information Processing Systems)

•       KDD (ACM SIGKDD Conference on Knowledge Discovery and Data Mining)


Once you have found an appropriate paper, report the following in separate sections: •       Title, authors and complete citation (journal name, book title, issue, year, …)

•       The data mining problem considered

•       The data mining techniques used

•       The results reported

•       A critical discussion of the model and results (assumptions made, shortcomings, limitations, …)

Make sure you demonstrate that you understand what the article is all about!

WeCreativez WhatsApp Support
Stuck with your assignment? When is it due? Chat with us.
👋 Hi, how can I help?