Jobs

Openings

PhD Student Position: Data Scientist for Fraud Detection

Title: Active Learning and Autoencoders in Bank Fraud Detection

The researcher will participate to a research project in collaboration with NetGuardians which regularly ranks among the most innovative Fintech start-ups in Europe. Traditionally, static rule-based controls have been effective for uncovering known fraud patterns. However, with increasing anti-fraud requirements, it is essential to take fraud detection to the next level and detect emerging fraud types in a proactive manner. Unlike traditional solutions in the market, we are using advanced analytics, dynamic profiling and machine learning to build up highly accurate customer profiles. Machine-learning algorithms discover new fraud schemes, helping banks stay on top of emerging threats. One of the challenges that we have is the following: we receive a lot of data from banks (customer transactions, ebanking activity) which are used to perform fraud detection, and provide scores to the banks. However, banks do not generally update us on whether the prediction of the algorithm was correct or which fraud cases were identified (as these checks are done manually by other bank departments). We are keen to develop more advanced machine-learning tools, which realise the importance of having labelled data (transactions) as fraud or non-fraud by bank staff. However, labelling of data is time consuming and has to be done manually by bank staff. This means that it is practically impossible to label all the data and we need to limit the number of requests for labeling. The aim of active learning algorithms is to determine automatically which transactions in the dataset will need to be labelled, and which do not need to be labelled. This choice is done among others based on the similarity of transactions between them and the popularity of certain features. The goal is to label as little as possible of the data, whilst at the same time maintain as much information and variety as possible to enable the development of a good machine learning model based on supervised learning. Another challenge concerns the fraud detection with new methods. The most advanced companies in the financial industry have been pioneering advanced machine learning approaches such as deep neural networks. In this project we would like to use a particular form of them, autoencoders, which takes the core features of an input and reverses the process to recreate the input, keeping only the key features. In other words, it is trying to learn an approximation to the identity function x=h(x) so as the output is similar to input. This network can be trained by minimizing the reconstruction error, which measures the differences between our original input and the consequent reconstruction. It means that if the autoencoder tries to reconstruct something it does not recognize it won’t be able to reconstruct the main features. We assume therefore that using autoencoders for frauds and anomalies will suffer a high reconstruction error which should lead to a good detection rate of frauds. This subject requires advanced mathematical methods such as complex optimization methods, Regularization techniques, Probability distribution, Kullback-Leibler (KL) divergence, etc. Deep Learning (including autoencoders) is becoming a mature technology, showing results in many fields, such as image and video recognition, text processing or complex robotic tasks such as autonomous car driving. In fraud detection domain, Deep Learning is still a work in progress and the specific setup (unbalanced, unlabeled) is a real challenge. However, the modeling capacity of Neural Networks is almost unlimited and would allow to combine both transactional and internal audit data sources, while reaching high true positive and low false positive rates. These methods will be implemented and tested in collaboration with NetGuardians. PhD students at HEIG-Vd will be enrolled in a university doctoral school (University of Neuchatel).

Requirements

  • MSc in mathematics (possibly related to data science and machine learning algorithms), eventually in statistics or in theoretical physics with excellent academic and publication records (for postdocs)
  • Very Good knowledge of Python, Java or a similar programming language
  • Excellent writing and verbal communication skills, as well as presentation skills. Besides proficiency in English, creativity, innovative and independent thinking is a must. He shows motivation to collaborate in an interdisciplinary international team, to participate in training programs, and is willing to travel to present his work to international conferences

Activity: 100%

Salary range: 50-60 kCHF
More information: Stephan Robert, stephan.robert@heig-vd.ch
Starting date: when available
Contract: 12 months, renewable until 4 years.

Applications, including a resume, a list of publications if any, and the name of at least three references (physical and email addresses, phones numbers) should be sent as soon as possible to stephan.robert@heig-vd.ch and stephanr@illinois.edu (MS Word, .pdf, .ps or plain text). Applications will be handled confidentially.

Our positions will also be advertised at:

Internships and Master Projects

I am happy to give internships and Master projects (min. 6 months, preferably mathematicians with very good skills in CS) to motivated non-HEIG-Vd students. If you do want to do an internship or a Master project with me, please send me a resume with your academic records and work or project experience, and explain me your motivation.