Tous les articles par stephan

Active Learning and Autoencoders in Banking Fraud Detection (ALEA)

Fraud detection and operational risk management are two main challenges faced by banks and financial institutions worldwide. Traditionally, static rule-based controls have been effective for uncovering known fraud patterns. However, with the rising anti-fraud requirements, it is essential to take fraud detection to the next level and detect emerging fraud types in a proactive manner. Unlike traditional solutions in the market, our solution works by using advanced analytics, dynamic profiling and machine learning to build up highly accurate customer profiles. All transactions linked to an account are continuously monitored across all channels and compared against the customer profile. The result is a massive reduction in the number of false positives, thereby maintaining an excellent customer and user experience. Machine-learning algorithms discover new fraud schemes, helping banks stay on top of emerging threats.

One of the challenges we have is the following: they receive a lot of data from the banks (customer transactions, ebanking activity) which are used by the software to perform fraud detection, and provide the banks with a score. However, the banks do not generally update us on whether the prediction of the algorithm was correct or which fraud cases were identified (as these checks are done manually by other bank departments). We are keen to develop more advanced machine-learning tools for their banks, which realise the importance of having data (transactions) which are labelled as fraud or non-fraud by bank staff. However, the labelling of the data is time consuming and has to be done manually by bank staff. This means that it is practically impossible to label all the data and we need to limit the number of requests for labelling.

Another challenge concerns the fraud detection with new methods. The most advanced companies in the financial industry (PayPal for instance) have been pioneering advanced machine learning approaches such as deep neural networks. In this project we would like to use a particular form of them, autoencoders, which takes the core features of an input and reverses the process to recreate the input, keeping only the key features. It means that if the autoencoder tries to reconstruct something it does not recognize it won’t be able to reconstruct the main features. We assume therefore that using autoencoders for frauds and anomalies will suffer a high reconstruction error which should lead to a good detection rate of frauds.

Deep Learning (incl. autoencoders) is becoming a mature technology, showing results in many fields, such as image and video recognition, text processing or complex robotic tasks such as autonomous car driving. In fraud detection domain, Deep Learning is still a work in progress and the specific setup (unbalanced, unlabeled) is a real challenge. However, the modeling capacity of Neural Networks is almost unlimited and would allow to combine both transactional and internal audit data sources, while reaching high true positive and low false positive rates.

CollaborationNetGuardians SA

Optimizing Operating Rooms and Care Services using Deep Reinforcement Learning (OPERATE)

Nowadays the method for scheduling appointments relies on the availability of the medics, the care providers, the patients and the required resources (rooms, material, etc.) at a particular time. Once all is available, a surgery reservation is confirmed. However experience shows that the closer the intervention’s date is, the higher the risk of unavailability or something that disrupts the scheduling, becomes. The orchestration of these factors of uncertainty is currently managed by human experts: based solely on their experience, they are able to manage the exceptions whenever they occur and even anticipate them (for the best ones).

The complexity of the surgical scheduling problem has often led researchers to focus on one aspect of the problem at a time. The main advantage we have in this project is that we have access to data. So we will be able to study these records and calculate estimates of variability in time or cost for each task. We will base our work on what is best in research nowadays and improve it with recent research on deep learning. We aim to create a dynamic model which takes into account uncertainties. At first one has to select the features which will be the most significant for us to consider in order to maximize the operating room (OR) occupancy. Actually only few people use real data to optimize and adjust the scheduling techniques according to existing literature.

We can observe that dynamic learning techniques were used but
their model are concerned with the booking of patients into OR rather than (also) the scheduling of the OR themselves. Other people develop a batch scheduling framework to book a set of surgeries into an ordered set of available OR. OR booking is mainly concerned with the balance between OR utilization and OR overtime. They approximate the sum of procedure durations to a normal distribution and provide near-optimal solutions for stochastic scheduling and show that batch scheduling exhibits a better performance rather than open booking (sequential booking). Open booking books the first surgery case arrival to the first available and appropriate (in case of specialty, time available, etc.) slot. What we propose here is to optimize not only statically but dynamically based on the data we have at disposal and set up a schedule which has inputs and outputs, as an inventory system, the aim being to minimize the unused/wasted minutes here.

Effectively we will create a new scheduler, modelled as a direct acyclic graph (DAG), which is a modern way to represent scheduling problems and which can include complex dependencies and heterogeneous demands, in addition to be flexible and efficient. Note here that in relaxing one or the other of the constraints (i.e scheduling an independent set of heterogeneous tasks) leads to NP-complete problems already. Furthermore it is possible to automate it, i.e. when adding a new constraint, which would require a “re-design” with standard methods. Algorithms over graphs are generally designed by human experts but to meet very strong performances it becomes more and more challenging as the algorithmic literature is limited. What we aim to do here is to apply deep learning methods to challenging graph based optimization problems. The first problem is that the graph-structured data needs to be “transformed” into a Euclidian space before deep learning methods can be used (instead of using vectors as it is usually done). There is a family of representations called graph convolutional neural networks but they are quite specific to particular graphs, inspired essentially from images. There is a need of further understanding in this area and an adaptation to our scheduling problem (DAG). Once this step is done we would like to apply deep neural network techniques, or reinforcement learning methods to the problem of obtaining optimal schedules. The novelty here is to combine neural networks architectures and reinforcement learning methods to downstream graph optimization resulting in a new state-of-the art performance for scheduling. The second step concerns the learning part, from the sequence to the schedule. As stated the “timeseries” format is very well adapted for neural networks and that is especially why this intermediate step is necessary. Advantage of this representation is that it is not limited. The learning part has to be investigated as well. We suggest recurrent neural networks to be compared with reinforcement learning techniques to maximize the reward over time. Applying neural networks techniques based learning on graphs can lead to much more flexibility for being able to optimize complex schedules over time.

CollaborationCalyps SA

Automated Dynamic Machine Learning for time based forecasts

Research in efficient data exploitation continues to demonstrate major breakthroughs world-wide, especially with the advent of large volumes of available data. Unfortunately, most of this research isstill restricted to researchers and therefore medium and small company are unable to take advantage of these advanced technologies due to a combination of lack of internal skills and the large upfrontcosts and time necessary to master these new data technologies.

We aim to design innovative services and products to help companies by facilitating their access to efficient data exploitation technologies and therefore to help them stay ahead of their competitors by endorsing the right, cost-effective, data-driven approach. The data analysis domain addressed by Predictive Layer is Time Series analytics. In this respect, impressive results have already been achieved inthe case of energy consumption prediction. 

We shall base this approach on the automated analysis of the topology of the problem that is being treated. In other words, it willcreate, dynamically during the learning process, the best features ofrepresentation of the time series, based on previous explored andtested features.

CollaborationPredictive Layer SA