Post written by Erin McMahon:
The Ai4Finance conference took place on August 22nd at the Williamsburg hotel, bringing together technical and business thought leaders in all areas of finance. Special thanks to Ai4Finance and Women in Machine Learning & Data Science for sponsoring my ticket.
The conference spanned a variety of topics including cyber security, robo-advising, and using alternative data for trading strategies. Many of the presentations are available here.
Here were my top takeaways:
From Lessons Learned (so far) Applying ML to Fraud Detection: Sue-Wie Chen, Managing Director, Barclays
Data science is a cycle consisting of:
1) business understanding
2) data understanding
3) data preparation
While the majority of academic research focuses on 4&5, the most vital part of a robust data science process is framing the question (1&2) by understanding the task of the ML model as well as understanding the data to limit the scope of models considered. Finally and most important, data scientists need to grasp the real-world implementation (5) and make sure ML models work within the dimensions of the system including time constraints, data storage, and the backend pipeline.
From Modeling Speech & Language to Modeling Financial Markets: Li Deng, Chief AI Officer, Citadel
The three biggest challenges to applying ML to finance are:
Low signal-to-noise ratio
Other fields like speech and text have clear and distinguishable signals with large variety of training samples. Although there is a ton of financial data out there, the low signal-to-noise means we have to treat it like “small” data in a lot of ways. Most of the successful methods to deal with financial data are of the Bayesian ideology: exploiting structure in the data, using prior knowledge, and regularization.
Strong stationary with adversarial nature
To put this in layman’s terms, the financial world is continuously changing, and it is changing in ways that defy past experience. This is probably the hardest problem to solve in finance. I like to conceptualize this as a series of ever evolving positive and negative feedback loops. Because we make models in a marketplace of changing dynamics, we are constantly updating those models as they gain and lose traction. In a speech model, what works today for a given population will work in the next 100 years. But, in the world of finance, a model can lose efficacy within a year.
Heterogeneity of (big) alternative data
As I said before, there is a lot of financial data out there, but it is hard to prioritize all the different sources of data. Twitter vs. financial markets data vs. fundamental analysis: how do we put this together in a comprehensive model? In some ways, the markets are the best example of the messiness of human nature. So, it’s important to include all these heterogeneous data sources but hard to combine in one model.