Interview
Data and Machine Learning Operations applied to Loan Data
Interview with Luca Borella, CEO of Algoritmica GmbH
We are on the verge of a revolution. Numerous changes are coming our way, making it difficult to keep track. To keep up with the latest innovations in big data and artificial intelligence, we sat down with Luca Borella, CEO of Algoritmica GmbH.
Algoritmica spent the last three years building data infrastructure for enabling data and machine learning operations (Data and MLOps) within the lending domain. Luca explained the key learnings when designing loan data architectures and building data products.
BFI: What was the inspiration for Algoritmica and what problem are you trying to solve?
Luca: Although data is the primary input of financial products and services, we were very surprised to find abundant data available with very few organizations utilizing it. Let alone applying AI paradigms to it!
For the past 30 years or so, analytics vendors have disintermediated data providers and data users by re-packaging datasets from various data providers and providing data users with turnkey solutions, which include data, models and analytics. Such packaged solutions have technical limitations, fosters lock-in and hinders innovation.
Algoritmica aims to address this by creating a middle-layer that empowers data users to construct their own analytics solutions, allowing them to train models on the datasets most relevant to their specific use case.
BFI: What kind of organizations can benefit from Algoritmica’s middle-layer? In what way do they benefit from it?
Luca: The middle-layer specializes in assisting data providers and data users in the credit domain, aptly named “deeploans”.
Let me start from the data providers. The total universe of datasets is huge: Neudata, a data broker, scouted circa 7,000 alternative datasets, and among the 1,643 datasets tracked, the predominant delivery channels are through websites (66%), APIs (57%), FTP, also known as "bulk download" (39%), and email (32%). Notably, only 22% of datasets are delivered via AWS (i.e., AWS cloud marketplace). Our experience suggests that even when data providers offer APIs, these are often not treated as actual "products" and lack a comprehensive developer experience. For data providers, deeploans offers a self-managed solution, eliminating the need to deal with delivery in order to stay focused on content creation and aggregation.
When it comes to data users, Cognilytica indicated that they allocate the majority of their time not to model training or interpreting model outputs but rather to essential activities like data cleansing (15%), labeling (25%), augmentation (15%), and aggregation (10%). When the data provider's data undergoes processing through deeploans, it goes through a comprehensive cleaning, enrichment, and integration. This ensures that data users like banks, mortgage and speciality finance providers, insurance companies, and asset managers can seamlessly integrate their preferred BI and AI tools, thereby significantly improving decision-making at scale.
BFI: What’s the key innovation in deeploans?
Luca: Deeploans revolves around the concept of a “lakehouse”, combining the strengths of traditional databases, the reliability of data warehouses and flexibility of data lakes. This intelligent lakehouse, through advanced feature engineering and time-series analysis, recovers up to 80% of data typically lost due to quality issues. Notably, deeploans can be deployed anywhere.
BFI: What was the biggest challenge for you when setting up shop?
Luca: Like many startups, our foremost challenge was fundraising. Venture capitalists and investors seek traction and scalability, which we lack at the moment. Thankfully, the accessibility of cloud resources, particularly through the Google Cloud Startup program, allowed us to validate key proof points.
To navigate the sluggish fundraising environment, we deployed deeploans on top of a loan-level data provider and went after its drop-off data users and those users who wanted to upgrade their data experience.
This strategy slowed product development but kept us afloat. We are currently seeking a pre-seed round to increase our sales outreach, build more product features, and onboard more data providers.
BFI: By how much is the US ahead of Europe when it comes to data and AI?
Luca: The data and AI market is currently dynamic, particularly in the United States, where big techs and established financial analytics providers like Bloomberg, S&P Global, and Moody's have their headquarters and production hubs. These entities enjoy easier access to both financial resources and a skilled workforce.
Notably, strategic collaborations are taking shape across the Atlantic. For instance, the London Stock Exchange Group (LSEG), owner of Refinitiv, a prominent financial data provider, is joining forces with Microsoft to enhance AI-based analytics for its existing LSEG clients.
Highlighting some leading US startups, Crux, founded in 2017, secured an impressive $50 million USD in Series B funding in 2023. Similarly, Bobsled, a rising star, garnered $7 million USD in a Seed round in January and followed up with a $17 million Series A in April.
While Europe excels in regulatory frameworks and standards, these advantages may not be as pronounced when launching a startup.
BFI: What are the key takeaways to keep in mind for professionals that are building loan data operations?
Luca: I would say that data architecture needs to follow the structure of the data that they are processing. If your aim is to process loan data, make sure you have an architecture that’s cloud-based (i.e. scalable and flexible). Otherwise, if you're aiming to do this on an excel sheet, you're probably on the wrong path.
Moreover, leveraging big data and AI takes commitment and it is definitely not an “install and forget” system. It needs to be looked after by experts. They can be internal, so you can hire a team of data professionals, or you can outsource part of the expertise and use a software-as-a-service solution.
Lastly, there is no artificial intelligence versus human intelligence. It's a combination of both!