Event Schedule & Agenda

We are working on building an interesting agenda full of great talks and discussions for you.
Keep connected for more info.

14:00 - 14:10 CET

Welcome From organisers 🎥 Recording

Ramon Salvado; Peter Maas

🎥 Recording
Welcome from Organisation comitee and introduction to Data Days

14:10 - 14:55 CET

Odin & Data Highway: A Nordic God Story 🎥 Recording

Erik Van Oosten; Iker

🎥 Recording
Both teams behing Odin and Data Highway will share their backstory, the process and the context that led them to build this incredible two products that are similar and different in equal proportions.

14:55 - 15:00 CET

Coffe break

Go! grab coffee, tea or water. This is about to start!!

Data Engineering Track

15:00 - 15:30 CET

How to anonymize data online fast 🎥 Recording

Mohamed (Zezo) Ahmed

🎥 Recording
With the introduction of the european GDPR laws in 2018, companies are required to change the way they are processing and storing personal information (PII) of their customers. This task is particularly challenging when considering that Big Data platforms running on Hadoop-based systems were not designed for mutability. We will present how the Data Strategy team @ mobile.de has build a GDPR-compliant data lake in the Google Cloud Platform.
Leveraging Delta Lake as a storage layer on GCS allowed us to implements the new requirements on a Petabyte scale. The use of BigQuery as an access layer provides additional capabilities for managing fine-grained access to PII fields which allows a privacy-first data approach without reducing comfort or speed for our users.

15:30 - 16:00 CET

How we use BumbleBee for our experiment analysis in Personalisation 🎥 Recording

Salvatore Lovecchio; Sacha Verrier

🎥 Recording
In the last quarters we started from BumbleBee to get metrics from our experiment and made a lot of change to compute advanced metrics.
Engineers adapted the pipeline to consume from either raw data or fact layer when possible, added a layer of prepared datasets from which we can compute all sort of metrics from simple count to advanced aggregations (session length, # session ending in a lead, dwell time, returning users, ...) Analysts work on nice dash-boarding in Tableau so we can also share our learnings from experiments in an efficient manner with the marketplaces.
The presentation will be made by a DE and a DA from P10N. We will quickly introduce our backend A/B testing setup, our adaptation of the BumbleBee pipeline and the Tableau dashboards built on top of these results

16:00 - 16:30 CET

Building data transformation pipelines with dbt and BigQuery 🎥 Recording

Atilla Erdodi

🎥 Recording
dbt (Data Built Tool) is a tool for developing, testing and deploying data transformation jobs, bringing software engineering best practices to SQL development. At the eCG Personalization team we use dbt with BigQuery to provide user profile data points for our tenants and features for downstream ML models. In this talk we will present how we have built our data transformation pipelines and how it fits with the rest of our data stack. We also show how dbt can reduce the required engineering effort for developing data products and democratize access to data.

16:30 - 17:00 CET

Kafka to Athena - Hands free streams archiving and exploration 🎥 Recording

Nicolas Goll Perrier

🎥 Recording
At Leboncoin, we have close to 200 kafka topics produced by micro-services covering all domains of the organisation (ad publication, ad validation, authentication, transactions, messaging...).
These topics are used by business intelligence, analytics, and machine learning teams, but first they need to be transfered to a more ""offline friendly"" storage with a proper query engine such as Spark or Athena.
Their schemas are also susceptible to evolve quite rapidly, beyond what the central data-engineering teams is able to cope with.
This talk will delve into the challenges and solutions we implemented to - using kafka connect - fully automate the process of discovering, normalizing, extracting, storing, and exposing those topics in our Hive metastore,without any human intervention along the process, to be enable greater agility for our downstream teams, and make the data engineering team less of a bottleneck when it comes to accessing data.

Data Engineering Track

17:00 - 17:10 CET

Coffe break

Go! grab coffee, tea or water. Get ready for the last sprint!!

17:10 - 17:55 CET

Networking Activity Challenge: Can you be the one getting more missions done?

Get to know your colleagues from ECG and Adevinta share their experiences.

We will go on a quest to discover who can discover more secrets and get to meet a more diverse group of people in our now bigger than ever familiy of Data-Lovers. There will be prizes for the winners.

17:55 - 18:00 CET

Wrap up

Let's close the day and announce the winners

14:00 - 14:20 CET

Welcome From organisers: ML Day 🎥 Recording

🎥 Recording
Welcome from ML Organisation panel and highlight on what's going on with ML in Adevinta

14:20 - 14:55 CET

Keynote from Abhishek Thakur 🎥 Recording

First Kaggle Triple Grandmaster

🎥 Recording
First Kaggle Triple Grandmaster Abhishek Thakur will share his learnings, experiences and vision on the future of ML.

14:55 - 15:00 CET

Coffe break

Go! grab coffee, tea or water. This is about to start!!

Machine Learning Track

15:00 - 15:30 CET

Online related-items recommendations with LightFM embeddings and ES 🎥 Recording

Victor Codina

🎥 Recording
In this talk we will present how we improved the performance of the most impactful product provided by Personalisation called Related Items, which presents users with similar ads to the one they are currently looking at. We currently generate recommendations for this product by a combination of two batch algorithms that run in an hourly fashion: a graph-based Collaborative Filtering (CF) algorithm and a text-matching Content-Based (CB) approach. This offline architecture has 2 main limitations: (1) low scalability / high computational costs incurred by precomputing all recommendations every hour, even though only 10-20% of them are actually served; and (2) the item cold-start problem, not being able to return recommendations for classified ads published in the last hour.
We addressed these limitations by moving towards an online hybrid recommendation architecture on top of Elasticsearch, powered by a neural recommender model called LightFM. This model learns item embeddings from user interactions which then are used to retrieve related-items recommendations in an online manner via efficient approximate nearest neighbour search. We will also talk about the novel streaming content ingestion pipeline built on top of Yotta’s self-serve services, which is key to be able to have access to near real time ad content data and generate recommendations also for new ads, which allows us to reach almost 100% coverage. The most recent A/B test on Segundamano.mx has shown a significant uplift in several user-centric online metrics, which validates the effectiveness of the new architecture not only in terms of cost reduction and coverage, but also in user engagement impact.

15:30 - 16:00 CET

Machine Learning Methods for Detecting Fraud in Online Marketplaces 🎥 Recording

Raoul Dekou

🎥 Recording
Connecting buyers and sellers in a safe and secure environment is one of the biggest challenges in online marketplaces. Probabilistic models built upon user-item databases address the challenge, but often encounter issues such as lack of stability and robustness. These issues are magnified in fraud scenarios where datasets are highly imbalanced, noisy and malicious users deliberately adapt their behaviors to avoid detection.
In this context, we leveraged the power of existing open sources machine learning libraries H2O and Catboost and designed a pipeline to collect, process and predict the likelihood of a private seller’s listing data to be fraudulent. We found that the stacked ensemble model provides the best performance (F1=0.73) when compared to other commonly used models in the field. Further, our models are benchmarked on a public Kaggle Dataset, TalkingData AdTracking Fraud Detection Challenge where we compared them to other studies and highlighted their generalizability and effectiveness at handling online fraud.

16:00 - 16:30 CET

Evaluation on User Behaviour based recommenders 🎥 Recording

Anton Lashin

🎥 Recording
Our team (PnR) provides recommendations for various markets, and in this talk we want to give an overview of how our solutions evolved, what were the driving factors, how do they work and what are the advantages and disadvantages of different approaches we saw during our journey.

16:30 - 17:00 CET

Personalised homepage recommendations in Marketplaats 🎥 Recording

Coen Pieterse

🎥 Recording
In this talk we will share our way of working and dive into our personalised homepage recommendations. The first half will focus on our ML development lifecycle, the tech stack, the challenges we faced and how we overcame some of them. In the second half we will show how we increased revenue and connections on Marktplaats by adding a personalised ranking layer on top of multiple existing recommender systems. We will discuss the details of its two core components, one for producing relevant content and one for adding a taste of inspiration. We will also provide an overview of the experiments we did in the past year to share our key learnings.

Machine Learning Track

17:00 - 17:10 CET

Coffe break

Go! grab coffee, tea or water. Get ready for the last sprint!!

17:10 - 17:55 CET

Guesspionage - How Good Do You Know Your Colleagues?

Get to know your colleagues from ECG and Adevinta share their experiences.

Quiz Challengue, Who knows more about ML?

17:55 - 18:00 CET

Wrap up

ML DataDays Committe

9:00 - 9:10 CET

Welcome From organisers 🎥 Recording

🎥 Recording
Welcome from Organisation comitee and introduction to Analytics Day

9:10 - 9:55 CET

Keynote: Prof. Dr. Arif Wider 🎥 Recording

Data Mesh and how to move towards it

🎥 Recording
Prof. Dr. Arif Wider is a full professor of software engineering at HTW Berlin and a fellow technology consultant (part-time) with ThoughtWorks Germany, where he served as Head of Data & AI before moving back to academia.

9:55 - 10:00 CET

Coffe break

Go! grab coffee, tea or water. This is about to start!!

Analytics Track

10:00 - 10:30 CET

How BIC is enabling the Data mesh paradigm 🎥 Recording

Sandra Real

🎥 Recording
It was mid of 2019 when the distributed Data mesh concept developed by Zhamak Dehghani arrived to BIC, and there we were able to identify some of the problems we were suffering while building analytics data for central teams.
From this moment on, we started slowly to shift central teams analytics data development to this new paradigm. It's been a long way to get there, as it is not only about engineering or data, it has a big impact on Data culture.
In this session we are going to get and introduction to Data mesh, how BIC is enabling it and some learnings and fails from the trip.

10:30 - 11:00 CET

Time Series Forecasting Methodology 🎥 Recording

Christina Keriti

🎥 Recording
An A-Z time series forecasting methodology including all steps from data preparation, exploratory analysis, model fitting/validation/selection (currently in the form of an R script, building a Shiny app is WiP). The idea is to help end-users with no coding skills, to generate robust forecasting and achieve standardisation and transparency of forecasting methods across the organisation. Tested on Vibrancy, SEO and Paid Digital use cases with promising results.

11:00 - 11:30 CET

Ramping up the AB testing effort in marketplaces 🎥 Recording

Fabio Venni

🎥 Recording
Taking from the experience of Q3 and Q4 at Subito, integrating the Houston tool and starting the upskilling programme we will show how challenging but rewarding is to move an entire organisation towards a more data driven product process and creating an experimentation culture.

11:30 - 12:00 CET

How we democratize data and extract value through Self-Service Analytics 🎥 Recording

Magda Filipska

🎥 Recording
The goal of self-service analytics is to enable everyone in Adevinta to leverage available data in making business decisions - not an easy task considering thousands of Adevintans are making decisions every day!
This presentation will give an overview of what we’ve learned so far about self-service analytics, share best practices and showcase some of our most successful tools.

Analytics Track

12:00 - 12:10 CET

Coffe break

Go! grab coffee, tea or water. Get ready for the last sprint!!

12:10 - 12:55 CET

Trivia "Night" on Data Analysis

Get to know your colleagues from ECG and Adevinta share their experiences.

TBC

12:55 - 13:00 CET

Event Will Start In

Why Attend Data Days?

Get Inspired

Meet New Faces

Learn Tech Trends

Boost your skills

Discover about your ECG colleagues

Networking

Event Schedule & Agenda

Day 1: Data Engineering

23 November

Day 2: Machine Learning

24 November

Day 3: Analytics

25 November

Welcome From organisers 🎥 Recording

Ramon Salvado; Peter Maas

Odin & Data Highway: A Nordic God Story 🎥 Recording

Erik Van Oosten; Iker

Coffe break

Go! grab coffee, tea or water. This is about to start!!

Data Engineering Track

How to anonymize data online fast 🎥 Recording

Mohamed (Zezo) Ahmed

How we use BumbleBee for our experiment analysis in Personalisation 🎥 Recording

Salvatore Lovecchio; Sacha Verrier

Building data transformation pipelines with dbt and BigQuery 🎥 Recording

Atilla Erdodi

Kafka to Athena - Hands free streams archiving and exploration 🎥 Recording

Nicolas Goll Perrier

Data Engineering Track

Coffe break

Go! grab coffee, tea or water. Get ready for the last sprint!!

Networking Activity Challenge: Can you be the one getting more missions done?

Get to know your colleagues from ECG and Adevinta share their experiences.

Wrap up

Let's close the day and announce the winners

Welcome From organisers: ML Day 🎥 Recording

Keynote from Abhishek Thakur 🎥 Recording

First Kaggle Triple Grandmaster

Coffe break

Go! grab coffee, tea or water. This is about to start!!

Machine Learning Track

Online related-items recommendations with LightFM embeddings and ES 🎥 Recording

Victor Codina

Machine Learning Methods for Detecting Fraud in Online Marketplaces 🎥 Recording

Raoul Dekou

Evaluation on User Behaviour based recommenders 🎥 Recording

Anton Lashin

Personalised homepage recommendations in Marketplaats 🎥 Recording

Coen Pieterse

Machine Learning Track

Coffe break

Go! grab coffee, tea or water. Get ready for the last sprint!!

Guesspionage - How Good Do You Know Your Colleagues?

Get to know your colleagues from ECG and Adevinta share their experiences.

Wrap up

ML DataDays Committe

Welcome From organisers 🎥 Recording

Keynote: Prof. Dr. Arif Wider 🎥 Recording

Data Mesh and how to move towards it

Coffe break

Go! grab coffee, tea or water. This is about to start!!

Analytics Track

How BIC is enabling the Data mesh paradigm 🎥 Recording

Sandra Real

Time Series Forecasting Methodology 🎥 Recording

Christina Keriti

Ramping up the AB testing effort in marketplaces 🎥 Recording

Fabio Venni

How we democratize data and extract value through Self-Service Analytics 🎥 Recording

Magda Filipska

Analytics Track

Coffe break

Go! grab coffee, tea or water. Get ready for the last sprint!!

Trivia "Night" on Data Analysis

Get to know your colleagues from ECG and Adevinta share their experiences.

Wrap up

Speaker

Do you want to participate?