Ioannis Koumarelas

Ioannis Koumarelas

Senior Data Scientist
PhD graduate in Data Quality

Veeva Systems

Professional Summary

Specializing in Data Science, Data Engineering, and Machine Learning to convert messy data into high-quality, usable information. Exploring LLMs and agentic AI to enhance automation and workflows.

Education

PhD in Information Systems

Hasso Plattner Institute

MSc Computer Science

Aristotle University of Thessaloniki

BSc Computer Science

Aristotle University of Thessaloniki

Interests

Data Matching Machine & Deep Learning Natural Language Processing Duplicate Detection Entity Resolution Data Quality Artificial Intelligence Data Engineering Large Language Models

Skills

Entity Resolution

Machine Learning

Data Cleaning

Data Engineering

Research & Development

Data Science

Experience

  1. Data Scientist / Senior Data Scientist

    Veeva Systems

    Creating and updating profiles of medical experts (Health Care Professionals – HCPs) while continuously monitoring and ensuring high data quality.

    • Processing millions of Health-Care Professionals’ activities (e.g., publications) to create and update millions of professional profiles.
    • Analyzing data to assess quality and support decisions, training machine learning models on large-scale curated datasets, and deploying them to production.
    • Turning ideas into experiments in Jupyter Notebooks and building production-grade, resilient PySpark pipelines. Tech stack includes AWS, Apache Airflow, Kubernetes, Docker, and more.
  2. Data Engineer / Full-Stack Engineer

    HPI Schul-Cloud

    Managed data workflows and improved content quality for the HPI SchulCloud platform, delivering content to multiple states and their schools across Germany.

    • Imported and scraped data using Scrapy; managed the platform’s database and content infrastructure based on Tomcat, PostgreSQL, and ElasticSearch.
    • Improved data quality through validation, enrichment, and consistency checks.
    • Supported multiple teams, broadening my full-stack expertise—from DevOps (Docker, Kubernetes) to backend development (JavaScript, later TypeScript), and front-end development (Vue.js, Next.js), as well as writing unit and end-to-end tests using Cucumber (Gherkin).
  3. Research Consultant

    SAP Concur
    In the first three years of my PhD, in collaboration with SAP and in particular SAP Concur, we develop approaches to perform data cleaning and deduplication on hotel datasets provided by our partners from Concur. Several methodologies were developed and the most notable ones produced two publications. Doing an applied PhD was a remarkable experience to get hands-on knowledge.

Education

  1. PhD in Information Systems

    Hasso Plattner Institute
    Thesis on Data Preparation and Domain-Agnostic Duplicate Detection. Supervised by Prof. Felix Naumann. Published 7 papers in top-tier journals and conferences.
    Read dissertation
  2. MSc Computer Science

    Aristotle University of Thessaloniki
    Specialized in Data Engineering and efficient calculation of Theta-Joins on large-scale data using Apache MapReduce.
    Read thesis
  3. BSc Computer Science

    Aristotle University of Thessaloniki
    Thesis on Recommender Systems on large-scale data using Apache MapReduce.
    Read thesis (in Greek)
Certificates
AI & LLM Engineering (Udemy)
Udemy ∙ October 2025

Completed during July - October 2025 a comprehensive series of courses covering modern AI engineering practices:

Courses completed:

Generative AI with Large Language Models
Coursera ∙ July 2025
Three-week course covering the complete LLM lifecycle: Transformer architecture and pretraining, fine-tuning techniques including Parameter Efficient Fine-Tuning (PEFT) with LoRA and Soft Prompts, and Reinforcement Learning with Human Feedback (RLHF). Explored Chain-of-Thought reasoning and the ReAct framework that underlies modern agentic AI systems.
Deep Learning Specialization
Coursera ∙ January 2021

Foundational specialization from Coursera on Deep Learning. Comprised of the following courses:

  1. Neural Networks and Deep Learning
  2. Improving Deep Neural Networks: Hyperparameter Tuning, Regularization and Optimization
  3. Structuring Machine Learning Projects
  4. Convolutional Neural Networks
  5. Sequence Models

Through it I got a hollistic refreshment and further expansion of my knowledge on the primary Deep Learning fundamentals and models.

Languages
100%
English
100%
Greek
25%
German