Machine learning researcher-engineer

Generative AI, interpretability, and adaptive systems.

EPFL MSc Data Science, OpenAI contractor, ETH research work, publications at NeurIPS/TMLR, and ML systems experience at Oracle Labs.

I like work where careful modeling, experiments, and engineering all matter: understanding model behavior, building robust learning systems, and turning research ideas into working artifacts.

Contact Download CV Google Scholar

Generative AIMechanistic InterpretabilityProbabilistic ModelsReinforcement LearningML Systems

Background

My background

I move between academic ML research and practical systems work: understanding how models behave, then building tools and experiments that make that understanding useful.

Current work

Contractor at OpenAI

Contract work connected to frontier AI systems, adding production-facing model experience to my research background.

Graduate training

EPFL MSc Data Science

My MSc at EPFL gave me a broad ML foundation: NLP, diffusion models, RL, probability, and data visualization. GPA 5.83/6; Teaching Assistant for Modern NLP.

Adaptive systems

ETH Research Assistant

At ETH, I work on adaptive learning systems, especially test-time meta reinforcement learning for policies that adapt during deployment.

Publications

NeurIPS and TMLR publications

My publications span diffusion interpretability, sparse autoencoders, probabilistic circuits, and tensor factorization theory.

Systems experience

Oracle Labs Intern

At Oracle Labs, I worked on ML systems for cloud-security defense inside Spark-scale pipelines processing billions of events.

Questions

Research

I am interested in machine learning problems where empirical behavior, theory, and engineering constraints all matter.

How can we understand and control what neural networks compute?

How can systems learn to learn?

How can reinforcement learning become more sample-efficient and reliable?

Conference

Feature interventions in SDXL Turbo, SDXL Base, and Flux-schnell image generation

Conference NeurIPS 2025 · Published

One-Step is Enough: Sparse Autoencoders for Text-to-Image Diffusion Models

Applies sparse autoencoders to analyze text-to-image diffusion models, contributing to interpretability methods for modern generative systems.

Co-author Diffusion ModelsSparse AutoencodersInterpretability

Paper

Journal

Journal Transactions on Machine Learning Research · 🏅 Featured Certification, 2025

What is the Relationship between Tensor Factorizations and Circuits (and How Can We Exploit it)?

Establishes a rigorous connection between tensor factorizations and probabilistic circuits, unifying model families and exposing new architecture-search opportunities.

Co-first author Probabilistic CircuitsTensor FactorizationsTractable Inference

Paper

Workshop

Workshop 6th Workshop on Tractable Probabilistic Modeling, UAI 2023

Unifying and Understanding Overparameterized Circuit Representations via Low-Rank Tensor Decompositions

Unifies overparameterized probabilistic circuit architectures and studies low-rank decompositions as a way to understand and compress expressive layers.

First author Probabilistic CircuitsLow-Rank ModelsTensor Decompositions

Paper

Workshop NeurIPS 2025 ResponsibleFM Workshop

Liminal Training: Characterizing and Mitigating Subliminal Learning in Large Language Models

Supervised student research on characterizing and mitigating subliminal learning behavior in large language models.

Supervisor LLMsResponsible AISupervision

Paper

Workshop AAAI 2026 XAI4Science Workshop

Diffusion Transformers use Sink Registers

Supervised student research investigating sink-register behavior in diffusion transformers.

Supervisor Diffusion TransformersXAIMechanistic Interpretability

Paper

Experience

Selected roles

Jul 2024 - Jan 2025

Oracle Labs

Research Assistant

Developed machine learning systems for cloud security defense in a large-scale Spark pipeline.

Enhanced a pipeline processing billions of events daily.
Implemented anomaly detection models using PyOD and custom approaches.
Engineered data-loading mechanisms for large-scale model training.

Sep 2025 - Feb 2026

ETH Zurich

MSc Thesis

Investigating test-time meta reinforcement learning for policies that adapt during deployment.

Designing a meta-learning framework for rapid policy adaptation.
Studying how to reduce gradient steps needed at test time.
Bridges reinforcement learning, meta-learning, and adaptive systems.

Jul 2025 - Present

Algoverse LLC

Research Mentor

Mentoring students on mechanistic interpretability research for LLMs and diffusion transformers.

Mentored 12 students on frontier model-interpretability projects.
Supervised workshop publications on LLM and diffusion-transformer behavior.
Helped students turn early research ideas into concrete experiments and submissions.

Sep 2023 - Mar 2026

EPFL

MSc Data Science

Graduate training in machine learning, NLP, diffusion models, reinforcement learning, and probability.

GPA: 5.83/6.
Teaching Assistant for CS-552 Modern Natural Language Processing.
Coursework includes ML, NLP, RL, visual intelligence, probability, and data visualization.

Writing

Technical notes

Collage preview for a Flux architecture article

July 2025

Understanding Flux: Diffusion and Flow Models

A high-level technical walkthrough of diffusion models, rectified flows, diffusion transformers, and the Flux architecture.

June 2025

Wrapped up my final semester of courses and launched this website as a research-engineering profile hub.

January 2025

Our work on tensor factorizations and probabilistic circuits was accepted to TMLR and received Featured Certification.

Contact

Open to research and ML engineering conversations.

Best way to reach me: antonio.mari02 [at] outlook [dot] com

LinkedIn GitHub Scholar