Profile photo

John Peter Flynn

馃搷 San Francisco, CA 路 鉁夛笍

I鈥檓 a Senior Machine Learning Researcher and founding employee at Pipio, where my team and I build video editing tools powered by diffusion transformers. We just released a new paper called EditYourself.

I completed my MSc in Computer Science at the Technical University of Munich, where I wrote my thesis on graph neural networks under the supervision of Prof. Matthias Nie脽ner in the Visual Computing and AI Lab. Prior to that, I studied at the University of California at Davis (go ags 馃コ), where I earned a BS in Electrical Engineering and Computer Engineering.

I鈥檓 currently interested in video diffusion and 3D computer vision, with emerging interests in world models and AI safety. I鈥檓 particularly drawn to research that bridges digital models and physical reality.

Publications & Projects

EditYourself teaser

EditYourself: Audio-Driven Generation and Manipulation of Talking Head Videos with Diffusion Transformers

2026

John Peter Flynn*, Wolfgang Paier*, Dimitar Dinev, Sam Nhut Nguyen, Hayk Poghosyan, Manuel Toribio, Sandipan Banerjee, Guy Gafni

A diffusion-based video editing model for talking heads, enabling transcript-driven lip-syncing, insertion, removal and retiming of video while preserving identity and visual fidelity. Please refer to Ethical Considerations (Section 5.1) in our paper.

STINet output animation

STINet: Surface Texture Inpainting Using Graph Neural Networks

Master's Thesis, 2022

John Peter Flynn, Matthias Nie脽ner

Master鈥檚 thesis exploring graph neural networks for completing texture on partially-textured 3D meshes. Unlike traditional 2D image inpainting, STINet operates directly on mesh surfaces, leveraging vertex positions, normals, and connectivity to predict vertex colors. This represents the first application of GNNs to surface texture completion.

ORB-SLAM demo

Stereo-Camera ORB-SLAM for Indoor 3D Reconstruction

Course Project at TU Munich, 2019

John Peter Flynn, Parika Goel

Real-time stereo-camera SLAM system implementing sparse indirect feature correspondence methods with a covisibility graph for storing spatially-related keyframes and landmarks. Features tracking, local mapping, and loop closing threads for robust camera localization and 3D reconstruction.