Motion Tokenization & Gesture Synthesis

Work developed at Fraunhofer HHI as part of my research on human motion synthesis.

Motion Tokenizer

A discrete motion tokenizer for human body sequences, designed to produce compact, semantically meaningful representations of motion. The tokenizer serves as a reusable backbone for downstream generative tasks.

tk

Co-speech Gesture Synthesis (Supervised thesis project)

Built on top of the motion tokenizer above, this work was carried out by a student I supervised. The system synthesizes full-body gestures conditioned on speech audio, leveraging the discrete motion representation as its generative backbone.


Human Motion Synthesis (GANs)

Earlier work on generating human motion from limited training data using a sequential GAN architecture. Results include full walking cycles synthesized entirely from the model. This work was published at CVMP 2024 and received the Runner-Up Best Paper Award.

cvmp

Augmented and Virtual Reality

Head-coupled Perspective

A fun project where we created a window to a virtual 3D world :-) The algorithm works by detecting the user’s face and estimating its 3D position with respect to a camera. With this information, the program is able to simulate a head-coupled perspective. This application is mainly based in this article, and our repository can be found here.

hc-pers

Video Analysis

Mean Shift Tracking with Corrected Background

Simple mean shift tracking implementation which uses background information, based on the paper Robust mean-shift tracking with corrected background-weighted histogram by Ning et al. It attempts to reduce the interference of background information in kernel-based tracking. Please refer to the repository for the C++ code.

Mean Shift 1 Mean Shift 2 Mean Shift 3
Examples of mean shift tracking.

Abandoned Object Detection

Classification of stationary objects into abandoned or stolen, based on the paper Robust unattended and stolen object detection by fusing simple algorithms by San Miguel and Martinez (reference).

abandoned

Tomography and 3D Imaging

3D Image Reconstruction

Reconstruction of a mouse volume using filtered backprojection. The Matlab implementation can be found in the repository.

mouse

Vision for Multiple/Moving Cameras

3D Reconstruction

Reconstruction of a scene by extracting and matching interest points. The points were detected using the KAZE detector, and their features are provided by the DSP-SIFT descriptor. The reconstruction process consists of extracting the fundamental matrix, applying projective bundle adjustment and finally obtaining an Euclidean reconstruction.

Scene Point cloud
Scene and its point cloud.

Natural Scene Statistics of Fused Long Wave Infrared and Visible Light Images

This is an image processing piece about quality assessment, which can be found puclicly in IEEE Transactions on Image Processing: “Predicting the Quality of Fused Long Wave Infrared and Visible Light Images”, David-Moreno D.E., Benítez-Restrepo H.D., Bovik A.C. This work proposes fused image quality metrics and presents a subjective human study for their construction and validation. Please refer to the repository for more details about the implementation.

Additive white noise JPEG distortion Non uniformity distortion Blur
Examples of distortions (Gaussian, JPEG compression, Non-uniformity, blur) occurring to fused LWIR-visible light images.
Distortion features SVM results
Scatter plot of features extracted from distorted images / Scatter plot for our metric prediction scores vs subjective scores.

Complex Analysis

Tinkering with Julia and Mandelbrot sets

Visualisation of the filled-in Julia set and the Mandelbrot set. A simple implementation provided in C# can be found in here.

r.5 r.75 r1.25 r1
Representations of the filled-in Julia set.