Visual comparison of text-to-motion generation results. This figure presents motions generated by existing methods such as Mo.Diffuse, MDM, MLD, and MotionLCM. Observations reveal persistent mismatch with input texts and missing motion details. ReAlign enhances these models to generate motions more consistent with the text inputs.
Comparison of text-to-motion generation performance on the HumanML3D dataset. Percentages in subscripts indicate improvements over respective baselines.
Comparison on the KIT-ML dataset. Since the models MLD and MLD++ for KIT-ML have not been released, we use the widely used MDM as the baseline.
Plug-and-play functionality of ReAlign. Performance enhancement of motion generation methods with step-aware reward guidance on HumanML3D. MLCM1/MLCM4 denote the 1-step/4-step model in MotionLCM. MDiff denotes MotionDiffuse.