2026
Pixel Motion Diffusion is What We Need for Robot Control
CVPR 2026
E-Ro Nguyen*, Yichi Zhang*, Kanchana Ranasinghe, Xiang Li, Michael S. Ryoo
DAWN introduces a unified diffusion framework for language-conditioned robotic manipulation using structured pixel motion between intent and action.
2025
Instance-Aware Generalized Referring Expression Segmentation
arXiv 2025
E-Ro Nguyen, Hieu Le, Dimitris Samaras, Michael S. Ryoo
Generalized referring segmentation with explicit instance-level alignment for stronger cross-category robustness.
Improving Contrastive Learning for Referring Expression Counting
arXiv 2025
Kostas Triaridis, Panagiotis Kaliosis, E-Ro Nguyen, Jingyi Xu, Hieu Le, Dimitris Samaras
Improves counting performance under referring-language supervision via stronger contrastive representation learning.
Pixel Motion as Universal Representation for Robot Control
arXiv 2025
Kanchana Ranasinghe, Xiang Li, E-Ro Nguyen, Cristina Mata, Jongwoo Park, Michael S. Ryoo
Explores dense pixel motion as a transferable intermediate space for language-conditioned robot control.
Vision-Aware Text Features in Referring Image Segmentation: From Object Understanding to Context Understanding
WACV 2025
Hai Nguyen-Truong*, E-Ro Nguyen*, Tuan-Anh Vu, Minh-Triet Tran, Binh-Son Hua, Sai-Kit Yeung
Introduces vision-aware text features for stronger object/context reasoning in referring image segmentation.
2023
V-FIRST 2.0: Video Event Retrieval with Flexible Textual-Visual Intermediary for VBS 2023
MMM 2023
Nhat Hoang-Xuan, E-Ro Nguyen, Thang-Long Nguyen-Ho, Minh-Khoi Pham, Quang-Thuc Nguyen, Hoang-Phuc Trang-Trung, Van-Tu Ninh, Tu-Khiem Le, Cathal Gurrin, Minh-Triet Tran
Video event retrieval pipeline for Visual Browser Showdown with flexible textual-visual intermediary representations.
2022
Flexible Interactive Retrieval SysTem 3.0 for Visual Lifelog Exploration
ICMR 2022
Nhat Hoang-Xuan, Hoang-Phuc Trang-Trung, E-Ro Nguyen, Thanh-Cong Le, Mai-Khiem Tran, Tu-Khiem Le, Van-Tu Ninh, Cathal Gurrin, Minh-Triet Tran
Interactive lifelog retrieval system for complex multimodal search in the Lifelog Search Challenge track.
Visual-Language Transformer for Referring Video Object Segmentation
CVPRW 2022
E-Ro Nguyen, Nhat Hoang-Xuan, Minh-Triet Tran
VLFormer for referring video object segmentation in the YouTube-VOS Challenge at CVPR Workshops.
2021
PointRend with Attention Fusion Refinement for Polyps Segmentation
MediaEval 2021
E-Ro Nguyen, Hai-Dang Nguyen, Minh-Triet Tran
Attention fusion and refinement strategies for robust polyp segmentation in the MediaEval Medico task.
Attention-based Hierarchical Fusion Network for Predicting Media Memorability
MediaEval 2021
E-Ro Nguyen, Hai-Dang Huynh-Lam, Hai-Dang Nguyen, Minh-Triet Tran
Hierarchical multimodal fusion for media memorability prediction under benchmark constraints.
No publications match your current search/filter.