Publications

Use search, venue, and year filters to navigate quickly.

10 publications
2026

Pixel Motion Diffusion is What We Need for Robot Control

CVPR 2026

E-Ro Nguyen*, Yichi Zhang*, Kanchana Ranasinghe, Xiang Li, Michael S. Ryoo

DAWN introduces a unified diffusion framework for language-conditioned robotic manipulation using structured pixel motion between intent and action.

2025

Instance-Aware Generalized Referring Expression Segmentation

arXiv 2025

E-Ro Nguyen, Hieu Le, Dimitris Samaras, Michael S. Ryoo

Generalized referring segmentation with explicit instance-level alignment for stronger cross-category robustness.

Improving Contrastive Learning for Referring Expression Counting

arXiv 2025

Kostas Triaridis, Panagiotis Kaliosis, E-Ro Nguyen, Jingyi Xu, Hieu Le, Dimitris Samaras

Improves counting performance under referring-language supervision via stronger contrastive representation learning.

Pixel Motion as Universal Representation for Robot Control

arXiv 2025

Kanchana Ranasinghe, Xiang Li, E-Ro Nguyen, Cristina Mata, Jongwoo Park, Michael S. Ryoo

Explores dense pixel motion as a transferable intermediate space for language-conditioned robot control.

Vision-Aware Text Features in Referring Image Segmentation: From Object Understanding to Context Understanding

WACV 2025

Hai Nguyen-Truong*, E-Ro Nguyen*, Tuan-Anh Vu, Minh-Triet Tran, Binh-Son Hua, Sai-Kit Yeung

Introduces vision-aware text features for stronger object/context reasoning in referring image segmentation.

2023

V-FIRST 2.0: Video Event Retrieval with Flexible Textual-Visual Intermediary for VBS 2023

MMM 2023

Nhat Hoang-Xuan, E-Ro Nguyen, Thang-Long Nguyen-Ho, Minh-Khoi Pham, Quang-Thuc Nguyen, Hoang-Phuc Trang-Trung, Van-Tu Ninh, Tu-Khiem Le, Cathal Gurrin, Minh-Triet Tran

Video event retrieval pipeline for Visual Browser Showdown with flexible textual-visual intermediary representations.

2022

Flexible Interactive Retrieval SysTem 3.0 for Visual Lifelog Exploration

ICMR 2022

Nhat Hoang-Xuan, Hoang-Phuc Trang-Trung, E-Ro Nguyen, Thanh-Cong Le, Mai-Khiem Tran, Tu-Khiem Le, Van-Tu Ninh, Cathal Gurrin, Minh-Triet Tran

Interactive lifelog retrieval system for complex multimodal search in the Lifelog Search Challenge track.

Visual-Language Transformer for Referring Video Object Segmentation

CVPRW 2022

E-Ro Nguyen, Nhat Hoang-Xuan, Minh-Triet Tran

VLFormer for referring video object segmentation in the YouTube-VOS Challenge at CVPR Workshops.

2021

PointRend with Attention Fusion Refinement for Polyps Segmentation

MediaEval 2021

E-Ro Nguyen, Hai-Dang Nguyen, Minh-Triet Tran

Attention fusion and refinement strategies for robust polyp segmentation in the MediaEval Medico task.

Attention-based Hierarchical Fusion Network for Predicting Media Memorability

MediaEval 2021

E-Ro Nguyen, Hai-Dang Huynh-Lam, Hai-Dang Nguyen, Minh-Triet Tran

Hierarchical multimodal fusion for media memorability prediction under benchmark constraints.

No publications match your current search/filter.