Yuta Oshima
I’m a PhD student at The University of Tokyo, mentored by Professor Yutaka Matsuo.
My research goal is to develop interactive vision generation systems that translate human imagination into reality, enabling anyone to create and shape visual worlds with intuitive, flexible, and precise control. Toward this goal, I currently focus on improving the controllability of vision foundation models, particularly diffusion models, through alignment and instruction following for fine-grained visual generation.
selected publications
- CVPR 2026 Main
MultiBanana: A Challenging Benchmark for Multi-Reference Text-to-Image GenerationIn the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026