publications

Conference

  1. CVPR 2026 Main
    multibanana.png
    MultiBanana: A Challenging Benchmark for Multi-Reference Text-to-Image Generation
    Yuta Oshima, Daiki Miyake, Kohsei Matsutani, Yusuke Iwasawa, Masahiro Suzuki, Yutaka Matsuo, and Hiroki Furuta
    In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2026
  2. NeurIPS 2025
    dlbs.gif
    Inference-Time Text-to-Video Alignment with Diffusion Latent Beam Search
    Yuta Oshima, Masahiro Suzuki, Yutaka Matsuo, and Hiroki Furuta
    In Advances in Neural Information Processing Systems (NeurIPS), 2025
  3. NeurIPS 2024
    ADOPT: Modified Adam Can Converge with Any \beta_2 with the Optimal Rate
    Shohei Taniguchi, Keno Harada, Gouki Minegishi, Yuta Oshima, Seong Cheol Jeong, Go Nagahara, Tomoshi Iiyama, Masahiro Suzuki, Yusuke Iwasawa, and Yutaka Matsuo
    In Advances in Neural Information Processing Systems (NeurIPS), 2024

Conference Workshop

  1. MMAsia 2025 WS
    AKITalk: Audio-Implicit Keypoints for Identity-Preserving Talking-Head Video Synthesis
    Riku Takahashi, Rongzhi Li, Yuta Oshima, Sho Kuno, Ryugo Morita, and Issey Sukeda
    In Proceedings of the 7th ACM International Conference on Multimedia in Asia, 2025
  2. ICLR 2024 WS
    ssm.png
    SSM Meets Video Diffusion Models: Efficient Video Generation with Structured State Spaces
    Yuta Oshima, Shohei Taniguchi, Masahiro Suzuki, and Yutaka Matsuo
    In 5th Workshop on Practical ML for Limited/Low Resource Settings (ICLR Workshop), 2024
  3. IROS 2023 WS
    tactile.png
    Tactile In-Hand Pose Estimation through Perceptual Inference
    Tatsuya Kamijo, Tomoshi Iiyama, Yuta Oshima, Gentiane Venture, Tatsuya Matsushima, Yutaka Matsuo, and Yusuke Iwasawa
    In IROS 2023 Workshop on World Models and Predictive Coding in Cognitive Robotics, 2023

Preprint

  1. Preprint
    WorldPack: Compressed Memory Improves Spatial Consistency in Video World Modeling
    Yuta Oshima, Yusuke Iwasawa, Masahiro Suzuki, Yutaka Matsuo, and Hiroki Furuta
    2025
  2. Preprint
    Enhancing Unimodal Latent Representations in Multimodal VAEs through Iterative Amortized Inference
    Yuta Oshima, Masahiro Suzuki, and Yutaka Matsuo
    2024