2025 Preprint MultiBanana: A Challenging Benchmark for Multi-Reference Text-to-Image Generation Yuta Oshima, Daiki Miyake, Kohsei Matsutani, Yusuke Iwasawa, Masahiro Suzuki, Yutaka Matsuo, and Hiroki Furuta 2025 arXiv Code Website Preprint WorldPack: Compressed Memory Improves Spatial Consistency in Video World Modeling Yuta Oshima, Yusuke Iwasawa, Masahiro Suzuki, Yutaka Matsuo, and Hiroki Furuta 2025 arXiv NeurIPS Inference-Time Text-to-Video Alignment with Diffusion Latent Beam Search Yuta Oshima, Masahiro Suzuki, Yutaka Matsuo, and Hiroki Furuta In Advances in Neural Information Processing Systems (NeurIPS), 2025 arXiv Code Website 2024 NeurIPS ADOPT: Modified Adam Can Converge with Any \beta_2 with the Optimal Rate Shohei Taniguchi, Keno Harada, Gouki Minegishi, Yuta Oshima, Seong Cheol Jeong, Go Nagahara, Tomoshi Iiyama, Masahiro Suzuki, Yusuke Iwasawa, and Yutaka Matsuo In Advances in Neural Information Processing Systems (NeurIPS), 2024 arXiv Code ICLR WS SSM Meets Video Diffusion Models: Efficient Video Generation with Structured State Spaces Yuta Oshima, Shohei Taniguchi, Masahiro Suzuki, and Yutaka Matsuo In 5th Workshop on Practical ML for Limited/Low Resource Settings (ICLR Workshop), 2024 arXiv Code Preprint Enhancing Unimodal Latent Representations in Multimodal VAEs through Iterative Amortized Inference Yuta Oshima, Masahiro Suzuki, and Yutaka Matsuo 2024 arXiv