Selected Publications

(2026). Thinking in Frames: How Visual Context and Test-Time Scaling Empower Video Reasoning. arXiv preprint arXiv:2601.21037.

PDF Cite Abstract Blog

(2025). Agentic Policy Optimization via Instruction-Policy Co-Evolution. arXiv preprint arXiv:2512.01945.

PDF Cite Abstract

(2025). Maestro: Self-Improving Text-to-Image Generation via Agent Orchestration. arXiv preprint arXiv:2509.10704.

PDF Cite Abstract

(2025). Visual Planning: Let's Think Only with Images. International Conference on Learning Representations (ICLR 2026 Oral).

PDF Cite Code Abstract OpenReview 机器之心 量子位

(2025). Multi-Agent Design: Optimizing Agents with Better Prompts and Topologies. International Conference on Learning Representations (ICLR).

PDF Cite Abstract OpenReview

(2025). From Few to Many: Self-Improving Many-Shot Reasoners Through Iterative Optimization and Generation. International Conference on Learning Representations (ICLR).

PDF Cite Abstract OpenReview

(2024). Fairer Preferences Elicit Improved Human-Aligned Large Language Model Judgments. The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP).

PDF Cite Code Abstract ACL Anthology

(2024). TopViewRS: Vision-Language Models as Top-View Spatial Reasoners. The 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP).

PDF Cite Code Abstract Project Page ACL Anthology

(2024). Aligning with Human Judgement: The Role of Pairwise Preference in Large Language Model Evaluators. The First Conference on Language Modeling (COLM).

PDF Cite Code Abstract OpenReview

(2024). Batch Calibration: Rethinking Calibration for In-Context Learning and Prompt Engineering. International Conference on Learning Representations (ICLR).

PDF Cite Abstract (Google Research) OpenReview Blog (Google AI) Talk (NeurIPS Spotlight)

(2024). AutoPEFT: Automatic Configuration Search for Parameter-Efficient Fine-Tuning. Transactions of the Association for Computational Linguistics (TACL).

PDF Cite Code Abstract MIT Press ACL Anthology

(2023). Large Language Models are Miscalibrated In-Context Learners. Findings of the Association for Computational Linguistics (ACL 2025).

PDF Cite Code Abstract OpenReview ACL Anthology

(2023). Survival of the Most Influential Prompts: Efficient Black-Box Prompt Search via Clustering and Pruning. Findings of the Association for Computational Linguistics (EMNLP).

PDF Cite Code Abstract OpenReview ACL Anthology

(2023). A Systematic Study of Performance Disparities in Multilingual Task-Oriented Dialogue Systems. The 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP).

PDF Cite Code Abstract OpenReview ACL Anthology

(2023). Multi3WOZ: A Multilingual, Multi-Domain, Multi-Parallel Dataset for Training and Evaluating Culturally Adapted Task-Oriented Dialog Systems. Transactions of the Association for Computational Linguistics (TACL).

PDF Cite Dataset Abstract MIT Press ACL Anthology

(2023). XQA-DST: Multi-Domain and Multi-Lingual Dialogue State Tracking. Findings of the Association for Computational Linguistics (EACL).

PDF Cite Code Abstract ACL Anthology