Han Zhou
Han Zhou
Home
Featured
Publications
Experience
Tags
Contact
Light
Dark
Automatic
Reinforcement Learning
Reasoning Arena: Trace Tournaments When Verifiable Rewards Fall Short
Trace Tournaments When Verifiable Rewards Fall Short.
Han Zhou
,
Adam Yang
,
Laurence Aitchison
,
Anna Korhonen
,
Albert Jiang
Agentic Policy Optimization via Instruction-Policy Co-Evolution
Dynamic instruction optimization in reinforcement learning for agents.
Han Zhou
,
Xingchen Wan
,
Ivan Vulić
,
Anna Korhonen
Cite
×