Abstract: The paradigm of using large models as evaluators (LLM-as-a-Judge) has shown potential in multiple tasks, but has not been fully explored in tool invocation scenarios, especially for ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results