多所高校联手推出AgentBench,可测试大言语模子技艺
发布日期:2023-09-01 15:19 点击次数:189
品玩8月9日讯, Arxiv页面表示,由来自清华大学、俄亥俄州立大学和加州大学伯克利分校等机构的商榷者构成的团队近日发布一款测试器用AgentBench,可用于对大言语模子的技艺进行测试。
AgentBench当今包括8个不同的任务,可测试大言语模子在多轮灵通式生成环境中的推理和有野心技艺。实际效果表示,GPT-4面前的发达最好,而 Claude和GPT3.5差异排行第二、第三。

AgentBench 的数据集、环境和集成评估软件包已发布在https://github.com/THUDM/AgentBench上。
","gnid":"9060d4d3b6ad89d7e","img_data":[{"flag":2,"img":[{"desc":"","height":"353","title":"","url":"http://p2.img.一牛体育kuai.com/t0181129f928fa89275.jpg","width":"750"}]}],"original":0,"pat":"art_src_3,fts0,sts0","powerby":"pika","pub_time":1691567580000,"pure":"","rawurl":"http://zm.news.so.com/23737556f022298d03c426b443c32d39","redirect":0,"rptid":"48a29d008a2d7809","rss_ext":[],"s":"t","src":"品玩","tag":[{"clk":"ktechnology_1:高校","k":"高校","u":""}],"title":"多所高校联手推出AgentBench,可测试大言语模子技艺","type":"zmt","wapurl":"http://zm.news.so.com/23737556f022298d03c426b443c32d39","ytag":"科技:东谈主工智能:AI本事","zmt":{"brand":{},"cert":"优质科技领域创作家","desc":"有品好玩的科技,一切与你关系。","fans_num":9264,"id":"2991151609","is_brand":"0","name":"品玩","new_verify":"7","pic":"http://p5.img.一牛体育kuai.com/t019112a1b3e04850a2.jpg","real":1,"textimg":"http://p9.img.一牛体育kuai.com/bl/0_3/t017c4d51e87f46986f.png","verify":"0"},"zmt_status":0}","errmsg":"","errno":0}