✨CL-bench: A Benchmark for Context Learning
📝 Summary:
Current LMs struggle with context learning, requiring new knowledge and reasoning beyond pre-training. The CL-bench, a new real-world benchmark, reveals models solve only 17.2 percent of tasks, showing a critical bottleneck for complex real-world applications.
🔹 Publication Date: Published on Feb 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.03587
• PDF: https://arxiv.org/pdf/2602.03587
• Project Page: https://www.clbench.com
• Github: https://github.com/Tencent-Hunyuan/CL-bench
✨ Datasets citing this paper:
• https://huggingface.co/datasets/tencent/CL-bench
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#ContextLearning #LanguageModels #AIBenchmark #NLP #AIResearch
📝 Summary:
Current LMs struggle with context learning, requiring new knowledge and reasoning beyond pre-training. The CL-bench, a new real-world benchmark, reveals models solve only 17.2 percent of tasks, showing a critical bottleneck for complex real-world applications.
🔹 Publication Date: Published on Feb 3
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.03587
• PDF: https://arxiv.org/pdf/2602.03587
• Project Page: https://www.clbench.com
• Github: https://github.com/Tencent-Hunyuan/CL-bench
✨ Datasets citing this paper:
• https://huggingface.co/datasets/tencent/CL-bench
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#ContextLearning #LanguageModels #AIBenchmark #NLP #AIResearch
✨CL4SE: A Context Learning Benchmark For Software Engineering Tasks
📝 Summary:
CL4SE presents a benchmark for evaluating context learning in software engineering tasks, defining four SE-specific context types. It demonstrates an average 24.7% performance improvement for LLMs across tasks like code generation and review, establishing a standardized evaluation framework.
🔹 Publication Date: Published on Feb 26
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.23047
• PDF: https://arxiv.org/pdf/2602.23047
• Project Page: https://huggingface.co/papers?q=project-specific%20context
• Github: https://github.com/Tomsawyerhu/CodeCL
✨ Datasets citing this paper:
• https://huggingface.co/datasets/tomhu/codecl
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#ContextLearning #SoftwareEngineering #LLMs #CodeGeneration #Benchmarks
📝 Summary:
CL4SE presents a benchmark for evaluating context learning in software engineering tasks, defining four SE-specific context types. It demonstrates an average 24.7% performance improvement for LLMs across tasks like code generation and review, establishing a standardized evaluation framework.
🔹 Publication Date: Published on Feb 26
🔹 Paper Links:
• arXiv Page: https://arxiv.org/abs/2602.23047
• PDF: https://arxiv.org/pdf/2602.23047
• Project Page: https://huggingface.co/papers?q=project-specific%20context
• Github: https://github.com/Tomsawyerhu/CodeCL
✨ Datasets citing this paper:
• https://huggingface.co/datasets/tomhu/codecl
==================================
For more data science resources:
✓ https://xn--r1a.website/DataScienceT
#ContextLearning #SoftwareEngineering #LLMs #CodeGeneration #Benchmarks
❤1