stalkermustang/llm-bulls-and-cows-benchmark
A mini-framework for evaluating LLM performance on the Bulls and Cows number guessing game, supporting multiple LLM providers.
Language: HTML
#benchmark #benchmarking #chatgpt #games #llm #openai #python #reasoning
Stars: 205 Issues: 0 Forks: 1
https://github.com/stalkermustang/llm-bulls-and-cows-benchmark
A mini-framework for evaluating LLM performance on the Bulls and Cows number guessing game, supporting multiple LLM providers.
Language: HTML
#benchmark #benchmarking #chatgpt #games #llm #openai #python #reasoning
Stars: 205 Issues: 0 Forks: 1
https://github.com/stalkermustang/llm-bulls-and-cows-benchmark
GitHub
GitHub - stalkermustang/llm-bulls-and-cows-benchmark: A mini-framework for evaluating LLM performance on the Bulls and Cows number…
A mini-framework for evaluating LLM performance on the Bulls and Cows number guessing game, supporting multiple LLM providers. - stalkermustang/llm-bulls-and-cows-benchmark
Kodezi/Chronos
Kodezi Chronos Debugging-first language model achieving 65.3% autonomous bug fixing (6-7x better than GPT-4). Research, benchmarks & evaluation framework. Model available Q1 2026 via Kodezi OS.
Language: Python
#artificial_intelligence #autonomous_debugging #benchmark #benchmark_report #bug_fixing #chronos #code #code_analysis #code_analysis_tool #code_debugger #code_understanding #debugging #developer_tools #kodezi #language_model #machine_learning #program_repair #software_engineering
Stars: 240 Issues: 0 Forks: 5
https://github.com/Kodezi/Chronos
Kodezi Chronos Debugging-first language model achieving 65.3% autonomous bug fixing (6-7x better than GPT-4). Research, benchmarks & evaluation framework. Model available Q1 2026 via Kodezi OS.
Language: Python
#artificial_intelligence #autonomous_debugging #benchmark #benchmark_report #bug_fixing #chronos #code #code_analysis #code_analysis_tool #code_debugger #code_understanding #debugging #developer_tools #kodezi #language_model #machine_learning #program_repair #software_engineering
Stars: 240 Issues: 0 Forks: 5
https://github.com/Kodezi/Chronos
GitHub
GitHub - Kodezi/Chronos: Kodezi Chronos Debugging-first language model achieving 65.3% autonomous bug fixing (6-7x better than…
Kodezi Chronos Debugging-first language model achieving 65.3% autonomous bug fixing (6-7x better than GPT-4). Research, benchmarks & evaluation framework. Model available Q1 2026 via Kodezi...
❤1