No description

Find a file

Erik Johannes Husom 51f1d4aef3 Add plots		2025-12-15 11:49:40 +01:00
src	Add plots	2025-12-15 11:49:40 +01:00
.gitignore	Add README	2025-12-11 14:47:37 +01:00
.python-version	Add uv	2025-12-09 23:11:27 +01:00
pyproject.toml	Improve MMLU/multiple choice evaluator	2025-12-11 10:48:14 +01:00
README.md	Add README	2025-12-11 14:47:37 +01:00
uv.lock	Improve MMLU/multiple choice evaluator	2025-12-11 10:48:14 +01:00

energybench

Benchmark LLMs for energy consumption and task performance.

Requirements

Ollama. The Ollama server needs to be running in the background, with the models you need downloaded.

pip install -e .

python3 src/energybench.py <model> <benchmark> [options]

model: LLM model to use (e.g., llama3.2)
benchmark: Benchmark to run (gsm8k, mmlu, arc_easy, arc_challenge, boolq)

-t, --temperature: Sampling temperature (default: 0.0)
-s, --samples: Number of samples to run (default: all)
-i, --iterations: Iterations per prompt (default: 1)
-e, --evaluate: Enable evaluation of responses
--evaluator: Evaluator to use (numeric, exact_match, multiple_choice, boolean, default: none)
-p, --prompt-template: Prompt template name (see prompt_templates.py)
-sp, --system-prompt: System prompt name (see prompt_templates.py)
--no-energy: Disable energy measurement
--offline: Use offline datasets (download datasets by running python3 src/download_datasets.py)
--subset: Use dataset subset
--subset-size: Size of subset (default: 200)

python3 src/energybench.py llama3.2 arc_easy -e -s 100 -i 5