No description
Find a file
Erik Johannes Husom 51f1d4aef3 Add plots
2025-12-15 11:49:40 +01:00
src Add plots 2025-12-15 11:49:40 +01:00
.gitignore Add README 2025-12-11 14:47:37 +01:00
.python-version Add uv 2025-12-09 23:11:27 +01:00
pyproject.toml Improve MMLU/multiple choice evaluator 2025-12-11 10:48:14 +01:00
README.md Add README 2025-12-11 14:47:37 +01:00
uv.lock Improve MMLU/multiple choice evaluator 2025-12-11 10:48:14 +01:00

energybench

Benchmark LLMs for energy consumption and task performance.

Requirements

  • Ollama. The Ollama server needs to be running in the background, with the models you need downloaded.

Installation

pip install -e .

Usage

python3 src/energybench.py <model> <benchmark> [options]

Arguments

  • model: LLM model to use (e.g., llama3.2)
  • benchmark: Benchmark to run (gsm8k, mmlu, arc_easy, arc_challenge, boolq)

Options

  • -t, --temperature: Sampling temperature (default: 0.0)
  • -s, --samples: Number of samples to run (default: all)
  • -i, --iterations: Iterations per prompt (default: 1)
  • -e, --evaluate: Enable evaluation of responses
  • --evaluator: Evaluator to use (numeric, exact_match, multiple_choice, boolean, default: none)
  • -p, --prompt-template: Prompt template name (see prompt_templates.py)
  • -sp, --system-prompt: System prompt name (see prompt_templates.py)
  • --no-energy: Disable energy measurement
  • --offline: Use offline datasets (download datasets by running python3 src/download_datasets.py)
  • --subset: Use dataset subset
  • --subset-size: Size of subset (default: 200)

Example

python3 src/energybench.py llama3.2 arc_easy -e -s 100 -i 5