Core Concepts

Keys & Caches is an experiment tracking and profiling library that helps you find the true bottlenecks, errors, and performance optimizations previously hidden or hard to discover in your ML stack - from PyTorch down to the GPU - with the help of AI.

Projects and Runs

Project

A collection of related experiments (e.g., “image-classification”, “llm-finetuning”)

Run

A single execution of your code with specific configuration and results

Config

Hyperparameters and settings tracked for each run

Metrics

Values logged during execution (loss, accuracy, custom metrics)

Traces

Model execution profiles showing layer-by-layer performance

Timings

Function execution times captured by timing decorators

Artifacts

Generated files like model traces, timing data, and code snapshots

Initialization Modes

kandc supports three different modes depending on your needs:

Full Cloud Experience

kandc.init(project="my-project")  # mode="online" is default
What happens:
  • 🔐 Authentication: Browser opens for sign-in (first time only)
  • 🌐 Dashboard: Automatically opens your run dashboard
  • ☁️ Cloud sync: All data synced in real-time
  • 📊 Live metrics: Charts update as your code runs
Requirements: Internet connection and authentication
Best for: Production experiments, team collaboration, sharing results

Mode Comparison

FeatureOnlineOfflineDisabled
Metrics Logging
Dashboard
Cloud Sync
AuthenticationRequiredNoneNone
Internet RequiredYesNoNo

Basic Usage

Here’s a complete transformer example showing the core kandc workflow:
import time
import random
import torch
import torch.nn as nn
import kandc

@kandc.capture_model_class(model_name="SimpleTransformer")
class SimpleTransformer(nn.Module):
    def __init__(self, input_dim=32, seq_len=16, d_model=64, nhead=4, num_layers=2, num_classes=10):
        super().__init__()
        self.input_dim = input_dim
        self.seq_len = seq_len
        self.d_model = d_model

        # Project input to d_model
        self.input_proj = nn.Linear(input_dim, d_model)

        # Positional encoding (learnable)
        self.pos_embedding = nn.Parameter(torch.zeros(1, seq_len, d_model))

        # Transformer encoder
        encoder_layer = nn.TransformerEncoderLayer(d_model=d_model, nhead=nhead, batch_first=True)
        self.transformer = nn.TransformerEncoder(encoder_layer, num_layers=num_layers)

        # Output head
        self.head = nn.Sequential(
            nn.LayerNorm(d_model),
            nn.Linear(d_model, num_classes)
        )

    def forward(self, x):
        # x: (batch, seq_len, input_dim)
        x = self.input_proj(x)  # (batch, seq_len, d_model)
        x = x + self.pos_embedding  # Add positional encoding
        x = self.transformer(x)  # (batch, seq_len, d_model)
        x = x.mean(dim=1)  # Pool over sequence
        x = self.head(x)   # (batch, num_classes)
        return x

def main():
    # Initialize experiment tracking
    kandc.init(
        project="optimize-transformer",
        name="test-run-1",
        config={"d_model": 64, "nhead": 4, "num_layers": 2, "seq_len": 16},
        tags=["transformer", "pytorch"]
    )

    # Create model and data
    model = SimpleTransformer()
    # Simulate a batch of 32 sequences, each of length 16, with 32 features
    data = torch.randn(32, 16, 32)

    # Run model (automatically profiled due to decorator)
    output = model(data)
    loss = output.mean()

    @kandc.timed(name="random_wait")
    def random_wait():
        time.sleep(random.random() * 2)
        return "processing_complete"

    processing_result = random_wait()

    # Log metrics with custom x values
    for i in range(10):
        time.sleep(0.1)  # Simulate training time
        x_value = i * 0.5
        kandc.log({
            "loss": loss.item(),
            "accuracy": random.random(),
            "model_params": sum(p.numel() for p in model.parameters())
        }, x=x_value)

    # Finish the run
    kandc.finish()

if __name__ == "__main__":
    main()

PyTorch Profiling & Performance Analysis

kandc integrates with PyTorch’s built-in profiler to capture detailed GPU/CPU performance metrics, memory usage, CUDA kernel execution, and more. All profiling data is automatically exported as Chrome traces viewable in Perfetto UI.

Model Class Profiling

The most convenient way to profile models is using the class decorator:
import torch
import torch.nn as nn
import kandc

@kandc.capture_model_class(
    model_name="SimpleTransformer",
    record_shapes=True,      # Record tensor shapes
    profile_memory=True      # Profile memory usage
)
class SimpleTransformer(nn.Module):
    def __init__(self, input_dim=32, seq_len=16, d_model=64, nhead=4, num_layers=2, num_classes=10):
        super().__init__()
        self.input_dim = input_dim
        self.seq_len = seq_len
        self.d_model = d_model

        # Project input to d_model
        self.input_proj = nn.Linear(input_dim, d_model)

        # Positional encoding (learnable)
        self.pos_embedding = nn.Parameter(torch.zeros(1, seq_len, d_model))

        # Transformer encoder
        encoder_layer = nn.TransformerEncoderLayer(d_model=d_model, nhead=nhead, batch_first=True)
        self.transformer = nn.TransformerEncoder(encoder_layer, num_layers=num_layers)

        # Output head
        self.head = nn.Sequential(
            nn.LayerNorm(d_model),
            nn.Linear(d_model, num_classes)
        )

    def forward(self, x):
        # x: (batch, seq_len, input_dim)
        x = self.input_proj(x)  # (batch, seq_len, d_model)
        x = x + self.pos_embedding  # Add positional encoding
        x = self.transformer(x)  # (batch, seq_len, d_model)
        x = x.mean(dim=1)  # Pool over sequence
        x = self.head(x)   # (batch, num_classes)
        return x

# Usage
kandc.init(project="model-profiling")
model = SimpleTransformer()
# Simulate a batch of 32 sequences, each of length 16, with 32 features
data = torch.randn(32, 16, 32)
output = model(data)  # Automatically profiled!
kandc.finish()

Model Instance Profiling

For existing models, wrap them with capture_model_instance:
# Existing model
model = torchvision.models.resnet18()

# Wrap for profiling
model = kandc.capture_model_instance(
    model,
    model_name="ResNet18_Pretrained",
    record_shapes=True,
    profile_memory=True
)

# Now all forward passes are profiled
output = model(data)

Advanced Profiling Options

For more control over profiling, use the wrapper and decorator classes directly:
import kandc
from kandc.annotators import ProfilerWrapper, ProfilerDecorator

# Wrap any object with detailed profiling
class MyModel:
    def forward(self, x):
        return x * 2
    
    def predict(self, x):
        return self.forward(x)

model = MyModel()
profiled_model = ProfilerWrapper(
    model, 
    name="MyModel",
    activities=['cpu', 'cuda'],  # Profile both CPU and CUDA
    record_shapes=True,          # Record tensor shapes
    profile_memory=True,         # Profile memory usage
    with_stack=True             # Include call stacks
)

# All method calls are now profiled with PyTorch profiler
result = profiled_model.forward(data)
result = profiled_model.predict(data)

# Or use as a decorator
@ProfilerDecorator(name="OptimizedModel", record_shapes=True)
class OptimizedModel:
    def predict(self, x):
        return x * 2

# Convenience functions
profiled_obj = kandc.profile(my_object, name="MyObject")
@kandc.profiler(name="MyFunction")
def my_function(x):
    return expensive_computation(x)

Environment Control

Disable profiling globally without changing your code:
# Disable profiling
export KANDC_PROFILER_DISABLED=1
python my_script.py

# Enable profiling (default)
unset KANDC_PROFILER_DISABLED
python my_script.py

Function-Level Profiling

Profile any function with the capture_trace decorator:
@kandc.capture_trace(
    trace_name="data_preprocessing",
    record_shapes=True
)
def preprocess_batch(images, labels):
    # Your preprocessing code
    processed_images = transforms(images)
    return processed_images, labels

# Usage
processed_data = preprocess_batch(raw_images, labels)

Timing Functions

Capture execution times for any function:
@kandc.timed(name="model_inference")
def run_inference(model, batch):
    with torch.no_grad():
        return model(batch)

# Or time existing functions
result = kandc.timed_call("data_loading", load_batch, batch_size=32)

Viewing Performance Data

All profiling data is automatically saved as trace artifacts that you can view in multiple ways:
In your Keys & Caches dashboard:
  1. Navigate to your run
  2. Click the Artifacts tab
  3. Select any trace artifact
  4. Click “Open in Viewer” to view in embedded Perfetto UI
What you’ll see in traces:
  • Layer-by-layer execution times
  • GPU kernel execution details
  • Memory allocation patterns
  • Tensor shapes and operations
  • Call stacks and function relationships
  • CPU vs GPU time breakdown

Logging Metrics

Track any metrics during your experiments:

Basic Logging

kandc.init(project="training")

# Log single values
kandc.log({"loss": 0.25})
kandc.log({"accuracy": 0.92, "f1_score": 0.89})

# Log with step numbers (useful for training loops)
for epoch in range(100):
    loss = train_epoch()
    kandc.log({"epoch_loss": loss}, step=epoch)

Code Snapshot Configuration

kandc automatically captures your source code for reproducibility. You can control this behavior:

Disable Code Capture

# Disable code snapshot completely
kandc.init(
    project="my-project",
    capture_code=False  # No code will be captured or uploaded
)

Custom Exclude Patterns

# Exclude specific files/directories from code capture
kandc.init(
    project="my-project",
    capture_code=True,
    code_exclude_patterns=[
        "*.pth",           # Model files
        "data/",           # Data directory
        "experiments/",    # Experiment outputs
        "*.log",           # Log files
        "temp_*"           # Temporary files
    ]
)

What Gets Captured by Default

When capture_code=True (default), kandc captures:
Source code files:
  • .py, .js, .ts, .jsx, .tsx
  • .java, .cpp, .c, .h, .hpp
  • .cs, .go, .rs, .rb, .php
  • .swift, .kt, .scala, .r
  • .sql, .sh, .bash, .zsh
  • .yaml, .yml, .json, .toml
  • .md, .rst, .txt
  • .html, .css, .scss
Configuration files:
  • requirements.txt, pyproject.toml
  • package.json, Dockerfile
  • .gitignore, .env.example

File Handling

kandc respects your .gitignore file when uploading code snapshots and traces. Add large files to .gitignore to avoid uploading them.
# Large model files
*.pth
*.safetensors
*.bin

# Data files
data/
datasets/
*.csv

# Environment
.env
venv/
Best practice: Download large models in your script rather than uploading them:
# Good: Download at runtime
from transformers import AutoModel
model = AutoModel.from_pretrained("bert-large-uncased")

# Avoid: Uploading large local files
# model = torch.load("my_5gb_model.pth")  # This would be uploaded

Error Handling

kandc is designed to fail gracefully:
try:
    kandc.init(project="my-project")
    # Your code here
    kandc.log({"metric": value})
finally:
    kandc.finish()  # Always finish, even if there's an error
If authentication fails or the backend is unavailable, kandc automatically falls back to offline mode and continues working locally.

Ready to dive deeper? Check out our complete example or get support.