Core concepts and advanced features of Keys & Caches - find the true bottlenecks, errors, and performance optimizations previously hidden in your ML stack
Keys & Caches is an experiment tracking and profiling library that helps you find the true bottlenecks, errors, and performance optimizations previously hidden or hard to discover in your ML stack - from PyTorch down to the GPU - with the help of AI.
kandc integrates with PyTorch’s built-in profiler to capture detailed GPU/CPU performance metrics, memory usage, CUDA kernel execution, and more. All profiling data is automatically exported as Chrome traces viewable in Perfetto UI.
For more control over profiling, use the wrapper and decorator classes directly:
Copy
Ask AI
import kandcfrom kandc.annotators import ProfilerWrapper, ProfilerDecorator# Wrap any object with detailed profilingclass MyModel: def forward(self, x): return x * 2 def predict(self, x): return self.forward(x)model = MyModel()profiled_model = ProfilerWrapper( model, name="MyModel", activities=['cpu', 'cuda'], # Profile both CPU and CUDA record_shapes=True, # Record tensor shapes profile_memory=True, # Profile memory usage with_stack=True # Include call stacks)# All method calls are now profiled with PyTorch profilerresult = profiled_model.forward(data)result = profiled_model.predict(data)# Or use as a decorator@ProfilerDecorator(name="OptimizedModel", record_shapes=True)class OptimizedModel: def predict(self, x): return x * 2# Convenience functionsprofiled_obj = kandc.profile(my_object, name="MyObject")@kandc.profiler(name="MyFunction")def my_function(x): return expensive_computation(x)
@kandc.timed(name="model_inference")def run_inference(model, batch): with torch.no_grad(): return model(batch)# Or time existing functionsresult = kandc.timed_call("data_loading", load_batch, batch_size=32)
kandc.init(project="training")# Log single valueskandc.log({"loss": 0.25})kandc.log({"accuracy": 0.92, "f1_score": 0.89})# Log with step numbers (useful for training loops)for epoch in range(100): loss = train_epoch() kandc.log({"epoch_loss": loss}, step=epoch)
kandc respects your .gitignore file when uploading code snapshots and
traces. Add large files to .gitignore to avoid uploading them.
Copy
Ask AI
# Large model files*.pth*.safetensors*.bin# Data filesdata/datasets/*.csv# Environment.envvenv/
Best practice: Download large models in your script rather than uploading them:
Copy
Ask AI
# Good: Download at runtimefrom transformers import AutoModelmodel = AutoModel.from_pretrained("bert-large-uncased")# Avoid: Uploading large local files# model = torch.load("my_5gb_model.pth") # This would be uploaded