The ChiselApp class is the primary interface for creating GPU-accelerated applications with Chisel CLI. It handles code packaging, upload, job submission, and function decoration.

Constructor

from chisel import ChiselApp, GPUType

app = ChiselApp(name, upload_dir=".", gpu=None)
name
string
required
Application name used for job tracking and identification in the Herdora backend.
upload_dir
string
default:"."
Directory to upload to the cloud. Defaults to current directory. Should contain all necessary code and dependencies.
gpu
GPUType | string | None
GPU configuration for the application. Can be a GPUType enum or string format like “A100-80GB:2”.

Constructor Examples

from chisel import ChiselApp, GPUType

# Recommended: Using GPUType enum
app = ChiselApp("my-app", gpu=GPUType.A100_80GB_2)

Methods

@capture_trace()

Decorator for marking functions for GPU execution and performance tracing.
@app.capture_trace(
    trace_name=None,
    record_shapes=False,
    profile_memory=False
)
def my_function():
    pass
trace_name
string
Identifier for the operation used in tracing and monitoring. Helps identify functions in job logs and performance analysis.
record_shapes
boolean
default:"false"
Records tensor shapes and dimensions for debugging purposes. Useful for identifying shape mismatches and tensor operations.
profile_memory
boolean
default:"false"
Profiles memory allocation and usage during execution. Helps identify memory bottlenecks and optimization opportunities.

Decorator Examples

@app.capture_trace(trace_name="matrix_operations")
def matrix_multiply(a, b):
    import torch
    device = "cuda" if torch.cuda.is_available() else "cpu"
    
    a_tensor = torch.tensor(a, device=device)
    b_tensor = torch.tensor(b, device=device)
    result = torch.mm(a_tensor, b_tensor)
    
    return result.cpu().numpy()

Properties

activated

activated
boolean
Indicates whether ChiselApp is in active GPU mode. Returns True when CHISEL_ACTIVATED=1 environment variable is set.
import os

app = ChiselApp("my-app")

if app.activated:
    print("🚀 Running on cloud GPU")
else:
    print("💻 Running locally")

# Alternative check
if os.environ.get("CHISEL_ACTIVATED") == "1":
    print("🚀 Chisel is activated")

gpu

gpu
string | None
Current GPU configuration in string format (e.g., “A100-80GB:2”). Automatically converts from GPUType enum if provided in constructor.
app = ChiselApp("my-app", gpu=GPUType.A100_80GB_4)
print(app.gpu)  # Output: "A100-80GB:4"

Advanced Usage

Multiple Applications

Create different ChiselApp instances for different workloads:
from chisel import ChiselApp, GPUType

# Lightweight preprocessing
prep_app = ChiselApp("preprocessing", gpu=GPUType.A100_80GB_1)

# Heavy training workload
train_app = ChiselApp("training", gpu=GPUType.A100_80GB_8)

@prep_app.capture_trace(trace_name="data_cleaning")
def clean_data(raw_data):
    # Light GPU work
    return cleaned_data

@train_app.capture_trace(trace_name="heavy_training")
def train_large_model(data):
    # Heavy GPU work requiring 8 GPUs
    return trained_model

Conditional Activation

Check if Chisel is activated for different execution paths:
import os
from chisel import ChiselApp, GPUType

app = ChiselApp("conditional-app", gpu=GPUType.A100_80GB_2)

def smart_processing(data):
    if os.environ.get("CHISEL_ACTIVATED") == "1":
        # Running on cloud GPU
        return gpu_accelerated_processing(data)
    else:
        # Running locally - use CPU fallback
        return cpu_processing(data)

@app.capture_trace(trace_name="gpu_processing")
def gpu_accelerated_processing(data):
    import torch
    device = "cuda" if torch.cuda.is_available() else "cpu"
    # GPU processing logic
    return result

def cpu_processing(data):
    # CPU fallback logic
    return result

Error Handling Best Practices

@app.capture_trace(trace_name="robust_function")
def robust_gpu_function(data):
    try:
        import torch
        
        if not torch.cuda.is_available():
            raise RuntimeError("CUDA not available")
            
        device = "cuda"
        tensor = torch.tensor(data, device=device)
        
        # Your GPU operations
        result = process_on_gpu(tensor)
        
        return result.cpu().numpy()
        
    except RuntimeError as e:
        if "CUDA" in str(e) or "out of memory" in str(e):
            print(f"GPU error: {e}")
            # Implement CPU fallback
            return process_on_cpu(data)
        else:
            # Re-raise other errors
            raise
    except Exception as e:
        print(f"Unexpected error: {e}")
        raise

Environment Integration

ChiselApp behavior changes based on environment variables:
Environment VariableEffectSet By
CHISEL_ACTIVATED=1Enables GPU functionalitychisel command
CHISEL_BACKEND_RUN=1Indicates backend executionBackend system
CHISEL_JOB_IDCurrent job identifierBackend system
import os

def get_execution_context():
    if os.environ.get("CHISEL_BACKEND_RUN") == "1":
        return "cloud_gpu"
    elif os.environ.get("CHISEL_ACTIVATED") == "1":
        return "chisel_mode"
    else:
        return "local_mode"

context = get_execution_context()
print(f"Execution context: {context}")

Performance Considerations

Follow these guidelines for optimal ChiselApp performance:
  1. Upload Directory Size: Keep under 100MB for faster uploads
  2. Function Granularity: Group related operations in single functions
  3. Memory Management: Process large datasets in chunks
  4. Device Placement: Always check CUDA availability in functions
# Optimized example
@app.capture_trace(trace_name="optimized_processing", profile_memory=True)
def optimized_batch_processing(large_dataset):
    import torch
    import gc
    
    device = "cuda" if torch.cuda.is_available() else "cpu"
    batch_size = 1000  # Adjust based on GPU memory
    results = []
    
    for i in range(0, len(large_dataset), batch_size):
        batch = large_dataset[i:i+batch_size]
        batch_tensor = torch.tensor(batch, device=device)
        
        # Process batch
        batch_result = process_batch(batch_tensor)
        results.append(batch_result.cpu())  # Move to CPU immediately
        
        # Clean up GPU memory
        del batch_tensor, batch_result
        if i % (batch_size * 10) == 0:
            torch.cuda.empty_cache()
            gc.collect()
    
    return torch.cat(results)