Skip to main content

Daily Kernel

The Daily Kernel is a daily CUDA programming challenge that helps you practice GPU programming skills. Each day features a new problem with varying difficulty levels.

What is it?

Similar to daily coding challenges you might find on other platforms, the Daily Kernel presents GPU-specific problems:
  • Kernel optimization — Make a kernel faster
  • Algorithm implementation — Implement GPU-friendly algorithms
  • Memory patterns — Work with different memory types
  • Parallel primitives — Reductions, scans, and more

Accessing Daily Kernel

  1. Click the puzzle icon (⚡) in the Wafer top bar, or
  2. Select Daily Kernel from the tool dropdown

Challenge Structure

Each challenge includes:

Problem Statement

A description of what you need to implement or optimize, including:
  • Input/output specifications
  • Performance requirements
  • Constraints

Examples

Concrete examples showing:
  • Sample inputs
  • Expected outputs
  • Explanations of the expected behavior

Framework Selection

Choose your preferred implementation framework:
  • CuTe DSL — Modern C++ DSL for tensor operations
  • CUDA — Standard CUDA C++
Different frameworks may have different starter code and hints tailored to that approach.

Starter Code

Template code to get you started:
  • Function signatures
  • Memory setup
  • Basic structure

Kernel Signature

For kernels, you’ll see:
  • Input tensors (names, types, shapes)
  • Output tensors
  • Any scalar parameters

Constraints

Problem constraints to keep in mind:
  • Input sizes
  • Performance targets
  • Memory limits

Hints

Collapsible hints if you get stuck:
  • Algorithmic approaches
  • Framework-specific tips
  • Common pitfalls to avoid

Starting a Challenge

1

Read the Problem

Understand what you need to implement. Pay attention to input/output specs and constraints.
2

Choose a Framework

Select CuTe DSL or CUDA based on your preference and the problem type.
3

Review Starter Code

Look at the provided template to understand the expected structure.
4

Click Start Challenge

This creates a new file in your workspace with the starter code.
5

Implement Your Solution

Write your kernel implementation in the created file.

Difficulty Levels

LevelDescription
EasyStraightforward implementations, good for learning basics
MediumRequires optimization or non-trivial algorithms
HardComplex problems requiring advanced techniques

Tips for Success

Get a correct solution first, then optimize. Don’t try to write the fastest solution immediately.
After profiling your kernel with NCU, view the PTX/SASS assembly in the NCU Profiler’s Source tab to understand what’s happening at the instruction level.
Use the NCU Profiler to identify bottlenecks in your solution.

Challenge History

Previous challenges remain accessible through the challenge archive. Practice old challenges to build your skills before tackling new ones.