ML Projects

Building Pytorch from Source (2025)

Built PyTorch from source on Windows 10 using Microsoft Visual Studio 2022 and CMake configuring the build for CUDA 12.4 support and optimizing GPU utilization for deep learning tasks

CUDA TCP Project (2025)

Built a multithreaded C++ TCP client-server system on Windows using Winsock2, enabling concurrent handling of multiple clients via per-connection threads
Integrated CUDA kernels to offload vector addition and matrix multiplication to the GPU, demonstrating basic heterogeneous computing
Designed a command-based protocol allowing clients to trigger GPU computations and receive real-time results over network sockets
Implemented thread-safe tracking of active client connections using atomic operations in a concurrent TCP server

Repo Link

Click to view README.md

CUDA TCP Project Demo (Windows)

This project demonstrates how to simulate a CUDA-based computational workload over TCP/IP on a Windows platform. The server performs data processing using GPU and handles multiple client requests concurrently, enabling high-performance computations through CUDA.

Overview

This is a single-file project that combines both server and client functionality into one .cpp file, utilizing multithreading to concurrently handle multiple client requests. The server processes the commands sent by the client, performs CUDA-based operations (array addition, matrix multiplication), and sends the results back to the client.

Server: Handles multiple client connections, processes various commands (like array addition and matrix multiplication), and offloads the computational work to the GPU using CUDA.
Client: Sends commands to the server (like add, matmul, or exit), receives the results, and displays them.

The server communicates with clients over TCP/IP, and the server is designed to handle multiple requests at the same time by creating a new thread for each client.

Key Features

Multithreaded TCP Server: Handles each client connection in a separate thread using C++ std::thread
CUDA Integration: Offloads simple vector addition and matrix multiplication to GPU using CUDA kernels
Client-Server Communication: Uses Windows Sockets (Winsock2) for TCP-based command exchange
Command-Based Interface: Supports add, matmul, and exit commands for triggering GPU computations
Active Connection Tracking: Uses std::atomic<int> to track connected clients in real time

Requirements

To build and run this project, ensure you have the following installed:

CUDA Toolkit: Required to compile and run CUDA-based functions. This project uses CUDA 12.6 (latest version at the time).
Windows Operating System: This project is designed to run on Windows-based platforms.
Visual Studio (or other IDE for CUDA & C++): Visual Studio is commonly used for CUDA development, but any IDE or build environment that supports C++ and CUDA should work.

Dependencies

CUDA Toolkit: For GPU-accelerated operations such as array addition and matrix multiplication.
Winsock2: Windows-specific library for socket communication (built into Windows).

Commands

Command Prefix:

None: No prefix is required for the commands.

Command Suffixes:

add: Triggers the CUDA-based array addition operation.
matmul: Triggers the CUDA-based matrix multiplication operation.
exit: Terminates the client-server connection.

Example Client Interaction:

```sh Enter command (add, matmul, exit): add Response from server: Result: {11, 22, 33, 44, 55}

Enter command (add, matmul, exit): matmul Response from server: Result: 30 24 18 84 69 54 138 114 90

Enter command (add, matmul, exit): exit Server exiting...

Conditional PixelCNN (2025)

Converted an unconditional PixelCNN into a conditional generative model in PyTorch by integrating class embeddings and implementing middle fusion for label conditioning
Modified core training and inference pipelines (loss function, sampling, and evaluation) to support conditional image generation and classification
Achieved 81.3% test accuracy and 26.8 FID score, improving performance over a baseline embedding-only model (~75% accuracy, FID ~30)
Implemented likelihood-based classification using discretized mixture logistic loss, enabling joint generation and inference from the same model
Evaluated model performance using Weights & Biases and external benchmarks via Hugging Face, selecting optimal checkpoints to mitigate overfitting