ML Projects
Building Pytorch from Source (2025)
Built PyTorch from source on Windows 10 using Microsoft Visual Studio 2022 and CMake configuring the build for CUDA 12.4 support and optimizing GPU utilization for deep learning tasks
CUDA TCP Project (2025)
- Built a multithreaded C++ TCP client-server system on Windows using Winsock2, enabling concurrent handling of multiple clients via per-connection threads
- Integrated CUDA kernels to offload vector addition and matrix multiplication to the GPU, demonstrating basic heterogeneous computing
- Designed a command-based protocol allowing clients to trigger GPU computations and receive real-time results over network sockets
- Implemented thread-safe tracking of active client connections using atomic operations in a concurrent TCP server
Click to view README.md
CUDA TCP Project Demo (Windows)
This project demonstrates how to simulate a CUDA-based computational workload over TCP/IP on a Windows platform. The server performs data processing using GPU and handles multiple client requests concurrently, enabling high-performance computations through CUDA.
Overview
This is a single-file project that combines both server and client functionality into one .cpp file, utilizing multithreading to concurrently handle multiple client requests. The server processes the commands sent by the client, performs CUDA-based operations (array addition, matrix multiplication), and sends the results back to the client.
- Server: Handles multiple client connections, processes various commands (like array addition and matrix multiplication), and offloads the computational work to the GPU using CUDA.
- Client: Sends commands to the server (like
add,matmul, orexit), receives the results, and displays them.
The server communicates with clients over TCP/IP, and the server is designed to handle multiple requests at the same time by creating a new thread for each client.
Key Features
- Multithreaded TCP Server: Handles each client connection in a separate thread using C++
std::thread - CUDA Integration: Offloads simple vector addition and matrix multiplication to GPU using CUDA kernels
- Client-Server Communication: Uses Windows Sockets (Winsock2) for TCP-based command exchange
- Command-Based Interface: Supports
add,matmul, andexitcommands for triggering GPU computations - Active Connection Tracking: Uses
std::atomic<int>to track connected clients in real time
Requirements
To build and run this project, ensure you have the following installed:
- CUDA Toolkit: Required to compile and run CUDA-based functions. This project uses CUDA 12.6 (latest version at the time).
- Windows Operating System: This project is designed to run on Windows-based platforms.
- Visual Studio (or other IDE for CUDA & C++): Visual Studio is commonly used for CUDA development, but any IDE or build environment that supports C++ and CUDA should work.
Dependencies
- CUDA Toolkit: For GPU-accelerated operations such as array addition and matrix multiplication.
- Winsock2: Windows-specific library for socket communication (built into Windows).
Commands
Command Prefix:
- None: No prefix is required for the commands.
Command Suffixes:
add: Triggers the CUDA-based array addition operation.matmul: Triggers the CUDA-based matrix multiplication operation.exit: Terminates the client-server connection.
Example Client Interaction:
```sh Enter command (add, matmul, exit): add Response from server: Result: {11, 22, 33, 44, 55}
Enter command (add, matmul, exit): matmul Response from server: Result: 30 24 18 84 69 54 138 114 90
Enter command (add, matmul, exit): exit Server exiting...
Conditional PixelCNN (2025)
- Converted an unconditional PixelCNN into a conditional generative model in PyTorch by integrating class embeddings and implementing middle fusion for label conditioning
- Modified core training and inference pipelines (loss function, sampling, and evaluation) to support conditional image generation and classification
- Achieved 81.3% test accuracy and 26.8 FID score, improving performance over a baseline embedding-only model (~75% accuracy, FID ~30)
- Implemented likelihood-based classification using discretized mixture logistic loss, enabling joint generation and inference from the same model
- Evaluated model performance using Weights & Biases and external benchmarks via Hugging Face, selecting optimal checkpoints to mitigate overfitting