top of page

Software Engineer | Data Scientist | Computational Biology Enthusiast  

Get to Know Me

  • LinkedIn

“I’m a software engineer passionate about using data science and machine learning to solve problems in genomics, healthcare, and biological systems. Currently exploring projects in bioinformatics and preparing for graduate research in computational biology.”

​​​​

vn.jpg
unnamed.png

2022 -2026

B.S in Software Engineering

University of Texas at Dallas  (Core GPA 3.7)

My Projects

RoveMiles — ML Engineering + Financial Modeling

End-to-end predictive analytics and financial modeling system.
​
Approach: Churn prediction, anomaly detection, KPI tracking, and a comprehensive revenue forecasting model with automated ML pipeline integration.
​
Result: Financial model, Identified high-impact churn features. 
​
Stack: Python, Pandas, Excel, Scikit-Learn, AWS S3, Jupyter
​
GitHub  I  View Report

Real-Time Customer Behavior Analytics Pipeline (AWS)

Scalable real-time data pipeline for continuous event processing and insights.

​

Approach: Designed full architecture: Kinesis → Lambda → S3 → Glue Crawler → Glue ETL → Glue DQ → Redshift → QuickSight

​

Result: Cut event-to-dashboard latency to under 2 minutes 

​

Stack: AWS Kinesis, AWS Lambda, AWS Glue, Glue DQ, Redshift, S3, Athena, CloudWatch, PySpark, VPC, IAM

 

View Report

BiblioHub - Library Management System 

Digital tools for managing books, members, and checkouts. 

​

Approach: CRUD operations ---> SQL schema --> Dashboard

​

Result: Clean, working DBMS for library operations.

​

Stack: PHP, MySQL, HTML, JS, SQL

​

GitHub  I  View Report

AI-Powered Retrieval-Augmented Generation System

Built an offline AI for secure, collaborative insight.

​

Approach: Built a full RAG architecture using Microsoft GraphRAG for structured retrieval.

​

Result: Reduced API cost by 100% via full offline inference.

​

Stack: Python, Chainlit, Microsoft GraphRAG, AutoGen, Lite-LLM, Ollama, Mistral, Nomic Embeddings, FAISS

 

GitHub

ChurnGuard — Customer Churn Prediction ML Platform

Forecast customer churn and proactively address retention weaknesses.

​

Approach: Feature engineering + outlier detection --->Logistic Regression, Random Forests, XGBoost --->hyperparameters to maximize predictive score --->ROC-AUC, precision-recall

​

Result: ML-based churn prediction with +15% accuracy improvement.

​

Stack: Python, Pandas, Scikit-Learn, Matplotlib, Jupyter

 

GitHub  I  View Report

Cancer Gene Classifier

Machine learning model for tumor subtype prediction using TCGA gene expression data.

​

Approach: Random Forest + TCGA dataset

​

Result: 91% accuracy, ROC-AUC 0.94 

​

Stack: Python, Pandas, Scikit-learn

​

GitHub  I  View Report

Bioinformatics Interest

“I’m fascinated by how algorithms and data pipelines can accelerate genomic research and precision medicine. I’m particularly drawn to machine learning applications in cancer genomics and protein function prediction.”

bottom of page