Software Engineer | Data Scientist | Computational Biology Enthusiast


2022 -2026
B.S in Software Engineering
University of Texas at Dallas (Core GPA 3.7)
My Projects
RoveMiles — ML Engineering + Financial Modeling
End-to-end predictive analytics and financial modeling system.
​
Approach: Churn prediction, anomaly detection, KPI tracking, and a comprehensive revenue forecasting model with automated ML pipeline integration.
​
Result: Financial model, Identified high-impact churn features.
​
Stack: Python, Pandas, Excel, Scikit-Learn, AWS S3, Jupyter
​
GitHub I View Report
Real-Time Customer Behavior Analytics Pipeline (AWS)
Scalable real-time data pipeline for continuous event processing and insights.
​
Approach: Designed full architecture: Kinesis → Lambda → S3 → Glue Crawler → Glue ETL → Glue DQ → Redshift → QuickSight
​
Result: Cut event-to-dashboard latency to under 2 minutes
​
Stack: AWS Kinesis, AWS Lambda, AWS Glue, Glue DQ, Redshift, S3, Athena, CloudWatch, PySpark, VPC, IAM
View Report
BiblioHub - Library Management System
Digital tools for managing books, members, and checkouts.
​
Approach: CRUD operations ---> SQL schema --> Dashboard
​
Result: Clean, working DBMS for library operations.
​
Stack: PHP, MySQL, HTML, JS, SQL
​
GitHub I View Report
AI-Powered Retrieval-Augmented Generation System
Built an offline AI for secure, collaborative insight.
​
Approach: Built a full RAG architecture using Microsoft GraphRAG for structured retrieval.
​
Result: Reduced API cost by 100% via full offline inference.
​
Stack: Python, Chainlit, Microsoft GraphRAG, AutoGen, Lite-LLM, Ollama, Mistral, Nomic Embeddings, FAISS
GitHub
ChurnGuard — Customer Churn Prediction ML Platform
Forecast customer churn and proactively address retention weaknesses.
​
Approach: Feature engineering + outlier detection --->Logistic Regression, Random Forests, XGBoost --->hyperparameters to maximize predictive score --->ROC-AUC, precision-recall
​
Result: ML-based churn prediction with +15% accuracy improvement.
​
Stack: Python, Pandas, Scikit-Learn, Matplotlib, Jupyter
GitHub I View Report
Cancer Gene Classifier
Machine learning model for tumor subtype prediction using TCGA gene expression data.
​
Approach: Random Forest + TCGA dataset
​
Result: 91% accuracy, ROC-AUC 0.94
​
Stack: Python, Pandas, Scikit-learn
​
GitHub I View Report
Bioinformatics Interest
“I’m fascinated by how algorithms and data pipelines can accelerate genomic research and precision medicine. I’m particularly drawn to machine learning applications in cancer genomics and protein function prediction.”

