top of page

Software Engineer  | Data Science & Bioinformatics Enthusiast

Get to Know Me

  • LinkedIn

“I’m a software engineer passionate about using data science and machine learning to solve problems in genomics, healthcare, and biological systems. Currently exploring projects in bioinformatics and preparing for graduate research in computational biology.”

​​​​

vn.jpg
unnamed.png

2022 -2026

B.S in Software Engineering

University of Texas at Dallas                                                            Major Core GPA 3.6

Work Experience

Data Science and Machine Learning Intern

Rove Miles

Jan 2025 -Aug 2025

  • Delivered end-to-end predictive analytics and financial modeling system (GMV, take rate, redemption cost, partner payouts) to identify high-impact churn features. 

  • Built revenue forecasting models, partner-pricing models, and profitability simulations.

  • Analyzed partner and redemption datasets to compute revenue/cost per mile and produced data-driven insights that guided pricing and strategy decisions.

  • Designed scenario-analysis tools and dashboards that modeled how changes in redemption rate, partner pricing, and take rate impact profitability and overall business performance.

Technical Skills: Excel financial modeling, revenue/cost analysis, partner-pricing modeling, scenario forecasting, dashboard creation.

API & Platform Team Member

UTD Nebula Lab

Jan 2024 - Aug 2024

  • Participated in Agile practices including daily stand-ups, sprint planning, and peer code reviews to drive project success.

  • Automated extraction and transformation of UTD course data using web scraping, structuring data with an optimized MongoDB schema for efficient querying.

  • Developed Go-based RESTful APIs with features like advanced filtering, caching, and secure authentication, enhancing API performance and security.

Technical Skills: Go APIs, MongoDB, React.js, JSON, Agile Scrum

My Projects

ChurnGuard — Customer Churn Prediction ML Platform

Goal: Forecast customer churn and proactively address retention weaknesses.

​

Approach: Feature engineering + outlier detection --->Logistic Regression, Random Forests, XGBoost --->hyperparameters to maximize predictive score --->ROC-AUC, precision-recall

​

Result: ML-based churn prediction with +15% accuracy improvement.

​

Skills: Python, Pandas, Scikit-Learn, Matplotlib, Jupyter

 

GitHub 

Real-Time Customer Behavior Analytics Pipeline (AWS)

Goal: Scalable real-time data pipeline for continuous event processing and insights.

​

Approach: Designed full architecture: Kinesis → Lambda → S3 → Glue Crawler → Glue ETL → Glue DQ → Redshift → QuickSight

​

Result: Cut event-to-dashboard latency to under 2 minutes 

​

Skills: AWS Kinesis, AWS Lambda, AWS Glue, Glue DQ, Redshift, S3, Athena, CloudWatch, PySpark, VPC, IAM

 

​

AI-Powered Retrieval-Augmented Generation System

Goal: Build an offline AI system for secure, collaborative insight.​

​

Approach: Designed a full RAG architecture using Microsoft GraphRAG, integrated AutoGen agents, ran local LLMs through Ollama and Lite-LLM, and built an interactive Chainlit UI.

​

Result: Reduced API cost by 100% through offline inference and improved retrieval accuracy using hybrid graph + vector search.

​

Skills: Python, Chainlit, Microsoft GraphRAG, AutoGen, Lite-LLM, Ollama, Mistral, Nomic Embeddings, FAISS

 

GitHub

BLAST Result Parser — Automated Sequence Alignment Analysis

Goal: Automate BLAST alignment parsing and filtering for faster biological insight.

​

Approach: Parsed XML/tabular BLAST with Biopython --->Chainable filtering pipeline ---> statistics computation.

​

Result: Clean filtering by e-value, identity %, bit score, Summary statistics for alignment quality

​

Skills: Python, Biopython, Pandas, JSON/CSV

​

GitHub

DNA Analysis Toolkit — Comprehensive Sequence Analysis 

Goal: Provide an all-in-one DNA analysis tool for students and researchers.
​
Approach:  FASTA parsing & standardized structure → GC, ORF, codon usage, translation pipelines →Memory-efficient batch workflows


Result: â€‹Generated GC-content profiles, ORF maps, codon-usage heat-maps  and supported batch analysis for 1,000+ FASTA sequences.


Skills: Python, Biopython, Pandas, Matplotlib, Seaborn

​

GitHub 

Gene Expression Visualizer

Goal: Simplify gene-expression visualization for high-dimensional datasets.

​

Approach: 
PCA & hierarchical clustering--->correlation and QC plots--->Automated report creation via unified API

​

Result: Clustered heat-maps, Correlation matrices,Expression distribution box plots.Automated reporting

​

Skills: Python, Pandas, NumPy, Matplotlib, Seaborn, Scikit-Learn

​

GitHub 

Bioinformatics Interest

“I’m fascinated by how algorithms and data pipelines can accelerate genomic research and precision medicine. I’m particularly drawn to machine learning applications in cancer genomics.”

bottom of page