Software Engineer | Data Science & Bioinformatics Enthusiast


2022 -2026
B.S in Software Engineering
University of Texas at Dallas Major Core GPA 3.6
Work Experience
Data Science and Machine Learning Intern
Rove Miles
Jan 2025 -Aug 2025
-
Delivered end-to-end predictive analytics and financial modeling system (GMV, take rate, redemption cost, partner payouts) to identify high-impact churn features.
-
Built revenue forecasting models, partner-pricing models, and profitability simulations.
-
Analyzed partner and redemption datasets to compute revenue/cost per mile and produced data-driven insights that guided pricing and strategy decisions.
-
Designed scenario-analysis tools and dashboards that modeled how changes in redemption rate, partner pricing, and take rate impact profitability and overall business performance.
Technical Skills: Excel financial modeling, revenue/cost analysis, partner-pricing modeling, scenario forecasting, dashboard creation.
API & Platform Team Member
UTD Nebula Lab
Jan 2024 - Aug 2024
-
Participated in Agile practices including daily stand-ups, sprint planning, and peer code reviews to drive project success.
-
Automated extraction and transformation of UTD course data using web scraping, structuring data with an optimized MongoDB schema for efficient querying.
-
Developed Go-based RESTful APIs with features like advanced filtering, caching, and secure authentication, enhancing API performance and security.
Technical Skills: Go APIs, MongoDB, React.js, JSON, Agile Scrum
My Projects
ChurnGuard — Customer Churn Prediction ML Platform
Goal: Forecast customer churn and proactively address retention weaknesses.
​
Approach: Feature engineering + outlier detection --->Logistic Regression, Random Forests, XGBoost --->hyperparameters to maximize predictive score --->ROC-AUC, precision-recall
​
Result: ML-based churn prediction with +15% accuracy improvement.
​
Skills: Python, Pandas, Scikit-Learn, Matplotlib, Jupyter
Real-Time Customer Behavior Analytics Pipeline (AWS)
Goal: Scalable real-time data pipeline for continuous event processing and insights.
​
Approach: Designed full architecture: Kinesis → Lambda → S3 → Glue Crawler → Glue ETL → Glue DQ → Redshift → QuickSight
​
Result: Cut event-to-dashboard latency to under 2 minutes
​
Skills: AWS Kinesis, AWS Lambda, AWS Glue, Glue DQ, Redshift, S3, Athena, CloudWatch, PySpark, VPC, IAM
​
AI-Powered Retrieval-Augmented Generation System
Goal: Build an offline AI system for secure, collaborative insight.​
​
Approach: Designed a full RAG architecture using Microsoft GraphRAG, integrated AutoGen agents, ran local LLMs through Ollama and Lite-LLM, and built an interactive Chainlit UI.
​
Result: Reduced API cost by 100% through offline inference and improved retrieval accuracy using hybrid graph + vector search.
​
Skills: Python, Chainlit, Microsoft GraphRAG, AutoGen, Lite-LLM, Ollama, Mistral, Nomic Embeddings, FAISS
BLAST Result Parser — Automated Sequence Alignment Analysis
Goal: Automate BLAST alignment parsing and filtering for faster biological insight.
​
Approach: Parsed XML/tabular BLAST with Biopython --->Chainable filtering pipeline ---> statistics computation.
​
Result: Clean filtering by e-value, identity %, bit score, Summary statistics for alignment quality
​
Skills: Python, Biopython, Pandas, JSON/CSV
​
DNA Analysis Toolkit — Comprehensive Sequence Analysis
Goal: Provide an all-in-one DNA analysis tool for students and researchers.
​
Approach: FASTA parsing & standardized structure → GC, ORF, codon usage, translation pipelines →Memory-efficient batch workflows
Result: ​Generated GC-content profiles, ORF maps, codon-usage heat-maps and supported batch analysis for 1,000+ FASTA sequences.
Skills: Python, Biopython, Pandas, Matplotlib, Seaborn
​
Gene Expression Visualizer
Goal: Simplify gene-expression visualization for high-dimensional datasets.
​
Approach:
PCA & hierarchical clustering--->correlation and QC plots--->Automated report creation via unified API
​
Result: Clustered heat-maps, Correlation matrices,Expression distribution box plots.Automated reporting
​
Skills: Python, Pandas, NumPy, Matplotlib, Seaborn, Scikit-Learn
​

