Derek
Your Portrait

Hi, I'm Derek!

Data Scientist | Problem Solver | Tinkerer


About Me

I am a Data Scientist with a B.S. in Pure Mathematics and an M.S. in Data Analytics Engineering. My work is centered on bridging the gap between complex datasets and practical tools that help teams make better decisions. I view data challenges through a mathematical lens, focusing on statistical rigor and the development of reliable, scalable pipelines.

What I Do

In my current role, I design and deploy production-grade LLM-powered systems that help users navigate multi-billion record datasets by translating natural language into structured search queries. I also own scalable Python ETL pipelines that reduce data delivery time by 93% and cloud costs by 15% for millions of records on a custom workflow manager platform. This work involves:

  • Pipeline Engineering: Architecting transformation layers for complex data types to ensure high-fidelity staging for seamless model training and deployment.
  • System Visibility: Building real-time monitoring dashboards using the ELK Stack and Grafana to track operational metrics across hundreds of servers.
  • Reliability: Implementing CI/CD pipelines via Jenkins to ensure that analytical systems are reproducible and perform consistently.

My Foundation

Prior to my current role, I served as a Junior Mathematician and Data Scientist. During this time, I bridged the gap between theory and performance by benchmarking ML models against statistical baselines to validate real-world improvements. I modernized legacy analytics into interactive web dashboards, enabling stakeholders to derive insights independently. In high-stakes cyber analytics, I engineered models that achieved a 10% gain in accuracy and a 50% increase in coverage. I applied this same rigor to Generative Adversarial Networks (GANs), using performance metrics and visualizations to drive higher-quality synthetic data generation.

I am an AWS Certified Developer and proficient in a technical stack that includes:

  • Python,
  • SQL,
  • PyTorch,
  • scikit-learn,
  • the ELK stack,
  • Apache Airflow, and
  • Dagster

Beyond the Screen

In my free time, I am still a tinkerer and a builder at heart. You will usually find me learning a new song on the piano, a practice that has grounded my approach to technical problem-solving since childhood. I also thrive on challenges that require long-term endurance, whether I am training for a marathon or navigating a 10+ mile scenic hiking trail. This same drive to find my way through unfamiliar terrain fuels my passion for travel; to me, exploring a new culture is much like exploring a new dataset—it is an opportunity to learn, adapt, and find clarity in a new environment.

My Projects

TestGPT

From-scratch implementation of the GPT transformer architecture focusing on deep-level mechanics and generative AI performance.

PyTorch Transformers LLM Architecture Generative AI GPU Computing
View on GitHub →

GMU Faculty and Researcher Experts List

Research expertise mapping system using Python NLP, Neo4j graph databases, and NeoDash visualization.

Python NLP Neo4j Web Scraping NeoDash
View on GitHub →

U.S. Patent Phrase to Phrase Matching

Semantic similarity modeling for U.S. patent phrase-to-phrase matching using deep learning and NLP.

Deep Learning NLP LSTM/RNN Semantic Similarity Pearson Correlation
Ask me more!

An Application of the Mapper Algorithm to Sports Analytics in College Basketball

Python-driven basketball analytics and performance modeling for player efficiency tracking.

Python Pandas Sports Analytics Data Visualization Performance Modeling
View on GitHub →

Donation Record Analysis for the Baltimore Humane Society

Data-driven donation analysis and growth strategy for the Baltimore Humane Society using Python

Python Exploratory Data Analysis (EDA) Fundraising Strategy Predictive Modeling Non-Profit Analytics
View on GitHub →