Hi, I'm Derek!
Data Scientist | Problem Solver | Tinkerer
About Me
I am a Data Scientist with a B.S. in Pure Mathematics and an M.S. in Data Analytics Engineering. My work is centered on bridging the gap between complex datasets and practical tools that help teams make better decisions. I view data challenges through a mathematical lens, focusing on statistical rigor and the development of reliable, scalable pipelines.
What I Do
In my current role, I design and deploy production-grade LLM-powered systems that help users navigate multi-billion record datasets by translating natural language into structured search queries. I also own scalable Python ETL pipelines that reduce data delivery time by 93% and cloud costs by 15% for millions of records on a custom workflow manager platform. This work involves:
- Pipeline Engineering: Architecting transformation layers for complex data types to ensure high-fidelity staging for seamless model training and deployment.
- System Visibility: Building real-time monitoring dashboards using the ELK Stack and Grafana to track operational metrics across hundreds of servers.
- Reliability: Implementing CI/CD pipelines via Jenkins to ensure that analytical systems are reproducible and perform consistently.
My Foundation
Prior to my current role, I served as a Junior Mathematician and Data Scientist. During this time, I bridged the gap between theory and performance by benchmarking ML models against statistical baselines to validate real-world improvements. I modernized legacy analytics into interactive web dashboards, enabling stakeholders to derive insights independently. In high-stakes cyber analytics, I engineered models that achieved a 10% gain in accuracy and a 50% increase in coverage. I applied this same rigor to Generative Adversarial Networks (GANs), using performance metrics and visualizations to drive higher-quality synthetic data generation.
I am an AWS Certified Developer and proficient in a technical stack that includes:
- Python,
- SQL,
- PyTorch,
- scikit-learn,
- the ELK stack,
- Apache Airflow, and
- Dagster
Beyond the Screen
In my free time, I am still a tinkerer and a builder at heart. You will usually find me learning a new song on the piano, a practice that has grounded my approach to technical problem-solving since childhood. I also thrive on challenges that require long-term endurance, whether I am training for a marathon or navigating a 10+ mile scenic hiking trail. This same drive to find my way through unfamiliar terrain fuels my passion for travel; to me, exploring a new culture is much like exploring a new dataset—it is an opportunity to learn, adapt, and find clarity in a new environment.
My Projects
TestGPT
From-scratch implementation of the GPT transformer architecture focusing on deep-level mechanics and generative AI performance.
View on GitHub →GMU Faculty and Researcher Experts List
Research expertise mapping system using Python NLP, Neo4j graph databases, and NeoDash visualization.
View on GitHub →U.S. Patent Phrase to Phrase Matching
Semantic similarity modeling for U.S. patent phrase-to-phrase matching using deep learning and NLP.
Ask me more!An Application of the Mapper Algorithm to Sports Analytics in College Basketball
Python-driven basketball analytics and performance modeling for player efficiency tracking.
View on GitHub →Donation Record Analysis for the Baltimore Humane Society
Data-driven donation analysis and growth strategy for the Baltimore Humane Society using Python
View on GitHub →Let's Connect
I am always open to discussing new opportunities or collaborations.