Full-Stack Data Scientist · ML · MLOps · Systems Architect

Derrick Karake

Senior-level data scientist and sole architect of two production systems at Devon Energy — a real-time frac data acquisition platform with AI-integrated agents and an ML-driven forecasting platform (decision trees + logistic regression, full Azure MLOps) supporting a $350M procurement program. Broad ML coverage across classical models, deep learning, and LLMs — with the communication chops to bridge data science and business stakeholders.

Selected Outcomes

Impact in production

$350M

Procurement program powered by ML forecasting model (decision trees + logistic regression)

$2.5M

Projected annual savings from the frac data acquisition platform

$40K/mo

Recurring savings from eliminating third-party vendor software

97%+

Field data reliability after pipeline + device overhaul (from <80%)

Systems I've Built

Production platforms, sole architect

The systems below are production platforms I designed and shipped end-to-end — ML models, cloud architecture, data pipelines, authentication, APIs, frontend, testing, and deployment. Each is in daily use by engineering and operations teams at Devon Energy.

Flagship · AI-Integrated · Production

Real-Time Frac Data Acquisition Platform

Sole architect of an end-to-end ingestion, validation, and AI-assisted backfill platform replacing a third-party vendor pipeline. Supports live frac fleet operations across Devon's field footprint — built on integrated agents, automated fallback, an LLM-powered backfill engine, and a React monitoring app used daily by completions engineers.

Agents Python/PySpark ingestion agents pull ~80 high-frequency telemetry files per frac fleet daily, run gap/spike validation, and stream clean data into the OSIsoft PI historian in near-real-time.
Fallback Primary/secondary ingestion paths with health checks, automatic failover on stream drop, exponential backoff retries, and self-healing recovery on node or device loss — keeps ingestion flowing when upstream sources degrade.
AI Backfill LLM-based agents (OpenAI SDK) parse unstructured vendor PDF reports and heterogeneous CSV exports, extract telemetry fields, and map varied service-provider schemas into PI-compatible format for historical gap recovery.
Frontend React + Node.js operator-facing monitoring app — live ingestion status, data-quality inspection, manual backfill triggers, and fallback/failover visibility. Used daily by completions engineers.
Infra Partnered with IT to replace third-party cellular links with direct VPN; data reliability <80% → >97%.
Impact $40K/month recurring savings · $2.5M/year projected acquisition-fee savings · replaced an entire vendor pipeline
ML + MLOps · Azure · $350M Program

Casing Procurement & Demand Forecast Platform

React/TypeScript + Node.js platform on Azure supporting a $350M casing procurement program. Built a production forecasting model combining decision trees and logistic regression with end-to-end MLOps — continuously re-tuning against drilling and completion schedule changes to produce exact per-SKU counts and delivery windows for the forward year.

ML Model Decision trees + logistic regression ensemble. Re-scores casing needs by SKU (OD, weight, grade, connection) across 30–180 day and full-year horizons as drilling and completion operations shift day-to-day.
MLOps Fully deployed on Azure with automated retraining pipelines, drift monitoring, model observability, and production-grade error handling — continuously keeps model outputs aligned with real field conditions.
Stakeholder Inputs Drilling engineers submit per-well casing plans, supply chain teams feed in-flight inventory and buffer thresholds, and casing providers contribute lead-time and shipment commitments — all rolled into each forecast run.
Delivery Layer Year-ahead delivery projection schedules rig-level casing deliveries across the full forward program, auto-rebalancing whenever drilling order, completion pace, or provider lead times shift.
Azure Stack App Service, Container Registry, Azure AD SSO (MSAL + Graph API, 3-tier RBAC), Blob Storage (SAS URLs), Logic Apps + Graph API cron pipeline.
Data 15+ Snowflake tables across 4 enterprise databases, parameterized SQL/CTEs, in-memory caching with stampede protection, full audit trail (before/after diffs, user attribution) on every CRUD op.
Quality Playwright E2E (admin, forecasting, audit, mobile) + Vitest unit tests — production-grade testing at enterprise scale.
Deep Learning · NLP · Production

BERT Document Intelligence System

Fine-tuned and deployed a BERT-based question-answering pipeline extracting structured fields from 5,000+ scanned TIFF images per day at RMS. Replaced a manual review workflow, reducing turnaround and human error while achieving an F1 score of 0.90.

Stack Python, PyTorch, Hugging Face Transformers, OpenCV, Tesseract OCR
Scale 5,000+ images processed daily · F1 = 0.90 · full manual process automated
Role Model fine-tuning, dataset curation, pipeline architecture, production deployment, monitoring
AI Agents · Developer Tooling

Agentic Validation & Developer Toolkit

Built AI-powered validation agents using Pydantic AI for casing pressure analysis with structured input/output models and custom tool functions. Authored an agentic coding toolkit with Snowflake integration plugins, query-validation hooks, and developer productivity automations. Integrated OpenAI SDK for intelligent data enrichment.

Stack Python, Pydantic AI, OpenAI SDK, Snowflake, Fuse.js
Highlights Structured agent I/O, custom tool functions, query-validation hooks, dev-loop automations

ML Coverage

Broad, not narrow

I've shipped across classical ML, deep learning, and LLM agents — production, not just notebooks. My view: LLMs are one tool in the box. The right model depends on the problem, the data, and what you can operate at scale.

Classical ML

Decision trees (random forest, gradient boosting), logistic regression, linear regression, clustering (k-means, hierarchical), time-series forecasting, scikit-learn, XGBoost

Deep Learning / NLP

BERT fine-tuning, PyTorch, Hugging Face Transformers, CNN architectures, OpenCV, Tesseract OCR

LLMs & Agents

OpenAI SDK, Pydantic AI, LLM-based agents for structured extraction (PDF/CSV → schemas), tool-using agents, prompt engineering, structured I/O

MLOps / Production

Automated retraining, drift monitoring, model observability, containerized deployment on Azure, production error handling, A/B-style model comparison

Statistics & Analysis

Regression analysis, hypothesis testing, distributional analysis, exploratory data analysis, time-series decomposition, feature engineering

Visualization

Power BI, Sisense, Recharts, Plotly.js, MUI Data Grid — dashboards and operator-facing UIs for non-technical stakeholders

Experience

Where I've shipped

Full-Stack Data Scientist

Devon Energy

January 2023 – Present

Oklahoma City, OK

  • Sole architect of two production platforms used daily by drilling, completions, and supply chain teams.
  • Shipped a decision-tree + logistic-regression forecasting model with end-to-end Azure MLOps (automated retraining, drift monitoring, model observability) powering a $350M casing procurement program.
  • Designed a multi-stakeholder input pipeline aggregating signals from drilling engineers, supply chain, and casing providers into every forecast run — strong cross-functional communication across technical and non-technical teams.
  • Built a real-time frac data acquisition platform with integrated AI agents, automated fallback, and an LLM-based backfill engine parsing unstructured vendor PDFs and CSVs into PI-compatible schemas.
  • Containerized React + Node.js applications on Azure App Service (multi-stage Docker, ACR, Azure AD SSO, Blob Storage SAS URLs, Logic Apps, Microsoft Graph API).
  • Delivered $2.5M/year projected savings, $40K/month recurring, field data reliability lifted from <80% to >97%.

Data Scientist

RMS

April 2022 – November 2023

Oklahoma City, OK

  • Fine-tuned, trained, and deployed a BERT question-answering model extracting structured fields from 5,000+ TIFF images per day (F1 = 0.90), replacing a manual review process.
  • Implemented time-series forecasting for daily inbound document volume; built Power BI and Sisense dashboards surfacing operational performance to leadership.
  • Owned the dataset curation, evaluation harness, and post-deployment monitoring — translating model metrics into business terms for ops stakeholders.

Skills

Technical depth

Production-grade ML, cloud, and full-stack — enterprise code with error handling, testing, and observability.

Languages & Data

Python, SQL, TypeScript, JavaScript, Bash · Snowflake, OSIsoft PI, PySpark, Pandas, Hive/Hadoop-style SQL-on-big-data patterns

ML / AI

Decision trees, logistic regression, random forest, XGBoost, clustering, BERT, PyTorch, Hugging Face, LLM agents, OpenAI SDK, Pydantic AI, scikit-learn, time-series forecasting

MLOps

Automated retraining, drift monitoring, model observability, containerized deployment, production error handling, regression testing on model outputs

Azure Cloud

App Service · Blob Storage (SAS URLs) · Container Registry · Azure AD / MSAL · Logic Apps · Microsoft Graph API

Big Data / Spark

PySpark pipelines, high-frequency ingestion (~80 files/fleet/day), gap/spike validation, stream processing into time-series historians

Frontend

React 18, TypeScript, Vite, Material-UI, Recharts, Plotly.js, React Query, React Hook Form, Zod

Backend

Node.js, Express, Flask, REST APIs, node-cron, multi-agent pipeline design

DevOps & Testing

Docker (multi-stage), Nginx, Azure Container Registry, CI/CD, Playwright (E2E), Vitest, Testing Library

Visualization

Power BI, Sisense, Recharts, Plotly.js, MUI Data Grid — dashboards for technical and non-technical audiences

Statistics

Regression, hypothesis testing, distributional analysis, exploratory analysis, feature engineering

Education

Foundation

M.S. Artificial Intelligence

In Progress · Started October 2025

Deepening expertise in advanced ML, neural network architectures, and applied AI research.

Oklahoma Christian University

M.S. Data Science

Edmond, OK · 2021 – 2022 · GPA 3.78

Oklahoma Christian University

B.S. Computer Science

Edmond, OK · 2016 – 2021

Love's Entrepreneurs Cup — 1st Place

$12,000 prize · 2020

Co-designed an ML + motion-capture system predicting workplace injuries; won Oklahoma's top student entrepreneurship competition.

Contact

Open to senior data science & ML engineering roles

Looking for teams solving real production problems with ML — predictive, prescriptive, or agentic. Happy to walk through system architecture, talk through specific problems you're solving, or share more detail on any of the work above.