Wrote several Retrieval-Augmented Generation (RAG) models to support chat questions related to documentation. The AI models significantly reduced hallucinations and improved retrieval over generic OpenAI models. Systems were templated to allow an easy transition from OpenAI to Llama or most other open-source models.
At multiple companies, developed pre-interview take-home challenges to filter top candidates. Created a unique onsite interview process that mixed questions with an onsite challenge and presentation. The hire success rate was high, and the results were helpful in post-hire management.
Architected and helped build a fraud identification system for a leading travel company. The CI/CD system deployed version-controlled models to a real-time RESTful API. Billions of dollars in transactions went through this platform. Fraud was reduced, while false positives were significantly reduced. The system saved the company headcount, improved customer outcomes, and decreased issues with public relations. Models could completely change type ( e.g., Random Forest, Regression, etc.) without a change to application deployment.
Led the development of a CI/CD data science platform that standardized data cleaning, transformation, and version deployment of models. The platform saved the company hundreds of human hours per week, reduced delivery time, and improved model performance by more than 2X. It later allowed a new pricing model that increased the company’s Total Addressable Market. The system was agnostic to model type and could combine different models in an ensemble for maximum effectiveness.
Designed and led the development of many reporting platforms at several companies and collaborated with executives and product teams to determine relevant KPI’s. Architected data pipelines that delivered decision-supporting reports. Each platform met specific requirements, including data ingestion, cleansing, enrichment, and reporting as files, or API deployment. Many platforms included modeling steps.
Inherited a customer acquisition system that required weekly manual updates. Rather than update the model, led the development of a data pipeline that tied new data sources from across the company. The newly added data was more valuable and recent so the model could be simplified to run daily — this increased customer acquisition by more than 2X.
Inherited an advertising auction system based on SAS that could process only a subset of the US market and took 20+ hours to run. Moved the system to a database plus Python/R system that could score over 30 markets worldwide in an hour. The bidding platform handled traffic for the 4th largest customer of Google’s paid search. Model deployment was agnostic to model type.
Was tasked with a project many other data scientists had attempted and failed at because it was too complicated for a non-deterministic solution. Worked with engineering to develop a simulator that realistically mimicked relevant demographic details like interests, geo-location, and page traversal. The simulation solved the formerly intractable problem, helped the business make many more informed advertiser decisions, and was later utilized for many other previously unthought-of tasks.
Developed and deployed algorithms that scaled to hundreds of millions of users and delivered tens of millions in revenue each month. The algorithm was a customized collaborative filter translated into C to handle the throughput.
Identified inefficiencies and incorrect assumptions in a campaign testing system. Made a series of improvements and put changes into production within two weeks. The changes identified poor campaigns more accurately and quickly. It led to an overall 8% improvement in total revenue for the product, yielding a $5mm/year increase at the time. With the growth of the product, it continues to improve revenue by well over $10mm/year.
In a one-day competition, developed a scalable algorithm that determined which people were likely to churn based on online activity. The algorithm was later added to the primary revenue-generating system for the company.
Led development of A/B testing infrastructure. The infrastructure worked for all online products (website and mobile) and allowed for multiple treatments with the ability to control for dozens of potentially confounding factors. This system was used for nearly every product change or modification and provided insights into the effectiveness of the changes.