About me.
I'm passionate about building robust data pipelines and deriving insights from complex datasets. I love tackling new challenges and giving my all to every project.
My Story.
My journey into the world of data began not in a computer lab, but in a civil engineering classroom. I was fascinated by the power of MATLAB, modeling complex systems and finding elegant solutions to challenging problems. This passion for problem-solving led me to get my first MacBook and dive into the world of programming, starting with iOS development.
The thrill of building something from scratch, of seeing my ideas come to life, was intoxicating. But I soon realized that what truly captivated me was the data that powered these applications. I became obsessed with understanding how data was stored, processed, and leveraged to create meaningful experiences. This newfound passion led me to a career as a Data Engineer at Experian, where I could immerse myself in the world of big data.
I've had the privilege of working with a diverse range of technologies, from relational databases like SQL, Python, Java, and JavaScript, and delving into the world of distributed computing with big data frameworks. I'm proficient in building and maintaining robust CI/CD pipelines using tools like Docker and Jenkins, ensuring that data flows seamlessly and reliably from source to destination.
I'm a firm believer in the power of open source and am always exploring new tools and technologies to expand my skillset. I'm excited to see what the future holds and am always open to new challenges and opportunities.
2018 - Present
2018 - Present
Senior Data Engineer
Experian
Led the development of high-performance data APIs and automated CI/CD pipelines, significantly improving data processing speeds and client onboarding efficiency.
2016 - 2017
2016 - 2017
Data Engineer
Thames Water
Built and maintained scalable data pipelines for processing smart meter data, and developed monitoring dashboards for system health and water usage analytics.
Experience.
API Development & Performance Optimisation
- Engineered and deployed a high-performance Python 3 FastAPI service to replace a legacy Python 2 CLI tool for address data correction. Implemented parallel processing with multiple workers, achieving a 3x improvement in performance.
- Developed a Java Spring Boot RESTful API and a serverless Python AWS Lambda (deployed with Terraform) for large-scale data aggregation from a PostgreSQL database with tables containing over a billion records. Optimised query performance by leveraging native database functions to pre-aggregate data, significantly reducing latency.
- Processed and maintained large-scale CAIS/CATO data feeds for credit reporting, ensuring data accuracy and compliance within a high-security environment.
- Built a custom Python API wrapper to interface with third-party C libraries, ensuring seamless integration with an external address validation service.
- Designed and delivered a bespoke, type-safe Python API for a key client, innovatively overcoming the limitations of an older Python 3.6 environment to meet stringent requirements.