I am Lerato
Matamela,
a Data Engineer
& AI and Low-Code Developer
based in
Johannesburg.
About Me
I am a final-year BSc IT student at Richfield Institute of Technology, specialising in Data Science and Machine Learning. My journey into technology began in high school with Excel and Access and gradually evolved into a strong passion for analytics, automation, and solving complex technical challenges.
Currently, I work as a Data Engineer Associate at CAPACITI, where I contribute to building scalable data infrastructure, while also undergoing intensive technical training through Project Y. As the technology landscape continues to evolve at a rapid pace, I am committed to ensuring that Africa does not merely adopt global innovation, but actively drives it through locally built solutions and African-led technical excellence.
I thrive in highly technical environments and specialise in data engineering, the development of scalable ETL pipelines, AI integration, and low-code automation solutions. I am particularly passionate about leveraging cloud technologies and machine learning to deliver innovative, production-ready systems that solve real-world business challenges.
Download CVExpertise
Data & Analytics
- Data Engineering & SQL
- ETL Pipeline Design
- Data Modelling & Analysis
Programming & Automation
- Python & Automation
- Git & Version Control
- CI/CD & DevOps
Cloud & AI
- Cloud Technologies (Azure, AWS)
- AI & Machine Learning
- Generative AI & LLMs
Business Solutions
- Power Apps & Low-Code Solutions
- Business Intelligence & Power BI
- Agile & Project Management
Experience
Designing and implementing scalable ETL pipelines and data workflows in cloud environments to support enterprise analytics and reporting. Supporting data infrastructure development, optimization, and automation initiatives while collaborating with cross-functional teams to deliver reliable, data-driven solutions.
Part of Project Y, a highly selective, first-of-its-kind program for exceptional South African tech talent. I was selected through a rigorous multi-month evaluation process to receive comprehensive support, including financial backing, professional equipment, workspace access, and specialised technical training.
Through Y-Academy, I am completing employer-aligned training focused on data engineering, cloud technologies, and modern software development practices. The program emphasizes hands-on, industry-relevant projects designed to build production-ready technical skills.
As a Project Y member, I am gaining direct exposure to global employment pathways through Y-Staffing & Recruiting's employer network, while positioning myself to contribute meaningfully to South Africa's growing presence in the global technology ecosystem.
Soweto Rising Stars
Volunteer Head Coach (Under 13 Team)
2019 - 2020
Organized training schedules, logistics, and team workflows. Ensured a disciplined and motivated team environment, mentoring young athletes. Developed leadership, mentorship, and team coordination skills.
Education
Richfield Graduate Institute of Technology
Bachelor of Science in Information Technology
2nd Year - Present | 2024
Pursuing a degree in Information Technology with focus on Data Science and Machine Learning.
Completed 2nd year studies in Accounting before transitioning to Information Technology.
My Projects
Here are some bootcamp and personal projects showcasing my data engineering, cloud, and development skills.
Healthcare Insurance Cost Prediction
A professional machine learning application using Gradient Boosting regression to predict healthcare insurance costs with 83.8% accuracy (R² = 0.8383). Features a modern Streamlit dashboard with user authentication, real-time predictions, scenario analysis, and prediction history. The analysis reveals smoking status is the dominant cost driver (68.8% importance), followed by BMI (17.8%) and Age (11.9%). Key capabilities include a 4-tab interface for prediction, scenario analysis (what-if calculator), model insights, and user history with statistical analysis. Built with scikit-learn, Streamlit, SQLite, and deployed on Streamlit Cloud for easy access and scalability.
- Machine Learning
- Gradient Boosting
- Streamlit
- Python
- Jupyter Notebook
Student Records Management System
End-to-end Student Records Management System built with PostgreSQL and Python. Features a normalized relational schema, ETL pipelines with data validation, advanced SQL queries, stored procedures, and a CLI interface for managing students, courses, grades, and attendance. The project follows a phased, end-to-end data engineering approach aligned with the IBM Data Engineering Professional Certificate.
Completed Phases (1-4): Schema Design, Database Setup with PostgreSQL, Sample Data Generation using Faker, and comprehensive ETL Pipeline with Extract-Transform-Load stages. The ETL pipeline includes data quality checks, foreign key validation, and batch insert optimization. Generated 100 students, 15 courses, 200 enrollments, 400 grades, and 1000 attendance records with 97%+ data validity.
Upcoming Phases (5-7): Advanced SQL queries and stored procedures, Python CLI application interface for CRUD operations, deployment to cloud (Render or Railway), and comprehensive testing and documentation.
- PostgreSQL & SQL
- Python & ETL
- Data Modeling
- Relational Databases
SyntaxNova Smart Data Manager
A data-driven solution designed to simplify how organizations collect, manage, visualize, and interact with business data. The system integrates Azure SQL Database, Power BI, and Power Apps to deliver a seamless workflow for storing, analyzing, and presenting operational data. This project showcases end-to-end data engineering, from ETL pipelines to interactive dashboards and user-facing applications.
- Python & ETL
- Azure SQL
- Power Apps
- Power BI
AI Portfolio Assistant
A Node.js web application demonstrating true multi-cloud deployment using one CI/CD pipeline that builds once and deploys to both AWS and Azure. The AI Portfolio Assistant helps users generate professional portfolio content using Generative AI API (Gemini). Features include professional bio generation, project summary creation, and learning reflection generation. The project showcases DevOps excellence with automated testing, secure secret management, and cloud-native hosting across Elastic Beanstalk and Azure App Service.
- Node.js
- Multi-Cloud
- CI/CD
- Gemini API
Certificates & Credentials
Professional certifications and credentials showcasing my continuous learning and expertise.
Python Project for Data Engineering
Issued by IBM
2025
Non-credit online course authorized by IBM and offered through Coursera. Demonstrates practical application of Python programming in data engineering projects and workflows.
Microsoft Power Platform
Professional Development
2025
Certification demonstrating expertise in Microsoft Power Platform, including Power Apps, Power BI, and Power Automate for building low-code solutions and business applications.
Introduction to Data Engineering
Data Engineering
2025
Foundational certification in data engineering principles, covering data pipelines, data processing, and best practices for building scalable data infrastructure.
Generative AI with LLMs
AI & Machine Learning
2025
Advanced certification in generative AI and large language models. Covers LLM architectures, fine-tuning, prompt engineering, and production deployment of generative AI applications.
Google Data Analytics Professional Certificate
Issued by Google
Completed December 2023
Comprehensive certification in data analysis, SQL, R programming, data visualization, and spreadsheet analysis. Enhanced foundational skills in analytical problem-solving and business intelligence.
Agile with Atlassian Jira
Professional Development
2025
Certification in agile methodologies and Jira project management. Demonstrates proficiency in sprint planning, issue tracking, team collaboration, and agile workflows for software development teams.
Introduction to Generative AI
AI & Machine Learning
2025
Foundational certification in generative AI technologies, covering AI fundamentals, large language models, prompt engineering, and practical applications of generative AI in business contexts.
Version Control with Git
Development Tools
2025
Certification demonstrating expertise in Git version control, repository management, branching strategies, collaboration workflows, and CI/CD integration for software development projects.