Building scalable data pipelines and cloud solutions to transform business insights with 10+ years of experience in AWS, big data, and machine learning integration.
Extensive experience with AWS, Azure, and GCP cloud platforms for data solutions.
Designed and optimized ETL/ELT processes for high-volume data processing.
Implemented machine learning models into production data pipelines for actionable insights.
Senior Data Engineer with a decade of experience building robust data solutions
I'm a Senior Data Engineer specializing in building and optimizing end-to-end data pipelines, ETL processes, and large-scale data solutions on cloud platforms.
With extensive experience in AWS services (Glue, Lambda, Redshift, SageMaker, EMR) and big data technologies (Spark, Hadoop, Kafka), I've helped organizations transform their data infrastructure for better decision-making.
My expertise includes designing complex data models, implementing real-time data streaming solutions, and integrating machine learning models into production pipelines.
I hold a Master's degree in Information Systems with a focus on Business Analytics from Marist College and a Bachelor's degree in Computer Science Engineering.
My professional journey through leading organizations
• Leveraged AWS Kinesis for real-time data streaming, simplifying data ingestion processes
• Built predictive analytics models using AWS SageMaker, improving decision-making accuracy
• Developed and optimized ETL pipelines using AWS Glue, improving data integration efficiency
• Designed ETL processes in AWS Glue to migrate historic product purchase data from on-prem to AWS Redshift
• Automated data analytics pipelines using Apache Airflow, optimizing Tableau dashboards
• Implemented data security strategy using AWS IAM policies, KMS encryption, and S3 bucket policies
• Designed and implemented ETL/ELT pipelines using AWS Glue, Lambda, and Step Functions
• Automated cloud infrastructure setup with Terraform, reducing deployment time
• Implemented data governance practices including data quality checks and lineage tracking
• Managed cloud-based infrastructure on AWS (EC2, S3, Lambda) adhering to data governance policies
• Applied machine learning techniques to analyze large datasets for strategic decision-making
• Designed and implemented scalable data pipelines using Hadoop and HDFS
My expertise across various technologies and platforms
Some of my notable data engineering implementations
Implemented real-time data streaming solution for payment transaction analytics at Visa, reducing insight latency from hours to seconds.
Learn MoreMigrated multi-terabyte healthcare data from on-prem MySQL to AWS RDS with zero downtime, improving query performance by 300%.
Learn MoreBuilt end-to-end ML pipeline for fraud detection with automated model retraining and monitoring, reducing false positives by 40%.
Learn MoreFeel free to reach out for collaborations or opportunities
vinaykumarsurabhi190@gmail.com
+1 (469) 712-7243
Austin, TX, USA