Portfolio

Vinesh Reddy

Senior Data Engineer · Dallas, TX · 9+ years in multi-cloud data solutions

Hire Me

Professional Summary

Senior Data Engineer with 9+ years architecting and optimizing cloud-scale data platforms across Azure, AWS, and Google Cloud. Expert in building resilient ETL and streaming pipelines with Apache Spark, Kafka, Talend, Informatica, and modern orchestration frameworks. Skilled in crafting performant data warehouses (Amazon Redshift, BigQuery, Azure Synapse, Snowflake) and dimensional models that enable high-impact analytics.

Versed in Java, Python, SQL, and PowerShell to automate end-to-end data workflows, including real-time ingestion via Kafka, Pub/Sub, and Kinesis. Adept at cross-cloud integration of AWS Glue, Azure Data Factory, and Google Cloud Composer while enforcing governance with AWS IAM, Azure Purview, and OAuth 2.0.

Recognized for improving pipeline performance by up to 40%, designing petabyte-scale ecosystems, and enabling ML-ready data products integrated with Amazon SageMaker, Azure Machine Learning, and TensorFlow. Collaborative leader who closes the gap between data engineering, analytics, and business stakeholders.

Highlighted Capabilities

Multi-Cloud Architecture

Hands-on delivery across Azure, AWS, and GCP with IaC, CI/CD, and cost-optimized deployments tailored to enterprise governance.

Streaming & Real-Time Analytics

Event-driven pipelines with Kafka, Kinesis, Pub/Sub enabling sub-minute insights, anomaly detection, and operational dashboards.

Data Governance & Security

Compliant frameworks combining Lake Formation, Purview, IAM, and Key Vault to uphold data privacy, lineage, and least-privilege access.

Technical Toolkit

Cloud Platforms

Azure, AWS, Google Cloud

ADLS Synapse Analytics Databricks Cosmos DB ADF Purview AWS Glue Lambda Redshift DynamoDB S3 Athena CloudFormation Kinesis EMR BigQuery Cloud Functions Cloud Dataflow Cloud Pub/Sub

Programming & Analytics

Languages, Libraries, Statistics

Java Python PowerShell SQL Pandas NumPy TensorFlow Scikit-Learn Seaborn Regression Analysis Hypothesis Testing

Data Ecosystems

Databases, ETL, Warehousing

SQL Server MySQL PostgreSQL MongoDB CosmosDB DynamoDB Kafka Airflow Beam Hadoop Talend Informatica Hive Pig Google BigQuery Amazon Redshift Azure Synapse Snowflake Star Schema Snowflake Schema

Delivery & Operations

DevOps, Monitoring, Collaboration

Agile Scrum Kanban Git GitHub BitBucket JIRA Confluence Azure DevOps Jenkins Terraform Docker Kubernetes AWS CloudWatch AWS X-Ray ELK Stack Azure Key Vault AWS IAM OAuth 2.0

Experience

American Express · Sr. Data Engineer

January 2024 – Present · Texas, USA

AWS Ecosystem
  • Architected Amazon RDS and DynamoDB deployments supporting high-availability transactional and analytical workloads.
  • Leveraged Apache Spark and PySpark for distributed data processing and real-time analytics on petabyte-scale datasets.
  • Engineered AWS Glue ETL pipelines, automated orchestration with AWS Data Pipeline, and tuned Spark SQL for cost-efficient performance.
  • Implemented streaming analytics via AWS Kinesis and built event-driven workflows with Lambda and Step Functions.
  • Optimized Amazon Redshift warehousing through indexing, partitioning, compression, and workload management.
  • Delivered governance with AWS Lake Formation, IAM policy design, and cross-platform security compliance.
  • Produced Amazon QuickSight dashboards for business stakeholders and managed EMR clusters for large-scale analytics.
  • Enabled ML pipelines using Amazon SageMaker and enforced observability with AWS CloudWatch and X-Ray.
  • Automated infrastructure provisioning via AWS CloudFormation and CDK, integrating CI/CD with Jenkins.
  • Led cloud migration initiatives from on-premises sources to AWS, ensuring secure data transfer and minimal downtime.

Environment

AWS · Amazon RDS · DynamoDB · Apache Spark · PySpark · AWS Glue · AWS Kinesis · Jenkins · AWS Data Pipeline · Amazon Redshift · Amazon QuickSight · AWS Lake Formation · Amazon EMR · AWS CloudWatch · Jupyter Notebooks · Amazon SageMaker · AWS Lambda · AWS X-Ray · AWS CloudFormation · AWS CDK · AWS Step Functions · AWS IAM

State Street · Data Engineer

April 2021 – December 2023 · Texas, USA

Azure Platform
  • Optimized Azure Synapse analytics workloads with robust data models, monitoring, and query tuning.
  • Automated infrastructure via PowerShell, Azure CLI, and Terraform to scale data services across regions.
  • Delivered low-latency applications with Azure Cosmos DB and streaming ingestion via Apache Kafka.
  • Developed automated ETL pipelines in Azure Data Factory and orchestrated transformations with PySpark, Pandas, and NumPy.
  • Created interactive Power BI dashboards and enforced governance with Azure Purview and Key Vault.
  • Managed CI/CD through Azure DevOps and BitBucket, ensuring reliable deployments of data solutions.
  • Leveraged Azure Databricks, HDInsight, and AKS for collaborative analytics and containerized workflows.
  • Implemented observability with the ELK Stack and structured storage via ADLS, Blob Storage, and SQL Server.

Environment

Azure Synapse Analytics · PowerShell · Azure CLI · Azure Cosmos DB · Power BI · Apache Spark · Azure HDInsight · Python · Pandas · NumPy · PySpark · Apache Kafka · Azure Machine Learning · Azure Data Factory · ADLS · Azure Key Vault · Azure Purview · Azure DevOps · BitBucket · T-SQL · SQL Server · Azure Blob Storage · Azure Databricks · ELK Stack · Terraform · Azure Kubernetes Service

Target · Data Engineer

September 2019 – March 2021 · California, USA

AWS Analytics
  • Administered Amazon RDS and DynamoDB clusters achieving 99.99% uptime for hybrid workloads.
  • Built petabyte-scale ETL frameworks via AWS Glue and PySpark, reducing ingestion SLAs.
  • Delivered distributed processing on Amazon EMR and real-time streaming with AWS Kinesis.
  • Optimized Amazon Redshift storage and query strategies, integrating QuickSight dashboards.
  • Strengthened governance with AWS Lake Formation and IAM-based least-privilege controls.
  • Automated infrastructure with AWS CloudFormation, CDK, and CI/CD pipelines through Jenkins.
  • Executed on-premises to AWS migrations, ensuring compliance, security, and minimal downtime.
  • Integrated SageMaker models into analytics pipelines and instrumented observability using CloudWatch and X-Ray.

Environment

AWS · Amazon RDS · DynamoDB · AWS Glue · PySpark · Apache Spark · Amazon EMR · AWS Kinesis · Amazon Redshift · AWS Lake Formation · AWS Lambda · AWS Step Functions · AWS CloudFormation · AWS CDK · Jenkins · Amazon QuickSight · AWS Data Pipeline · AWS IAM · AWS CloudWatch · AWS X-Ray · Hive · Amazon S3 · AWS Glue Data Catalog · Amazon SageMaker · Git · JIRA

Unified Healthcare Group · Data Engineer

May 2017 – August 2019 · Minnesota, USA

Hybrid Cloud
  • Designed ETL pipelines funneling heterogeneous data into Snowflake and AWS-based data lakes.
  • Integrated Talend, Apache Spark, and PySpark to deliver scalable batch transformations.
  • Implemented Kafka streaming for real-time ingestion and predictive analytics readiness.
  • Employed MySQL, MongoDB, and SQL optimizations to ensure data quality and accessibility.
  • Embedded Scikit-Learn models within data pipelines to support forecasting and insights.
  • Managed Hadoop ecosystems (HDFS, MapReduce, Hive, Pig) for large-scale processing.
  • Curated analytics artifacts in Jupyter Notebooks and maintained version control with Git.

Environment

Snowflake · AWS S3 · EC2 · Redshift · Talend · Apache Spark · PySpark · Apache Kafka · MySQL · MongoDB · Scikit-Learn · Apache Hadoop · HDFS · MapReduce · Hive · Pig · Python · Pandas · NumPy · Jupyter Notebooks · Git

Sequent Global Technologies Inc · Jr. Data Engineer

June 2014 – December 2015 · Hyderabad, India

Foundational Data Ops
  • Built ETL pipelines across PostgreSQL, Redshift, and AWS RDS ensuring data reliability and consistency.
  • Authored advanced SQL queries and Python scripts using Pandas and NumPy for data wrangling.
  • Produced interactive Excel-based dashboards leveraging PivotTables, VLOOKUP, and analysis toolkits.
  • Managed data ingestion via AWS S3 and documented workflows in Jupyter Notebooks.
  • Implemented Agile practices, collaborating across teams and controlling versions with Git.
  • Delivered statistical analyses and imputations to maintain data completeness and accuracy.

Environment

PostgreSQL · Redshift · AWS RDS · SQL · Pandas · NumPy · Excel · PivotTables · VLOOKUP · Data Analysis Toolpak · AWS S3 · Jupyter Notebooks · Git · Agile · Scrum

Let’s build your next data platform.

Available for senior data engineering engagements, multi-cloud architecture leadership, and real-time analytics initiatives.