- Table View
- List View
Data Engineering and Communication Technology: Proceedings of ICDECT 2020 (Lecture Notes on Data Engineering and Communications Technologies #63)
by K. Srujan Raju Boby George K. Ashoka Reddy B. Rama DeviThis book includes selected papers presented at the 4th International Conference on Data Engineering and Communication Technology (ICDECT 2020), held at Kakatiya Institute of Technology & Science, Warangal, India, during 25–26 September 2020. It features advanced, multidisciplinary research towards the design of smart computing, information systems and electronic systems. It also focuses on various innovation paradigms in system knowledge, intelligence and sustainability which can be applied to provide viable solutions to diverse problems related to society, the environment and industry.
Data Engineering and Intelligent Computing: Proceedings of 5th ICICC 2021, Volume 1 (Lecture Notes in Networks and Systems #446)
by Suresh Chandra Satapathy Vikrant Bhateja Jerry Chun-Wei Lin Lai Khin Wee T. M. RajeshThis book features a collection of high-quality, peer-reviewed papers presented at the Fifth International Conference on Intelligent Computing and Communication (ICICC 2021) organized by the Department of Computer Science and Engineering and the Department of Computer Science and Technology, Dayananda Sagar University, Bengaluru, India, on 26–27 November 2021. The book is organized in two volumes and discusses advanced and multi-disciplinary research regarding the design of smart computing and informatics. It focuses on innovation paradigms in system knowledge, intelligence and sustainability that can be applied to provide practical solutions to a number of problems in society, the environment and industry. Further, the book also addresses the deployment of emerging computational and knowledge transfer approaches, optimizing solutions in various disciplines of science, technology and health care.
Data Engineering and Intelligent Computing: Proceedings of IC3T 2016 (Advances in Intelligent Systems and Computing #542)
by Suresh Chandra Satapathy Vikrant Bhateja K. Srujan Raju B. JanakiramaiahThe book is a compilation of high-quality scientific papers presented at the 3rd International Conference on Computer & Communication Technologies (IC3T 2016). The individual papers address cutting-edge technologies and applications of soft computing, artificial intelligence and communication. In addition, a variety of further topics are discussed, which include data mining, machine intelligence, fuzzy computing, sensor networks, signal and image processing, human-computer interaction, web intelligence, etc. As such, it offers readers a valuable and unique resource.
Data Engineering and Intelligent Computing: Proceedings of ICICC 2020 (Advances in Intelligent Systems and Computing #1)
by Suresh Chandra Satapathy Vikrant Bhateja V. N. Manjunath Aradhya Carlos M. Travieso-GonzálezThis book features a collection of high-quality, peer-reviewed papers presented at the Fourth International Conference on Intelligent Computing and Communication (ICICC 2020) organized by the Department of Computer Science and Engineering and the Department of Computer Science and Technology, Dayananda Sagar University, Bengaluru, India, on 18–20 September 2020. The book is organized in two volumes and discusses advanced and multi-disciplinary research regarding the design of smart computing and informatics. It focuses on innovation paradigms in system knowledge, intelligence and sustainability that can be applied to provide practical solutions to a number of problems in society, the environment and industry. Further, the book also addresses the deployment of emerging computational and knowledge transfer approaches, optimizing solutions in various disciplines of science, technology and health care.
Data Engineering for Beginners (Tech Today)
by Chisom NwokwuA hands-on technical and industry roadmap for aspiring data engineers In Data Engineering for Beginners, big data expert Chisom Nwokwu delivers a beginner-friendly handbook for everyone interested in the fundamentals of data engineering. Whether you're interested in starting a rewarding, new career as a data analyst, data engineer, or data scientist, or seeking to expand your skillset in an existing engineering role, Nwokwu offers the technical and industry knowledge you need to succeed. The book explains: Database fundamentals, including relational and noSQL databases Data warehouses and data lakes Data pipelines, including info about batch and stream processing Data quality dimensions Data security principles, including data encryption Data governance principles and data framework Big data and distributed systems concepts Data engineering on the cloud Essential skills and tools for data engineering interviews and jobs Data Engineering for Beginners offers an easy-to-read roadmap on a seemingly complicated and intimidating subject. It addresses the topics most likely to cause a beginning data engineer to stumble, clearly explaining key concepts in an accessible way. You'll also find: A comprehensive glossary of data engineering terms Common and practical career paths in the data engineering industry An introduction to key cloud technologies and services you may encounter early in your data engineering career Perfect for practicing and aspiring data analysts, data scientists, and data engineers, Data Engineering for Beginners is an effective and reliable starting point for learning an in-demand skill. It's a powerful resource for everyone hoping to expand their data engineering Skillset and upskill in the big data era.
Data Engineering for Cybersecurity: Build Secure Data Pipelines with Free and Open-Source Tools
by James BonifieldTurn raw logs into real intelligence.Security teams rely on telemetry—the continuous stream of logs, events, metrics, and signals that reveal what&’s happening across systems, endpoints, and cloud services. But that data doesn&’t organize itself. It has to be collected, normalized, enriched, and secured before it becomes useful. That&’s where data engineering comes in.In this hands-on guide, cybersecurity engineer James Bonifield teaches you how to design and build scalable, secure data pipelines using free, open source tools such as Filebeat, Logstash, Redis, Kafka, and Elasticsearch and more. You&’ll learn how to collect telemetry from Windows including Sysmon and PowerShell events, Linux files and syslog, and streaming data from network and security appliances. You&’ll then transform it into structured formats, secure it in transit, and automate your deployments using Ansible.You&’ll also learn how to:Encrypt and secure data in transit using TLS and SSHCentrally manage code and configuration files using GitTransform messy logs into structured eventsEnrich data with threat intelligence using Redis and MemcachedStream and centralize data at scale with KafkaAutomate with Ansible for repeatable deploymentsWhether you&’re building a pipeline on a tight budget or deploying an enterprise-scale system, this book shows you how to centralize your security data, support real-time detection, and lay the groundwork for incident response and long-term forensics.
Data Engineering for Machine Learning Pipelines: From Python Libraries to ML Pipelines and Cloud Platforms
by Pavan Kumar NarayananThis book covers modern data engineering functions and important Python libraries, to help you develop state-of-the-art ML pipelines and integration code. The book begins by explaining data analytics and transformation, delving into the Pandas library, its capabilities, and nuances. It then explores emerging libraries such as Polars and CuDF, providing insights into GPU-based computing and cutting-edge data manipulation techniques. The text discusses the importance of data validation in engineering processes, introducing tools such as Great Expectations and Pandera to ensure data quality and reliability. The book delves into API design and development, with a specific focus on leveraging the power of FastAPI. It covers authentication, authorization, and real-world applications, enabling you to construct efficient and secure APIs using FastAPI. Also explored is concurrency in data engineering, examining Dask's capabilities from basic setup to crafting advanced machine learning pipelines. The book includes development and delivery of data engineering pipelines using leading cloud platforms such as AWS, Google Cloud, and Microsoft Azure. The concluding chapters concentrate on real-time and streaming data engineering pipelines, emphasizing Apache Kafka and workflow orchestration in data engineering. Workflow tools such as Airflow and Prefect are introduced to seamlessly manage and automate complex data workflows. What sets this book apart is its blend of theoretical knowledge and practical application, a structured path from basic to advanced concepts, and insights into using state-of-the-art tools. With this book, you gain access to cutting-edge techniques and insights that are reshaping the industry. This book is not just an educational tool. It is a career catalyst, and an investment in your future as a data engineering expert, poised to meet the challenges of today's data-driven world. What You Will Learn Elevate your data wrangling jobs by utilizing the power of both CPU and GPU computing, and learn to process data using Pandas 2.0, Polars, and CuDF at unprecedented speeds Design data validation pipelines, construct efficient data service APIs, develop real-time streaming pipelines and master the art of workflow orchestration to streamline your engineering projects Leverage concurrent programming to develop machine learning pipelines and get hands-on experience in development and deployment of machine learning pipelines across AWS, GCP, and Azure Who This Book Is For Data analysts, data engineers, data scientists, machine learning engineers, and MLOps specialists
Data Engineering for Smart Systems: Proceedings of SSIC 2021 (Lecture Notes in Networks and Systems #238)
by Sumit Srivastava Vivek Kumar Verma Priyadarsi Nanda Rohit Kumar Gupta Arka Prokash MazumdarThis book features original papers from the 3rd International Conference on Smart IoT Systems: Innovations and Computing (SSIC 2021), organized by Manipal University, Jaipur, India, during January 22–23, 2021. It discusses scientific works related to data engineering in the context of computational collective intelligence consisted of interaction between smart devices for smart environments and interactions. Thanks to the high-quality content and the broad range of topics covered, the book appeals to researchers pursuing advanced studies.
Data Engineering in Medical Imaging: First MICCAI Workshop, DEMI 2023, Held in Conjunction with MICCAI 2023, Vancouver, BC, Canada, October 8, 2023, Proceedings (Lecture Notes in Computer Science #14314)
by Danail Stoyanov Sharib Ali Binod Bhattarai Anita Rau Anh Nguyen Ana Namburete Razvan CaramalauVolume LNCS 14414 constitutes the refereed proceedings of the 26th International Conference on Medical Image Computing and Computer-Assisted Intervention, MICCAI 2023, which was held in Vancouver, Canada in October 2023.The DEMI 2023 proceedings contain 11 high-quality papers of 9 to 15 pages pre-selected through a rigorous peer review process (with an average of three reviews per paper). All submissions were peer-reviewed through a double-blind process by at least three members of the scientific review committee, comprising 16 experts in the field of medical imaging. The accepted manuscripts cover various medical image analysis methods and applications.
Data Engineering in Medical Imaging: Second MICCAI Workshop, DEMI 2024, Held in Conjunction with MICCAI 2024, Marrakesh, Morocco, October 10, 2024, Proceedings (Lecture Notes in Computer Science #15265)
by Danail Stoyanov Sharib Ali Binod Bhattarai Anita Rau Anh Nguyen Ana Namburete Razvan Caramalau Prashnna GyawaliThis book constitutes the proceedings of the Second MICCAI Workshop on Data Engineering in Medical Imaging, DEMI 2024, held in conjunction with the 27th International conference on Medical Image Computing and Computer Assisted Intervention, MICCAI 2024, in Marrakesh, Morocco, on October 10, 2024. The 18 papers presented in this book were carefully reviewed and selected. These papers focus on the application of various Data engineering techniques in the field of Medical Imaging.
Data Engineering in Medical Imaging: Third MICCAI Workshop, DEMI 2025, Held in Conjunction with MICCAI 2025, Daejeon, South Korea, September 27, 2025, Proceedings (Lecture Notes in Computer Science #16191)
by Danail Stoyanov Binod Bhattarai Anita Rau Anh Nguyen Ana Namburete Razvan Caramalau Annika Reinke Prashnna GyawaliThis book constitutes the proceedings of the Third MICCAI Workshop on Data Engineering in Medical Imaging, DEMI 2025, held in conjunction with the 28th International conference on Medical Image Computing and Computer Assisted Intervention, MICCAI 2025, in Daejeon, South Korea, on September 27, 2025. The 24 full papers included in this book were carefully reviewed and selected from 33 submissions. These papers focus on the topic of data engineering in medical imaging and address open questions in the field. The workshop welcomes various approaches such as data and label augmentation, active learning and active synthesis, federated learning, multimodal learning, self-supervised learning, and large-scale data management and data quality assessment.
Data Engineering on Azure
by Vlad RiscutiaBuild a data platform to the industry-leading standards set by Microsoft&’s own infrastructure.Summary In Data Engineering on Azure you will learn how to: Pick the right Azure services for different data scenarios Manage data inventory Implement production quality data modeling, analytics, and machine learning workloads Handle data governance Using DevOps to increase reliability Ingesting, storing, and distributing data Apply best practices for compliance and access control Data Engineering on Azure reveals the data management patterns and techniques that support Microsoft&’s own massive data infrastructure. Author Vlad Riscutia, a data engineer at Microsoft, teaches you to bring an engineering rigor to your data platform and ensure that your data prototypes function just as well under the pressures of production. You'll implement common data modeling patterns, stand up cloud-native data platforms on Azure, and get to grips with DevOps for both analytics and machine learning. Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications. About the technology Build secure, stable data platforms that can scale to loads of any size. When a project moves from the lab into production, you need confidence that it can stand up to real-world challenges. This book teaches you to design and implement cloud-based data infrastructure that you can easily monitor, scale, and modify. About the book In Data Engineering on Azure you&’ll learn the skills you need to build and maintain big data platforms in massive enterprises. This invaluable guide includes clear, practical guidance for setting up infrastructure, orchestration, workloads, and governance. As you go, you&’ll set up efficient machine learning pipelines, and then master time-saving automation and DevOps solutions. The Azure-based examples are easy to reproduce on other cloud platforms. What's inside Data inventory and data governance Assure data quality, compliance, and distribution Build automated pipelines to increase reliability Ingest, store, and distribute data Production-quality data modeling, analytics, and machine learning About the reader For data engineers familiar with cloud computing and DevOps. About the author Vlad Riscutia is a software architect at Microsoft. Table of Contents 1 Introduction PART 1 INFRASTRUCTURE 2 Storage 3 DevOps 4 Orchestration PART 2 WORKLOADS 5 Processing 6 Analytics 7 Machine learning PART 3 GOVERNANCE 8 Metadata 9 Data quality 10 Compliance 11 Distributing data
Data Engineering with AWS: Acquire the skills to design and build AWS-based data transformation pipelines like a pro
by Gareth EagarLooking to revolutionize your data transformation game with AWS? Look no further! From strong foundations to hands-on building of data engineering pipelines, our expert-led manual has got you covered.Key FeaturesDelve into robust AWS tools for ingesting, transforming, and consuming data, and for orchestrating pipelinesStay up to date with a comprehensive revised chapter on Data GovernanceBuild modern data platforms with a new section covering transactional data lakes and data meshBook DescriptionThis book, authored by a seasoned Senior Data Architect with 25 years of experience, aims to help you achieve proficiency in using the AWS ecosystem for data engineering. This revised edition provides updates in every chapter to cover the latest AWS services and features, takes a refreshed look at data governance, and includes a brand-new section on building modern data platforms which covers; implementing a data mesh approach, open-table formats (such as Apache Iceberg), and using DataOps for automation and observability. You'll begin by reviewing the key concepts and essential AWS tools in a data engineer's toolkit and getting acquainted with modern data management approaches. You'll then architect a data pipeline, review raw data sources, transform the data, and learn how that transformed data is used by various data consumers. You’ll learn how to ensure strong data governance, and about populating data marts and data warehouses along with how a data lakehouse fits into the picture. After that, you'll be introduced to AWS tools for analyzing data, including those for ad-hoc SQL queries and creating visualizations. Then, you'll explore how the power of machine learning and artificial intelligence can be used to draw new insights from data. In the final chapters, you'll discover transactional data lakes, data meshes, and how to build a cutting-edge data platform on AWS. By the end of this AWS book, you'll be able to execute data engineering tasks and implement a data pipeline on AWS like a pro!What you will learnSeamlessly ingest streaming data with Amazon Kinesis Data FirehoseOptimize, denormalize, and join datasets with AWS Glue StudioUse Amazon S3 events to trigger a Lambda process to transform a fileLoad data into a Redshift data warehouse and run queries with easeVisualize and explore data using Amazon QuickSightExtract sentiment data from a dataset using Amazon ComprehendBuild transactional data lakes using Apache Iceberg with Amazon AthenaLearn how a data mesh approach can be implemented on AWSWho this book is forThis book is for data engineers, data analysts, and data architects who are new to AWS and looking to extend their skills to the AWS cloud. Anyone new to data engineering who wants to learn about the foundational concepts, while gaining practical experience with common data engineering services on AWS, will also find this book useful. A basic understanding of big data-related topics and Python coding will help you get the most out of this book, but it’s not a prerequisite. Familiarity with the AWS console and core services will also help you follow along.
Data Engineering with AWS: Learn how to design and build cloud-based data transformation pipelines using AWS
by Gareth Eagar Rafael Pecora Marcos AmorimThe missing expert-led manual for the AWS ecosystem — go from foundations to building data engineering pipelines effortlesslyPurchase of the print or Kindle book includes a free eBook in the PDF format.Key FeaturesLearn about common data architectures and modern approaches to generating value from big dataExplore AWS tools for ingesting, transforming, and consuming data, and for orchestrating pipelinesLearn how to architect and implement data lakes and data lakehouses for big data analytics from a data lakes expertBook DescriptionWritten by a Senior Data Architect with over twenty-five years of experience in the business, Data Engineering for AWS is a book whose sole aim is to make you proficient in using the AWS ecosystem. Using a thorough and hands-on approach to data, this book will give aspiring and new data engineers a solid theoretical and practical foundation to succeed with AWS. As you progress, you'll be taken through the services and the skills you need to architect and implement data pipelines on AWS. You'll begin by reviewing important data engineering concepts and some of the core AWS services that form a part of the data engineer's toolkit. You'll then architect a data pipeline, review raw data sources, transform the data, and learn how the transformed data is used by various data consumers. You'll also learn about populating data marts and data warehouses along with how a data lakehouse fits into the picture. Later, you'll be introduced to AWS tools for analyzing data, including those for ad-hoc SQL queries and creating visualizations. In the final chapters, you'll understand how the power of machine learning and artificial intelligence can be used to draw new insights from data. By the end of this AWS book, you'll be able to carry out data engineering tasks and implement a data pipeline on AWS independently.What you will learnUnderstand data engineering concepts and emerging technologiesIngest streaming data with Amazon Kinesis Data FirehoseOptimize, denormalize, and join datasets with AWS Glue StudioUse Amazon S3 events to trigger a Lambda process to transform a fileRun complex SQL queries on data lake data using Amazon AthenaLoad data into a Redshift data warehouse and run queriesCreate a visualization of your data using Amazon QuickSightExtract sentiment data from a dataset using Amazon ComprehendWho this book is forThis book is for data engineers, data analysts, and data architects who are new to AWS and looking to extend their skills to the AWS cloud. Anyone new to data engineering who wants to learn about the foundational concepts while gaining practical experience with common data engineering services on AWS will also find this book useful.A basic understanding of big data-related topics and Python coding will help you get the most out of this book but it's not a prerequisite. Familiarity with the AWS console and core services will also help you follow along.
Data Engineering with Alteryx: Helping data engineers apply DataOps practices with Alteryx
by Paul HoughtonBuild and deploy data pipelines with Alteryx by applying practical DataOps principlesKey FeaturesLearn DataOps principles to build data pipelines with AlteryxBuild robust data pipelines with Alteryx DesignerUse Alteryx Server and Alteryx Connect to share and deploy your data pipelinesBook DescriptionAlteryx is a GUI-based development platform for data analytic applications.Data Engineering with Alteryx will help you leverage Alteryx's code-free aspects which increase development speed while still enabling you to make the most of the code-based skills you have.This book will teach you the principles of DataOps and how they can be used with the Alteryx software stack. You'll build data pipelines with Alteryx Designer and incorporate the error handling and data validation needed for reliable datasets. Next, you'll take the data pipeline from raw data, transform it into a robust dataset, and publish it to Alteryx Server following a continuous integration process.By the end of this Alteryx book, you'll be able to build systems for validating datasets, monitoring workflow performance, managing access, and promoting the use of your data sources.What you will learnBuild a working pipeline to integrate an external data sourceDevelop monitoring processes for the pipeline exampleUnderstand and apply DataOps principles to an Alteryx data pipelineGain skills for data engineering with the Alteryx software stackWork with spatial analytics and machine learning techniques in an Alteryx workflow Explore Alteryx workflow deployment strategies using metadata validation and continuous integrationOrganize content on Alteryx Server and secure user accessWho this book is forIf you're a data engineer, data scientist, or data analyst who wants to set up a reliable process for developing data pipelines using Alteryx, this book is for you. You'll also find this book useful if you are trying to make the development and deployment of datasets more robust by following the DataOps principles. Familiarity with Alteryx products will be helpful but is not necessary.
Data Engineering with Apache Spark, Delta Lake, and Lakehouse: Create scalable pipelines that ingest, curate, and aggregate complex data in a timely and secure way
by Danil Zburivsky Manoj KukrejaUnderstand the complexities of modern-day data engineering platforms and explore strategies to deal with them with the help of use case scenarios led by an industry expert in big dataKey FeaturesBecome well-versed with the core concepts of Apache Spark and Delta Lake for building data platformsLearn how to ingest, process, and analyze data that can be later used for training machine learning modelsUnderstand how to operationalize data models in production using curated dataBook DescriptionIn the world of ever-changing data and schemas, it is important to build data pipelines that can auto-adjust to changes. This book will help you build scalable data platforms that managers, data scientists, and data analysts can rely on.Starting with an introduction to data engineering, along with its key concepts and architectures, this book will show you how to use Microsoft Azure Cloud services effectively for data engineering. You'll cover data lake design patterns and the different stages through which the data needs to flow in a typical data lake. Once you've explored the main features of Delta Lake to build data lakes with fast performance and governance in mind, you'll advance to implementing the lambda architecture using Delta Lake. Packed with practical examples and code snippets, this book takes you through real-world examples based on production scenarios faced by the author in his 10 years of experience working with big data. Finally, you'll cover data lake deployment strategies that play an important role in provisioning the cloud resources and deploying the data pipelines in a repeatable and continuous way.By the end of this data engineering book, you'll know how to effectively deal with ever-changing data and create scalable data pipelines to streamline data science, ML, and artificial intelligence (AI) tasks.What you will learnDiscover the challenges you may face in the data engineering worldAdd ACID transactions to Apache Spark using Delta LakeUnderstand effective design strategies to build enterprise-grade data lakesExplore architectural and design patterns for building efficient data ingestion pipelinesOrchestrate a data pipeline for preprocessing data using Apache Spark and Delta Lake APIsAutomate deployment and monitoring of data pipelines in productionGet to grips with securing, monitoring, and managing data pipelines models efficientlyWho this book is forThis book is for aspiring data engineers and data analysts who are new to the world of data engineering and are looking for a practical guide to building scalable data platforms. If you already work with PySpark and want to use Delta Lake for data engineering, you'll find this book useful. Basic knowledge of Python, Spark, and SQL is expected.
Data Engineering with Databricks Cookbook: Build effective data and AI solutions using Apache Spark, Databricks, and Delta Lake
by Pulkit ChadhaWork through 70 recipes for implementing reliable data pipelines with Apache Spark, optimally store and process structured and unstructured data in Delta Lake, and use Databricks to orchestrate and govern your dataKey FeaturesLearn data ingestion, data transformation, and data management techniques using Apache Spark and Delta LakeGain practical guidance on using Delta Lake tables and orchestrating data pipelinesImplement reliable DataOps and DevOps practices, and enforce data governance policies on DatabricksPurchase of the print or Kindle book includes a free PDF eBookBook DescriptionWritten by a Senior Solutions Architect at Databricks, Data Engineering with Databricks Cookbook will show you how to effectively use Apache Spark, Delta Lake, and Databricks for data engineering, starting with comprehensive introduction to data ingestion and loading with Apache Spark. What makes this book unique is its recipe-based approach, which will help you put your knowledge to use straight away and tackle common problems. You’ll be introduced to various data manipulation and data transformation solutions that can be applied to data, find out how to manage and optimize Delta tables, and get to grips with ingesting and processing streaming data. The book will also show you how to improve the performance problems of Apache Spark apps and Delta Lake. Advanced recipes later in the book will teach you how to use Databricks to implement DataOps and DevOps practices, as well as how to orchestrate and schedule data pipelines using Databricks Workflows. You’ll also go through the full process of setup and configuration of the Unity Catalog for data governance. By the end of this book, you’ll be well-versed in building reliable and scalable data pipelines using modern data engineering technologies.What you will learnPerform data loading, ingestion, and processing with Apache SparkDiscover data transformation techniques and custom user-defined functions (UDFs) in Apache SparkManage and optimize Delta tables with Apache Spark and Delta Lake APIsUse Spark Structured Streaming for real-time data processingOptimize Apache Spark application and Delta table query performanceImplement DataOps and DevOps practices on DatabricksOrchestrate data pipelines with Delta Live Tables and Databricks WorkflowsImplement data governance policies with Unity CatalogWho this book is forThis book is for data engineers, data scientists, and data practitioners who want to learn how to build efficient and scalable data pipelines using Apache Spark, Delta Lake, and Databricks. To get the most out of this book, you should have basic knowledge of data architecture, SQL, and Python programming.
Data Engineering with Google Cloud Platform: A practical guide to operationalizing scalable data analytics systems on GCP
by Adi WijayaBuild and deploy your own data pipelines on GCP, make key architectural decisions, and gain the confidence to boost your career as a data engineerKey FeaturesUnderstand data engineering concepts, the role of a data engineer, and the benefits of using GCP for building your solutionLearn how to use the various GCP products to ingest, consume, and transform data and orchestrate pipelinesDiscover tips to prepare for and pass the Professional Data Engineer examBook DescriptionWith this book, you'll understand how the highly scalable Google Cloud Platform (GCP) enables data engineers to create end-to-end data pipelines right from storing and processing data and workflow orchestration to presenting data through visualization dashboards. Starting with a quick overview of the fundamental concepts of data engineering, you'll learn the various responsibilities of a data engineer and how GCP plays a vital role in fulfilling those responsibilities. As you progress through the chapters, you'll be able to leverage GCP products to build a sample data warehouse using Cloud Storage and BigQuery and a data lake using Dataproc. The book gradually takes you through operations such as data ingestion, data cleansing, transformation, and integrating data with other sources. You'll learn how to design IAM for data governance, deploy ML pipelines with the Vertex AI, leverage pre-built GCP models as a service, and visualize data with Google Data Studio to build compelling reports. Finally, you'll find tips on how to boost your career as a data engineer, take the Professional Data Engineer certification exam, and get ready to become an expert in data engineering with GCP. By the end of this data engineering book, you'll have developed the skills to perform core data engineering tasks and build efficient ETL data pipelines with GCP.What you will learnLoad data into BigQuery and materialize its output for downstream consumptionBuild data pipeline orchestration using Cloud ComposerDevelop Airflow jobs to orchestrate and automate a data warehouseBuild a Hadoop data lake, create ephemeral clusters, and run jobs on the Dataproc clusterLeverage Pub/Sub for messaging and ingestion for event-driven systemsUse Dataflow to perform ETL on streaming dataUnlock the power of your data with Data StudioCalculate the GCP cost estimation for your end-to-end data solutionsWho this book is forThis book is for data engineers, data analysts, and anyone looking to design and manage data processing pipelines using GCP. You'll find this book useful if you are preparing to take Google's Professional Data Engineer exam. Beginner-level understanding of data science, the Python programming language, and Linux commands is necessary. A basic understanding of data processing and cloud computing, in general, will help you make the most out of this book.
Data Engineering with Google Cloud Platform: A practical guide to operationalizing scalable data analytics systems on GCP
by Adi WijayaBuild and deploy your own data pipelines on GCP, make key architectural decisions, and gain the confidence to boost your career as a data engineerKey FeaturesUnderstand data engineering concepts, the role of a data engineer, and the benefits of using GCP for building your solutionLearn how to use the various GCP products to ingest, consume, and transform data and orchestrate pipelinesDiscover tips to prepare for and pass the Professional Data Engineer examBook DescriptionWith this book, you'll understand how the highly scalable Google Cloud Platform (GCP) enables data engineers to create end-to-end data pipelines right from storing and processing data and workflow orchestration to presenting data through visualization dashboards. Starting with a quick overview of the fundamental concepts of data engineering, you'll learn the various responsibilities of a data engineer and how GCP plays a vital role in fulfilling those responsibilities. As you progress through the chapters, you'll be able to leverage GCP products to build a sample data warehouse using Cloud Storage and BigQuery and a data lake using Dataproc. The book gradually takes you through operations such as data ingestion, data cleansing, transformation, and integrating data with other sources. You'll learn how to design IAM for data governance, deploy ML pipelines with the Vertex AI, leverage pre-built GCP models as a service, and visualize data with Google Data Studio to build compelling reports. Finally, you'll find tips on how to boost your career as a data engineer, take the Professional Data Engineer certification exam, and get ready to become an expert in data engineering with GCP. By the end of this data engineering book, you'll have developed the skills to perform core data engineering tasks and build efficient ETL data pipelines with GCP.What you will learnLoad data into BigQuery and materialize its output for downstream consumptionBuild data pipeline orchestration using Cloud ComposerDevelop Airflow jobs to orchestrate and automate a data warehouseBuild a Hadoop data lake, create ephemeral clusters, and run jobs on the Dataproc clusterLeverage Pub/Sub for messaging and ingestion for event-driven systemsUse Dataflow to perform ETL on streaming dataUnlock the power of your data with Data StudioCalculate the GCP cost estimation for your end-to-end data solutionsWho this book is forThis book is for data engineers, data analysts, and anyone looking to design and manage data processing pipelines using GCP. You'll find this book useful if you are preparing to take Google's Professional Data Engineer exam. Beginner-level understanding of data science, the Python programming language, and Linux commands is necessary. A basic understanding of data processing and cloud computing, in general, will help you make the most out of this book.
Data Engineering with Python: Work with massive datasets to design data models and automate data pipelines using Python
by Paul CrickardBuild, monitor, and manage real-time data pipelines to create data engineering infrastructure efficiently using open-source Apache projectsKey FeaturesBecome well-versed in data architectures, data preparation, and data optimization skills with the help of practical examplesDesign data models and learn how to extract, transform, and load (ETL) data using PythonSchedule, automate, and monitor complex data pipelines in productionBook DescriptionData engineering provides the foundation for data science and analytics, and forms an important part of all businesses. This book will help you to explore various tools and methods that are used for understanding the data engineering process using Python.The book will show you how to tackle challenges commonly faced in different aspects of data engineering. You'll start with an introduction to the basics of data engineering, along with the technologies and frameworks required to build data pipelines to work with large datasets. You'll learn how to transform and clean data and perform analytics to get the most out of your data. As you advance, you'll discover how to work with big data of varying complexity and production databases, and build data pipelines. Using real-world examples, you'll build architectures on which you'll learn how to deploy data pipelines.By the end of this Python book, you'll have gained a clear understanding of data modeling techniques, and will be able to confidently build data engineering pipelines for tracking data, running quality checks, and making necessary changes in production.What you will learnUnderstand how data engineering supports data science workflowsDiscover how to extract data from files and databases and then clean, transform, and enrich itConfigure processors for handling different file formats as well as both relational and NoSQL databasesFind out how to implement a data pipeline and dashboard to visualize resultsUse staging and validation to check data before landing in the warehouseBuild real-time pipelines with staging areas that perform validation and handle failuresGet to grips with deploying pipelines in the production environmentWho this book is forThis book is for data analysts, ETL developers, and anyone looking to get started with or transition to the field of data engineering or refresh their knowledge of data engineering using Python. This book will also be useful for students planning to build a career in data engineering or IT professionals preparing for a transition. No previous knowledge of data engineering is required.
Data Engineering with Python: Work with massive datasets to design data models and automate data pipelines using Python
by Paul CrickardBuild, monitor, and manage real-time data pipelines to create data engineering infrastructure efficiently using open-source Apache projectsKey FeaturesBecome well-versed in data architectures, data preparation, and data optimization skills with the help of practical examplesDesign data models and learn how to extract, transform, and load (ETL) data using PythonSchedule, automate, and monitor complex data pipelines in productionBook DescriptionData engineering provides the foundation for data science and analytics, and forms an important part of all businesses. This book will help you to explore various tools and methods that are used for understanding the data engineering process using Python. The book will show you how to tackle challenges commonly faced in different aspects of data engineering. You’ll start with an introduction to the basics of data engineering, along with the technologies and frameworks required to build data pipelines to work with large datasets. You’ll learn how to transform and clean data and perform analytics to get the most out of your data. As you advance, you'll discover how to work with big data of varying complexity and production databases, and build data pipelines. Using real-world examples, you’ll build architectures on which you’ll learn how to deploy data pipelines. By the end of this Python book, you’ll have gained a clear understanding of data modeling techniques, and will be able to confidently build data engineering pipelines for tracking data, running quality checks, and making necessary changes in production.What you will learnUnderstand how data engineering supports data science workflowsDiscover how to extract data from files and databases and then clean, transform, and enrich itConfigure processors for handling different file formats as well as both relational and NoSQL databasesFind out how to implement a data pipeline and dashboard to visualize resultsUse staging and validation to check data before landing in the warehouseBuild real-time pipelines with staging areas that perform validation and handle failuresGet to grips with deploying pipelines in the production environmentWho this book is forThis book is for data analysts, ETL developers, and anyone looking to get started with or transition to the field of data engineering or refresh their knowledge of data engineering using Python. This book will also be useful for students planning to build a career in data engineering or IT professionals preparing for a transition. No previous knowledge of data engineering is required.
Data Engineering with dbt: A practical guide to building a cloud-based, pragmatic, and dependable data platform with SQL
by Roberto ZagniUse easy-to-apply patterns in SQL and Python to adopt modern analytics engineering to build agile platforms with dbt that are well-tested and simple to extend and run Purchase of the print or Kindle book includes a free PDF eBookKey FeaturesBuild a solid dbt base and learn data modeling and the modern data stack to become an analytics engineerBuild automated and reliable pipelines to deploy, test, run, and monitor ELTs with dbt CloudGuided dbt + Snowflake project to build a pattern-based architecture that delivers reliable datasetsBook Descriptiondbt Cloud helps professional analytics engineers automate the application of powerful and proven patterns to transform data from ingestion to delivery, enabling real DataOps. This book begins by introducing you to dbt and its role in the data stack, along with how it uses simple SQL to build your data platform, helping you and your team work better together. You’ll find out how to leverage data modeling, data quality, master data management, and more to build a simple-to-understand and future-proof solution. As you advance, you’ll explore the modern data stack, understand how data-related careers are changing, and see how dbt enables this transition into the emerging role of an analytics engineer. The chapters help you build a sample project using the free version of dbt Cloud, Snowflake, and GitHub to create a professional DevOps setup with continuous integration, automated deployment, ELT run, scheduling, and monitoring, solving practical cases you encounter in your daily work. By the end of this dbt book, you’ll be able to build an end-to-end pragmatic data platform by ingesting data exported from your source systems, coding the needed transformations, including master data and the desired business rules, and building well-formed dimensional models or wide tables that’ll enable you to build reports with the BI tool of your choice.What you will learnCreate a dbt Cloud account and understand the ELT workflowCombine Snowflake and dbt for building modern data engineering pipelinesUse SQL to transform raw data into usable data, and test its accuracyWrite dbt macros and use Jinja to apply software engineering principlesTest data and transformations to ensure reliability and data qualityBuild a lightweight pragmatic data platform using proven patternsWrite easy-to-maintain idempotent code using dbt materializationWho this book is forThis book is for data engineers, analytics engineers, BI professionals, and data analysts who want to learn how to build simple, futureproof, and maintainable data platforms in an agile way. Project managers, data team managers, and decision makers looking to understand the importance of building a data platform and foster a culture of high-performing data teams will also find this book useful. Basic knowledge of SQL and data modeling will help you get the most out of the many layers of this book. The book also includes primers on many data-related subjects to help juniors get started.
Data Engineering: Mining, Information and Intelligence (International Series in Operations Research & Management Science #132)
by John Talburt Terry M. Talley Yupo ChanDATA ENGINEERING: Mining, Information, and Intelligence describes applied research aimed at the task of collecting data and distilling useful information from that data. Most of the work presented emanates from research completed through collaborations between Acxiom Corporation and its academic research partners under the aegis of the Acxiom Laboratory for Applied Research (ALAR). Chapters are roughly ordered to follow the logical sequence of the transformation of data from raw input data streams to refined information. Four discrete sections cover Data Integration and Information Quality; Grid Computing; Data Mining; and Visualization. Additionally, there are exercises at the end of each chapter. The primary audience for this book is the broad base of anyone interested in data engineering, whether from academia, market research firms, or business-intelligence companies. The volume is ideally suited for researchers, practitioners, and postgraduate students alike. With its focus on problems arising from industry rather than a basic research perspective, combined with its intelligent organization, extensive references, and subject and author indices, it can serve the academic, research, and industrial audiences.
Data Envelopment Analysis with R (Studies in Fuzziness and Soft Computing #386)
by Ali Ebrahimnejad Farhad Hosseinzadeh Lotfi Mohsen Vaez-Ghasemi Zohreh MoghaddasThis book introduces readers to the use of R codes for optimization problems. First, it provides the necessary background to understand data envelopment analysis (DEA), with a special emphasis on fuzzy DEA. It then describes DEA models, including fuzzy DEA models, and shows how to use them to solve optimization problems with R. Further, it discusses the main advantages of R in optimization problems, and provides R codes based on real-world data sets throughout. Offering a comprehensive review of DEA and fuzzy DEA models and the corresponding R codes, this practice-oriented reference guide is intended for masters and Ph.D. students in various disciplines, as well as practitioners and researchers.
Data Envelopment Analysis: A Handbook of Modeling Internal Structure and Network (International Series in Operations Research & Management Science #208)
by Joe Zhu Wade D. CookThis handbook serves as a complement to the Handbook on Data Envelopment Analysis (eds, W. W. Cooper, L. M. Seiford and J, Zhu, 2011, Springer) in an effort to extend the frontier of DEA research. It provides a comprehensive source for the state-of-the art DEA modeling on internal structures and network DEA. Chapter 1 provides a survey on two-stage network performance decomposition and modeling techniques. Chapter 2 discusses the pitfalls in network DEA modeling. Chapter 3 discusses efficiency decompositions in network DEA under three types of structures, namely series, parallel and dynamic. Chapter 4 studies the determination of the network DEA frontier. In chapter 5 additive efficiency decomposition in network DEA is discussed. An approach in scale efficiency measurement in two-stage networks is presented in chapter 6. Chapter 7 further discusses the scale efficiency decomposition in two stage networks. Chapter 8 offers a bargaining game approach to modeling two-stage networks. Chapter 9 studies shared resources and efficiency decomposition in two-stage networks. Chapter 10 introduces an approach to computing the technical efficiency scores for a dynamic production network and its sub-processes. Chapter 11 presents a slacks-based network DEA. Chapter 12 discusses a DEA modeling technique for a two-stage network process where the inputs of the second stage include both the outputs from the first stage and additional inputs to the second stage. Chapter 13 presents an efficiency measurement methodology for multi-stage production systems. Chapter 14 discusses network DEA models, both static and dynamic. The discussion also explores various useful objective functions that can be applied to the models to find the optimal allocation of resources for processes within the black box, that are normally invisible to DEA. Chapter 15 provides a comprehensive review of various type network DEA modeling techniques. Chapter 16 presents shared resources models for deriving aggregate measures of bank-branch performance, with accompanying component measures that make up that aggregate value. Chapter 17 examines a set of manufacturing plants operating under a single umbrella, with the objective being to use the component or function measures to decide what might be considered as each plant's core business. Chapter 18 considers problem settings where there may be clusters or groups of DMUs that form a hierarchy. The specific case of a set off electric power plants is examined in this context. Chapter 19 models bad outputs in two-stage network DEA. Chapter 20 presents an application of network DEA to performance measurement of Major League Baseball (MLB) teams. Chapter 21 presents an application of a two-stage network DEA model for examining the performance of 30 U. S. airline companies. Chapter 22 then presents two distinct network efficiency models that are applied to engineering systems.