Browse Results

Showing 15,526 through 15,550 of 61,974 results

Data Engineering with Databricks Cookbook: Build effective data and AI solutions using Apache Spark, Databricks, and Delta Lake

by Pulkit Chadha

Work through 70 recipes for implementing reliable data pipelines with Apache Spark, optimally store and process structured and unstructured data in Delta Lake, and use Databricks to orchestrate and govern your dataKey FeaturesLearn data ingestion, data transformation, and data management techniques using Apache Spark and Delta LakeGain practical guidance on using Delta Lake tables and orchestrating data pipelinesImplement reliable DataOps and DevOps practices, and enforce data governance policies on DatabricksPurchase of the print or Kindle book includes a free PDF eBookBook DescriptionWritten by a Senior Solutions Architect at Databricks, Data Engineering with Databricks Cookbook will show you how to effectively use Apache Spark, Delta Lake, and Databricks for data engineering, starting with comprehensive introduction to data ingestion and loading with Apache Spark. What makes this book unique is its recipe-based approach, which will help you put your knowledge to use straight away and tackle common problems. You’ll be introduced to various data manipulation and data transformation solutions that can be applied to data, find out how to manage and optimize Delta tables, and get to grips with ingesting and processing streaming data. The book will also show you how to improve the performance problems of Apache Spark apps and Delta Lake. Advanced recipes later in the book will teach you how to use Databricks to implement DataOps and DevOps practices, as well as how to orchestrate and schedule data pipelines using Databricks Workflows. You’ll also go through the full process of setup and configuration of the Unity Catalog for data governance. By the end of this book, you’ll be well-versed in building reliable and scalable data pipelines using modern data engineering technologies.What you will learnPerform data loading, ingestion, and processing with Apache SparkDiscover data transformation techniques and custom user-defined functions (UDFs) in Apache SparkManage and optimize Delta tables with Apache Spark and Delta Lake APIsUse Spark Structured Streaming for real-time data processingOptimize Apache Spark application and Delta table query performanceImplement DataOps and DevOps practices on DatabricksOrchestrate data pipelines with Delta Live Tables and Databricks WorkflowsImplement data governance policies with Unity CatalogWho this book is forThis book is for data engineers, data scientists, and data practitioners who want to learn how to build efficient and scalable data pipelines using Apache Spark, Delta Lake, and Databricks. To get the most out of this book, you should have basic knowledge of data architecture, SQL, and Python programming.

Data Engineering with Google Cloud Platform: A practical guide to operationalizing scalable data analytics systems on GCP

by Adi Wijaya

Build and deploy your own data pipelines on GCP, make key architectural decisions, and gain the confidence to boost your career as a data engineerKey FeaturesUnderstand data engineering concepts, the role of a data engineer, and the benefits of using GCP for building your solutionLearn how to use the various GCP products to ingest, consume, and transform data and orchestrate pipelinesDiscover tips to prepare for and pass the Professional Data Engineer examBook DescriptionWith this book, you'll understand how the highly scalable Google Cloud Platform (GCP) enables data engineers to create end-to-end data pipelines right from storing and processing data and workflow orchestration to presenting data through visualization dashboards. Starting with a quick overview of the fundamental concepts of data engineering, you'll learn the various responsibilities of a data engineer and how GCP plays a vital role in fulfilling those responsibilities. As you progress through the chapters, you'll be able to leverage GCP products to build a sample data warehouse using Cloud Storage and BigQuery and a data lake using Dataproc. The book gradually takes you through operations such as data ingestion, data cleansing, transformation, and integrating data with other sources. You'll learn how to design IAM for data governance, deploy ML pipelines with the Vertex AI, leverage pre-built GCP models as a service, and visualize data with Google Data Studio to build compelling reports. Finally, you'll find tips on how to boost your career as a data engineer, take the Professional Data Engineer certification exam, and get ready to become an expert in data engineering with GCP. By the end of this data engineering book, you'll have developed the skills to perform core data engineering tasks and build efficient ETL data pipelines with GCP.What you will learnLoad data into BigQuery and materialize its output for downstream consumptionBuild data pipeline orchestration using Cloud ComposerDevelop Airflow jobs to orchestrate and automate a data warehouseBuild a Hadoop data lake, create ephemeral clusters, and run jobs on the Dataproc clusterLeverage Pub/Sub for messaging and ingestion for event-driven systemsUse Dataflow to perform ETL on streaming dataUnlock the power of your data with Data StudioCalculate the GCP cost estimation for your end-to-end data solutionsWho this book is forThis book is for data engineers, data analysts, and anyone looking to design and manage data processing pipelines using GCP. You'll find this book useful if you are preparing to take Google's Professional Data Engineer exam. Beginner-level understanding of data science, the Python programming language, and Linux commands is necessary. A basic understanding of data processing and cloud computing, in general, will help you make the most out of this book.

Data Engineering with Google Cloud Platform: A practical guide to operationalizing scalable data analytics systems on GCP

by Adi Wijaya

Build and deploy your own data pipelines on GCP, make key architectural decisions, and gain the confidence to boost your career as a data engineerKey FeaturesUnderstand data engineering concepts, the role of a data engineer, and the benefits of using GCP for building your solutionLearn how to use the various GCP products to ingest, consume, and transform data and orchestrate pipelinesDiscover tips to prepare for and pass the Professional Data Engineer examBook DescriptionWith this book, you'll understand how the highly scalable Google Cloud Platform (GCP) enables data engineers to create end-to-end data pipelines right from storing and processing data and workflow orchestration to presenting data through visualization dashboards. Starting with a quick overview of the fundamental concepts of data engineering, you'll learn the various responsibilities of a data engineer and how GCP plays a vital role in fulfilling those responsibilities. As you progress through the chapters, you'll be able to leverage GCP products to build a sample data warehouse using Cloud Storage and BigQuery and a data lake using Dataproc. The book gradually takes you through operations such as data ingestion, data cleansing, transformation, and integrating data with other sources. You'll learn how to design IAM for data governance, deploy ML pipelines with the Vertex AI, leverage pre-built GCP models as a service, and visualize data with Google Data Studio to build compelling reports. Finally, you'll find tips on how to boost your career as a data engineer, take the Professional Data Engineer certification exam, and get ready to become an expert in data engineering with GCP. By the end of this data engineering book, you'll have developed the skills to perform core data engineering tasks and build efficient ETL data pipelines with GCP.What you will learnLoad data into BigQuery and materialize its output for downstream consumptionBuild data pipeline orchestration using Cloud ComposerDevelop Airflow jobs to orchestrate and automate a data warehouseBuild a Hadoop data lake, create ephemeral clusters, and run jobs on the Dataproc clusterLeverage Pub/Sub for messaging and ingestion for event-driven systemsUse Dataflow to perform ETL on streaming dataUnlock the power of your data with Data StudioCalculate the GCP cost estimation for your end-to-end data solutionsWho this book is forThis book is for data engineers, data analysts, and anyone looking to design and manage data processing pipelines using GCP. You'll find this book useful if you are preparing to take Google's Professional Data Engineer exam. Beginner-level understanding of data science, the Python programming language, and Linux commands is necessary. A basic understanding of data processing and cloud computing, in general, will help you make the most out of this book.

Data Engineering with Python: Work with massive datasets to design data models and automate data pipelines using Python

by Paul Crickard

Build, monitor, and manage real-time data pipelines to create data engineering infrastructure efficiently using open-source Apache projectsKey FeaturesBecome well-versed in data architectures, data preparation, and data optimization skills with the help of practical examplesDesign data models and learn how to extract, transform, and load (ETL) data using PythonSchedule, automate, and monitor complex data pipelines in productionBook DescriptionData engineering provides the foundation for data science and analytics, and forms an important part of all businesses. This book will help you to explore various tools and methods that are used for understanding the data engineering process using Python.The book will show you how to tackle challenges commonly faced in different aspects of data engineering. You'll start with an introduction to the basics of data engineering, along with the technologies and frameworks required to build data pipelines to work with large datasets. You'll learn how to transform and clean data and perform analytics to get the most out of your data. As you advance, you'll discover how to work with big data of varying complexity and production databases, and build data pipelines. Using real-world examples, you'll build architectures on which you'll learn how to deploy data pipelines.By the end of this Python book, you'll have gained a clear understanding of data modeling techniques, and will be able to confidently build data engineering pipelines for tracking data, running quality checks, and making necessary changes in production.What you will learnUnderstand how data engineering supports data science workflowsDiscover how to extract data from files and databases and then clean, transform, and enrich itConfigure processors for handling different file formats as well as both relational and NoSQL databasesFind out how to implement a data pipeline and dashboard to visualize resultsUse staging and validation to check data before landing in the warehouseBuild real-time pipelines with staging areas that perform validation and handle failuresGet to grips with deploying pipelines in the production environmentWho this book is forThis book is for data analysts, ETL developers, and anyone looking to get started with or transition to the field of data engineering or refresh their knowledge of data engineering using Python. This book will also be useful for students planning to build a career in data engineering or IT professionals preparing for a transition. No previous knowledge of data engineering is required.

Data Engineering with Python: Work with massive datasets to design data models and automate data pipelines using Python

by Paul Crickard

Build, monitor, and manage real-time data pipelines to create data engineering infrastructure efficiently using open-source Apache projectsKey FeaturesBecome well-versed in data architectures, data preparation, and data optimization skills with the help of practical examplesDesign data models and learn how to extract, transform, and load (ETL) data using PythonSchedule, automate, and monitor complex data pipelines in productionBook DescriptionData engineering provides the foundation for data science and analytics, and forms an important part of all businesses. This book will help you to explore various tools and methods that are used for understanding the data engineering process using Python. The book will show you how to tackle challenges commonly faced in different aspects of data engineering. You’ll start with an introduction to the basics of data engineering, along with the technologies and frameworks required to build data pipelines to work with large datasets. You’ll learn how to transform and clean data and perform analytics to get the most out of your data. As you advance, you'll discover how to work with big data of varying complexity and production databases, and build data pipelines. Using real-world examples, you’ll build architectures on which you’ll learn how to deploy data pipelines. By the end of this Python book, you’ll have gained a clear understanding of data modeling techniques, and will be able to confidently build data engineering pipelines for tracking data, running quality checks, and making necessary changes in production.What you will learnUnderstand how data engineering supports data science workflowsDiscover how to extract data from files and databases and then clean, transform, and enrich itConfigure processors for handling different file formats as well as both relational and NoSQL databasesFind out how to implement a data pipeline and dashboard to visualize resultsUse staging and validation to check data before landing in the warehouseBuild real-time pipelines with staging areas that perform validation and handle failuresGet to grips with deploying pipelines in the production environmentWho this book is forThis book is for data analysts, ETL developers, and anyone looking to get started with or transition to the field of data engineering or refresh their knowledge of data engineering using Python. This book will also be useful for students planning to build a career in data engineering or IT professionals preparing for a transition. No previous knowledge of data engineering is required.

Data Engineering with dbt: A practical guide to building a cloud-based, pragmatic, and dependable data platform with SQL

by Roberto Zagni

Use easy-to-apply patterns in SQL and Python to adopt modern analytics engineering to build agile platforms with dbt that are well-tested and simple to extend and run Purchase of the print or Kindle book includes a free PDF eBookKey FeaturesBuild a solid dbt base and learn data modeling and the modern data stack to become an analytics engineerBuild automated and reliable pipelines to deploy, test, run, and monitor ELTs with dbt CloudGuided dbt + Snowflake project to build a pattern-based architecture that delivers reliable datasetsBook Descriptiondbt Cloud helps professional analytics engineers automate the application of powerful and proven patterns to transform data from ingestion to delivery, enabling real DataOps. This book begins by introducing you to dbt and its role in the data stack, along with how it uses simple SQL to build your data platform, helping you and your team work better together. You’ll find out how to leverage data modeling, data quality, master data management, and more to build a simple-to-understand and future-proof solution. As you advance, you’ll explore the modern data stack, understand how data-related careers are changing, and see how dbt enables this transition into the emerging role of an analytics engineer. The chapters help you build a sample project using the free version of dbt Cloud, Snowflake, and GitHub to create a professional DevOps setup with continuous integration, automated deployment, ELT run, scheduling, and monitoring, solving practical cases you encounter in your daily work. By the end of this dbt book, you’ll be able to build an end-to-end pragmatic data platform by ingesting data exported from your source systems, coding the needed transformations, including master data and the desired business rules, and building well-formed dimensional models or wide tables that’ll enable you to build reports with the BI tool of your choice.What you will learnCreate a dbt Cloud account and understand the ELT workflowCombine Snowflake and dbt for building modern data engineering pipelinesUse SQL to transform raw data into usable data, and test its accuracyWrite dbt macros and use Jinja to apply software engineering principlesTest data and transformations to ensure reliability and data qualityBuild a lightweight pragmatic data platform using proven patternsWrite easy-to-maintain idempotent code using dbt materializationWho this book is forThis book is for data engineers, analytics engineers, BI professionals, and data analysts who want to learn how to build simple, futureproof, and maintainable data platforms in an agile way. Project managers, data team managers, and decision makers looking to understand the importance of building a data platform and foster a culture of high-performing data teams will also find this book useful. Basic knowledge of SQL and data modeling will help you get the most out of the many layers of this book. The book also includes primers on many data-related subjects to help juniors get started.

Data Engineering: Mining, Information and Intelligence (International Series in Operations Research & Management Science #132)

by John Talburt Terry M. Talley Yupo Chan

DATA ENGINEERING: Mining, Information, and Intelligence describes applied research aimed at the task of collecting data and distilling useful information from that data. Most of the work presented emanates from research completed through collaborations between Acxiom Corporation and its academic research partners under the aegis of the Acxiom Laboratory for Applied Research (ALAR). Chapters are roughly ordered to follow the logical sequence of the transformation of data from raw input data streams to refined information. Four discrete sections cover Data Integration and Information Quality; Grid Computing; Data Mining; and Visualization. Additionally, there are exercises at the end of each chapter. The primary audience for this book is the broad base of anyone interested in data engineering, whether from academia, market research firms, or business-intelligence companies. The volume is ideally suited for researchers, practitioners, and postgraduate students alike. With its focus on problems arising from industry rather than a basic research perspective, combined with its intelligent organization, extensive references, and subject and author indices, it can serve the academic, research, and industrial audiences.

Data Envelopment Analysis with R (Studies in Fuzziness and Soft Computing #386)

by Ali Ebrahimnejad Farhad Hosseinzadeh Lotfi Mohsen Vaez-Ghasemi Zohreh Moghaddas

This book introduces readers to the use of R codes for optimization problems. First, it provides the necessary background to understand data envelopment analysis (DEA), with a special emphasis on fuzzy DEA. It then describes DEA models, including fuzzy DEA models, and shows how to use them to solve optimization problems with R. Further, it discusses the main advantages of R in optimization problems, and provides R codes based on real-world data sets throughout. Offering a comprehensive review of DEA and fuzzy DEA models and the corresponding R codes, this practice-oriented reference guide is intended for masters and Ph.D. students in various disciplines, as well as practitioners and researchers.

Data Envelopment Analysis: A Handbook of Modeling Internal Structure and Network (International Series in Operations Research & Management Science #208)

by Joe Zhu Wade D. Cook

This handbook serves as a complement to the Handbook on Data Envelopment Analysis (eds, W. W. Cooper, L. M. Seiford and J, Zhu, 2011, Springer) in an effort to extend the frontier of DEA research. It provides a comprehensive source for the state-of-the art DEA modeling on internal structures and network DEA. Chapter 1 provides a survey on two-stage network performance decomposition and modeling techniques. Chapter 2 discusses the pitfalls in network DEA modeling. Chapter 3 discusses efficiency decompositions in network DEA under three types of structures, namely series, parallel and dynamic. Chapter 4 studies the determination of the network DEA frontier. In chapter 5 additive efficiency decomposition in network DEA is discussed. An approach in scale efficiency measurement in two-stage networks is presented in chapter 6. Chapter 7 further discusses the scale efficiency decomposition in two stage networks. Chapter 8 offers a bargaining game approach to modeling two-stage networks. Chapter 9 studies shared resources and efficiency decomposition in two-stage networks. Chapter 10 introduces an approach to computing the technical efficiency scores for a dynamic production network and its sub-processes. Chapter 11 presents a slacks-based network DEA. Chapter 12 discusses a DEA modeling technique for a two-stage network process where the inputs of the second stage include both the outputs from the first stage and additional inputs to the second stage. Chapter 13 presents an efficiency measurement methodology for multi-stage production systems. Chapter 14 discusses network DEA models, both static and dynamic. The discussion also explores various useful objective functions that can be applied to the models to find the optimal allocation of resources for processes within the black box, that are normally invisible to DEA. Chapter 15 provides a comprehensive review of various type network DEA modeling techniques. Chapter 16 presents shared resources models for deriving aggregate measures of bank-branch performance, with accompanying component measures that make up that aggregate value. Chapter 17 examines a set of manufacturing plants operating under a single umbrella, with the objective being to use the component or function measures to decide what might be considered as each plant's core business. Chapter 18 considers problem settings where there may be clusters or groups of DMUs that form a hierarchy. The specific case of a set off electric power plants is examined in this context. Chapter 19 models bad outputs in two-stage network DEA. Chapter 20 presents an application of network DEA to performance measurement of Major League Baseball (MLB) teams. Chapter 21 presents an application of a two-stage network DEA model for examining the performance of 30 U. S. airline companies. Chapter 22 then presents two distinct network efficiency models that are applied to engineering systems.

Data Ethics and Challenges (SpringerBriefs in Applied Sciences and Technology)

by Samiksha Shukla Joseph Varghese Kureethara Jossy P. George Kapil Tiwari

This book gives a thorough and systematic introduction to Data, Data Sources, Dimensions of Data, Privacy, and Security Challenges associated with Data, Ethics, Laws, IPR Copyright, and Technology Law. This book will help students, scholars, and practitioners to understand the challenges while dealing with data and its ethical and legal aspects. The book focuses on emerging issues while working with the Data.

Data Ethics: Practical Strategies for Implementing Ethical Information Management and Governance

by Katherine O'Keefe Daragh O Brien

Data-gathering technology is more sophisticated than ever, as are the ethical standards for using this data. This second edition shows how to navigate this complex environment.Data Ethics provides a practical framework for the implementation of ethical principles into information management systems. It shows how to assess the types of ethical dilemmas organizations might face as they become more data-driven. This fully updated edition includes guidance on sustainability and environmental management and on how ethical frameworks can be standardized across cultures that have conflicting values. There is also discussion of data colonialism, the challenge of ethical trade-offs with ad-tech and analytics such as Covid-19 tracking systems and case studies on Smart Cities and Demings Principles.As the pace of developments in data-processing technology continues to increase, it is vital to capitalize on the opportunities this affords while ensuring that ethical standards and ideals are not compromised. Written by internationally regarded experts in the field, Data Ethics is the essential guide for students and practitioners to optimizing ethical data standards in organizations.

Data Exfiltration Threats and Prevention Techniques: Machine Learning and Memory-Based Data Security

by Zahir Tari Nasrin Sohrabi Yasaman Samadi Jakapan Suaboot

DATA EXFILTRATION THREATS AND PREVENTION TECHNIQUES Comprehensive resource covering threat prevention techniques for data exfiltration and applying machine learning applications to aid in identification and prevention Data Exfiltration Threats and Prevention Techniques provides readers the knowledge needed to prevent and protect from malware attacks by introducing existing and recently developed methods in malware protection using AI, memory forensic, and pattern matching, presenting various data exfiltration attack vectors and advanced memory-based data leakage detection, and discussing ways in which machine learning methods have a positive impact on malware detection. Providing detailed descriptions of the recent advances in data exfiltration detection methods and technologies, the authors also discuss details of data breach countermeasures and attack scenarios to show how the reader may identify a potential cyber attack in the real world. Composed of eight chapters, this book presents a better understanding of the core issues related to the cyber-attacks as well as the recent methods that have been developed in the field. In Data Exfiltration Threats and Prevention Techniques, readers can expect to find detailed information on: Sensitive data classification, covering text pre-processing, supervised text classification, automated text clustering, and other sensitive text detection approaches Supervised machine learning technologies for intrusion detection systems, covering taxonomy and benchmarking of supervised machine learning techniques Behavior-based malware detection using API-call sequences, covering API-call extraction techniques and detecting data stealing behavior based on API-call sequences Memory-based sensitive data monitoring for real-time data exfiltration detection and advanced time delay data exfiltration attack and detection Aimed at professionals and students alike, Data Exfiltration Threats and Prevention Techniques highlights a range of machine learning methods that can be used to detect potential data theft and identifies research gaps and the potential to make change in the future as technology continues to grow.

Data Fabric and Data Mesh Approaches with AI: A Guide to AI-based Data Cataloging, Governance, Integration, Orchestration, and Consumption

by Eberhard Hechler Maryela Weihrauch Yan (Catherine) Wu

Understand modern data fabric and data mesh concepts using AI-based self-service data discovery and delivery capabilities, a range of intelligent data integration styles, and automated unified data governance—all designed to deliver "data as a product" within hybrid cloud landscapes.This book teaches you how to successfully deploy state-of-the-art data mesh solutions and gain a comprehensive overview on how a data fabric architecture uses artificial intelligence (AI) and machine learning (ML) for automated metadata management and self-service data discovery and consumption. You will learn how data fabric and data mesh relate to other concepts such as data DataOps, MLOps, AIDevOps, and more. Many examples are included to demonstrate how to modernize the consumption of data to enable a shopping-for-data (data as a product) experience.By the end of this book, you will understand the data fabric concept and architecture as it relates to themes such as automated unified data governance and compliance, enterprise information architecture, AI and hybrid cloud landscapes, and intelligent cataloging and metadata management. What You Will LearnDiscover best practices and methods to successfully implement a data fabric architecture and data mesh solutionUnderstand key data fabric capabilities, e.g., self-service data discovery, intelligent data integration techniques, intelligent cataloging and metadata management, and trustworthy AIRecognize the importance of data fabric to accelerate digital transformation and democratize data accessDive into important data fabric topics, addressing current data fabric challengesConceive data fabric and data mesh concepts holistically within an enterprise contextBecome acquainted with the business benefits of data fabric and data mesh Who This Book Is ForAnyone who is interested in deploying modern data fabric architectures and data mesh solutions within an enterprise, including IT and business leaders, data governance and data office professionals, data stewards and engineers, data scientists, and information and data architects. Readers should have a basic understanding of enterprise information architecture.

Data Flow Analysis: Theory and Practice

by Uday Khedker Amitabha Sanyal Bageshri Sathe

Data flow analysis is used to discover information for a wide variety of useful applications, ranging from compiler optimizations to software engineering and verification. Modern compilers apply it to produce performance-maximizing code, and software engineers use it to re-engineer or reverse engineer programs and verify the integrity of their programs. Supplementary Online Materials to Strengthen Understanding Unlike most comparable books, many of which are limited to bit vector frameworks and classical constant propagation, Data Flow Analysis: Theory and Practice offers comprehensive coverage of both classical and contemporary data flow analysis. It prepares foundations useful for both researchers and students in the field by standardizing and unifying various existing research, concepts, and notations. It also presents mathematical foundations of data flow analysis and includes study of data flow analysis implantation through use of the GNU Compiler Collection (GCC). Divided into three parts, this unique text combines discussions of inter- and intraprocedural analysis and then describes implementation of a generic data flow analyzer (gdfa) for bit vector frameworks in GCC. Through the inclusion of case studies and examples to reinforce material, this text equips readers with a combination of mutually supportive theory and practice, and they will be able to access the author’s accompanying Web page. Here they can experiment with the analyses described in the book, and can make use of updated features, including: Slides used in the authors’ courses The source of the generic data flow analyzer (gdfa) An errata that features errors as they are discovered Additional updated relevant material discovered in the course of research

Data Fluency

by Richard Galentino Zach Gemignani Patrick Schuermann Chris Gemignani

A dream come true for those looking to improve their data fluencyAnalytical data is a powerful tool for growing companies, but what good is it if it hides in the shadows? Bring your data to the forefront with effective visualization and communication approaches, and let Data Fluency: Empowering Your Organization with Effective Communication show you the best tools and strategies for getting the job done right. Learn the best practices of data presentation and the ways that reporting and dashboards can help organizations effectively gauge performance, identify areas for improvement, and communicate results.Topics covered in the book include data reporting and communication, audience and user needs, data presentation tools, layout and styling, and common design failures. Those responsible for analytics, reporting, or BI implementation will find a refreshing take on data and visualization in this resource, as will report, data visualization, and dashboard designers.Conquer the challenge of making valuable data approachable and easy to understandDevelop unique skills required to shape data to the needs of different audiencesFull color book links to bonus content at juiceanalytics.comWritten by well-known and highly esteemed authors in the data presentation communityData Fluency: Empowering Your Organization with Effective Communication focuses on user experience, making reports approachable, and presenting data in a compelling, inspiring way. The book helps to dissolve the disconnect between your data and those who might use it and can help make an impact on the people who are most affected by data. Use Data Fluency today to develop the skills necessary to turn data into effective displays for decision-making.

Data Forecasting and Segmentation Using Microsoft Excel: Perform data grouping, linear predictions, and time series machine learning statistics without using code

by Fernando Roque

Perform time series forecasts, linear prediction, and data segmentation with no-code Excel machine learningKey FeaturesSegment data, regression predictions, and time series forecasts without writing any codeGroup multiple variables with K-means using Excel plugin without programmingBuild, validate, and predict with a multiple linear regression model and time series forecastsBook DescriptionData Forecasting and Segmentation Using Microsoft Excel guides you through basic statistics to test whether your data can be used to perform regression predictions and time series forecasts. The exercises covered in this book use real-life data from Kaggle, such as demand for seasonal air tickets and credit card fraud detection.You'll learn how to apply the grouping K-means algorithm, which helps you find segments of your data that are impossible to see with other analyses, such as business intelligence (BI) and pivot analysis. By analyzing groups returned by K-means, you'll be able to detect outliers that could indicate possible fraud or a bad function in network packets.By the end of this Microsoft Excel book, you'll be able to use the classification algorithm to group data with different variables. You'll also be able to train linear and time series models to perform predictions and forecasts based on past data.What you will learnUnderstand why machine learning is important for classifying data segmentationFocus on basic statistics tests for regression variable dependencyTest time series autocorrelation to build a useful forecastUse Excel add-ins to run K-means without programmingAnalyze segment outliers for possible data anomalies and fraudBuild, train, and validate multiple regression models and time series forecastsWho this book is forThis book is for data and business analysts as well as data science professionals. MIS, finance, and auditing professionals working with MS Excel will also find this book beneficial.

Data Governance For Dummies

by Jonathan Reichental

How to build and maintain strong data organizations—the Dummies way Data Governance For Dummies offers an accessible first step for decision makers into understanding how data governance works and how to apply it to an organization in a way that improves results and doesn't disrupt. Prep your organization to handle the data explosion (if you know, you know) and learn how to manage this valuable asset. Take full control of your organization&’s data with all the info and how-tos you need. This book walks you through making accurate data readily available and maintaining it in a secure environment. It serves as your step-by-step guide to extracting every ounce of value from your data. Identify the impact and value of data in your business Design governance programs that fit your organization Discover and adopt tools that measure performance and need Address data needs and build a more data-centric business cultureThis is the perfect handbook for professionals in the world of data analysis and business intelligence, plus the people who interact with data on a daily basis. And, as always, Dummies explains things in terms anyone can understand, making it easy to learn everything you need to know.

Data Governance Success: Growing and Sustaining Data Governance

by Rupa Mahanti

While good data is an enterprise asset, bad data is an enterprise liability. Data governance enables you to effectively and proactively manage data assets throughout the enterprise by providing guidance in the form of policies, standards, processes and rules and defining roles and responsibilities outlining who will do what, with respect to data. While implementing data governance is not rocket science, it is not a simple exercise. There is a lot confusion around what data governance is, and a lot of challenges in the implementation of data governance. Data governance is not a project or a one-off exercise but a journey that involves a significant amount of effort, time and investment and cultural change and a number of factors to take into consideration to achieve and sustain data governance success. Data Governance Success: Growing and Sustaining Data Governance is the third and final book in the Data Governance series and discusses the following:• Data governance perceptions and challenges • Key considerations when implementing data governance to achieve and sustain success• Strategy and data governance• Different data governance maturity frameworks• Data governance – people and process elements• Data governance metricsThis book shares the combined knowledge related to data and data governance that the author has gained over the years of working in different industrial and research programs and projects associated with data, processes, and technologies and unique perspectives of Thought Leaders and Data Experts through Interviews conducted. This book will be highly beneficial for IT students, academicians, information management and business professionals and researchers to enhance their knowledge to support and succeed in data governance implementations. This book is technology agnostic and contains a balance of concepts and examples and illustrations making it easy for the readers to understand and relate to their own specific data projects.

Data Governance and Compliance: Evolving to Our Current High Stakes Environment

by Rupa Mahanti

This book sets the stage of the evolution of corporate governance, laws and regulations, other forms of governance, and the interaction between data governance and other corporate governance sub-disciplines. Given the continuously evolving and complex regulatory landscape and the growing number of laws and regulations, compliance is a widely discussed issue in the field of data. This book considers the cost of non-compliance bringing in examples from different industries of instances in which companies failed to comply with rules, regulations, and other legal obligations, and goes on to explain how data governance helps in avoiding such pitfalls.The first in a three-volume series on data governance, this book does not assume any prior or specialist knowledge in data governance and will be highly beneficial for IT, management and law students, academics, information management and business professionals, and researchers to enhance their knowledge and get guidance in managing their own data governance projects from a governance and compliance perspective.

Data Governance and Data Management: Contextualizing Data Governance Drivers, Technologies, and Tools

by Rupa Mahanti

This book delves into the concept of data as a critical enterprise asset needed for informed decision making, compliance, regulatory reporting and insights into trends, behaviors, performance and patterns. With good data being key to staying ahead in a competitive market, enterprises capture and store exponential volumes of data. Considering the business impact of data, there needs to be adequate management around it to derive the best value. Data governance is one of the core data management related functions. However, it is often overlooked, misunderstood or confused with other terminologies and data management functions. Given the pervasiveness of data and the importance of data, this book provides comprehensive understanding of the business drivers for data governance and benefits of data governance, the interactions of data governance function with other data management functions and various components and aspects of data governance that can be facilitated by technology and tools, the distinction between data management tools and data governance tools, the readiness checks to perform before exploring the market to purchase a data governance tool, the different aspects that must be considered when comparing and selecting the appropriate data governance technologies and tools from large number of options available in the marketplace and the different market players that provide tools for supporting data governance. This book combines the data and data governance knowledge that the author has gained over years of working in different industrial and research programs and projects associated with data, processes and technologies with unique perspectives gained through interviews with thought leaders and data experts. This book is highly beneficial for IT students, academicians, information management and business professionals and researchers to enhance their knowledge and get guidance on implementing data governance in their own data initiatives.

Data Governance and the Digital Economy in Asia: Harmonising Cross-Border Data Flows (Routledge Studies in the Modern World Economy)

by Paul Cheung Liu Jingting Ulrike Sengstschmid

Data governance is the cornerstone of digital economy growth, particularly in Asia, where both the digital economy and the policy space are fast expanding. The chapters collected in this volume delve into how diverse and rapidly evolving data governance models of ASEAN countries and their Asian partners are shaping the regional digital economy integration, particularly through cross-border data flows.The book begins with an examination of the diffusion of data governance rules globally and their economic impacts on a macro level. It then delves into a regional analysis, emphasising the interplay between data governance and economic development. Key discussions include data policies in India, China, South Korea, and ASEAN countries, enriched with insights from industry leaders. The book evaluates the role of regional and international trade agreements in facilitating digital trade and explores the consequences of widely differing data governance models for the ASEAN regional economy, with a special focus on implications for ASEAN’s Digital Economy Framework Agreement.Written for scholars of digital economy, data governance, and digital trade, this book provides a thorough understanding of Asia’s data regulatory environment. Policymakers and industry professionals will also find the book’s insights into the intricacies of digital economy policies and their implications useful in navigating the future of digital economic integration and growth in the ASEAN region.

Data Governance for Managers: The Driver of Value Stream Optimization and a Pacemaker for Digital Transformation (Management for Professionals)

by Lars Michael Bollweg

Professional data management is the foundation for the successful digital transformation of traditional companies. Unfortunately, many companies fail to implement data governance because they do not fully understand the complexity of the challenge (organizational structure, employee empowerment, change management, etc.) and therefore do not include all aspects in the planning and implementation of their data governance. This book explains the driving role that a responsive data organization can play in a company's digital transformation. Using proven process models, the book takes readers from the basics, through planning and implementation, to regular operations and measuring the success of data governance. All the important decision points are highlighted, and the advantages and disadvantages are discussed in order to identify digitization potential, implement it in the company, and develop customized data governance. The book will serve as a useful guide for interested newcomers as well as for experienced managers.

Data Governance: A Guide

by Dimitrios Sargiotis

This book is a comprehensive resource designed to demystify the complex world of data governance for professionals across various sectors. This guide provides in-depth insights, methodologies, and best practices to help organizations manage their data effectively and securely. It covers essential topics such as data quality, privacy, security, and management ensuring that readers gain a holistic understanding of how to establish and maintain a robust data governance framework. Through a blend of theoretical knowledge and practical applications, this book addresses the challenges and benefits of data governance, equipping readers with the tools needed to navigate the evolving data landscape. In addition to foundational principles, this book explores real-world case studies that illustrate the tangible benefits and common pitfalls of implementing data governance. Emerging trends and technologies, including artificial intelligence, machine learning, and blockchain are also examined to prepare readers for future developments in the field. Whether you are a seasoned data management professional or new to the discipline, this book serves as an invaluable resource for mastering the intricacies of data governance and leveraging data as a strategic asset for organizational success. This resourceful guide targets data management professionals, IT managers, Compliance officers, Data Stewards, Data Owners Data Governance Managers and more. Business leaders, business executives academic researchers, students focused on computer science in data-related fields will also find this book a useful resource.

Data Governance: From the Fundamentals to Real Cases

by Mario Piattini Ismael Caballero

This book presents a set of models, methods, and techniques that allow the successful implementation of data governance (DG) in an organization and reports real experiences of data governance in different public and private sectors. To this end, this book is composed of two parts. Part I on “Data Governance Fundamentals” begins with an introduction to the concept of data governance that stresses that DG is not primarily focused on databases, clouds, or other technologies, but that the DG framework must be understood by business users, systems personnel, and the systems themselves alike. Next, chapter 2 addresses crucial topics for DG, such as the evolution of data management in organizations, data strategy and policies, and defensive and offensive approaches to data strategy. Chapter 3 then details the central role that human resources play in DG, analysing the key responsibilities of the different DG-related roles and boards, while chapter 4 discusses the most common barriers to DG in practice. Chapter 5 summarizes the paradigm shifts in DG from control to value creation. Subsequently chapter 6 explores the needs, characteristics and key functionalities of DG tools, before this part ends with a chapter on maturity models for data governance. Part II on “Data Governance Applied” consists of five chapters which review the situation of DG in different sectors and industries. Details about DG in the banking sector, public administration, insurance companies, healthcare and telecommunications each are presented in one chapter. The book is aimed at academics, researchers and practitioners (especially CIOs, Data Governors, or Data Stewards) involved in DG. It can also serve as a reference for courses on data governance in information systems.

Data Governance: How to Design, Deploy and Sustain an Effective Data Governance Program (The Morgan Kaufmann Series on Business Intelligence)

by John Ladley

This book is for any manager or team leader that has the green light to implement a data governance program. The problem of managing data continues to grow with issues surrounding cost of storage, exponential growth, as well as administrative, management and security concerns – the solution to being able to scale all of these issues up is data governance which provides better services to users and saves money. What you will find in this book is an overview of why data governance is needed, how to design, initiate, and execute a program and how to keep the program sustainable. With the provided framework and case studies you will be enabled and educated in launching your very own successful and money saving data governance program. - Provides a complete overview of the data governance lifecycle, that can help you discern technology and staff needs - Specifically aimed at managers who need to implement a data governance program at their company - Includes case studies to detail 'do's' and 'don'ts' in real-world situations

Refine Search

Showing 15,526 through 15,550 of 61,974 results