- Table View
- List View
Data Analysis and Classification: Methods and Applications (Studies in Classification, Data Analysis, and Knowledge Organization)
by Krzysztof Jajuga Marek Walesiak Krzysztof NajmanThis volume gathers peer-reviewed contributions that address a wide range of recent developments in the methodology and applications of data analysis and classification tools in micro and macroeconomic problems. The papers were originally presented at the 29th Conference of the Section on Classification and Data Analysis of the Polish Statistical Association, SKAD 2020, held in Sopot, Poland, September 7–9, 2020. Providing a balance between methodological contributions and empirical papers, the book is divided into five parts focusing on methodology, finance, economics, social issues and applications dealing with COVID-19 data. It is aimed at a wide audience, including researchers at universities and research institutions, graduate and doctoral students, practitioners, data scientists and employees in public statistical institutions.
Data Analysis and Optimization for Engineering and Computing Problems: Proceedings of the 3rd EAI International Conference on Computer Science and Engineering and Health Services (EAI/Springer Innovations in Communication and Computing)
by Pandian Vasant Igor Litvinchev Jose Antonio Marmolejo-Saucedo Roman Rodriguez-Aguilar Felix Martinez-RiosThis book presents the proceedings of The EAI International Conference on Computer Science: Applications in Engineering and Health Services (COMPSE 2019). The conference highlighted the latest research innovations and applications of algorithms designed for optimization applications within the fields of Science, Computer Science, Engineering, Information Technology, Management, Finance and Economics and Health Systems. Focusing on a variety of methods and systems as well as practical examples, this conference is a significant resource for post graduate-level students, decision makers, and researchers in both public and private sectors who are seeking research-based methods for modelling uncertain and unpredictable real-world problems.
Data Analysis and Optimization: In Honor of Boris Mirkin's 80th Birthday (Springer Optimization and Its Applications #202)
by Boris Goldengorin Sergei KuznetsovThis book presents the state-of-the-art in the emerging field of data science and includes models for layered security with applications in the protection of sites—such as large gathering places—through high-stake decision-making tasks. Such tasks include cancer diagnostics, self-driving cars, and others where wrong decisions can possibly have catastrophic consequences. Additionally, this book provides readers with automated methods to analyze patterns and models for various types of data, with applications ranging from scientific discovery to business intelligence and analytics. The book primarily includes exploratory data analysis, pattern mining, clustering, and classification supported by real life case studies. The statistical section of this book explores the impact of data mining and modeling on the predictability assessment of time series. Further new notions of mean values based on ideas of multi-criteria optimization are compared with their conventional definitions, leading to new algorithmic approaches to the calculation of the suggested new means. The style of the written chapters and the provision of a broad yet in-depth overview of data mining, integrating novel concepts from machine learning and statistics, make the book accessible to upper level undergraduate and graduate students in data mining courses. Students and professionals specializing in computer and management science, data mining for high-dimensional data, complex graphs and networks will benefit from the cutting-edge ideas and practically motivated case studies in this book.
Data Analysis and Pattern Recognition in Multiple Databases (Intelligent Systems Reference Library #61)
by Witold Pedrycz Animesh Adhikari Jhimli AdhikariPattern recognition in data is a well known classical problem that falls under the ambit of data analysis. As we need to handle different data, the nature of patterns, their recognition and the types of data analyses are bound to change. Since the number of data collection channels increases in the recent time and becomes more diversified, many real-world data mining tasks can easily acquire multiple databases from various sources. In these cases, data mining becomes more challenging for several essential reasons. We may encounter sensitive data originating from different sources - those cannot be amalgamated. Even if we are allowed to place different data together, we are certainly not able to analyze them when local identities of patterns are required to be retained. Thus, pattern recognition in multiple databases gives rise to a suite of new, challenging problems different from those encountered before. Association rule mining, global pattern discovery and mining patterns of select items provide different patterns discovery techniques in multiple data sources. Some interesting item-based data analyses are also covered in this book. Interesting patterns, such as exceptional patterns, icebergs and periodic patterns have been recently reported. The book presents a thorough influence analysis between items in time-stamped databases. The recent research on mining multiple related databases is covered while some previous contributions to the area are highlighted and contrasted with the most recent developments.
Data Analysis and Rationality in a Complex World (Studies in Classification, Data Analysis, and Knowledge Organization)
by Angela Montanari Berthold Lausen Theodore Chadjipadelis Angelos Markos Tae Rim Lee Rebecca NugentThis volume presents the latest advances in statistics and data science, including theoretical, methodological and computational developments and practical applications related to classification and clustering, data gathering, exploratory and multivariate data analysis, statistical modeling, and knowledge discovery and seeking. It includes contributions on analyzing and interpreting large, complex and aggregated datasets, and highlights numerous applications in economics, finance, computer science, political science and education. It gathers a selection of peer-reviewed contributions presented at the 16th Conference of the International Federation of Classification Societies (IFCS 2019), which was organized by the Greek Society of Data Analysis and held in Thessaloniki, Greece, on August 26-29, 2019.
Data Analysis and Related Applications 3: Theory and Practice, New Approaches
by Christos H. Skiadas Yiannis DimotikalisThe book is a collective work by a number of leading scientists, analysts, engineers, mathematicians and statisticians who have been working at the forefront of data analysis and related applications, arising from data science, operations research, engineering, machine learning or statistics. The chapters of this collaborative work represent a cross-section of current research interests in the above scientific areas. The collected material has been divided into appropriate sections to provide the reader with both theoretical and applied information on data analysis methods, models and techniques, along with appropriate applications. The published data analysis methodology includes the updated state-of-the-art rapidly developed theory and applications of data expansion, both of which go through outstanding changes nowadays. New approaches are expected to deliver and have been developed, including Artificial Intelligence.
Data Analysis and Related Applications, Volume 1: Computational, Algorithmic and Applied Economic Data Analysis
by Alex Karagrigoriou Christos H. Skiadas Yiannis Dimotikalis Konstantinos N. Zafeiris Christiana Karagrigoriou-VontaThe scientific field of data analysis is constantly expanding due to the rapid growth of the computer industry and the wide applicability of computational and algorithmic techniques, in conjunction with new advances in statistical, stochastic and analytic tools. There is a constant need for new, high-quality publications to cover the recent advances in all fields of science and engineering.This book is a collective work by a number of leading scientists, computer experts, analysts, engineers, mathematicians, probabilists and statisticians who have been working at the forefront of data analysis and related applications. The chapters of this collaborative work represent a cross-section of current concerns, developments and research interests in the above scientific areas. The collected material has been divided into appropriate sections to provide the reader with both theoretical and applied information on data analysis methods, models and techniques, along with related applications.
Data Analysis and Related Applications, Volume 2: Multivariate, Health and Demographic Data Analysis
by Alex Karagrigoriou Christos H. Skiadas Yiannis Dimotikalis Konstantinos N. Zafeiris Christiana Karagrigoriou-VontaThe scientific field of data analysis is constantly expanding due to the rapid growth of the computer industry and the wide applicability of computational and algorithmic techniques, in conjunction with new advances in statistical, stochastic and analytic tools. There is a constant need for new, high-quality publications to cover the recent advances in all fields of science and engineering.This book is a collective work by a number of leading scientists, computer experts, analysts, engineers, mathematicians, probabilists and statisticians who have been working at the forefront of data analysis and related applications. The chapters of this collaborative work represent a cross-section of current concerns, developments and research interests in the above scientific areas. The collected material has been divided into appropriate sections to provide the reader with both theoretical and applied information on data analysis methods, models and techniques, along with related applications.
Data Analysis and Visualization Using Python: Analyze Data to Create Visualizations for BI Systems
by Dr Ossama EmbarakLook at Python from a data science point of view and learn proven techniques for data visualization as used in making critical business decisions. Starting with an introduction to data science with Python, you will take a closer look at the Python environment and get acquainted with editors such as Jupyter Notebook and Spyder. After going through a primer on Python programming, you will grasp fundamental Python programming techniques used in data science. Moving on to data visualization, you will see how it caters to modern business needs and forms a key factor in decision-making. You will also take a look at some popular data visualization libraries in Python. Shifting focus to data structures, you will learn the various aspects of data structures from a data science perspective. You will then work with file I/O and regular expressions in Python, followed by gathering and cleaning data. Moving on to exploring and analyzing data, you will look at advanced data structures in Python. Then, you will take a deep dive into data visualization techniques, going through a number of plotting systems in Python. In conclusion, you will complete a detailed case study, where you’ll get a chance to revisit the concepts you’ve covered so far.What You Will LearnUse Python programming techniques for data scienceMaster data collections in Python Create engaging visualizations for BI systemsDeploy effective strategies for gathering and cleaning dataIntegrate the Seaborn and Matplotlib plotting systemsWho This Book Is ForDevelopers with basic Python programming knowledge looking to adopt key strategies for data analysis and visualizations using Python.
Data Analysis for Direct Numerical Simulations of Turbulent Combustion: From Equation-Based Analysis to Machine Learning
by Heinz Pitsch Antonio AttiliThis book presents methodologies for analysing large data sets produced by the direct numerical simulation (DNS) of turbulence and combustion. It describes the development of models that can be used to analyse large eddy simulations, and highlights both the most common techniques and newly emerging ones. The chapters, written by internationally respected experts, invite readers to consider DNS of turbulence and combustion from a formal, data-driven standpoint, rather than one led by experience and intuition. This perspective allows readers to recognise the shortcomings of existing models, with the ultimate goal of quantifying and reducing model-based uncertainty. In addition, recent advances in machine learning and statistical inferences offer new insights on the interpretation of DNS data. The book will especially benefit graduate-level students and researchers in mechanical and aerospace engineering, e.g. those with an interest in general fluid mechanics, applied mathematics, and the environmental and atmospheric sciences.
Data Analysis for Neurodegenerative Disorders (Cognitive Technologies)
by Amira S. Ashour Yanhui Guo Deepika Koundal Deepak Kumar Jain Atef ZaguiaThis book explores the challenges involved in handling medical big data in the diagnosis of neurological disorders. It discusses how to optimally reduce the number of neuropsychological tests during the classification of these disorders by using feature selection methods based on the diagnostic information of enrolled subjects. The book includes key definitions/models and covers their applications in different types of signal/image processing for neurological disorder data. An extensive discussion on the possibility of enhancing the abilities of AI systems using the different data analysis is included. The book recollects several applicable basic preliminaries of the different AI networks and models, while also highlighting basic processes in image processing for various neurological disorders. It also reports on several applications to image processing and explores numerous topics concerning the role of big data analysis in addressing signal and image processing in various real-world scenarios involving neurological disorders.This cutting-edge book highlights the analysis of medical data, together with novel procedures and challenges for handling neurological signals and images. It will help engineers, researchers and software developers to understand the concepts and different models of AI and data analysis. To help readers gain a comprehensive grasp of the subject, it focuses on three key features:● Presents outstanding concepts and models for using AI in clinical applications involving neurological disorders, with clear descriptions of image representation, feature extraction and selection.● Highlights a range of techniques for evaluating the performance of proposed CAD systems for the diagnosis of neurological disorders.● Examines various signal and image processing methods for efficient decision support systems. Soft computing, machine learning and optimization algorithms are also included to improve the CAD systems used.
Data Analysis for Physical Scientists
by Les KirkupThe ability to summarise data, compare models and apply computer-based analysis tools are vital skills necessary for studying and working in the physical sciences. This textbook supports undergraduate students as they develop and enhance these skills. Introducing data analysis techniques, this textbook pays particular attention to the internationally recognised guidelines for calculating and expressing measurement uncertainty. This new edition has been revised to incorporate Excel® 2010. It also provides a practical approach to fitting models to data using non-linear least squares, a powerful technique which can be applied to many types of model. Worked examples using actual experimental data help students understand how the calculations apply to real situations. Over 200 in-text exercises and end-of-chapter problems give students the opportunity to use the techniques themselves and gain confidence in applying them. Answers to the exercises and problems are given at the end of the book.
Data Analysis for Social Science: A Friendly and Practical Introduction
by Kosuke Imai Elena LlaudetAn ideal textbook for complete beginners—teaches from scratch R, statistics, and the fundamentals of quantitative social scienceData Analysis for Social Science provides a friendly introduction to the statistical concepts and programming skills needed to conduct and evaluate social scientific studies. Assuming no prior knowledge of statistics and coding and only minimal knowledge of math, the book teaches the fundamentals of survey research, predictive models, and causal inference while analyzing data from published studies with the statistical program R. It teaches not only how to perform the data analyses but also how to interpret the results and identify the analyses&’ strengths and limitations.Progresses by teaching how to solve one kind of problem after another, bringing in methods as needed. It teaches, in this order, how to (1) estimate causal effects with randomized experiments, (2) visualize and summarize data, (3) infer population characteristics, (4) predict outcomes, (5) estimate causal effects with observational data, and (6) generalize from sample to population.Flips the script of traditional statistics textbooks. It starts by estimating causal effects with randomized experiments and postpones any discussion of probability and statistical inference until the final chapters. This unconventional order engages students by demonstrating from the very beginning how data analysis can be used to answer interesting questions, while reserving more abstract, complex concepts for later chapters.Provides a step-by-step guide to analyzing real-world data using the powerful, open-source statistical program R, which is free for everyone to use. The datasets are provided on the book&’s website so that readers can learn how to analyze data by following along with the exercises in the book on their own computer.Assumes no prior knowledge of statistics or coding.Specifically designed to accommodate students with a variety of math backgrounds. It includes supplemental materials for students with minimal knowledge of math and clearly identifies sections with more advanced material so that readers can skip them if they so choose.Provides cheatsheets of statistical concepts and R code.Comes with instructor materials (upon request), including sample syllabi, lecture slides, and additional replication-style exercises with solutions and with the real-world datasets analyzed. Looking for a more advanced introduction? Consider Quantitative Social Science by Kosuke Imai. In addition to covering the material in Data Analysis for Social Science, it teaches diffs-in-diffs models, heterogeneous effects, text analysis, and regression discontinuity designs, among other things.
Data Analysis in Bi-partial Perspective: Clustering and Beyond (Studies in Computational Intelligence #818)
by Jan W. OwsińskiThis book presents the bi-partial approach to data analysis, which is both uniquely general and enables the development of techniques for many data analysis problems, including related models and algorithms. It is based on adequate representation of the essential clustering problem: to group together the similar, and to separate the dissimilar. This leads to a general objective function and subsequently to a broad class of concrete implementations. Using this basis, a suboptimising procedure can be developed, together with a variety of implementations.This procedure has a striking affinity with the classical hierarchical merger algorithms, while also incorporating the stopping rule, based on the objective function. The approach resolves the cluster number issue, as the solutions obtained include both the content and the number of clusters. Further, it is demonstrated how the bi-partial principle can be effectively applied to a wide variety of problems in data analysis.The book offers a valuable resource for all data scientists who wish to broaden their perspective on basic approaches and essential problems, and to thus find answers to questions that are often overlooked or have yet to be solved convincingly. It is also intended for graduate students in the computer and data sciences, and will complement their knowledge and skills with fresh insights on problems that are otherwise treated in the standard “academic” manner.
Data Analysis in Medicine and Health using R (Analytics and AI for Healthcare)
by Kamarul Imran Musa Wan Nor Mansor Tengku Muhammad HanisIn medicine and health, data are analyzed to guide treatment plans, patient care and control and prevention policies. However, in doing so, researchers in medicine and health often lack the understanding of data and statistical concepts and the skills in programming. In addition, there is also an increasing demand for data analyses to be reproducible, along with more complex data that require cutting-edge analysis. This book provides readers with both the fundamental concepts of data and statistical analysis and modeling. It also has the skills to perform the analysis using the R programming language, which is the lingua franca for statisticians. The topics in the book are presented in a sequence to minimize the time to help readers understand the objectives of data and statistical analysis, learn the concepts of statistical modeling and acquire the skills to perform the analysis. The R codes and datasets used in the book will be made available on GitHub for easy access. The book will also be live on the website bookdown.org, a service provided by RStudio, PBC, to host books written using the bookdown package in the R programming language.
Data Analysis with IBM SPSS Statistics
by Kenneth Stehlik-Barry Anthony J. BabinecMaster data management & analysis techniques with IBM SPSS Statistics 24 About This Book • Leverage the power of IBM SPSS Statistics to perform efficient statistical analysis of your data • Choose the right statistical technique to analyze different types of data and build efficient models from your data with ease • Overcome any hurdle that you might come across while learning the different SPSS Statistics concepts with clear instructions, tips and tricks Who This Book Is For This book is designed for analysts and researchers who need to work with data to discover meaningful patterns but do not have the time (or inclination) to become programmers. We assume a foundational understanding of statistics such as one would learn in a basic course or two on statistical techniques and methods. What You Will Learn • Install and set up SPSS to create a working environment for analytics • Techniques for exploring data visually and statistically, assessing data quality and addressing issues related to missing data • How to import different kinds of data and work with it • Organize data for analytical purposes (create new data elements, sampling, weighting, subsetting, and restructure your data) • Discover basic relationships among data elements (bivariate data patterns, differences in means, correlations) • Explore multivariate relationships • Leverage the offerings to draw accurate insights from your research, and benefit your decision-making In Detail SPSS Statistics is a software package used for logical batched and non-batched statistical analysis. Analytical tools such as SPSS can readily provide even a novice user with an overwhelming amount of information and a broad range of options for analyzing patterns in the data. The journey starts with installing and configuring SPSS Statistics for first use and exploring the data to understand its potential (as well as its limitations). Use the right statistical analysis technique such as regression, classification and more, and analyze your data in the best possible manner. Work with graphs and charts to visualize your findings. With this information in hand, the discovery of patterns within the data can be undertaken. Finally, the high level objective of developing predictive models that can be applied to other situations will be addressed. By the end of this book, you will have a firm understanding of the various statistical analysis techniques offered by SPSS Statistics, and be able to master its use for data analysis with ease. Style and approach Provides a practical orientation to understanding a set of data and examining the key relationships among the data elements. Shows useful visualizations to enhance understanding and interpretation. Outlines a roadmap that focuses the process so decision regarding how to proceed can be made easily.
Data Analysis with LLMs
by Immanuel TrummerSpeed up common data science tasks with AI assistants like ChatGPT and Large Language Models (LLMs) from Anthropic, Cohere, Open AI, Google, Hugging Face, and more!Data Analysis with LLMs teaches you to use the new generation of AI assistants and Large Language Models (LLMs) to aid and accelerate common data science tasks. Learn how to use LLMs to: • Analyze text, tables, images, and audio files • Extract information from multi-modal data lakes • Classify, cluster, transform, and query multimodal data • Build natural language query interfaces over structured data sources • Use LangChain to build complex data analysis pipelines • Prompt engineering and model configuration All practical, Data Analysis with LLMs takes you from your first prompts through advanced techniques like creating LLM-based agents for data analysis and fine-tuning existing models. You&’ll learn how to extract data, build natural language query interfaces, and much more. About the technology Large Language Models (LLMs) can streamline and accelerate almost any data science task. Master the techniques in this book, and you&’ll be able to analyze large amounts of text, tabular and graph data, images, videos, and more with clear natural language prompts and a few lines of Python code. About the book Data Analysis with LLMs shows you exactly how to integrate generative AI into your day-to-day work as a data scientist. In it, Cornell professor Immanuel Trummer guides you through a series of engaging projects that introduce OpenAI&’s Python library, tools like LangChain and LlamaIndex, and LLMs from Anthropic, Cohere, and Hugging Face. As you go, you&’ll use AI to query structured and unstructured data, analyze sound and images, and optimize the cost and quality of your data analysis process. What's inside • Classify, cluster, transform, and query multimodal data • Build natural language query interfaces over structured data sources • Create LLM-based agents for autonomous data analysis • Prompt engineering and model configuration About the reader For data scientists and data analysts who know the basics of Python. About the author Immanuel Trummer is an associate professor of computer science at Cornell University and a member of the Cornell Database Group. Table of Contents Part 1 1 Analyzing data with large language models 2 Chatting with ChatGPT Part 2 3 The OpenAI Python library 4 Analyzing text data 5 Analyzing structured data 6 Analyzing images and videos 7 Analyzing audio data Part 3 8 GPT alternatives 9 Optimizing cost and quality 10 Software frameworks
Data Analysis with Machine Learning for Psychologists: Crash Course to Learn Python 3 and Machine Learning in 10 hours
by Chandril GhoshThe power of data drives the digital economy of the 21st century. It has been argued that data is as vital a resource as oil was during the industrial revolution. An upward trend in the number of research publications using machine learning in some of the top journals in combination with an increasing number of academic recruiters within psychology asking for Python knowledge from applicants indicates a growing demand for these skills in the market. While there are plenty of books covering data science, rarely, if ever, books in the market address the need of social science students with no computer science background. They are typically written by engineers or computer scientists for people of their discipline. As a result, often such books are filled with technical jargon and examples irrelevant to psychological studies or projects. In contrast, this book was written by a psychologist in a simple, easy-to-understand way that is brief and accessible. The aim for this book was to make the learning experience on this topic as smooth as possible for psychology students/researchers with no background in programming or data science. Completing this book will also open up an enormous amount of possibilities for quantitative researchers in psychological science, as it will enable them to explore newer types of research questions.
Data Analysis with Open Source Tools: A Hands-On Guide for Programmers and Data Scientists
by Philipp K. JanertCollecting data is relatively easy, but turning raw information into something useful requires that you know how to extract precisely what you need. With this insightful book, intermediate to experienced programmers interested in data analysis will learn techniques for working with data in a business environment. You'll learn how to look at data to discover what it contains, how to capture those ideas in conceptual models, and then feed your understanding back into the organization through business plans, metrics dashboards, and other applications.Along the way, you'll experiment with concepts through hands-on workshops at the end of each chapter. Above all, you'll learn how to think about the results you want to achieve -- rather than rely on tools to think for you.Use graphics to describe data with one, two, or dozens of variablesDevelop conceptual models using back-of-the-envelope calculations, as well asscaling and probability argumentsMine data with computationally intensive methods such as simulation and clusteringMake your conclusions understandable through reports, dashboards, and other metrics programsUnderstand financial calculations, including the time-value of moneyUse dimensionality reduction techniques or predictive analytics to conquer challenging data analysis situationsBecome familiar with different open source programming environments for data analysis"Finally, a concise reference for understanding how to conquer piles of data."--Austin King, Senior Web Developer, Mozilla"An indispensable text for aspiring data scientists."--Michael E. Driscoll, CEO/Founder, Dataspora
Data Analysis with Python and PySpark
by Jonathan RiouxThink big about your data! PySpark brings the powerful Spark big data processing engine to the Python ecosystem, letting you seamlessly scale up your data tasks and create lightning-fast pipelines.In Data Analysis with Python and PySpark you will learn how to: Manage your data as it scales across multiple machines Scale up your data programs with full confidence Read and write data to and from a variety of sources and formats Deal with messy data with PySpark&’s data manipulation functionality Discover new data sets and perform exploratory data analysis Build automated data pipelines that transform, summarize, and get insights from data Troubleshoot common PySpark errors Creating reliable long-running jobs Data Analysis with Python and PySpark is your guide to delivering successful Python-driven data projects. Packed with relevant examples and essential techniques, this practical book teaches you to build pipelines for reporting, machine learning, and other data-centric tasks. Quick exercises in every chapter help you practice what you&’ve learned, and rapidly start implementing PySpark into your data systems. No previous knowledge of Spark is required. About the technology The Spark data processing engine is an amazing analytics factory: raw data comes in, insight comes out. PySpark wraps Spark&’s core engine with a Python-based API. It helps simplify Spark&’s steep learning curve and makes this powerful tool available to anyone working in the Python data ecosystem. About the book Data Analysis with Python and PySpark helps you solve the daily challenges of data science with PySpark. You&’ll learn how to scale your processing capabilities across multiple machines while ingesting data from any source—whether that&’s Hadoop clusters, cloud data storage, or local data files. Once you&’ve covered the fundamentals, you&’ll explore the full versatility of PySpark by building machine learning pipelines, and blending Python, pandas, and PySpark code. What's inside Organizing your PySpark code Managing your data, no matter the size Scale up your data programs with full confidence Troubleshooting common data pipeline problems Creating reliable long-running jobs About the reader Written for data scientists and data engineers comfortable with Python. About the author As a ML director for a data-driven software company, Jonathan Rioux uses PySpark daily. He teaches the software to data scientists, engineers, and data-savvy business analysts. Table of Contents 1 Introduction PART 1 GET ACQUAINTED: FIRST STEPS IN PYSPARK 2 Your first data program in PySpark 3 Submitting and scaling your first PySpark program 4 Analyzing tabular data with pyspark.sql 5 Data frame gymnastics: Joining and grouping PART 2 GET PROFICIENT: TRANSLATE YOUR IDEAS INTO CODE 6 Multidimensional data frames: Using PySpark with JSON data 7 Bilingual PySpark: Blending Python and SQL code 8 Extending PySpark with Python: RDD and UDFs 9 Big data is just a lot of small data: Using pandas UDFs 10 Your data under a different lens: Window functions 11 Faster PySpark: Understanding Spark&’s query planning PART 3 GET CONFIDENT: USING MACHINE LEARNING WITH PYSPARK 12 Setting the stage: Preparing features for machine learning 13 Robust machine learning with ML Pipelines 14 Building custom ML transformers and estimators
Data Analysis with Python: A Modern Approach
by David TaiebLearn a modern approach to data analysis using Python to harness the power of programming and AI across your data. Detailed case studies bring this modern approach to life across visual data, social media, graph algorithms, and time series analysis. Key Features Bridge your data analysis with the power of programming, complex algorithms, and AI Use Python and its extensive libraries to power your way to new levels of data insight Work with AI algorithms, TensorFlow, graph algorithms, NLP, and financial time series Explore this modern approach across with key industry case studies and hands-on projects Book Description Data Analysis with Python offers a modern approach to data analysis so that you can work with the latest and most powerful Python tools, AI techniques, and open source libraries. Industry expert David Taieb shows you how to bridge data science with the power of programming and algorithms in Python. You'll be working with complex algorithms, and cutting-edge AI in your data analysis. Learn how to analyze data with hands-on examples using Python-based tools and Jupyter Notebook. You'll find the right balance of theory and practice, with extensive code files that you can integrate right into your own data projects. Explore the power of this approach to data analysis by then working with it across key industry case studies. Four fascinating and full projects connect you to the most critical data analysis challenges you're likely to meet in today. The first of these is an image recognition application with TensorFlow – embracing the importance today of AI in your data analysis. The second industry project analyses social media trends, exploring big data issues and AI approaches to natural language processing. The third case study is a financial portfolio analysis application that engages you with time series analysis - pivotal to many data science applications today. The fourth industry use case dives you into graph algorithms and the power of programming in modern data science. You'll wrap up with a thoughtful look at the future of data science and how it will harness the power of algorithms and artificial intelligence. What you will learn A new toolset that has been carefully crafted to meet for your data analysis challenges Full and detailed case studies of the toolset across several of today's key industry contexts Become super productive with a new toolset across Python and Jupyter Notebook Look into the future of data science and which directions to develop your skills next Who this book is for This book is for developers wanting to bridge the gap between them and data scientists. Introducing PixieDust from its creator, the book is a great desk companion for the accomplished Data Scientist. Some fluency in data interpretation and visualization is assumed. It will be helpful to have some knowledge of Python, using Python libraries, and some proficiency in web development.
Data Analysis with R, Second Edition: A Comprehensive Guide To Manipulating, Analyzing, And Visualizing Data In R, 2nd Edition
by Tony FischettiR has spread deep into the private sector and can be found in the production pipelines at some of the most advanced and successful enterprises. Starting with the basics of R and statistical reasoning, this book dives into advanced predictive analytics, showing how to apply those techniques to real-world data though with real-world examples.
Data Analysis with R, Second Edition: A comprehensive guide to manipulating, analyzing, and visualizing data in R, 2nd Edition
by Anthony FischettiLearn, by example, the fundamentals of data analysis as well as several intermediate to advanced methods and techniques ranging from classification and regression to Bayesian methods and MCMC, which can be put to immediate use. Key FeaturesAnalyze your data using R – the most powerful statistical programming languageLearn how to implement applied statistics using practical use-casesUse popular R packages to work with unstructured and structured dataBook DescriptionFrequently the tool of choice for academics, R has spread deep into the private sector and can be found in the production pipelines at some of the most advanced and successful enterprises. The power and domain-specificity of R allows the user to express complex analytics easily, quickly, and succinctly. Starting with the basics of R and statistical reasoning, this book dives into advanced predictive analytics, showing how to apply those techniques to real-world data though with real-world examples.Packed with engaging problems and exercises, this book begins with a review of R and its syntax with packages like Rcpp, ggplot2, and dplyr. From there, get to grips with the fundamentals of applied statistics and build on this knowledge to perform sophisticated and powerful analytics. Solve the difficulties relating to performing data analysis in practice and find solutions to working with messy data, large data, communicating results, and facilitating reproducibility.This book is engineered to be an invaluable resource through many stages of anyone’s career as a data analyst.What you will learn Gain a thorough understanding of statistical reasoning and sampling theory Employ hypothesis testing to draw inferences from your data Learn Bayesian methods for estimating parameters Train regression, classification, and time series models Handle missing data gracefully using multiple imputation Identify and manage problematic data points Learn how to scale your analyses to larger data with Rcpp, data.table, dplyr, and parallelization Put best practices into effect to make your job easier and facilitate reproducibilityWho this book is forBudding data scientists and data analysts who are new to the concept of data analysis, or who want to build efficient analytical models in R will find this book to be useful. No prior exposure to data analysis is needed, although a fundamental understanding of the R programming language is required to get the best out of this book.
Data Analysis with RStudio: An Easygoing Introduction
by Franz Kronthaler Silke ZöllnerThe objective of this text is to introduce RStudio to practitioners and students and enable them to use R in their everyday work. It is not a statistical textbook, the purpose is to transmit the joy of analyzing data with RStudio. Practitioners and students learn how RStudio can be installed and used, they learn to import data, write scripts and save working results. Furthermore, they learn to employ descriptive statistics and create graphics with RStudio. Additionally, it is shown how RStudio can be used to test hypotheses, run an analysis of variance and regressions. To deepen the learned content, tasks are included with the solutions provided at the end of the textbook. This textbook has been recommended and developed for university courses in Germany, Austria and Switzerland.
Data Analysis with STATA
by Prasad KothariThis book is for all the professionals and students who want to learn STATA programming and apply predictive modelling concepts. This book is also very helpful for experienced STATA programmers as it provides advanced statistical modelling concepts and their application.