- Table View
- List View
Data Science mit Python für Dummies (Für Dummies)
by John Paul Mueller Luca MassaronSie wollen sich ernsthaft mit wissenschaftlicher Datenanalyse beschäftigen und wissen, dass Sie da an Python nur schwer vorbeikommen? Dann ist dieses das richtige Buch für Sie. John Paul Mueller erklärt Ihnen, was Sie in Python beherrschen müssen, um sich der Datenanalyse zu widmen inklusive Objekten, Funktionen, Modulen und Bibliotheken. Außerdem erläutert er die wichtigsten Bibliotheken für die Datenanalyse wie NumPy, SciPy, BeautifulSoup, Pandas, und MatPlobLib. So lernen Sie Python für die Datenanalyse richtig einsetzen.
Data Science und Statistik mit R: Anwendungslösungen für die Praxis
by Bernd HeesenData Science trägt wesentlich zu einer schnelleren Nutzbarmachung von Markt-, Kunden- und Nutzerdaten bei, inklusive der Analyse von Daten aus Sozialen Netzwerken. Wo früher klassische Statistik für Berechnungen und Vorhersagen herangezogen wurde, da erlauben heute Open-Source-Werkzeuge wie R Daten in unterschiedlichsten Formaten und aus beliebig vielen Quellen für die Analyse einzulesen, aufzubereiten und mit Hilfe von Methoden der Künstlichen Intelligenz und des Machine Learning zu analysieren. Die Ergebnisse können dann anschließend perfekt visuell dargestellt werden, so dass die Entscheider schnell und effektiv davon profitieren können. Daraus lässt sich ableiten, welche Maßnahmen mit einer vorhersagbaren Wahrscheinlichkeit zur Erreichung der eigenen Ziele geeignet sind, z.B. welcher Preis für ein Angebot die gewünschte Nachfrage erzeugt oder welche Marketingmaßnahme eine gewünschte Zielgruppe erreicht.Dieses Buch vermittelt auf Basis von R, wie Sie Statistik, Data Science, Künstliche Intelligenz und Machine Learning in der Industrie 4.0 nutzen können. Die Anwendungsbeispiele können von Lesern selbst durchgeführt werden, da das Buch die R-Anweisungen beinhaltet. Damit ist das Buch ideal für Studierende und andere Interessierte, die sich Kenntnisse in der Statistiklösung R aneignen wollen.
Data Science with Julia
by Peter Tait Paul McNicholas"This book is a great way to both start learning data science through the promising Julia language and to become an efficient data scientist."- Professor Charles Bouveyron, INRIA Chair in Data Science, Université Côte d’Azur, Nice, France Julia, an open-source programming language, was created to be as easy to use as languages such as R and Python while also as fast as C and Fortran. An accessible, intuitive, and highly efficient base language with speed that exceeds R and Python, makes Julia a formidable language for data science. Using well known data science methods that will motivate the reader, Data Science with Julia will get readers up to speed on key features of the Julia language and illustrate its facilities for data science and machine learning work. Features: Covers the core components of Julia as well as packages relevant to the input, manipulation and representation of data. Discusses several important topics in data science including supervised and unsupervised learning. Reviews data visualization using the Gadfly package, which was designed to emulate the very popular ggplot2 package in R. Readers will learn how to make many common plots and how to visualize model results. Presents how to optimize Julia code for performance. Will be an ideal source for people who already know R and want to learn how to use Julia (though no previous knowledge of R or any other programming language is required). The advantages of Julia for data science cannot be understated. Besides speed and ease of use, there are already over 1,900 packages available and Julia can interface (either directly or through packages) with libraries written in R, Python, Matlab, C, C++ or Fortran. The book is for senior undergraduates, beginning graduate students, or practicing data scientists who want to learn how to use Julia for data science. "This book is a great way to both start learning data science through the promising Julia language and to become an efficient data scientist." Professor Charles BouveyronINRIA Chair in Data ScienceUniversité Côte d’Azur, Nice, France
Data Science with R for Psychologists and Healthcare Professionals
by Christian RyanThis introduction to R for students of psychology and health sciences aims to fast-track the reader through some of the most difficult aspects of learning to do data analysis and statistics. It demonstrates the benefits for reproducibility and reliability of using a programming language over commercial software packages such as SPSS. The early chapters build at a gentle pace, to give the reader confidence in moving from a point-and-click software environment, to the more robust and reliable world of statistical coding. This is a thoroughly modern and up-to-date approach using RStudio and the tidyverse. A range of R packages relevant to psychological research are discussed in detail. A great deal of research in the health sciences concerns questionnaire data, which may require recoding, aggregation and transformation before quantitative techniques and statistical analysis can be applied. R offers many useful and transparent functions to process data and check psychometric properties. These are illustrated in detail, along with a wide range of tools R affords for data visualisation. Many introductory statistics books for the health sciences rely on toy examples - in contrast, this book benefits from utilising open datasets from published psychological studies, to both motivate and demonstrate the transition from data manipulation and analysis to published report. R Markdown is becoming the preferred method for communicating in the open science community. This book also covers the detail of how to integrate the use of R Markdown documents into the research workflow and how to use these in preparing manuscripts for publication, adhering to the latest APA style guidelines.
Data Science – was ist das eigentlich?!: Algorithmen des maschinellen Lernens verständlich erklärt
by Annalyn Ng Kenneth SooSie möchten endlich wissen, was sich hinter Schlagworten wie „Data Science“ und „Machine Learning“ eigentlich verbirgt – und was man alles damit anstellen kann? Auf allzu viel Mathematik würden Sie dabei aber gern verzichten? Dann sind Sie hier genau richtig: Dieses Buch bietet einen kompakten Einblick in die wichtigsten Schlüsselkonzepte der Datenwissenschaft und ihrer Algorithmen – und zwar ohne Sie mit mathematischen Formeln und Details zu belasten! Der Fokus liegt – nach einer übergeordneten Einführung – auf Anwendungen des maschinellen Lernens zur Mustererkennung und Vorhersage von Ergebnissen: In jedem Kapitel wird ein Algorithmus erläutert und mit einem leicht verständlichen, realen Anwendungsbeispiel verknüpft. Die Kombination aus intuitiven Erklärungen und zahlreichen Abbildungen ermöglicht dabei ein grundlegendes Verständnis, das ohne mathematische Formelsprache auskommt. Abschließend werden auch die Grenzen und Nachteile der betrachteten Algorithmen explizit aufgezeigt.
Data Science, AI, and Machine Learning in Drug Development (Chapman & Hall/CRC Biostatistics Series)
by Harry YangThe confluence of big data, artificial intelligence (AI), and machine learning (ML) has led to a paradigm shift in how innovative medicines are developed and healthcare delivered. To fully capitalize on these technological advances, it is essential to systematically harness data from diverse sources and leverage digital technologies and advanced analytics to enable data-driven decisions. Data science stands at a unique moment of opportunity to lead such a transformative change. Intended to be a single source of information, Data Science, AI, and Machine Learning in Drug Research and Development covers a wide range of topics on the changing landscape of drug R & D, emerging applications of big data, AI and ML in drug development, and the build of robust data science organizations to drive biopharmaceutical digital transformations. Features Provides a comprehensive review of challenges and opportunities as related to the applications of big data, AI, and ML in the entire spectrum of drug R & D Discusses regulatory developments in leveraging big data and advanced analytics in drug review and approval Offers a balanced approach to data science organization build Presents real-world examples of AI-powered solutions to a host of issues in the lifecycle of drug development Affords sufficient context for each problem and provides a detailed description of solutions suitable for practitioners with limited data science expertise
Data Science, Learning by Latent Structures, and Knowledge Discovery (Studies in Classification, Data Analysis, and Knowledge Organization)
by Berthold Lausen Sabine Krolak-Schwerdt Matthias BöhmerThis volume comprises papers dedicated to data science and the extraction of knowledge from many types of data: structural, quantitative, or statistical approaches for the analysis of data; advances in classification, clustering and pattern recognition methods; strategies for modeling complex data and mining large data sets; applications of advanced methods in specific domains of practice. The contributions offer interesting applications to various disciplines such as psychology, biology, medical and health sciences; economics, marketing, banking and finance; engineering; geography and geology; archeology, sociology, educational sciences, linguistics and musicology; library science. The book contains the selected and peer-reviewed papers presented during the European Conference on Data Analysis (ECDA 2013) which was jointly held by the German Classification Society (GfKl) and the French-speaking Classification Society (SFC) in July 2013 at the University of Luxembourg.
Data Science: 10th International Conference of Pioneering Computer Scientists, Engineers and Educators, ICPCSEE 2024, Macao, China, September 27–30, 2024, Proceedings, Part I (Communications in Computer and Information Science #2213)
by Qilong Han Xianhua Song Zeguang Lu Chen Yu Jianping Wang Chengzhong Xu Haiwei PanThis three-volume set CCIS 2213-2215 constitutes the refereed proceedings of the 10th International Conference of Pioneering Computer Scientists, Engineers and Educators, ICPCSEE 2024, held in Macau, China, during September 27–30, 2024. The 74 full papers and 3 short papers presented in these three volumes were carefully reviewed and selected from 249 submissions. The papers are organized in the following topical sections: Part I: Novel methods or tools used in big data and its applications; applications of data science. Part II: Education research, methods and materials for data science and engine; data security and privacy; big data mining and knowledge management. Part III: Infrastructure for data science; social media and recommendation system; multimedia data management and analysis.
Data Science: 10th International Conference of Pioneering Computer Scientists, Engineers and Educators, ICPCSEE 2024, Macao, China, September 27–30, 2024, Proceedings, Part II (Communications in Computer and Information Science #2214)
by Qilong Han Xianhua Song Zeguang Lu Chen Yu Jianping Wang Chengzhong Xu Haiwei PanThis three-volume set CCIS 2213-2215 constitutes the refereed proceedings of the 10th International Conference of Pioneering Computer Scientists, Engineers and Educators, ICPCSEE 2024, held in Macau, China, during September 27–30, 2024. The 74 full papers and 3 short papers presented in these three volumes were carefully reviewed and selected from 249 submissions. The papers are organized in the following topical sections: Part I: Novel methods or tools used in big data and its applications; applications of data science. Part II: Education research, methods and materials for data science and engine; data security and privacy; big data mining and knowledge management. Part III: Infrastructure for data science; social media and recommendation system; multimedia data management and analysis.
Data Science: 10th International Conference of Pioneering Computer Scientists, Engineers and Educators, ICPCSEE 2024, Macao, China, September 27–30, 2024, Proceedings, Part III (Communications in Computer and Information Science #2215)
by Qilong Han Xianhua Song Zeguang Lu Chen Yu Jianping Wang Chengzhong Xu Haiwei PanThis three-volume set CCIS 2213-2215 constitutes the refereed proceedings of the 10th International Conference of Pioneering Computer Scientists, Engineers and Educators, ICPCSEE 2024, held in Macau, China, during September 27–30, 2024. The 74 full papers and 3 short papers presented in these three volumes were carefully reviewed and selected from 249 submissions. The papers are organized in the following topical sections: Part I: Novel methods or tools used in big data and its applications; applications of data science. Part II: Education research, methods and materials for data science and engine; data security and privacy; big data mining and knowledge management. Part III: Infrastructure for data science; social media and recommendation system; multimedia data management and analysis.
Data Science: 29th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2025, Sydney, NSW, Australia, June 10-13, 2025, Proceedings, Part VI (Lecture Notes in Computer Science #15875)
by Longbing Cao Myra Spiliopoulou Vipin Kumar Can Wang Joao Gama Xintao Wu Xiangmin Zhou Guansong PangThe two-volume set LNAI 15875 + 15876 constitutes the proceedings of the 29th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2025 Special Session, held in Sydney, NSW, Australia, during June 10–13, 2025. The 68 full papers included in this set were carefully reviewed and selected from 696 submissions. They were organized in topical sections as follows: survey track; machine learning; trustworthiness; learning on complex data; graph mining; machine learning applications; representation learning; scientific/business data analysis; and special track on large language models.
Data Science: 29th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2025, Sydney, NSW, Australia, June 10-13, 2025, Proceedings, Part VII (Lecture Notes in Computer Science #15876)
by Longbing Cao Myra Spiliopoulou Vipin Kumar Can Wang Joao Gama Xintao Wu Xiangmin Zhou Guansong PangThe two-volume set LNAI 15875 + 15876 constitutes the proceedings of the 29th Pacific-Asia Conference on Knowledge Discovery and Data Mining, PAKDD 2025 Special Session, held in Sydney, NSW, Australia, during June 10–13, 2025. The 68 full papers included in this set were carefully reviewed and selected from 696 submissions. They were organized in topical sections as follows: survey track; machine learning; trustworthiness; learning on complex data; graph mining; machine learning applications; representation learning; scientific/business data analysis; and special track on large language models.
Data Science: A First Introduction (Chapman & Hall/CRC Data Science Series)
by Tiffany Timbers Trevor Campbell Melissa LeeData Science: A First Introduction focuses on using the R programming language in Jupyter notebooks to perform data manipulation and cleaning, create effective visualizations, and extract insights from data using classification, regression, clustering, and inference. The text emphasizes workflows that are clear, reproducible, and shareable, and includes coverage of the basics of version control. All source code is available online, demonstrating the use of good reproducible project workflows. Based on educational research and active learning principles, the book uses a modern approach to R and includes accompanying autograded Jupyter worksheets for interactive, self-directed learning. The book will leave readers well-prepared for data science projects. The book is designed for learners from all disciplines with minimal prior knowledge of mathematics and programming. The authors have honed the material through years of experience teaching thousands of undergraduates in the University of British Columbia’s DSCI100: Introduction to Data Science course.
Data Science: A First Introduction with Python (Chapman & Hall/CRC Data Science Series)
by Joel Ostblom Tiffany Timbers Trevor Campbell Melissa Lee Lindsey HeagyData Science: A First Introduction with Python focuses on using the Python programming language in Jupyter notebooks to perform data manipulation and cleaning, create effective visualizations, and extract insights from data using classification, regression, clustering, and inference. It emphasizes workflows that are clear, reproducible, and shareable, and includes coverage of the basics of version control. Based on educational research and active learning principles, the book uses a modern approach to Python and includes accompanying autograded Jupyter worksheets for interactive, self-directed learning. The text will leave readers well-prepared for data science projects. It is designed for learners from all disciplines with minimal prior knowledge of mathematics and programming. The authors have honed the material through years of experience teaching thousands of undergraduates at the University of British Columbia.Key Features: Includes autograded worksheets for interactive, self-directed learning. Introduces readers to modern data analysis and workflow tools such as Jupyter notebooks and GitHub, and covers cutting-edge data analysis and manipulation Python libraries such as pandas, scikit-learn, and altair. Is designed for a broad audience of learners from all backgrounds and disciplines.
Data Science: Best Practices mit Python
by Benjamin M. Abdel-KarimDieses Buch entstand aus der Motivation heraus, eines der ersten deutschsprachigen Nachschlagewerke zu entwickeln, in welchem relativ simple Quellcode-Beispiele enthalten sind, um so Lösungsansätze für die (wiederkehrenden) Programmierprobleme in der Datenanalyse weiterzugeben. Dabei ist dieses Werk nicht uneigennützig verfasst worden. Es enthält Lösungswege für immer wiederkehrende Problemstellungen die ich über meinen täglichen Umgang entwickelt habe Zweifellos gehört das Nachschlagen von Lösungsansätzen in Büchern oder im Internet zur normalen Arbeit eines Programmierers. Allerdings ist diese Suche in der Regel ein unstrukturierter und damit, zumindest teilweise, ein zeitaufwendiger Prozess.Unabhängig davon, ob Sie das Buch als Student, Mitarbeiter oder Gründer lesen, hoffe ich, dass Ihnen dieses Nachschlagewerk ein wertvoller Helfer für die ersten Anfänge sein wird. Ich gehe davon aus, dass jede Person die Grundlagen der Datenanalyse mit Hilfe moderner Programmiersprachen erlernen kann.
Data Science: Innovative Developments in Data Analysis and Clustering (Studies in Classification, Data Analysis, and Knowledge Organization)
by Francesco Palumbo Angela Montanari Maurizio VichiInternational Federation of Classification Societies The International Federation of Classification Societies (lFCS) is an agency for the dissemination of technical and scientific information concerning classification and multivariate data analysis in the broad sense and in as wide a range of applications as possible; founded in 1985 in Cambridge (UK) by the following Scientific Societies and Groups: - British Classification Society - BCS - Classification Society of North America - CSNA - Gesellschaft fUr Klassification - GfKI - Japanese Classification Society - JCS - Classification Group ofItalian Statistical Society - CGSIS - Societe Francophone de Classification - SFC Now the IFCS includes also the following Societies: - Dutch-Belgian Classification Society - VOC - Polish Classification Section - SKAD - Portuguese Classification Association - CLAD - Group at Large - Korean Classification Society - KCS IFCS-98, the Sixth Conference of the International Federation of Classification Societies, was held in Rome, from July 21 to 24, 1998. Five preceding conferences were held in Aachen (Germany), Charlottesville (USA), Edinburgh (UK), Paris (France), Kobe (Japan).
Data Science: Konzepte, Erfahrungen, Fallstudien und Praxis
by Andreas Gadatsch Andreas Schmidt Christoph Quix Uwe Schmitz Detlev Frick Jens Kaufmann Birgit LankesData Science ist in vielen Organisationen angekommen und oft alltägliche Praxis. Dennoch stehen viele Verantwortliche vor der Herausforderung, sich erstmalig mit konkreten Fragestellungen zu beschäftigen oder laufende Projekte weiterzuentwickeln. Die Spannbreite der Methoden, Werkzeuge und Anwendungsmöglichkeiten ist sehr groß und entwickelt sich kontinuierlich weiter. Die Vielzahl an Publikationen zu Data Science ist spezialisiert und behandelt fokussiert Einzelaspekte. Das vorliegende Werk gibt den Leserinnen und Lesern eine umfassende Orientierung zum Status Quo aus der wissenschaftlichen Perspektive und zahlreiche vertiefende Darstellungen praxisrelevanter Aspekte. Die Inhalte bauen auf den wissenschaftlichen CAS-Zertifikatskursen zu Big Data und Data Science der Hochschule Niederrhein in Kooperation mit der Hochschule Bonn-Rhein-Sieg und der FH Dortmund auf. Sie berücksichtigen wissenschaftliche Grundlagen und Vertiefungen, aber auch konkrete Erfahrungen aus Data Science Projekten. Das Buch greift praxisrelevante Fragen auf wissenschaftlichem Niveau aus Sicht der Rollen eines „Data Strategist“, „Data Architect“ und „Data Analyst“ auf und bindet erprobte Praxiserfahrungen u. a. von Seminarteilnehmern mit ein. Das Buch gibt für Interessierte einen Einblick in die aktuell relevante Vielfalt der Aspekte zu Data Science bzw. Big Data und liefert Hinweise für die praxisnahe Umsetzung.
Data Science: Techniques for Excelling at Data Science
by Daniel VaughanThis practical guide provides a collection of techniques and best practices that are generally overlooked in most data engineering and data science pedagogy. A common misconception is that great data scientists are experts in the "big themes" of the discipline—machine learning and programming. But most of the time, these tools can only take us so far. In practice, the smaller tools and skills really separate a great data scientist from a not-so-great one.Taken as a whole, the lessons in this book make the difference between an average data scientist candidate and a qualified data scientist working in the field. Author Daniel Vaughan has collected, extended, and used these skills to create value and train data scientists from different companies and industries.With this book, you will:Understand how data science creates valueDeliver compelling narratives to sell your data science projectBuild a business case using unit economics principlesCreate new features for a ML model using storytellingLearn how to decompose KPIsPerform growth decompositions to find root causes for changes in a metricDaniel Vaughan is head of data at Clip, the leading paytech company in Mexico. He's the author of Analytical Skills for AI and Data Science (O'Reilly).
Data Science: The Executive Summary - A Technical Book for Non-Technical Professionals
by Field CadyTap into the power of data science with this comprehensive resource for non-technical professionals Data Science: The Executive Summary – A Technical Book for Non-Technical Professionals is a comprehensive resource for people in non-engineer roles who want to fully understand data science and analytics concepts. Accomplished data scientist and author Field Cady describes both the “business side” of data science, including what problems it solves and how it fits into an organization, and the technical side, including analytical techniques and key technologies. Data Science: The Executive Summary covers topics like: Assessing whether your organization needs data scientists, and what to look for when hiring them When Big Data is the best approach to use for a project, and when it actually ties analysts’ hands Cutting edge Artificial Intelligence, as well as classical approaches that work better for many problems How many techniques rely on dubious mathematical idealizations, and when you can work around them Perfect for executives who make critical decisions based on data science and analytics, as well as mangers who hire and assess the work of data scientists, Data Science: The Executive Summary also belongs on the bookshelves of salespeople and marketers who need to explain what a data analytics product does. Finally, data scientists themselves will improve their technical work with insights into the goals and constraints of the business situation.
Data Science: Time Complexity, Inferential Uncertainty, And Spacekime Analytics (De Gruyter Stem Series)
by Ivo D. Dinov Milen Velchev VelevThe amount of new information is constantly increasing, faster than our ability to fully interpret and utilize it to improve human experiences. Addressing this asymmetry requires novel and revolutionary scientific methods and effective human and artificial intelligence interfaces. By lifting the concept of time from a positive real number to a 2D complex time (kime), this book uncovers a connection between artificial intelligence (AI), data science, and quantum mechanics. It proposes a new mathematical foundation for data science based on raising the 4D spacetime to a higher dimension where longitudinal data (e.g., time-series) are represented as manifolds (e.g., kime-surfaces). This new framework enables the development of innovative data science analytical methods for model-based and model-free scientific inference, derived computed phenotyping, and statistical forecasting. The book provides a transdisciplinary bridge and a pragmatic mechanism to translate quantum mechanical principles, such as particles and wavefunctions, into data science concepts, such as datum and inference-functions. It includes many open mathematical problems that still need to be solved, technological challenges that need to be tackled, and computational statistics algorithms that have to be fully developed and validated. Spacekime analytics provide mechanisms to effectively handle, process, and interpret large, heterogeneous, and continuously-tracked digital information from multiple sources. The authors propose computational methods, probability model-based techniques, and analytical strategies to estimate, approximate, or simulate the complex time phases (kime directions). This allows transforming time-varying data, such as time-series observations, into higher-dimensional manifolds representing complex-valued and kime-indexed surfaces (kime-surfaces). The book includes many illustrations of model-based and model-free spacekime analytic techniques applied to economic forecasting, identification of functional brain activation, and high-dimensional cohort phenotyping. Specific case-study examples include unsupervised clustering using the Michigan Consumer Sentiment Index (MCSI), model-based inference using functional magnetic resonance imaging (fMRI) data, and model-free inference using the UK Biobank data archive. The material includes mathematical, inferential, computational, and philosophical topics such as Heisenberg uncertainty principle and alternative approaches to large sample theory, where a few spacetime observations can be amplified by a series of derived, estimated, or simulated kime-phases. The authors extend Newton-Leibniz calculus of integration and differentiation to the spacekime manifold and discuss possible solutions to some of the "problems of time". The coverage also includes 5D spacekime formulations of classical 4D spacetime mathematical equations describing natural laws of physics, as well as, statistical articulation of spacekime analytics in a Bayesian inference framework. The steady increase of the volume and complexity of observed and recorded digital information drives the urgent need to develop novel data analytical strategies. Spacekime analytics represents one new data-analytic approach, which provides a mechanism to understand compound phenomena that are observed as multiplex longitudinal processes and computationally tracked by proxy measures. This book may be of interest to academic scholars, graduate students, postdoctoral fellows, artificial intelligence and machine learning engineers, biostatisticians, econometricians, and data analysts. Some of the material may also resonate with philosophers, futurists, astrophysicists, space industry technicians, biomedical researchers, health practitioners, and the general public.
Data Scientists at Work
by Sebastian GutierrezData Scientists at Work is a collection of interviews with sixteen of the world's most influential and innovative data scientists from across the spectrum of this hot new profession. 'Data scientist is the sexiest job in the 21st century, ' according to the Harvard Business Review. By 2018, the United States will experience a shortage of 190,000 skilled data scientists, according to a McKinsey report. Through incisive in-depth interviews, this book mines the what, how, and why of the practice of data science from the stories, ideas, shop talk, and forecasts of its preeminent practitioners across diverse industries: social network (Yann LeCun, Facebook); professional network (Daniel Tunkelang, LinkedIn); venture capital (Roger Ehrenberg, IA Ventures); enterprise cloud computing and neuroscience (Eric Jonas, formerly Salesforce. com); newspaper and media (Chris Wiggins, The New York Times); streaming television (Caitlin Smallwood, Netflix); music forecast (Victor Hu, Next Big Sound); strategic intelligence (Amy Heineike, Quid); environmental big data (Andre? Karpis'ts'enko, Planet OS); geospatial marketing intelligence (Jonathan Lenaghan, PlaceIQ); advertising (Claudia Perlich, Dstillery); fashion e-commerce (Anna Smith, Rent the Runway); specialty retail (Erin Shellman, Nordstrom); email marketing (John Foreman, MailChimp); predictive sales intelligence (Kira Radinsky, SalesPredict); and humanitarian nonprofit (Jake Porway, DataKind). The book features a stimulating foreword by Google's Director of Research, Peter Norvig. Each of these data scientists shares how he or she tailors the torrent-taming techniques of big data, data visualization, search, and statistics to specific jobs by dint of ingenuity, imagination, patience, and passion. Data Scientists at Work parts the curtain on the interviewees' earliest data projects, how they became data scientists, their discoveries and surprises in working with data, their thoughts on the past, present, and future of the profession, their experiences of team collaboration within their organizations, and the insights they have gained as they get their hands dirty refining mountains of raw data into objects of commercial, scientific, and educational value for their organizations and clients.
Data Spaces: Design, Deployment and Future Directions
by Edward Curry Simon Scerri Tuomo TuikkaThis open access book aims to educate data space designers to understand what is required to create a successful data space. It explores cutting-edge theory, technologies, methodologies, and best practices for data spaces for both industrial and personal data and provides the reader with a basis for understanding the design, deployment, and future directions of data spaces.The book captures the early lessons and experience in creating data spaces. It arranges these contributions into three parts covering design, deployment, and future directions respectively. The first part explores the design space of data spaces. The single chapters detail the organisational design for data spaces, data platforms, data governance federated learning, personal data sharing, data marketplaces, and hybrid artificial intelligence for data spaces.The second part describes the use of data spaces within real-world deployments. Its chapters are co-authored with industry experts and include case studies of data spaces in sectors including industry 4.0, food safety, FinTech, health care, and energy.The third and final part details future directions for data spaces, including challenges and opportunities for common European data spaces and privacy-preserving techniques for trustworthy data sharing.The book is of interest to two primary audiences: first, researchers interested in data management and data sharing, and second, practitioners and industry experts engaged in data-driven systems where the sharing and exchange of data within an ecosystem are critical.
Data Stewardship for Open Science: Implementing FAIR Principles
by Barend MonsData Stewardship for Open Science: Implementing FAIR Principles has been written with the intention of making scientists, funders, and innovators in all disciplines and stages of their professional activities broadly aware of the need, complexity, and challenges associated with open science, modern science communication, and data stewardship. The FAIR principles are used as a guide throughout the text, and this book should leave experimentalists consciously incompetent about data stewardship and motivated to respect data stewards as representatives of a new profession, while possibly motivating others to consider a career in the field. The ebook, avalable for no additional cost when you buy the paperback, will be updated every 6 months on average (providing that significant updates are needed or avaialble). Readers will have the opportunity to contribute material towards these updates, and to develop their own data management plans, via the free Data Stewardship Wizard.
Data Storage for Social Networks: A Socially Aware Approach (SpringerBriefs in Optimization)
by Duc A. TranEvidenced by the success of Facebook, Twitter, and LinkedIn, online social networks (OSNs) have become ubiquitous, offering novel ways for people to access information and communicate with each other. As the increasing popularity of social networking is undeniable, scalability is an important issue for any OSN that wants to serve a large number of users. Storing user data for the entire network on a single server can quickly lead to a bottleneck, and, consequently, more servers are needed to expand storage capacity and lower data request traffic per server. Adding more servers is just one step to address scalability. The next step is to determine how best to store the data across multiple servers. This problem has been widely-studied in the literature of distributed and database systems. OSNs, however, represent a different class of data systems. When a user spends time on a social network, the data mostly requested is her own and that of her friends; e.g., in Facebook or Twitter, these data are the status updates posted by herself as well as that posted by the friends. This so-called social locality should be taken into account when determining the server locations to store these data, so that when a user issues a read request, all its relevant data can be returned quickly and efficiently. Social locality is not a design factor in traditional storage systems where data requests are always processed independently. Even for today's OSNs, social locality is not yet considered in their data partition schemes. These schemes rely on distributed hash tables (DHT), using consistent hashing to assign the users' data to the servers. The random nature of DHT leads to weak social locality which has been shown to result in poor performance under heavy request loads. Data Storage for Social Networks: A Socially Aware Approach is aimed at reviewing the current literature of data storage for online social networks and discussing new methods that take into account social awareness in designing efficient data storage.
Data Structure Practice: for Collegiate Programming Contests and Education
by Yonghui Wu Jiande WangCombining knowledge with strategies, Data Structure Practice for Collegiate Programming Contests and Education presents the first comprehensive book on data structure in programming contests. This book is designed for training collegiate programming contest teams in the nuances of data structure and for helping college students in computer-related