- Table View
- List View
Data Science and Analytics: 4th International Conference On Recent Developments In Science, Engineering And Technology, Redset 2017, Gurgaon, India, October 13-14, 2017, Revised Selected Papers (Communications In Computer And Information Science #799)
by Brajendra Panda Sudeep Sharma Nihar Ranjan RoyThis book constitutes the refereed proceedings of the 4th International Conference on Recent Developments in Science, Engineering and Technology, REDSET 2017, held in Gurgaon, India, in October 2017. The 66 revised full papers presented were carefully reviewed and selected from 329 submissions. The papers are organized in topical sections on big data analysis, data centric programming, next generation computing, social and web analytics, security in data science analytics.
Data Science and Predictive Analytics: Biomedical And Health Applications Using R
by Ivo D. DinovOver the past decade, Big Data have become ubiquitous in all economic sectors, scientific disciplines, and human activities. They have led to striking technological advances, affecting all human experiences. Our ability to manage, understand, interrogate, and interpret such extremely large, multisource, heterogeneous, incomplete, multiscale, and incongruent data has not kept pace with the rapid increase of the volume, complexity and proliferation of the deluge of digital information. There are three reasons for this shortfall. First, the volume of data is increasing much faster than the corresponding rise of our computational processing power (Kryder’s law > Moore’s law). Second, traditional discipline-bounds inhibit expeditious progress. Third, our education and training activities have fallen behind the accelerated trend of scientific, information, and communication advances. There are very few rigorous instructional resources, interactive learning materials, and dynamic training environments that support active data science learning. The textbook balances the mathematical foundations with dexterous demonstrations and examples of data, tools, modules and workflows that serve as pillars for the urgently needed bridge to close that supply and demand predictive analytic skills gap. Exposing the enormous opportunities presented by the tsunami of Big data, this textbook aims to identify specific knowledge gaps, educational barriers, and workforce readiness deficiencies. Specifically, it focuses on the development of a transdisciplinary curriculum integrating modern computational methods, advanced data science techniques, innovative biomedical applications, and impactful health analytics. The content of this graduate-level textbook fills a substantial gap in integrating modern engineering concepts, computational algorithms, mathematical optimization, statistical computing and biomedical inference. Big data analytic techniques and predictive scientific methods demand broad transdisciplinary knowledge, appeal to an extremely wide spectrum of readers/learners, and provide incredible opportunities for engagement throughout the academy, industry, regulatory and funding agencies. The two examples below demonstrate the powerful need for scientific knowledge, computational abilities, interdisciplinary expertise, and modern technologies necessary to achieve desired outcomes (improving human health and optimizing future return on investment). This can only be achieved by appropriately trained teams of researchers who can develop robust decision support systems using modern techniques and effective end-to-end protocols, like the ones described in this textbook. • A geriatric neurologist is examining a patient complaining of gait imbalance and posture instability. To determine if the patient may suffer from Parkinson’s disease, the physician acquires clinical, cognitive, phenotypic, imaging, and genetics data (Big Data). Most clinics and healthcare centers are not equipped with skilled data analytic teams that can wrangle, harmonize and interpret such complex datasets. A learner that completes a course of study using this textbook will have the competency and ability to manage the data, generate a protocol for deriving biomarkers, and provide an actionable decision support system. The results of this protocol will help the physician understand the entire patient dataset and assist in making a holistic evidence-based, data-driven, clinical diagnosis.• To improve the return on investment for their shareholders, a healthcare manufacturer needs to forecast the demand for their product subject to environmental, demographic, economic, and bio-social sentiment data (Big Data). The organization’s data-analytics team is tasked with developing a protocol that identifies, aggregates, harmonizes, models and analyzes these heterogeneous data elements to generate a trend forecast. This system needs to provide an automated, adaptive, scalable, and reliable prediction of the optimal investment, e.g., R&D allocation, that maximizes the company’s bot
Data Science at Target
by Srikant M. Datar Caitlin N. Bowler<p>Paritosh Desai joined Target.com in 2013 as VP of Business Intelligence, Analytics & Testing to explore how the retailer could use its relatively small but thriving e-commerce arm to drive sales and win customers. The case explores the technological and organizational challenges Desai faced and the trade offs he considered in his four-year journey to develop the larger retail business into a data science organization. <p>Professor Srikant M. Datar and Research Associate Caitlin N. Bowler prepared this case. It was reviewed and approved before publication by a company designate. Funding for the development of this case was provided by Harvard Business School and not by the company. The citation review for this case has not yet been completed. HBS cases are developed solely as the basis for class discussion. Cases are not intended to serve as endorsements, sources of primary data, or illustrations of effective or ineffective management.</p>
Data Science for Transport: A Self-study Guide With Computer Exercises (Springer Textbooks In Earth Sciences, Geography And Environment Ser.)
by Charles FoxThe quantity, diversity and availability of transport data is increasing rapidly, requiring new skills in the management and interrogation of data and databases. Recent years have seen a new wave of 'big data', 'Data Science', and 'smart cities' changing the world, with the Harvard Business Review describing Data Science as the "sexiest job of the 21st century". Transportation professionals and researchers need to be able to use data and databases in order to establish quantitative, empirical facts, and to validate and challenge their mathematical models, whose axioms have traditionally often been assumed rather than rigorously tested against data. This book takes a highly practical approach to learning about Data Science tools and their application to investigating transport issues. The focus is principally on practical, professional work with real data and tools, including business and ethical issues."Transport modeling practice was developed in a data poor world, and many of our current techniques and skills are building on that sparsity. In a new data rich world, the required tools are different and the ethical questions around data and privacy are definitely different. I am not sure whether current professionals have these skills; and I am certainly not convinced that our current transport modeling tools will survive in a data rich environment. This is an exciting time to be a data scientist in the transport field. We are trying to get to grips with the opportunities that big data sources offer; but at the same time such data skills need to be fused with an understanding of transport, and of transport modeling. Those with these combined skills can be instrumental at providing better, faster, cheaper data for transport decision- making; and ultimately contribute to innovative, efficient, data driven modeling techniques of the future. It is not surprising that this course, this book, has been authored by the Institute for Transport Studies. To do this well, you need a blend of academic rigor and practical pragmatism. There are few educational or research establishments better equipped to do that than ITS Leeds". - Tom van Vuren, Divisional Director, Mott MacDonald"WSP is proud to be a thought leader in the world of transport modelling, planning and economics, and has a wide range of opportunities for people with skills in these areas. The evidence base and forecasts we deliver to effectively implement strategies and schemes are ever more data and technology focused a trend we have helped shape since the 1970's, but with particular disruption and opportunity in recent years. As a result of these trends, and to suitably skill the next generation of transport modellers, we asked the world-leading Institute for Transport Studies, to boost skills in these areas, and they have responded with a new MSc programme which you too can now study via this book." - Leighton Cardwell, Technical Director, WSP."From processing and analysing large datasets, to automation of modelling tasks sometimes requiring different software packages to "talk" to each other, to data visualization, SYSTRA employs a range of techniques and tools to provide our clients with deeper insights and effective solutions. This book does an excellent job in giving you the skills to manage, interrogate and analyse databases, and develop powerful presentations. Another important publication from ITS Leeds." - Fitsum Teklu, Associate Director (Modelling & Appraisal) SYSTRA Ltd"Urban planning has relied for decades on statistical and computational practices that have little to do with mainstream data science. Information is still often used as evidence on the impact of new infrastructure even when it hardly contains any valid evidence. This book is an extremely welcome effort to provide young professionals with the skills needed to analyse how cities and transport networks actually work. The book is also highly relevant to anyone who will later want to build digital solutions to optimise urban travel based on emerging data sources". - Yaron Hollander, author of "T
Data Science on the Google Cloud Platform: Implementing End-to-End Real-Time Data Pipelines: From Ingest to Machine Learning
by Valliappa LakshmananLearn how easy it is to apply sophisticated statistical and machine learning methods to real-world problems when you build on top of the Google Cloud Platform (GCP). This hands-on guide shows developers entering the data science field how to implement an end-to-end data pipeline, using statistical and machine learning methods and tools on GCP. Through the course of the book, you’ll work through a sample business decision by employing a variety of data science approaches.Follow along by implementing these statistical and machine learning solutions in your own project on GCP, and discover how this platform provides a transformative and more collaborative way of doing data science.You’ll learn how to:Automate and schedule data ingest, using an App Engine applicationCreate and populate a dashboard in Google Data StudioBuild a real-time analysis pipeline to carry out streaming analyticsConduct interactive data exploration with Google BigQueryCreate a Bayesian model on a Cloud Dataproc clusterBuild a logistic regression machine-learning model with SparkCompute time-aggregate features with a Cloud Dataflow pipelineCreate a high-performing prediction model with TensorFlowUse your deployed model as a microservice you can access from both batch and real-time pipelines
Data Science with SQL Server Quick Start Guide: Integrate SQL Server with data science
by Dejan SarkaGet unique insights from your data by combining the power of SQL Server, R and PythonKey FeaturesUse the features of SQL Server 2017 to implement the data science project life cycleLeverage the power of R and Python to design and develop efficient data modelsfind unique insights from your data with powerful techniques for data preprocessing and analysisBook DescriptionSQL Server only started to fully support data science with its two most recent editions. If you are a professional from both worlds, SQL Server and data science, and interested in using SQL Server and Machine Learning (ML) Services for your projects, then this is the ideal book for you.This book is the ideal introduction to data science with Microsoft SQL Server and In-Database ML Services. It covers all stages of a data science project, from businessand data understanding,through data overview, data preparation, modeling and using algorithms, model evaluation, and deployment.You will learn to use the engines and languages that come with SQL Server, including ML Services with R and Python languages and Transact-SQL. You will also learn how to choose which algorithm to use for which task, and learn the working of each algorithm.What you will learnUse the popular programming languages,T-SQL, R, and Python, for data scienceUnderstand your data with queries and introductory statisticsCreate and enhance the datasets for MLVisualize and analyze data using basic and advanced graphsExplore ML using unsupervised and supervised modelsDeploy models in SQL Server and perform predictionsWho this book is forSQL Server professionals who want to start with data science, and data scientists who would like to start using SQL Server in their projects will find this book to be useful. Prior exposure to SQL Server will be helpful.
Data Science – was ist das eigentlich?!: Algorithmen des maschinellen Lernens verständlich erklärt
by Annalyn Ng Kenneth SooSie möchten endlich wissen, was sich hinter Schlagworten wie „Data Science“ und „Machine Learning“ eigentlich verbirgt – und was man alles damit anstellen kann? Auf allzu viel Mathematik würden Sie dabei aber gern verzichten? Dann sind Sie hier genau richtig: Dieses Buch bietet einen kompakten Einblick in die wichtigsten Schlüsselkonzepte der Datenwissenschaft und ihrer Algorithmen – und zwar ohne Sie mit mathematischen Formeln und Details zu belasten! Der Fokus liegt – nach einer übergeordneten Einführung – auf Anwendungen des maschinellen Lernens zur Mustererkennung und Vorhersage von Ergebnissen: In jedem Kapitel wird ein Algorithmus erläutert und mit einem leicht verständlichen, realen Anwendungsbeispiel verknüpft. Die Kombination aus intuitiven Erklärungen und zahlreichen Abbildungen ermöglicht dabei ein grundlegendes Verständnis, das ohne mathematische Formelsprache auskommt. Abschließend werden auch die Grenzen und Nachteile der betrachteten Algorithmen explizit aufgezeigt.
Data Science: 4th International Conference of Pioneering Computer Scientists, Engineers and Educators, ICPCSEE 2018, Zhengzhou, China, September 21-23, 2018, Proceedings, Part I (Communications in Computer and Information Science #901)
by Yan Wang Weipeng Jing Xianhua Song Zeguang Lu Yong Gan Qinglei ZhouThis two volume set (CCIS 901 and 902) constitutes the refereed proceedings of the 4th International Conference of Pioneering Computer Scientists, Engineers and Educators, ICPCSEE 2018 (originally ICYCSEE) held in Zhengzhou, China, in September 2018. The 125 revised full papers presented in these two volumes were carefully reviewed and selected from 1057 submissions. The papers cover a wide range of topics related to basic theory and techniques for data science including mathematical issues in data science, computational theory for data science, big data management and applications, data quality and data preparation, evaluation and measurement in data science, data visualization, big data mining and knowledge management, infrastructure for data science, machine learning for data science, data security and privacy, applications of data science, case study of data science, multimedia data management and analysis, data-driven scientific research, data-driven bioinformatics, data-driven healthcare, data-driven management, data-driven eGovernment, data-driven smart city/planet, data marketing and economics, social media and recommendation systems, data-driven security, data-driven business model innovation, social and/or organizational impacts of data science.
Data Science: 4th International Conference of Pioneering Computer Scientists, Engineers and Educators, ICPCSEE 2018, Zhengzhou, China, September 21-23, 2018, Proceedings, Part II (Communications in Computer and Information Science #902)
by Yan Wang Hongzhi Wang Zeguang Lu Wei Xie Qinglei Zhou Qiguang MiaoThis two volume set (CCIS 901 and 902) constitutes the refereed proceedings of the 4th International Conference of Pioneering Computer Scientists, Engineers and Educators, ICPCSEE 2018 (originally ICYCSEE) held in Zhengzhou, China, in September 2018. The 125 revised full papers presented in these two volumes were carefully reviewed and selected from 1057 submissions. The papers cover a wide range of topics related to basic theory and techniques for data science including mathematical issues in data science, computational theory for data science, big data management and applications, data quality and data preparation, evaluation and measurement in data science, data visualization, big data mining and knowledge management, infrastructure for data science, machine learning for data science, data security and privacy, applications of data science, case study of data science, multimedia data management and analysis, data-driven scientific research, data-driven bioinformatics, data-driven healthcare, data-driven management, data-driven eGovernment, data-driven smart city/planet, data marketing and economics, social media and recommendation systems, data-driven security, data-driven business model innovation, social and/or organizational impacts of data science.
Data Stewardship for Open Science: Implementing FAIR Principles
by Barend MonsData Stewardship for Open Science: Implementing FAIR Principles has been written with the intention of making scientists, funders, and innovators in all disciplines and stages of their professional activities broadly aware of the need, complexity, and challenges associated with open science, modern science communication, and data stewardship. The FAIR principles are used as a guide throughout the text, and this book should leave experimentalists consciously incompetent about data stewardship and motivated to respect data stewards as representatives of a new profession, while possibly motivating others to consider a career in the field. The ebook, avalable for no additional cost when you buy the paperback, will be updated every 6 months on average (providing that significant updates are needed or avaialble). Readers will have the opportunity to contribute material towards these updates, and to develop their own data management plans, via the free Data Stewardship Wizard.
Data Warehouse Requirements Engineering: A Decision Based Approach
by Naveen Prakash Deepika PrakashAs the first to focus on the issue of Data Warehouse Requirements Engineering, this book introduces a model-driven requirements process used to identify requirements granules and incrementally develop data warehouse fragments. In addition, it presents an approach to the pair-wise integration of requirements granules for consolidating multiple data warehouse fragments. The process is systematic and does away with the fuzziness associated with existing techniques. Thus, consolidation is treated as a requirements engineering issue. The notion of a decision occupies a central position in the decision-based approach. On one hand, information relevant to a decision must be elicited from stakeholders; modeled; and transformed into multi-dimensional form. On the other, decisions themselves are to be obtained from decision applications. For the former, the authors introduce a suite of information elicitation techniques specific to data warehousing. This information is subsequently converted into multi-dimensional form. For the latter, not only are decisions obtained from decision applications for managing operational businesses, but also from applications for formulating business policies and for defining rules for enforcing policies, respectively. In this context, the book presents a broad range of models, tools and techniques. For readers from academia, the book identifies the scientific/technological problems it addresses and provides cogent arguments for the proposed solutions; for readers from industry, it presents an approach for ensuring that the product meets its requirements while ensuring low lead times in delivery.
Data Wrangling with JavaScript
by Ashley DavisSummaryData Wrangling with JavaScript is hands-on guide that will teach you how to create a JavaScript-based data processing pipeline, handle common and exotic data, and master practical troubleshooting strategies.Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications.About the TechnologyWhy not handle your data analysis in JavaScript? Modern libraries and data handling techniques mean you can collect, clean, process, store, visualize, and present web application data while enjoying the efficiency of a single-language pipeline and data-centric web applications that stay in JavaScript end to end.About the BookData Wrangling with JavaScript promotes JavaScript to the center of the data analysis stage! With this hands-on guide, you'll create a JavaScript-based data processing pipeline, handle common and exotic data, and master practical troubleshooting strategies. You'll also build interactive visualizations and deploy your apps to production. Each valuable chapter provides a new component for your reusable data wrangling toolkit.What's insideEstablishing a data pipelineAcquisition, storage, and retrievalHandling unusual data setsCleaning and preparing raw dataInteractive visualizations with D3About the ReaderWritten for intermediate JavaScript developers. No data analysis experience required.About the AuthorAshley Davis is a software developer, entrepreneur, author, and the creator of Data-Forge and Data-Forge Notebook, software for data transformation, analysis, and visualization in JavaScript.Table of ContentsGetting started: establishing your data pipelineGetting started with Node.jsAcquisition, storage, and retrievalWorking with unusual dataExploratory codingClean and prepareDealing with huge data filesWorking with a mountain of dataPractical data analysisBrowser-based visualizationServer-side visualizationLive dataAdvanced visualization with D3Getting to production
Data and Applications Security and Privacy XXXII: 32nd Annual IFIP WG 11.3 Conference, DBSec 2018, Bergamo, Italy, July 16–18, 2018, Proceedings (Lecture Notes in Computer Science #10980)
by Florian Kerschbaum Stefano ParaboschiThis book constitutes the refereed proceedings of the 32nd Annual IFIP WG 11.3 International Working Conference on Data and Applications Security and Privacy, DBSec 2018, held in Bergamo, Italy, in July 2018. The 16 full papers and 5 short papers presented were carefully reviewed and selected from 50 submissions. The papers present high-quality original research from academia, industry, and government on theoretical and practical aspects of information security. They are organized in topical sections on administration, access control policies, privacy-preserving access and computation, integrity and user interaction, security analysis and private evaluation, fixing vulnerabilities, and networked systems.
Data and Energy Integrated Communication Networks: A Brief Introduction (SpringerBriefs in Computer Science)
by Kun Yang Jie HuThe book discusses data and energy integrated communication networking technologies, including the latest research contributions in this promising area. It firstly provides an overview of data and energy integrated communication networks (DEINs) and introduces the key techniques for enabling integrated wireless energy transfer (WET) and wireless information transfer (WIT) in the radio frequency (RF) band. It then describes the ubiquitous architecture of DEINs and demonstrates the typical DEIN system and investigates the core issues in both the physical layer and the medium-access-control (MAC) layer in order to coordinate both the WIT and WET in the same RF band. Lastly, the book addresses a number of emerging research topics in the field of DEINs. It promotes joint efforts from both academia and industry to push DEIN a step closer to practical implementation. It is also a valuable resource for students interested in studying cutting-edge techniques in this field.
Data-Centric Applications with Vaadin 8: Develop and maintain high-quality web applications using Vaadin
by Alejandro DuarteThis book teaches you everything you need to know to create stunning Vaadin applications for all your web development needs. Deep dive into advanced Vaadin concepts while creating your very own sample Vaadin application.Key FeaturesA one-stop book to enhance your working knowledge with Vaadin.Explore and implement the architecture of Vaadin applications.Delve into advanced topics such as data binding, authentication and authorization to improvise your application’s performance.Book DescriptionVaadin is an open-source Java framework used to build modern user interfaces. Vaadin 8 simplifies application development and improves user experience. The book begins with an overview of the architecture of Vaadin applications and the way you can organize your code in modules.Then it moves to the more advanced topics about advanced topics such as internationalization, authentication, authorization, and database connectivity. The book also teaches you how to implement CRUD views, how to generate printable reports, and how to manage data with lazy loading.By the end of this book you will be able to architect, implement, and deploy stunning Vaadin applications, and have the knowledge to master web development with Vaadin.What you will learnModularize your Vaadin applications with MavenCreate high quality custom componentsImplement robust and secure authentication and authorization mechanismsConnect to SQL databases efficientlyDesign robust CRUD (Create, Read, Update, Delete) viewsGenerate stunning reportsImprove resource consumption by using lazy loadingWho this book is forIf you area Software developer with previous experience with Vaadin and would like to gain more comprehensive and advanced skills in Vaadin web development, then this book is for you.
Data-Driven HR: How to Use Analytics and Metrics to Drive Performance
by Bernard MarrTraditionally seen as a purely people function unconcerned with numbers, HR is now uniquely placed to use company data to drive performance, both of the people in the organization and the organization as a whole. Data-Driven HR is a practical guide which enables HR professionals to leverage the value of the vast amount of data available at their fingertips. Covering how to identify the most useful sources of data, collect information in a transparent way that is in line with data protection requirements and turn this data into tangible insights, this book marks a turning point for the HR profession. Covering all the key elements of HR including recruitment, employee engagement, performance management, wellbeing and training, Data-Driven HR examines the ways data can contribute to organizational success by, among other things, optimizing processes, driving performance and improving HR decision making. Packed with case studies and real-life examples, this is essential reading for all HR professionals looking to make a measurable difference in their organizations.
Data-Driven Law: Data Analytics and the New Legal Services (Data Analytics Applications)
by Edward J. WaltersFor increasingly data-savvy clients, lawyers can no longer give "it depends" answers rooted in anecdata. Clients insist that their lawyers justify their reasoning, and with more than a limited set of war stories. The considered judgment of an experienced lawyer is unquestionably valuable. However, on balance, clients would rather have the considered judgment of an experienced lawyer informed by the most relevant information required to answer their questions. Data-Driven Law: Data Analytics and the New Legal Services helps legal professionals meet the challenges posed by a data-driven approach to delivering legal services. Its chapters are written by leading experts who cover such topics as: Mining legal data Computational law Uncovering bias through the use of Big Data Quantifying the quality of legal services Data mining and decision-making Contract analytics and contract standards In addition to providing clients with data-based insight, legal firms can track a matter with data from beginning to end, from the marketing spend through to the type of matter, hours spent, billed, and collected, including metrics on profitability and success. Firms can organize and collect documents after a matter and even automate them for reuse. Data on marketing related to a matter can be an amazing source of insight about which practice areas are most profitable. Data-driven decision-making requires firms to think differently about their workflow. Most firms warehouse their files, never to be seen again after the matter closes. Running a data-driven firm requires lawyers and their teams to treat information about the work as part of the service, and to collect, standardize, and analyze matter data from cradle to grave. More than anything, using data in a law practice requires a different mindset about the value of this information. This book helps legal professionals to develop this data-driven mindset.
Data-Driven Prediction for Industrial Processes and Their Applications (Information Fusion and Data Science)
by Wei Wang Jun Zhao Chunyang ShengThis book presents modeling methods and algorithms for data-driven prediction and forecasting of practical industrial process by employing machine learning and statistics methodologies. Related case studies, especially on energy systems in the steel industry are also addressed and analyzed. The case studies in this volume are entirely rooted in both classical data-driven prediction problems and industrial practice requirements. Detailed figures and tables demonstrate the effectiveness and generalization of the methods addressed, and the classifications of the addressed prediction problems come from practical industrial demands, rather than from academic categories. As such, readers will learn the corresponding approaches for resolving their industrial technical problems. Although the contents of this book and its case studies come from the steel industry, these techniques can be also used for other process industries. This book appeals to students, researchers, and professionals within the machine learning and data analysis and mining communities.
Data-Driven Storytelling (AK Peters Visualization Series)
by Nathalie Henry Riche Christophe Hurter Nicholas Diakopoulos Sheelagh CarpendaleThis book presents an accessible introduction to data-driven storytelling. Resulting from unique discussions between data visualization researchers and data journalists, it offers an integrated definition of the topic, presents vivid examples and patterns for data storytelling, and calls out key challenges and new opportunities for researchers and practitioners.
Data-Warehouse-Systeme für Dummies (Für Dummies)
by Wolfgang GerkenJede Business-Intelligence-Anwendung beruht letzten Endes auf einem Data Warehouse. Data Warehousing ist deshalb ein sehr wichtiges Gebiet der Angewandten Informatik, insbesondere im Zeitalter von Big Data. Das vorliegende Buch beleuchtet das Data Warehouse aus zwei Perspektiven: der des Entwicklers und der des Anwenders. Der zukünftige Entwickler lernt, ein Data Warehouse mit geeigneten Methoden selbst zu entwickeln. Für den zukünftigen Anwender geht der Autor auf die Themen Reporting, Online Analytical Processing und Data Mining ein. Das Lehrbuch ist auch zum Selbststudium geeignet. Kenntnisse über Datenbanksysteme sollten allerdings vorhanden sein.
Database Benchmarking and Stress Testing: An Evidence-Based Approach to Decisions on Architecture and Technology
by Bert ScalzoProvide evidence-based answers that can be measured and relied upon by your business. Database administrators will be able to make sound architectural decisions in a fast-changing landscape of virtualized servers and container-based solutions based on the empirical method presented in this book for answering “what if” questions about database performance.Today’s database administrators face numerous questions such as: What if we consolidate databases using multitenant features? What if we virtualize database servers as Docker containers? What if we deploy the latest in NVMe flash disks to speed up IO access?Do features such as compression, partitioning, and in-memory OLTP earn back their price? What if we move our databases to the cloud?As an administrator, do you know the answers or even how to test the assumptions?Database Benchmarking and Stress Testing introduces you to database benchmarking using industry-standard test suites such as the TCP series of benchmarks, which are the same benchmarks that vendors rely upon. You’ll learn to run these industry-standard benchmarks and collect results to use in answering questions about the performance impact of architectural changes, technology changes, and even down to the brand of database software. You’ll learn to measure performance and predict the specific impact of changes to your environment. You’ll know the limitations of the benchmarks and the crucial difference between benchmarking and workload capture/reply. This book teaches you how to create empirical evidence in support of business and technology decisions. It’s about not guessing when you should be measuring. Empirical testing is scientific testing that delivers measurable results. Begin with a hypothesis about the impact of a possible architecture or technology change. Then run the appropriate benchmarks to gather data and predict whether the change you’re exploring will be beneficial, and by what order of magnitude. Stop guessing. Start measuring. Let Database Benchmarking and Stress Testing show the way.What You'll LearnUnderstand the industry-standard database benchmarks, and when each is best usedPrepare for a database benchmarking effort so reliable results can be achievedPerform database benchmarking for consolidation, virtualization, and cloud projectsRecognize and avoid common mistakes in benchmarking database performanceMeasure and interpret results in a rational, concise manner for reliable comparisonsChoose and provide advice on benchmarking tools based on their pros and consWho This Book Is ForDatabase administrators and professionals responsible for advising on architectural decisions such as whether to use cloud-based services, whether to consolidate and containerize, and who must make recommendations on storage or any other technology that impacts database performance
Database Internals: A Deep Dive into How Distributed Data Systems Work
by Alex PetrovWhen it comes to choosing, using, and maintaining a database, understanding its internals is essential. But with so many distributed databases and tools available today, it’s often difficult to understand what each one offers and how they differ. With this practical guide, Alex Petrov guides developers through the concepts behind modern database and storage engine internals.Throughout the book, you’ll explore relevant material gleaned from numerous books, papers, blog posts, and the source code of several open source databases. These resources are listed at the end of parts one and two. You’ll discover that the most significant distinctions among many modern databases reside in subsystems that determine how storage is organized and how data is distributed.This book examines:Storage engines: Explore storage classification and taxonomy, and dive into B-Tree-based and immutable Log Structured storage engines, with differences and use-cases for eachStorage building blocks: Learn how database files are organized to build efficient storage, using auxiliary data structures such as Page Cache, Buffer Pool and Write-Ahead LogDistributed systems: Learn step-by-step how nodes and processes connect and build complex communication patternsDatabase clusters: Which consistency models are commonly used by modern databases and how distributed storage systems achieve consistency
Database Systems for Advanced Applications: 23rd International Conference, DASFAA 2018, Gold Coast, QLD, Australia, May 21-24, 2018, Proceedings, Part I (Lecture Notes in Computer Science #10827)
by Yannis Manolopoulos Shazia Sadiq Jianxin Li Jian PeiThis two-volume set LNCS 10827 and LNCS 10828 constitutes the refereed proceedings of the 23rd International Conference on Database Systems for Advanced Applications, DASFAA 2018, held in Gold Coast, QLD, Australia, in May 2018.The 83 full papers, 21 short papers, 6 industry papers, and 8 demo papers were carefully selected from a total of 360 submissions. The papers are organized around the following topics: network embedding; recommendation; graph and network processing; social network analytics; sequence and temporal data processing; trajectory and streaming data; RDF and knowledge graphs; text and data mining; medical data mining; security and privacy; search and information retrieval; query processing and optimizations; data quality and crowdsourcing; learning models; multimedia data processing; and distributed computing.
Database Systems for Advanced Applications: 23rd International Conference, DASFAA 2018, Gold Coast, QLD, Australia, May 21-24, 2018, Proceedings, Part II (Lecture Notes in Computer Science #10828)
by Yannis Manolopoulos Shazia Sadiq Jianxin Li Jian PeiThis two-volume set LNCS 10827 and LNCS 10828 constitutes the refereed proceedings of the 23rd International Conference on Database Systems for Advanced Applications, DASFAA 2018, held in Gold Coast, QLD, Australia, in May 2018.The 83 full papers, 21 short papers, 6 industry papers, and 8 demo papers were carefully selected from a total of 360 submissions. The papers are organized around the following topics: network embedding; recommendation; graph and network processing; social network analytics; sequence and temporal data processing; trajectory and streaming data; RDF and knowledge graphs; text and data mining; medical data mining; security and privacy; search and information retrieval; query processing and optimizations; data quality and crowdsourcing; learning models; multimedia data processing; and distributed computing.
Database and Expert Systems Applications: 29th International Conference, DEXA 2018, Regensburg, Germany, September 3–6, 2018, Proceedings, Part I (Lecture Notes in Computer Science #11029)
by Abdelkader Hameurlain Sven Hartmann Hui Ma Günther Pernul Roland R. WagnerThis two volume set of LNCS 11029 and LNCS 11030 constitutes the refereed proceedings of the 29th International Conference on Database and Expert Systems Applications, DEXA 2018, held in Regensburg, Germany, in September 2018. The 35 revised full papers presented together with 40 short papers were carefully reviewed and selected from 160 submissions. The papers of the first volume discuss a range of topics including: Big data analytics; data integrity and privacy; decision support systems; data semantics; cloud data processing; time series data; social networks; temporal and spatial databases; and graph data and road networks. The papers of the second volume discuss a range of the following topics: Information retrieval; uncertain information; data warehouses and recommender systems; data streams; information networks and algorithms; database system architecture and performance; novel database solutions; graph querying and databases; learning; emerging applications; data mining; privacy; and text processing.