Modern Data Mining with Python

Home / Catalog / Modern Data Mining with Python

A risk-managed approach to developing and deploying explainable and efficient algorithms using ModelOps (English Edition)

Author(s): Dushyant Singh Sengar

Published by BPB Publications

ISBN: 9789355519146

Pages: 438

https://doi.org/9789355519146

- Facebook
- Twitter
- Linkedin
- Whatsapp
- Copy URL

EBOOK (EPUB)

ISBN: 9789355519146 Price: INR 899.00

Add to cart Buy Now

Description

Table of contents

User Reviews

Subject(s): Data mining, Machine Learning, AI, responsible AI, Explainable AI, ModelOps, Model risk management (MRM)

"Modern Data Mining with Python" is a guidebook for responsibly implementing data mining techniques that involve collecting, storing, and analyzing large amounts of structured and unstructured data to extract useful insights and patterns. Enter into the world of data mining and machine learning. Use insights from various data sources, from social media to credit card transactions. Master statistical tools, explore data trends, and patterns. Understand decision trees and artificial neural networks (ANNs). Manage high-dimensional data with dimensionality reduction. Explore binary classification with logistic regression. Spot concealed patterns with unsupervised learning. Analyze text with recurrent neural networks (RNNs) and visuals with convolutional neural networks (CNNs). Ensure model compliance with regulatory standards. After reading this book, readers will be equipped with the skills and knowledge necessary to use Python for data mining and analysis in an industry set-up. They will be able to analyze and implement algorithms on large structured and unstructured datasets.

Cover
Title Page
Copyright Page
Dedication Page
Foreword
About the Authors
About the Reviewers
Acknowledgement
Preface
Table of Contents
1. Understanding Data Mining in a Nutshell
- Introduction
- Structure
- Objectives
- What defines modern data mining
- The lifecycle: Data to insights consumption
- Understanding pattern recognition
  - Significance of the human learning process
  - The human learning process and mental models
  - Data: The key ingredient for meaningful patterns and relationships
- How machines leverage data to build models
  - Machine learning process
    - Two dominant strategies: Classification and regression
    - Biases and learning shortfalls
    - Measuring learning accuracy and balancing trade-offs
    - Can data size and sample impact learning
  - How do humans benefit from data and learning
  - Modern-day data mining challenges and possible remediation
- Conclusion
- Points to remember
2. Basic Statistics and Exploratory Data Analysis
- Introduction
- Structure
- Objectives
- Setting up Python 3.x
- Data mining and statistics
  - Statistics: Foundation, key terms, needs, and types
- Descriptive statistics
  - Graphical and non-graphical exploratory data analysis
  - Non-graphical and graphical representation of univariate data
  - Non-graphical representation of multivariate data
  - Graphical representation of multivariate data
- Probability theory
  - Probability distribution
- Inferential statistics
  - Hypothesis testing with commonly used statistical tests
- Introduction to Time Series Data
- Exploratory data analysis: HMDA case study
- Conclusion
- Points to remember
3. Digging into Linear Regression
- Introduction
- Structure
- Objectives
- Linear regression
  - Background
  - Under the hood
  - Challenges and assumptions including multi-collinearity
  - Detailed EDA
    - Dataset description
    - Missing value treatment
    - Outlier analysis
    - Correlation
    - Checking on the assumptions of linear regression
  - Feature selection
  - Regression execution and results
  - Regression result interpretation
- Optimization algorithm
  - Gradient descent
- Regularization
  - Lasso regression
  - Ridge regression
  - Elastic-Net regression
- MLflow introduction: Need and implementation
  - MLflow experiment tracking
- Case study
- Conclusion
- Points to remember
4. Exploring Logistic Regression
- Introduction
- Structure
- Objectives
- Logistic regression
- Background
- Under the hood
  - Data
  - Estimating probabilities
  - Loss function
- Challenges and assumptions
- Logistic regression result and interpretation
- Model interpretability and explainability
- Performance metrics
- Model generalization
  - K-fold cross-validation
  - Ensemble learning
- Model lifecycle processes
- Model development process
- Case study: Loan repayment likelihood prediction
- Conclusion
- Points to remember
5. Decision Trees with Bagging and Boosting
- Introduction
- Structure
- Objectives
- Decision trees
  - Background
  - Under the hood
    - Data
    - Model
    - Loss function
  - Challenges and assumptions
  - Decision tree result and interpretation
- Ensembling: Bagging, boosting, and stacking
  - Random forest
  - Gradient boosting
  - Ensembling using the stacking method
- Conclusion
- Points to remember
6. Support Vector Machines and K-Nearest Neighbors
- Introduction
- Structure
- Objectives
- Classification algorithms with a twist
  - Background
  - Under the hood
    - Data
    - Model
    - Loss function: Achieving optimal algorithmic results
- Challenges and assumptions
- Case study: Predicting customer propensity to subscribe to a term deposit
- Conclusion
- Points to remember
7. Putting Dimensionality Reduction into Action
- Introduction
- Structure
- Objectives
- Dimensionality reduction
- Background
- Under the dimensionality reduction hood
  - Data
  - Model: Reducing dimensions and variance
    - Principal component analysis
    - Linear discriminant analysis
    - t-distributed Stochastic Neighbor Embedding
  - Loss: Measuring Variance Reduction
- Challenges and assumptions
- Case study: Predicting loan repayment propensity using logistic regression, PCA, and LDA
- PCA parameters and interpretation
- LDA parameters and interpretation
- Logistic regression
- Conclusion
- Further reading
- Points to remember
8. Beginning with Unsupervised Models
- Introduction
- Structure
- Objectives
- Unsupervised learning
- Background
- Unsupervised learning techniques
  - Data
  - Model: Building meaningful clusters and profiling them
    - K-means clustering
    - Density-based spatial clustering of applications with noise
    - Hierarchical clustering
  - Loss: Efficiently achieving the optimal number of clusters
- Challenges and assumptions
- Case study: Bank customer portfolio segmentation
- Advanced unsupervised learning: A primer
- Conclusion
- Points to remember
9. Structured Data Classification using Artificial Neural Networks
- Introduction
- Structure
- Objectives
- Artificial neural network
- Background
- Under the hood of neural networks
  - Data
  - Model
  - Loss function: Achieving optimal results
    - Back-propagation and regularization
- Challenges and assumptions
- Case study: Explainable and Interpretable ANN Model
  - Interpretable and explainable AI using SHAP and PiML
- Conclusion
- Points to remember
10. Language Modeling with Recurrent Neural Networks
- Introduction
- Structure
- Objectives
- Language modeling
- Background
- Under the hood of language modeling
  - Data: From spoken languages to modeling datasets
  - Model: The language with context
    - Recurrent neural network
    - Long short term memory
  - Loss: Quest for the best model
- Challenges and assumptions related to text data and model
- Case study: Customer complaint classification explained with LIME
- Rise of transformers: A primer on BERT and GPT
- Conclusion
- Further reading
- Points to remember
11. Image Processing with Convolutional Neural Networks
- Introduction
- Structure
- Objectives
- Deep learning for computer vision tasks
- Background
- Under the hood of CNN models
  - Data
  - Model
  - Loss: How to achieve optimal results
- Challenges and assumptions
- The race for the best model and transfer learning: A primer
- Case study: PDF document parser
- Conclusion
- Further reading
- Points to remember
12. Understanding Model Risk Management for Data Mining Models
- Introduction
- Structure
- Objectives
- Data mining challenges and risks
  - Why do model risks occur
- Introduction to Model Risk Management
  - Key regulatory frameworks
  - Pillars of Model Risk Management
- Introduction to Model Operations
- ModelOps: Product first vs. model first mindset
- How ModelOps facilitates MRM
- Case study: Regulatory requirement fulfillment using MRM and ModelOps
- Conclusion
- Points to remember
13. Adopting ModelOps to Manage Model Risk
- Introduction
- Structure
- Objectives
- Model risk management for fair banking
- Background
- Case study: Fair lending model lifecycle implementation - concept to inference
  - Fair lending model lifecycle
  - Data
  - Model Operations tools primer
  - Architecting the model lifecycle using ModelOps
  - Fair Lending Risk Assessment: The application
- Challenges and assumptions
- Future of AI and its practitioners
- Conclusion
- Further reading
- Points to remember
Index

Comments should not be blank

Rating

Description

Subject(s): Data mining, Machine Learning, AI, responsible AI, Explainable AI, ModelOps, Model risk management (MRM)

Table of contents

Cover
Title Page
Copyright Page
Dedication Page
Foreword
About the Authors
About the Reviewers
Acknowledgement
Preface
Table of Contents
1. Understanding Data Mining in a Nutshell
- Introduction
- Structure
- Objectives
- What defines modern data mining
- The lifecycle: Data to insights consumption
- Understanding pattern recognition
  - Significance of the human learning process
  - The human learning process and mental models
  - Data: The key ingredient for meaningful patterns and relationships
- How machines leverage data to build models
  - Machine learning process
    - Two dominant strategies: Classification and regression
    - Biases and learning shortfalls
    - Measuring learning accuracy and balancing trade-offs
    - Can data size and sample impact learning
  - How do humans benefit from data and learning
  - Modern-day data mining challenges and possible remediation
- Conclusion
- Points to remember
2. Basic Statistics and Exploratory Data Analysis
- Introduction
- Structure
- Objectives
- Setting up Python 3.x
- Data mining and statistics
  - Statistics: Foundation, key terms, needs, and types
- Descriptive statistics
  - Graphical and non-graphical exploratory data analysis
  - Non-graphical and graphical representation of univariate data
  - Non-graphical representation of multivariate data
  - Graphical representation of multivariate data
- Probability theory
  - Probability distribution
- Inferential statistics
  - Hypothesis testing with commonly used statistical tests
- Introduction to Time Series Data
- Exploratory data analysis: HMDA case study
- Conclusion
- Points to remember
3. Digging into Linear Regression
- Introduction
- Structure
- Objectives
- Linear regression
  - Background
  - Under the hood
  - Challenges and assumptions including multi-collinearity
  - Detailed EDA
    - Dataset description
    - Missing value treatment
    - Outlier analysis
    - Correlation
    - Checking on the assumptions of linear regression
  - Feature selection
  - Regression execution and results
  - Regression result interpretation
- Optimization algorithm
  - Gradient descent
- Regularization
  - Lasso regression
  - Ridge regression
  - Elastic-Net regression
- MLflow introduction: Need and implementation
  - MLflow experiment tracking
- Case study
- Conclusion
- Points to remember
4. Exploring Logistic Regression
- Introduction
- Structure
- Objectives
- Logistic regression
- Background
- Under the hood
  - Data
  - Estimating probabilities
  - Loss function
- Challenges and assumptions
- Logistic regression result and interpretation
- Model interpretability and explainability
- Performance metrics
- Model generalization
  - K-fold cross-validation
  - Ensemble learning
- Model lifecycle processes
- Model development process
- Case study: Loan repayment likelihood prediction
- Conclusion
- Points to remember
5. Decision Trees with Bagging and Boosting
- Introduction
- Structure
- Objectives
- Decision trees
  - Background
  - Under the hood
    - Data
    - Model
    - Loss function
  - Challenges and assumptions
  - Decision tree result and interpretation
- Ensembling: Bagging, boosting, and stacking
  - Random forest
  - Gradient boosting
  - Ensembling using the stacking method
- Conclusion
- Points to remember
6. Support Vector Machines and K-Nearest Neighbors
- Introduction
- Structure
- Objectives
- Classification algorithms with a twist
  - Background
  - Under the hood
    - Data
    - Model
    - Loss function: Achieving optimal algorithmic results
- Challenges and assumptions
- Case study: Predicting customer propensity to subscribe to a term deposit
- Conclusion
- Points to remember
7. Putting Dimensionality Reduction into Action
- Introduction
- Structure
- Objectives
- Dimensionality reduction
- Background
- Under the dimensionality reduction hood
  - Data
  - Model: Reducing dimensions and variance
    - Principal component analysis
    - Linear discriminant analysis
    - t-distributed Stochastic Neighbor Embedding
  - Loss: Measuring Variance Reduction
- Challenges and assumptions
- Case study: Predicting loan repayment propensity using logistic regression, PCA, and LDA
- PCA parameters and interpretation
- LDA parameters and interpretation
- Logistic regression
- Conclusion
- Further reading
- Points to remember
8. Beginning with Unsupervised Models
- Introduction
- Structure
- Objectives
- Unsupervised learning
- Background
- Unsupervised learning techniques
  - Data
  - Model: Building meaningful clusters and profiling them
    - K-means clustering
    - Density-based spatial clustering of applications with noise
    - Hierarchical clustering
  - Loss: Efficiently achieving the optimal number of clusters
- Challenges and assumptions
- Case study: Bank customer portfolio segmentation
- Advanced unsupervised learning: A primer
- Conclusion
- Points to remember
9. Structured Data Classification using Artificial Neural Networks
- Introduction
- Structure
- Objectives
- Artificial neural network
- Background
- Under the hood of neural networks
  - Data
  - Model
  - Loss function: Achieving optimal results
    - Back-propagation and regularization
- Challenges and assumptions
- Case study: Explainable and Interpretable ANN Model
  - Interpretable and explainable AI using SHAP and PiML
- Conclusion
- Points to remember
10. Language Modeling with Recurrent Neural Networks
- Introduction
- Structure
- Objectives
- Language modeling
- Background
- Under the hood of language modeling
  - Data: From spoken languages to modeling datasets
  - Model: The language with context
    - Recurrent neural network
    - Long short term memory
  - Loss: Quest for the best model
- Challenges and assumptions related to text data and model
- Case study: Customer complaint classification explained with LIME
- Rise of transformers: A primer on BERT and GPT
- Conclusion
- Further reading
- Points to remember
11. Image Processing with Convolutional Neural Networks
- Introduction
- Structure
- Objectives
- Deep learning for computer vision tasks
- Background
- Under the hood of CNN models
  - Data
  - Model
  - Loss: How to achieve optimal results
- Challenges and assumptions
- The race for the best model and transfer learning: A primer
- Case study: PDF document parser
- Conclusion
- Further reading
- Points to remember
12. Understanding Model Risk Management for Data Mining Models
- Introduction
- Structure
- Objectives
- Data mining challenges and risks
  - Why do model risks occur
- Introduction to Model Risk Management
  - Key regulatory frameworks
  - Pillars of Model Risk Management
- Introduction to Model Operations
- ModelOps: Product first vs. model first mindset
- How ModelOps facilitates MRM
- Case study: Regulatory requirement fulfillment using MRM and ModelOps
- Conclusion
- Points to remember
13. Adopting ModelOps to Manage Model Risk
- Introduction
- Structure
- Objectives
- Model risk management for fair banking
- Background
- Case study: Fair lending model lifecycle implementation - concept to inference
  - Fair lending model lifecycle
  - Data
  - Model Operations tools primer
  - Architecting the model lifecycle using ModelOps
  - Fair Lending Risk Assessment: The application
- Challenges and assumptions
- Future of AI and its practitioners
- Conclusion
- Further reading
- Points to remember
Index