Learn Data Science from Scratch  
Mastering ML and NLP with Python in a step-by-step approach (English Edition)
Author(s): Pratheerth Padman
Published by BPB Publications
Publication Date:  Available in all formats
ISBN: 9789355517036
Pages: 416

EBOOK (EPUB)

ISBN: 9789355517036 Price: INR 899.00
Add to cart Buy Now
Learn Data Science from Scratch equips you with the essential tools and techniques, from Python libraries to machine learning algorithms, to tackle real-world problems and make informed decisions. This book provides a thorough exploration of essential data science concepts, tools, and techniques. Starting with the fundamentals of data science, you will progress through data collection, web scraping, data exploration and visualization, and data cleaning and pre-processing. You will build the required foundation in statistics and probability before diving into machine learning algorithms, deep learning, natural language processing, recommender systems, and data storage systems. With hands-on examples and practical advice, each chapter offers valuable insights and key takeaways, empowering you to master the art of data-driven decision making. By the end of this book, you will be well-equipped with the essential skills and knowledge to navigate the exciting world of data science. You will be able to collect, analyze, and interpret data, build and evaluate machine learning models, and effectively communicate your findings, making you a valuable asset in any data-driven environment.
Rating
Description
Learn Data Science from Scratch equips you with the essential tools and techniques, from Python libraries to machine learning algorithms, to tackle real-world problems and make informed decisions. This book provides a thorough exploration of essential data science concepts, tools, and techniques. Starting with the fundamentals of data science, you will progress through data collection, web scraping, data exploration and visualization, and data cleaning and pre-processing. You will build the required foundation in statistics and probability before diving into machine learning algorithms, deep learning, natural language processing, recommender systems, and data storage systems. With hands-on examples and practical advice, each chapter offers valuable insights and key takeaways, empowering you to master the art of data-driven decision making. By the end of this book, you will be well-equipped with the essential skills and knowledge to navigate the exciting world of data science. You will be able to collect, analyze, and interpret data, build and evaluate machine learning models, and effectively communicate your findings, making you a valuable asset in any data-driven environment.
Table of contents
  • Cover
  • Title Page
  • Copyright Page
  • Dedication Page
  • About the Author
  • About the Reviewer
  • Acknowledgement
  • Preface
  • Table of Contents
  • 1. Unraveling the Data Science Universe: An Introduction
    • Introduction
    • Structure
    • Objectives
    • What is data science
    • Data science: A fusion of fields
    • History and evolution of data science as a field
    • The data science process
    • A day in the life of a data scientist
    • How data science is shaping our world
    • Differences between Artificial Intelligence, big data, and data science
    • Conclusion
    • Points to remember
    • Multiple choice questions
    • Answers
    • Questions
  • 2. Essential Python Libraries and Tools for Data Science
    • Introduction
    • Structure
    • Objectives
    • Setting up your developer environment
    • Basics of NumPy
      • Array creation and manipulation
      • Mathematical operations with NumPy
      • Broadcasting
      • Advanced NumPy techniques
        • Array reshaping
        • Stacking
        • Splitting
    • Pandas for data manipulation
      • Introducing series and DataFrame
      • Reading and writing data from various file formats
      • Data cleaning and pre-processing
    • Matplotlib, seaborn, and Plotly for data visualization
      • Basics of Matplotlib
      • Seaborn for advanced visualization
      • Interactive visualizations with Plotly
      • Choosing the right visualization
    • Jupyter Notebook essentials
      • Launching and understanding the interface
      • Code, Markdown, and raw cells
      • Executing code and displaying results
      • Sharing and exporting notebooks
    • Scikit-learn: Key to streamlined Machine Learning
    • Conclusion
    • Points to remember
    • Multiple choice questions
    • Answers
    • Questions
  • 3. Statistics and Probability Essentials for Data Science
    • Introduction
    • Structure
    • Objectives
    • Probability theory
    • Basic probability concepts
      • Events
      • Sample space
    • Conditional probability and Bayes’ theorem
      • Conditional probability
      • Bayes’ theorem
    • Discrete and continuous random variables
    • Expectation, variance, and covariance of random variables
      • Expectation
      • Variance
      • Covariance
    • Distributions and sampling
      • Probability distributions
    • Central limit theorem
    • Sampling techniques
    • Hypothesis testing
      • Null and alternative hypotheses
      • Test statistics and p-values
      • Common hypothesis tests: Z-test, t-test, chi-square test, and ANOVA
      • Type I and type II errors
    • Conclusion
    • Points to remember
    • Multiple choice questions
    • Answers
    • Questions
  • 4. Data Mining Expedition: Web Scraping and Data Collection Techniques
    • Introduction
    • Structure
    • Objectives
    • Sources of data
      • Publicly available datasets
        • Government portals
        • Research institutions
      • Web scraping
      • APIs
      • Proprietary databases
    • Web scraping with Beautiful Soup and Requests
      • Installing and importing the Beautiful Soup and Requests libraries
      • Fetching web page content using Requests
      • Parsing HTML with Beautiful Soup and extracting data
      • Handling pagination, AJAX, and other web scraping challenges
    • APIs and Python libraries for data collection
      • RESTful APIs and their usage in data collection
      • Authentication methods
      • Popular Python libraries for working with APIs
      • Parsing and handling JSON, XML, and other data formats
    • Ethical considerations during data collection
      • Respecting website terms of service and the robots.txt file
      • Adhering to API rate limits and usage restrictions
      • User privacy and data anonymization
      • Ethics and law in data management
    • Conclusion
    • Points to remember
    • Multiple choice questions
    • Answers
    • Questions
  • 5. Painting with Data: Exploration and Visualization
    • Introduction
    • Structure
    • Objectives
    • Exploratory data analysis
      • Why do we need exploratory data analysis
      • Cleaning and preprocessing data for exploratory data analysis
      • Univariate and multivariate analysis techniques
    • Descriptive statistics
      • Measures of central tendency: mean, mode and median
      • Exploring data spread: Range, variance, and standard deviation
      • Skewness and kurtosis
      • Understanding descriptive statistics in data analysis
    • Data visualization with Matplotlib, seaborn, and Plotly
      • Getting acquainted with Matplotlib, seaborn, and Plotly
      • A guide to visualizing data with common chart types
      • Customization techniques for engaging visualizations
      • Creating interactive visualizations with Plotly
    • Discovering trends and relationships
      • Unraveling linear and non-linear relationships
      • Unraveling time series data: Trends and seasonality
      • Outliers: Uncovering their impact on data analysis
      • Revealing hidden patterns through visualization techniques
    • Conclusion
    • Points to remember
    • Multiple choice questions
    • Answers
    • Questions
  • 6. Data Alchemy: Cleaning and Preprocessing Raw Data
    • Introduction
    • Structure
    • Objectives
    • Handling missing data
      • Detecting missing data
      • Strategies for tackling missing data
      • Pandas and NumPy for missing data handling
    • Data transformation and normalization
      • Importance of data transformation and normalization
      • Overview of data transformation techniques
      • Scaling techniques in data normalization
      • Mastering data alchemy with Python libraries
    • Addressing duplication and data inconsistencies
      • Spotting and eliminating duplicate entries
      • Handling inconsistent and incorrect data
    • Feature engineering and selection
      • Role of feature engineering and selection
      • The art of crafting features
        • Picking the A-team: Methods for effective feature selection
      • Feature engineering with Pandas and Scikit-Learn
    • Encoding categorical features
      • Rationale for encoding categorical features
      • Diverse pathways of encoding: One-hot and ordinal techniques unveiled
      • Conjuring the magic of encoding: A pythonic approach
    • Conclusion
    • Points to remember
    • Multiple choice questions
    • Answers
    • Questions
  • 7. Machine Learning Magic: An Introduction to Predictive Modeling
    • Introduction
    • Structure
    • Objectives
    • Supervised and unsupervised learning
      • Supervised vs. unsupervised learning
      • Impact of supervised and unsupervised learning
    • Essential algorithms and model selection
      • Understanding the role of algorithms in machine learning
        • Finding the right model for your data
      • Balancing bias, variance, and accuracy in model selection
    • Training, testing, and evaluation'
      • Learning the ropes of the training process
      • Understanding training, testing, and holdout sets
      • Grading the machine: Understanding model evaluation metrics
        • Evaluating classification models
      • Evaluating regression models
    • Overfitting and underfitting
      • Striking the right balance: Overfitting and underfitting explained
        • Techniques to tackle overfitting and underfitting
    • Conclusion
    • Points to remember
    • Multiple choice questions
    • Answers
    • Questions
  • 8. Exploring Regression: Linear, Logistic, and Advanced Methods
    • Introduction
    • Structure
    • Objectives
    • Linear regression
      • What is linear regression
      • Understanding linear regression: Four fundamental assumptions'
      • Building a linear regression model: An overview
        • Coefficients, predictions, and model evaluation
      • A step-by-step guide to linear regression with Python’s scikit-learn
    • Logistic regression
      • Logistic regression: Deciphering binary decisions
        • The sigmoid function: An essential cog in logistic regression
      • Building a logistic regression model: An overview
      • Deciphering coefficients and model evaluation in logistic regression
        • Logistic regression analysis: A study of the Titanic dataset
    • Harnessing regularization: Techniques to rein in your model
      • Balancing variance, bias, and overfitting
        • Navigating the complexity maze: Unravelling regularization
        • Regularization rumble: Lasso, Ridge, And Elastic Net
      • Implementing regularization techniques in Python with Scikit-Learn
    • Conclusion
    • Points to remember
    • Multiple choice questions
    • Answers
    • Questions
  • 9. Unveiling Patterns with k-Nearest Neighbors and Naïve Bayes
    • Introduction
    • Structure
    • Objectives
    • Understanding the k-Nearest Neighbors algorithm
      • Unraveling the threads of k-Nearest Neighbors
        • Exploring distance metrics: Euclidean to Hamming
        • How do distance metrics affect the performance of KNN
        • Constructing the KNN model: A step-by-step approach with Python
    • Naïve Bayes classifier
      • Unraveling the simplicity and power of Naïve Bayes
      • Crafting a Naïve Bayes classifier from scratch with Python
      • Deciphering Naïve Bayes: Understanding outputs and performance evaluation
    • Hyperparameter tuning
      • What are hyperparameters
      • Why does hyperparameter tuning matter
      • Hyperparameter tuning: Grid and random search methods
      • Fine-tuning the k-Nearest Neighbors model
      • Fine-tuning the Naïve Bayes model
    • Conclusion
    • Points to remember
    • Multiple choice questions
    • Answers
    • Questions
  • 10. Exploring Tree-Based Models: Decision Trees to Gradient Boosting
    • Introduction
    • Structure
    • Objectives
    • Decision trees
      • Getting acquainted with decision trees
        • Constructing a decision tree
      • The twin branches: Classification and regression trees
    • Entropy and information gain
      • Diving into entropy: Unraveling chaos in decision trees
      • Demystifying information gain
        • Role of entropy and information gain in constructing a decision tree
    • Tree pruning and optimization
      • Pruning a decision tree
      • Hyperparameters in decision trees
      • Crafting and refining a decision tree
    • The power of ensemble methods in machine learning
      • Embarking on the ensemble journey
      • Understanding the bagging method
      • Unearthing the forest within data
      • Boosting power: The strengths and shortcomings of boosting
      • Boosting with a twist: Introducing gradient boosting
      • Picking the right ensemble method
    • Conclusion
    • Points to remember
    • Multiple choice questions
    • Answers
  • 11. Support Vector Machines: Simplifying Complexity
    • Introduction
    • Structure
    • Objectives
    • Introduction to support vector machines
      • Mastering the mechanics of support vector machines
      • Uniqueness of SVM in the machine learning ensemble
      • Numerical craft behind support vector machines
      • The art of drawing lines: Hyperplanes and support vectors
    • Understanding kernel methods
      • The power of kernel functions
      • Data transformation with kernel methods
      • Kernel functions: Linear, polynomial, and radial basis
      • Choosing the right kernel for your SVM
    • SVM for classification and regression roles
      • SVM in binary and multiclass scenarios
      • SVM in the world of regression
    • Real-world SVM: From preprocessing to evaluation
      • Handling imbalanced data in support vector machines
      • Perfecting your support vector machines
      • Impact of the C parameter and kernel coefficients on your SVM model
    • Balancing the bias-variance trade-off in SVM
    • Conclusion
    • Points to remember
    • Multiple choice questions
    • Answers
    • Questions
  • 12. Dimensionality Reduction: From PCA to Advanced Methods
    • Introduction
    • Structure
    • Objectives
    • Understanding the problem of high dimensionality
      • The curse of dimensionality
      • High-dimensionality at play: Encounters in the real world
      • Tackling high-dimensional data
    • Principal component analysis
      • Decoding principal component analysis
      • Understanding PCA: The role of eigenvalues and eigenvectors
      • PCA in action: A step-by-step guide
      • Tuning into the right number of dimensions in PCA
    • Visualizing high-dimensional data
      • High dimensional data: Visualization techniques and challenges
      • Real-world high-dimensional data visualization
    • Exploring beyond PCA: t-SNE and UMAP
      • t-SNE unveiled: Functionality and use cases
      • Unfolding the UMAP technique: Operation and best use scenarios
      • PCA, t-SNE, and UMAP: A comparative analysis
    • Conclusion
    • Points to remember
    • Multiple choice questions
    • Answers
    • Questions
  • 13. Unlocking Unsupervised Learning
    • Introduction
    • Structure
    • Objectives
    • K-means clustering
      • Exploring K-means: From principles to practice
      • The enigma of optimal K
      • Bringing K-means to life: A real-world clustering journey
    • Hierarchical clustering
      • Intricacies of hierarchical clustering
      • Hierarchical clustering: Exploring linkage criteria
    • Understanding DBSCAN: A comprehensive guide
      • Navigating the dendrogram: Hierarchical clustering in action
    • DBSCAN and other density-based methods
      • DBSCAN clustering: Unveiling its unique approach
      • Tuning DBSCAN
      • Putting DBSCAN into action
    • Cluster evaluation and validation
      • Importance of cluster validation
      • Cluster validation with internal indices
      • Cluster validation with external indices
      • Ensuring robust clusters with stability-based validation
      • Demonstrating cluster evaluation and validation
    • Conclusion
    • Points to remember
    • Multiple choice questions
    • Answers
    • Questions
  • 14. The Essence of Neural Networks and Deep Learning
    • Introduction
    • Structure
    • Objectives
    • Deep learning: Beyond conventional machine learning
    • Deep learning as artificial intelligence’s game changer
    • Data and processing power
      • Transformative applications of deep learning in the modern world
    • Introduction to deep learning libraries
      • Navigating TensorFlow, Keras, and PyTorch
      • The seamless integration of Keras and TensorFlow
        • Installing TensorFlow and PyTorch
    • The intricate web of artificial neural networks
      • Mimicking the human brain with artificial neurons
        • Layers of an artificial neural network
      • The art of learning in neural networks: Weights, biases, and beyond
      • Steering ANNs with loss functions, optimizers, and epochs
      • Exploring activation functions and backpropagation in ANNs
        • Activation functions: The spark that ignites neural networks
        • Exploring top activation functions in neural networks
      • Backpropagation and gradient descent in neural networks
    • Importance of data and feature engineering in deep learning
      • Unlocking deep learning’s potential with pristine data
        • Prepping data for the deep learning forge
    • Feature crafting versus self-learning
      • Managing overfitting and complexity in deep learning
        • The role of hyperparameters in deep learning
    • Overfitting: A deep learning perspective
      • Dodging the overfitting bullet in deep learning
    • Convolutional neural networks
      • The art and architecture of convolutional neural networks
      • Image data processing with convolutional neural networks
      • CNNs in action: Revolutionizing industries with visual intelligence
      • Implementing CNNs on MNIST with Keras
    • Recurrent neural networks
      • The power of recurrence: Unfolding the RNN architecture
        • The utility of recurrent neural networks in sequential data
      • RNNs: Tackling the hurdles of vanishing and exploding gradients
      • Putting RNNs to work: Real-world applications
      • Deciphering sentiments: Implementing a basic RNN with Keras
    • Long short-term memory networks
      • Diving deep into LSTM networks
        • Cracking the long-term dependency problem with LSTM
      • LSTM gates: The secret sauce of long memory
      • Where LSTMs shine: A glimpse of practical applications
        • Sentiment analysis on IMDB movie reviews with LSTM
    • Conclusion
    • Points to remember
    • Multiple choice questions
    • Answers
    • Questions
  • 15. Word Play: Text Analytics and Natural Language Processing
    • Introduction
    • Structure
    • Objectives
    • Text processing and tokenization
      • The intricacies of textual data in natural language processing
        • Refining the raw: Text preprocessing essentials
      • Chopping blocks of text: The art of tokenization
      • Pruning words to their roots: Unraveling stemming and lemmatization
      • Assigning roles to words: Unveiling parts-of-speech tagging
        • Text cleaning and tokenization using natural language toolkit and spaC y in Python
    • The transformation journey: From text to features
      • Bag-of-words: Turning words into numbers
        • Weighing words with TF-IDF: Balancing frequency and importance
        • Embedding semantics with Word2Vec and GloVe
        • ELMo and BERT: The rise of context in word embeddings
      • Navigating text data: Bag of words, TF-IDF, and Word2Vec
    • Decoding emotions: Sentiment analysis and text classification
      • Navigating the sea of opinions with sentiment analysis
      • Mastering text classification
      • Bringing sentiment analysis and text classification to life with Python
    • Topic modeling and entity recognition
      • Introduction to topic modeling
      • Unearthing context with named entity recognition
      • Cracking topics and entities: A Python code walkthrough
    • Conclusion
    • Points to remember
    • Multiple choice questions
    • Answers
    • Questions
  • 16. Crafting Recommender Systems
    • Introduction
    • Structure
    • Objectives
    • Introduction to collaborative filtering
    • User-based collaborative filtering
    • Decoding item-based collaborative filtering
    • Measuring similarities in recommender systems
    • Sparsity and scalability in collaborative filtering
    • Building your first collaborative filtering systems in Python
      • User-based collaborative filtering
        • Item-based collaborative filtering
    • Personalized proposals: Understanding content-based filtering
      • The harmony of user and item profiles
        • Understanding feature extraction and selection
      • The pros and cons of content-based filtering
      • Breaking the filter bubble and enriching content analysis
    • Building content based recommendations in Python
    • Matrix factorization and SVD in recommender system
      • Introduction to matrix factorization
      • Singular value decomposition
        • Breaking down the user-item matrix into latent factors
        • Pros and cons of matrix factorization and SVD
        • Tackling sparsity with matrix factorization
      • Cracking latent factors: TruncatedSVD in action with Python
    • Synergy in recommendation: Hybrid systems
      • Understanding hybrid recommender approaches
      • Overcoming limitations for superior recommendations
      • Hybrid recommender systems in action
    • Crafting a hybrid recommender with Python: Step-by-step guide
    • Conclusion
    • Points to remember
    • Multiple choice questions
    • Answers
    • Questions
  • 17. Data Storage Mastery: Databases and Efficient Data Management
    • Introduction
    • Structure
    • Objectives
    • Exploring database types: Relational and NoSQL databases
      • Data housekeepers: The role of databases in data science
      • SQL and NoSQL: Two sides of the database coin
      • Breaking down relational databases: Tables, rows, columns and keys
    • Diversifying your data storage: NoSQL databases
      • Choosing between SQL and NoSQL
      • Database showdown: An overview of popular choices
    • Python meets SQL: Mastering database interaction
      • Exploring SQL: Definition, maipulation, and control
      • Unleashing SQL’s potential: Joins, subqueries, indexes, and stored procedures
    • Navigating databses in Python: SQLAlchemy, SQLite3, PyMango
      • Talking to databases with Python: A hands-on guide
      • The language of data: CSV, JSON, XML, Parquet, and Excel
      • Weighing the options: Advantages and drawbacks of different data formats
    • Python data format handling: CSV, JSON, XML, Parquet, Excel
    • Unpacking serialization: Moving and storing data efficiently
      • Journey through serialization formats: Pickle, JSON, MessagePack
    • Data warehouses and data lakes: A comprehensive guide
      • Exploring Google BigQuery and Amazon Redshift
      • Hadoop: The cornerstone of data lakes and big data management
    • Conclusion
    • Points to remember
    • Multiple choice questions
    • Answers
    • Questions
  • 18. Data Science in Action: A Comprehensive End-to-end Project
    • Introduction
    • Structure
    • Objectives
    • Defining a data science problem
      • Understanding the business context
      • Formulating the problem statement
        • Identifying key stakeholders and understanding their expectations
        • Establishing success metrics
    • Data collection and preparation
      • Dataset attribution
      • From source to solution: The journey of data collection
      • Polishing the mirror: The art of data cleaning
        • Handling missing values
        • Data type mismatch
        • Logical consistency
        • Duplicates
      • Unearthing data treasures: The power of exploration
        • Statistical summaries
        • Data visualizations
      • Sculpting data: The craft of feature engineering
        • Creating new features
        • Encoding categorical variables
        • Partitioning data: Carving out training, validation, and test sets
    • From selection to evaluation: Charting the model’s journey
      • Hotel booking analysis: Choosing the right classifier
      • Assessing predictions: The hotelier’s guide to model metrics
      • Exploring the hotel bookings landscape with four models
        • Hyperparameter tuning
    • Communication of results
      • Crafting understandable narratives for all stakeholders
      • Translating findings into actionable steps
    • Deployment, monitoring and maintenance of a model
      • Exploring model deployment platforms
      • Crafting application programming interfaces for seamless access
      • Embracing model versioning and rollback
      • Detecting drifts and setting retraining rhythms
      • Ensuring the model’s longevity and relevance
    • Conclusion
    • Points to remember
  • Index
User Reviews
Rating