Foundation of Data Science

CS3352 3rd Semester CSE Dept | 2021 Regulation

Home | All Courses | CSE Department | Subject: Foundation of Data Science

2021 regulation - 2nd year, 3rd semester paper for CSE Department (Computer Science Engineering Department). Subject Code: CS3352, Subject Name: Foundation of Data Science, Batch: 2021, 2022, 2023, 2024. Institute: Anna University Affiliated Engineering College, TamilNadu. This page has Foundation of Data Science study material, notes, semester question paper pdf download, important questions, lecture notes.

PDF Download Links

Foundation of Data Science

PDF Download Links

Foundation of Data Science

PDF Download Links


CS3352

FOUNDATIONS OF DATA SCIENCE


COURSE OBJECTIVES:

• To understand the data science fundamentals and process.

• To learn to describe the data for the data science process.

• To learn to describe the relationship between data.

• To utilize the Python libraries for Data Wrangling.

• To present and interpret data using visualization libraries in Python

UNIT I INTRODUCTION

Data Science: Benefits and uses – facets of data - Data Science Process: Overview – Defining  research goals – Retrieving data – Data preparation - Exploratory Data analysis – build the model– presenting findings and building applications - Data Mining - Data Warehousing – Basic Statistical  descriptions of Data

UNIT II DESCRIBING DATA

Types of Data - Types of Variables -Describing Data with Tables and Graphs –Describing Data  with Averages - Describing Variability - Normal Distributions and Standard (z) Scores

UNIT III DESCRIBING RELATIONSHIPS

Correlation –Scatter plots –correlation coefficient for quantitative data –computational formula for  correlation coefficient – Regression –regression line –least squares regression line – Standard  error of estimate – interpretation of r2 –multiple regression equations –regression towards the mean

UNIT IV PYTHON LIBRARIES FOR DATA WRANGLING

Basics of Numpy arrays –aggregations –computations on arrays –comparisons, masks, boolean  logic – fancy indexing – structured arrays – Data manipulation with Pandas – data indexing and  selection – operating on data – missing data – Hierarchical indexing – combining datasets – aggregation and grouping – pivot tables

UNIT V DATA VISUALIZATION

Importing Matplotlib – Line plots – Scatter plots – visualizing errors – density and contour plots – Histograms – legends – colors – subplots – text and annotation – customization – three dimensional  plotting - Geographic Data with Basemap - Visualization with Seaborn.

 

COURSE OUTCOMES:

At the end of this course, the students will be able to:

CO1: Define the data science process

CO2: Understand different types of data description for data science process

CO3: Gain knowledge on relationships between data

CO4: Use the Python Libraries for Data Wrangling

CO5: Apply visualization Libraries in Python to interpret and explore data

 

TEXT BOOKS

1. David Cielen, Arno D. B. Meysman, and Mohamed Ali, “Introducing Data Science”, Manning  Publications, 2016. (Unit I)

2. Robert S. Witte and John S. Witte, “Statistics”, Eleventh Edition, Wiley Publications, 2017.  (Units II and III)

3. Jake VanderPlas, “Python Data Science Handbook”, O’Reilly, 2016. (Units IV and V)

 

REFERENCES:

1. Allen B. Downey, “Think Stats: Exploratory Data Analysis in Python”, Green Tea Press,2014.

 

Foundation of Data Science: Unit I: Introduction,, Foundation of Data Science: Unit II: Describing Data,, Foundation of Data Science: Unit III: Describing Relationships,, Foundation of Data Science: Unit IV: Python Libraries for Data Wrangling,, Foundation of Data Science: Unit V: Data Visualization 3rd Semester CSE Dept 2021 Regulation : CS3352 3rd Semester CSE Dept | 2021 Regulation Foundation of Data Science

Home | All Courses | CSE Department | Subject: Foundation of Data Science