JNTUH CSE-AIML INTRODUCTION TO DATA SCIENCE SYLLABUS

Course Objectives:
1. Learn concepts, techniques and tools they need to deal with various facets of data science
practice, including data collection and integration
2. Understand the basic types of data and basic statistics
3. Identify the importance of data reduction and data visualization techniques
Course Outcomes: After completion of the course, the student should be able to
1. Understand basic terms what Statistical Inference means.
2. Identify probability distributions commonly used as foundations for statistical modelling. Fit a
model to data
3. describe the data using various statistical measures
4. utilize R elements for data handling
5. perform data reduction and apply visualization techniques.
UNIT – I
Introduction: Definition of Data Science- Big Data and Data Science hype – and getting past the hype – Datafication – Current landscape of perspectives – Statistical Inference – Populations and samples -Statistical modeling, probability distributions, fitting a model – Over fitting. Basics of R: Introduction, REnvironment Setup, Programming with R, Basic Data Types.
UNIT – II
Data Types & Statistical Description Types of Data: Attributes and Measurement, What is an Attribute? The Type of an Attribute, The Different Types of Attributes, Describing Attributes by the Number of Values, Asymmetric Attributes, Binary Attribute, Nominal Attributes, Ordinal Attributes, Numeric Attributes, Discrete versus Continuous
Attributes. Basic Statistical Descriptions of Data: Measuring the Central Tendency: Mean, Median, and Mode, Measuring the Dispersion of Data: Range, Quartiles, Variance, Standard Deviation, and Interquartile Range, Graphic Displays of Basic Statistical Descriptions of Data.
UNIT – III
Vectors: Creating and Naming Vectors, Vector Arithmetic, Vector sub setting, Matrices: Creating and Naming Matrices, Matrix Sub setting, Arrays, Class. Factors and Data Frames: Introduction to Factors: Factor Levels, Summarizing a Factor, Ordered Factors, Comparing Ordered Factors, Introduction to Data Frame, subsetting of Data Frames, Extending Data Frames, Sorting Data Frames. Lists: Introduction, creating a List: Creating a Named List, Accessing List Elements, Manipulating List Elements, Merging Lists, Converting Lists to Vectors
UNIT – IV
Conditionals and Control Flow: Relational Operators, Relational Operators and Vectors, Logical  Operators, Logical Operators and Vectors, Conditional Statements. Iterative Programming in R: Introduction, While Loop, For Loop, Looping Over List. Functions in R: Introduction, writing a Function in R, Nested Functions, Function Scoping, Recursion, Loading an R Package, Mathematical Functions in R.
UNIT – V
Data Reduction: Overview of Data Reduction Strategies, Wavelet Transforms, Principal Components Analysis, Attribute Subset Selection, Regression and Log-Linear Models: Parametric Data Reduction,Histograms, Clustering, Sampling, Data Cube Aggregation. Data Visualization: Pixel-OrientedVisualization Techniques, Geometric Projection Visualization Techniques, Icon-Based Visualization Techniques, Hierarchical Visualization Techniques, Visualizing Complex Data and Relations.
TEXT BOOKS:
1. Doing Data Science, Straight Talk from The Frontline. Cathy O’Neil and Rachel Schutt, O’Reilly, 2014
2. Jiawei Han, Micheline Kamber and Jian Pei. Data Mining: Concepts and Techniques, 3rd ed.
The Morgan Kaufmann Series in Data Management Systems.
3. K G Srinivas, G M Siddesh, “Statistical programming in R”, Oxford Publications.
REFERENCE BOOKS:
1. Introduction to Data Mining, Pang-Ning Tan, Vipin Kumar, Michael Steinbanch, Pearson Education.
2. Brain S. Everitt, “A Handbook of Statistical Analysis Using R”, Second Edition, 4 LLC, 2014.
3. Dalgaard, Peter, “Introductory statistics with R”, Springer Science & Business Media, 2008.
4. Paul Teetor, “R Cookbook”, O’Reilly, 2011

CSE-AIML

SEMESTER SUBJECT CODE SUBJECT Lession Plan Lecturer Notes & Question Bank SYLLABUS
II-I CS304PC Computer Organization and Architecture
III-I Information Retrieval Systems(PE2)