Python Data Types

Operations

Buit in Data Structure of Python

1. List

List is the most used data structure in python. It is used to store element of ANY type ! Can be accessed with indexs. Also can be mutable

2. Tuple

3. Set

Please take a look at: https://github.com/sadat1971/Conducting-Workshop-DATA-SCIENCE/blob/master/lecture_tutorials/03_python_data_science.ipynb

4. Dictionary:

*Smartest of all the data structures and probably the most important one*

Logics

Loops

Functions

Other important concepts:

Take a look: https://github.com/sadat1971/Conducting-Workshop-DATA-SCIENCE/blob/master/lecture_tutorials/05_python_data_science.ipynb

Other important concepts

  1. How exceptions are handled in python

  2. File Handling in Python

  3. Object Oriented Programming in Python

Numpy: The heart of scientific computing¶

  1. Numpy is faster

a. Uses less memory see more : https://stackoverflow.com/questions/51240086/how-does-python-numpy-save-memory-compared-to-a-list

b. No need to type-check while iterating

c. Numpy has contiguous memory blocks, meaning, unlike lists, the memory blocks are next to each other

  1. NumPy has many more functionalities including that of in Lists

You can find more important and interesting functionalities of Numpy here

Pandas Dataframe Tutorial

Pandas facts:

  1. Data analysis library to analyze different types and forms of data
  2. Great performance as built on top of NumPy

How to install

!pip install pandas

For more: https://pandas.pydata.org/pandas-docs/stable/getting_started/install.html

How to generate pandas dataframe

  1. Way 2: Throught the dictionary
  2. Way 3: List of dicts
  3. Way 4: tuples
  4. way 5: Pandas Series

You can find these ways here

Most important:

You can import datasets that are in csv or tsv format from your local machine and have a nice frame-like representation of the data

df = pd.read_csv("Documents\\Notebooks\\lecture_tutorials\\HR-data.csv") #For windows
df = pd.read_csv("Documents/Notebooks/lecture_tutorials/HR-data.csv") #For Linux (and probably mac)

Download any of the dataset from here: https://perso.telecom-paristech.fr/eagan/class/igr204/datasets

You can perform various conditions and loops on the table

For more interesting application and features of the dataframe you can take a look at:

  1. pandas1

  2. pandas2

  3. pandas3

Data Visualization

Best libraries for visualization are:

  1. Seaborn
  2. matplotlib

More on this:

  1. https://seaborn.pydata.org/tutorial.html
  2. https://matplotlib.org/stable/tutorials/index.html

Machine Learning and Prediction

Popular algorithms:

  1. scikit-learn
  2. Tensorflow, Keras, Pytorch