Quick Pandas function and Attributes
Pandas is one of the powerful library used in python for data science and analysis. It has n-number of functions, methods and attributes, which are comparatively easy in syntax and flexible in nature. So a data scientist or any one who wants certain insights from any huge set of data prefers it and let their work done in minutes.
Here we will see certain popular and important functions of pandas which can make our data analysis job quite interesting and easy.
Let’s Define a random dataset to work upon: -12345678910import numpy as npimport pandas as pdv_1 = np.random.randint(30,60, size=10)v_2 = np.random.randint(40,70, size=10)years = np.arange(2010,2020)Teams = [‘X’,’X’,’Y’,’X’,’Y’,’Y’,’Z’,’X’,’Z’,’Z’]df = pd.DataFrame({‘Teams’:Teams, ‘year’:years, ‘V_1’:v_1, ‘V_2’:v_2})df
Without Data you are just another person with an opinion.W. Edwards Deming
1. Query Function: –Used to filter your data frame by comparing certain conditions.
We can filter our data frame by comparing any column entity using comparison operator as well. i.e. suppose I need to filter all value from team X then: –
2. Insert Function: – It is used to add a new column into the existing Dataframe. In the following example .insert() will add a new column at index 2.
4. groupby function: – We use groupby() function to group any repetitive or categorical value and calculate the total output. For example if we need to calculate the total v_1 and v_v2 value for categories X, Y and Z then we can use groupby on column “Teams”.
5. cumsum function: – It is a pandas function to generate the cumulative output after each instance. For example after first instance if value of V_1 for X is k1 and for second time it is k2. Then cumsum can return k1+k2 after second instance and so on.
6. Where function: – It will be used to filter certain information if a defined condition becomes true and replace instances with some defined value if condition becomes false. For example in the following case all V_3 instances greater then 0 remains as it is and values less then 0 turns 0.
7. Sample function: – It is a function used to pick some random values or samples from a given set of data. We can even define the fraction of data needs to be picked.
8. isin function: – It is a pandas function used to filter information by comparing multiple conditions in same column. In other words it replaces or operations to compare multiple conditions in a column.
9. rank function: – It is used to assign ranking to each sample based value of certain column in the dataset. For example: –
10. pct_change function: – It is known as percentage change function. We apply it on a column when percentage of change needs to be monitor after each instance. For example: –
11. loc function: – It is a function use to define selection or slice some part of data from existing data frame. Here we define initial and last location with respect to rows and columns for a data frame.
12. iloc function: – It is similar to loc function but we define initial and initial + no. of rows or columns needed to be sliced. For example: –
13. unique and nunique functions: – unique function returns all the unique or categorical values in a data series of a data frame and nunique returns the number of categorical or unique values. For example: –
14. dtypes: – it is attribute returns data type of value present in each column. For example: –
15. astype function: – Its a function used to change or convert the existing data type of a particular column in a data frame. For example to change the data type of V_2 from integer to float we can use it as given below.
16. memory_usage() function: – It returns the memory used by a data frame in our program. For example: –
17. describe() functions: – It returns the statistics of data such as mean, maximum, minimum, count, standard deviation etc. for a mathematical set of data in one run. For example: –
18. select_dtypes: – It is used to filter set data values from the existing data frame on the basis of data types. for example if I need to include all data values of type integer then it can be done as mentioned below: –
19. replace function: – It is a python function used to replace any existing value in a series. For example: –
20. read_csv() function: – It is a function to access or read a csv or a data file for analysis related work. Location of data file need to be passed as a parameter for the function.
21. contains function: – It’s a function to extract some data values based upon string comparison in a data series. For example: –
22. isnull() function: – It is used to extract information about null values in a data frame. in the following example it shows there are 1246 null points are there in choice_description column.
23. fillna() function: – It is a function to fill the null value points with some string or number. In the following example we will fill the null value points in choice description column with ‘abc’.
These are some quite popular and important pandas functions which helps us a lot in data analysis job. These are not all but good to start. Please do let me know your thoughts. Thanks..