Changing Index in Pandas explained with examples

Read Time:9 Minute, 9 Second

Table of Contents

In a Pandas DataFrame, a row is uniquely identified by its Index. It is merely a label for a row. The default values, or numbers ranging from 0 to n-1, will be used if we don’t specify index values when creating the DataFrame, where n is the number of rows.

The DataFrame Index can be set using pre-existing columns by using the set_index() function. Use one or more pre-existing columns or arrays of the right length to set the DataFrame Index (row labels). The Index may supplement or replace the current Index. To modify the indices of rows in the DataFrame that we will construct or that has already been produced by default, we will use the set_index() function.

This article will cover a DataFrame’s indexes and how to add additional indexes to an existing DataFrame. We will also endeavor to see how the set_index() function allows us to adjust the integer index that the Python constructor creates by default for each row. At the core of its functionality is understanding the set_index() function’s syntax and how we can use it to set the row index of a DataFrame in Pandas using lists, series, and columns.

How to modify the Index in a Column in Pandas

Using the Pandas set_index method, we may convert one of the columns in the DataFrame into the Index. Let’s examine the syntax of the set_index() approach to understand better how it functions.

The syntax(dataframe.set_index) is as follows:

SUGGESTED READ

DataFrame.set_index(self, keys, drop=True, append=False, inplace=False, verify_integrity=False)

Explanation of the parameters mentioned in the syntax above:

keys

This option can either be a single column key, an array the same size as the calling DataFrame, or a list with any possible combination. Series, Index, np.ndarray, and Iterator instances are all included in this definition of “array.”

The default value is a label or array-like or list of labels/arrays. Also, note that this parameter is required.

bool

It is a boolean value whose default value is True. Essentially, it requires columns that will be the new Index should be deleted. Additionally, it is a required parameter.

append

The default value for this parameter is False though it is a required parameter. If necessary, note its correspondence to adding columns to an existing index.

SUGGESTED READ

inplace

It is a required parameter with a default value of False. It implies updating the existing DataFrame (do not create a new object).

verify_integrity

By default, it is False though it is a required parameter. It checks for duplicates in the new Index.
If not, postpone the check until it is required. This method will operate more effectively if the value is set to False.

Make a dataframe first

# import necessary packages
import pandas as pd # let's create a new dataframe
Employee = pd.DataFrame({'reg_id': ['EMP001', 'EMP002', 'EMP003', 'EMP004', 'EMP005'], 'emp_id': ['22EMP1', '22EMP2', '22EMP3', '22EMP4', '22EMP5'], 'emp_name': ['Green', 'Bright', 'Mike', 'Joy', 'Ann'], 'height': [5.9, 6.2, 5.6, 5.8, 5.10]})
# display dataframe
Employee

The set_index() method

The set_index method, which is present in Pandas and allows defining the indexes, is required to update the index values.

The syntax is as follows:

SUGGESTED READ

DataFrameName.set_index("column_name_to_setas_Index",inplace=True/False)

where,

The inplace parameter, which determines whether an index change is permanent or transient, supports True or False values. True means the change is long-lasting. On the other hand, False means the change is only temporary. By setting the inplace option to false (or not at all), we can temporarily change the Index. By default, the inplace value is false.

# import necessary packages
import pandas as pd # let's create a new dataframe
Employee = pd.DataFrame({'reg_id': ['EMP001', 'EMP002', 'EMP003', 'EMP004', 'EMP005'], 'emp_id': ['22EMP1', '22EMP2', '22EMP3', '22EMP4', '22EMP5'], 'emp_name': ['Green', 'Bright', 'Mike', 'Joy', 'Ann'], 'height': [5.9, 6.2, 5.6, 5.8, 5.10]}) # temporarily putting the registration id as the Index
Employee.set_index("reg_id")

However, as the action was only temporary, it was not stored when the data was displayed in a DataFrame. Thus, if you show the DataFrame by running the following command, it still appears as before.

print(Employee)

As we didn’t specify the inplace parameter in the set_index method, it is considered false and a temporary operation by default. Now we can attempt the same by changing the Index permanently by specifying inplace=True in the set_index method.

# import necessary packages
import pandas as pd # let's create a new dataframe
Employee = pd.DataFrame({'reg_id': ['EMP001', 'EMP002', 'EMP003', 'EMP004', 'EMP005'], 'emp_id': ['22EMP1', '22EMP2', '22EMP3', '22EMP4', '22EMP5'], 'emp_name': ['Green', 'Bright', 'Mike', 'Joy', 'Ann'], 'height': [5.9, 6.2, 5.6, 5.8, 5.10]}) # permanently putting the registration id as the Index
Employee.set_index("reg_id",inplace=True) print(Employee)

Follow the code below if you want to obtain specific columns instead of all of them selectively.

SUGGESTED READ

# import necessary packages
import pandas as pd # let's create a new dataframe
Employee = pd.DataFrame({'reg_id': ['EMP001', 'EMP002', 'EMP003', 'EMP004', 'EMP005'], 'emp_id': ['22EMP1', '22EMP2', '22EMP3', '22EMP4', '22EMP5'], 'emp_name': ['Green', 'Bright', 'Mike', 'Joy', 'Ann'], 'height': [5.9, 6.2, 5.6, 5.8, 5.10]}) # permanently putting the registration id as the Index
Employee.set_index("reg_id",inplace=True) # displaying the necessary columns in a dataframe.
Employee[["emp_name", "height"]]

How to reset Index: reset_index ()

Let’s say we wish to undo all we’ve done so far and stop using the Index provided by some particular columns in our dataframe. As demonstrated in the following examples, we can use the reset_index() command in this situation:

#If the dataframe does not have the column that serves as an index
df. reset_index()
#if the dataframe contains the column that serves as an index
df. reset_index ( drop = True )

In conclusion, the reset_index () command repositions the dataframe’s index column. If it is already there, you must supply the “True” option for the reset_index() command’s “drop” parameter.

Example: How to change the month attribute as the Index

import numpy as np
import pandas as pd month_df = pd.DataFrame({'month': [4, 7, 10, 12], 'year': [2019, 2021, 2020, 2021], 'no_of_sales': [73, 58, 103, 49]})
print(month_df)

The intention is to set the Index as the ‘month’ column from the DataFrame above.

month_df.set_index('month')

Alternatively, you can use the columns “year” and “month” to create a MultiIndex by running the following commands:

month_df.set_index([pd.Index([2, 3, 4, 5]), 'year'])

The other option is using two Series to create a MultiIndex:

SUGGESTED READ

n_series = month_df.Series([2, 3, 4, 5])
df.set_index([n_series, n_series**2])

Example: Using Python Range, set the DataFrame’s Index

Let’s imagine that for the DataFrame to start at any number, and we need to define a set of numbers as the Index. For instance, we want the employee DataFrame’s ID number to begin at 1. The DataFrame cannot be utilized. Using a list of all the numbers as input, use the set index() function. In this case, the Python range() function is appropriate. We can generate a Pandas index that we can then send to the DataFrame.set_index() using the range() function. Let’s establish a DataFrame to use the range() function to change the row index.

import pandas as pd emp_df = pd.DataFrame({ "f_name":["Tom","White","Mike","Green","Tyson"], "position":[3,5,6,2,4], "commision":[1500,1300,1300,1699,1400] "net_pay":[3200,3600,3000,3200,3300]
}) print(emp_df)

We used the columns “f_name,” “position,” “commission,” and “net_pay” when we established our DataFrame. Let’s replace the integer index’s default value with one set using the range() method. The range() method produces a set of numbers that, by default, begins at 0, grows by 1, and terminates just before a given number.

index_val =pd.Index(range(1,6,1)) emp_df = emp_df.set_index(index_val)
print(emp_df)

We defined the index range as beginning at 1, increasing by 1, and ending before 6. After determining the index range, we used the set_index() function to set the row index of our DataFrame by using the “index” variable as an input.

Example: Using Multiple Columns to Set the DataFrame’s Index

Multi-index DataFrames in Python Pandas is defined as having more than one row or column as an index. We can designate several columns as row labels by using the DataFrame.set Index () function. It should be clear that adding additional indexes complicates our DataFrame.
There are various methods to structure the Index. We’ll demonstrate a straightforward process for setting many columns as an index. Let’s start by making a DataFrame.

import pandas as pd emp_df = pd.DataFrame({ "id":["emp_1","emp_2","emp_3","emp_4","emp_5"], "name":["Mikeson","Jonathan","White","Bright","Nathan"], "department":["operations","human resource","sales","marketing","information technology"] "dep_code":["OP","HR","S","M","IT"]
})

Our DataFrame consists of four columns – “id”, “name”, “department”, and “dep_code”.

SUGGESTED READ

To view this in an organized manner, run the following command.

print(emp_df)

We select the columns that should serve as the DataFrame’s indexes based on these columns. After selecting the appropriate columns, we pass a list with two labels inside the set_index() function.

emp_df =emp_df .set_index(['id','dep_code'])
print(emp_df)

The DataFrame’s row indexes are assigned to the columns “id” and “dep_code.” We assigned these columns as the indexes by utilizing the names of the columns inside the list and providing them to set_index().Set index accepts the list [“id,” “dep_code”] as a parameter(). As you will find out in the output, the name and department columns are the new indexes.

Conclusion

Pandas automatically provide a column as an “index” when we construct a dataframe or import a dataset. We’ve seen how to set the Pandas DataFrame’s Index using either a list of labels or the columns already in this article. Further, we’ve discussed every scenario in which new row labels must be assigned or current ones modified.

A DataFrame is the name of the tabular structure in the Pandas package. Labels are used to represent each row and column. A column label is a column index or header, whereas an index is a row label. When creating a DataFrame, Python Pandas, by default, designate a range of numbers (starting at 0) as an index for rows. A row index is used to identify each row specifically.

SUGGESTED READ

Source: https://www.codeunderscored.com/changing-index-in-pandas-explained-with-examples/

CyberSEO Pro - OpenAI GPT-3 autoblogging and content curation plugin for WordPress

Tag Cloud

Java Java Logical Programs OTP Generation in Java python Recursion youtube video ASCII Upper and Lower Case blockchain javascript graph learn to code software development Successful Software Engineers breadth first search Java Array Programs Java Programs Uncategorized android ios programming kotlin web-development django data sql cybersecurity database swiftui serverless aws swift rust react background-position gradients loader mask grid nth-child pseudo elements indieweb WordPress Print Array without brackets C++ factorial Java String Programs Final Keyword Static Variable Axie Infinity Cryptokitties NFT games tool inserting MISC Tips Codes python code python projects python3 system info python project Bigginers How to Do Integrations Payment Gateways PHP checkout page in php Implement stripe payment gateway in Step by step in PHP integrate stripe gatway in php mysql payment gateway integration in php step by step payment gateway integration in php step by step with source code payment gateway integration in website PHP Integrate Stripe Payment Gateway Tutorial PHP shopping cart checkout code shopping cart in php stripe php checkout PHP/MySQL/JSON best international payment gateway does google pay accept international payments how to accept international payments in india paytm payment gateway razorpay codeigniter github razorpay custom checkout github razorpay get payment details razorpay integration in codeigniter github razorpay international payments Razorpay payment gateway integration in CodeIgniter razorpay payment gateway integration in php code Razorpay payment gateway integration with PHP and CodeIgniter Razorpay payment gateway setup in CodeIgniter Library & Frameworks Tips & Tricks UI/UX & Front-end coding birds online html code for google sign in login with google account in PHP login with google account using javascript login with google account using javascript codeigniter login with google account using php login with google account using php source code
Converting Column with float values to Integer values in Pandas Previous post Converting Column with float values to Integer values in Pandas
9 Common JavaScript Interview Questions You Might Not Know The Answer For. Next post 9 Common JavaScript Interview Questions You Might Not Know The Answer For.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.