What is a Pandas DataFrame?
A Pandas DataFrame is like a table in a spreadsheet or SQL database. It consists of rows and columns, where:
- Rows are observations or data points.
- Columns are variables or attributes associated with those observations.
DataFrames are flexible and can hold data of different types (integers, floats, strings, etc.), making them suitable for data manipulation and analysis.
$ads={1}
Creating a Pandas DataFrame
You can create a Pandas DataFrame in several ways:
- From a dictionary
- From a list of lists or arrays
- From a CSV file
Create Pandas DataFrame From a Python Dictionary:
import pandas as pd
# create Pandas DataFrame From a Python Dictionary:
data = {
'Name': ['Alice', 'Bob', 'Charlie'],
'Age': [25, 30, 35],
'City': ['New York', 'Los Angeles', 'Chicago']
}
df = pd.DataFrame(data)
print(df)
Output:
Name Age City
0 chandra 25 chennai
1 kumar 30 vellore
2 dailyaspirants.com 5 Mumbai
This dictionary, data, contains three keys: 'Name', 'Age', and 'City'. Each key has a list of values representing data for each person.
The pd.DataFrame(data) function converts the dictionary into a Pandas DataFrame. Each key becomes a column, and each list item becomes a row.
Read Also:
- Become a Data Analyst at Google
- Mastering Data Analysis skill and tips
- 16 Best Free data analyst Course with Certificate
- Benefits of Data Science in India Career and Scope
Accessing Data in a DataFrame
Accessing data in a DataFrame is straightforward:
- Use
df['column_name']
to access a column. - Use
df.iloc[row_index]
ordf.loc[row_label]
to access rows.
import pandas as pd
# create Pandas DataFrame From a Python Dictionary:
data = {
'Name': ['chandra', 'kumar', 'dailyaspirants.com'],
'Age': [25, 30, 5],
'City': ['chennai', 'vellore', 'Mumbai']
}
df = pd.DataFrame(data)
# Accessing the 'Name' column
names = df['Name']
print(names)
# Accessing the first row
first_row = df.iloc[0]
print(first_row)
Output:
0 chandra
1 kumar
2 dailyaspirants.com
Name: Name, dtype: object
Name chandra
Age 25
City chennai
Name: 0, dtype: object
Modifying a DataFrame
You can easily modify the content of a DataFrame:
- Adding a column:
df['New_Column'] = [values]
- Dropping a column:
df.drop('column_name', axis=1, inplace=True)
- Updating values:
df.at[row_index, 'column_name'] = new_value
Output:
Name City Salary
0 chandra chennai 50000
1 kumar San Francisco 60000
2 dailyaspirants.com Mumbai 70000
Pandas DataFrame From a Python List
Using Python to create a DataFrame using a two-dimensional list. For example:
import pandas as pd
# create Pandas DataFrame From a Python Dictionary:
data = {
'Name': ['chandra', 'kumar', 'dailyaspirants.com'],
'Age': [25, 30, 5],
'City': ['chennai', 'vellore', 'Mumbai']
}
# create a DataFrame from the list
df = pd.DataFrame(data, columns=['Name', 'Age', 'City'])
print(df)
Filtering Data in a DataFrame
Filtering data is one of the most useful operations when analyzing data. You can filter data based on specific conditions using Pandas.
import pandas as pd
# create Pandas DataFrame From a Python Dictionary:
data = {
'Name': ['chandra', 'kumar', 'dailyaspirants.com'],
'Age': [25, 30, 5],
'City': ['chennai', 'vellore', 'Mumbai']
}
df = pd.DataFrame(data)
# Adding a new column
df['Salary'] = [50000, 60000, 70000]
# Updating a value
df.at[1, 'City'] = 'San Francisco'
# Dropping the 'Age' column
df.drop('Age', axis=1, inplace=True)
# Filtering rows where Salary is greater than 55000
high_salary = df[df['Salary'] > 55000]
print(high_salary)
Output:
Name City Salary
1 kumar San Francisco 60000
2 dailyaspirants.com Mumbai 70000
Importing and Exporting Data with DataFrames
Pandas makes it simple to import and export data in various formats, such as CSV, Excel, JSON, and more.
- Reading a CSV file:
pd.read_csv('file.csv')
- Writing to a CSV file:
df.to_csv('file.csv', index=False)
# Reading data from a CSV file
df = pd.read_csv('data.csv')
# Writing the DataFrame to a new CSV file
df.to_csv('new_data.csv', index=False)