Pandas Missing Values Handling – 30 Exercises

 #  Pandas Missing Values Handling – 30 Exercises


---

### Dataset 1: Retail Sales (CSV with Date, Product, Sales, Discount)

1. Load dataset and count total missing values in each column.
 df.isnull().sum()

2. Find rows where Sales is missing.
 df[df['Sales'].isna()]

3. Drop all rows where Product is missing.
 df.dropna(subset=['Product'])

4. Replace missing Discount values with 0.
 df['Discount'].fillna(0)

5. Fill missing Sales values with the mean of Sales column.
 df['Sales'].fillna(df['Sales'].mean())

6. Forward-fill (ffill) missing Sales based on previous day’s value.
 df['Sales'].fillna(method='ffill')

7. Backward-fill (bfill) missing Sales.
 df['Sales'].fillna(method='bfill')

8. Replace missing Product names with "Unknown Product".
 df['Product'].fillna("Unknown Product")

9. Interpolate missing Sales values (linear interpolation).
 df['Sales'].interpolate()

10. Drop rows where both Sales and Discount are missing.
 df.dropna(subset=['Sales','Discount'], how='all')

---

### Dataset 2: Healthcare Patient Records (PatientID, Age, Weight, BloodPressure)

11. Count missing values per patient.
 df.isna().sum(axis=1)

12. Remove patients with more than 2 missing fields.
 df.dropna(thresh=df.shape[1]-2)

13. Fill missing Age values with the median age.
 df['Age'].fillna(df['Age'].median())

14. Fill missing Weight with the most frequent value (mode).
 df['Weight'].fillna(df['Weight'].mode()[0])

15. Interpolate missing BloodPressure using time-based interpolation.
 df['BloodPressure'].interpolate(method='time')

---

### Dataset 3: Finance Transactions (Date, CustomerID, Amount, PaymentMethod)

16. Find all rows where PaymentMethod is missing. Replace with "Cash".

17. Fill missing Amount with rounded mean.
 df['Amount'].fillna(round(df['Amount'].mean(), 2))

18. Drop all customers who have no Amount values at all.
 df.dropna(subset=['Amount'], how='all')

19. Check if any entire column is missing. Drop such columns.
 df.dropna(axis=1, how='all')

20. Replace missing PaymentMethod by using forward fill grouped by CustomerID.
 df.groupby('CustomerID')['PaymentMethod'].ffill()

---

### Dataset 4: IoT Sensor Data (Timestamp, Temperature, Humidity, Pressure)

21. Find percentage of missing values for each sensor column.
 (df.isna().sum()/len(df))*100

22. Replace missing Temperature values with the rolling mean of last 3 readings.
 df['Temperature'].fillna(df['Temperature'].rolling(3).mean())

23. Forward-fill missing values but limit to 2 consecutive fills.
 df['Humidity'].fillna(method='ffill', limit=2)

24. Fill missing Pressure values with interpolation using "spline" method.
 df['Pressure'].interpolate(method='spline', order=2)

25. Detect if there are any missing values at start/end of dataset.

---

### Dataset 5: HR Employee Records (EmpID, Name, Department, Salary, JoinDate)

26. Replace missing Department with "Not Assigned".

27. Fill missing Salary with median grouped by Department.
 df.groupby('Department')['Salary'].transform(lambda x: x.fillna(x.median()))

28. Drop employees who have both Name and Department missing.

29. Convert missing JoinDate to today’s date.
 df['JoinDate'].fillna(pd.to_datetime('today'))

30. Create a report of employees with missing fields (count per employee).
 df[df.isna().any(axis=1)]

---

Data Files
Retail Sales

0 Comments