Theil Index

Overview

The Theil Index is a measure of inequality that quantifies the relative contribution of different groups or subpopulations to the overall inequality within a population. It provides a measure of the disparity or segregation between groups and is commonly used in economics and social sciences.

Calculation

The Theil Index can be calculated using the following formula:

Theil Index = (1/n) * Σ(xi / X) * ln(xi / X)

Source

Where:

  • xi: The value of the variable for a specific group or subpopulation
  • X: The average value of the variable across all groups or the total population
  • n: The number of groups or subpopulations

The Theil Index is calculated for each group or subpopulation, and the results are summed to obtain the overall measure of inequality

Usage

Manually

# Calculate Theil index
result = theil_index(df, protected_attribute, privileged_group, scores)

print("Theil Index:", result)

Using Fairness Object

result = (fo.compute(theil_index))

Results

Theil Index: 0.17756158058046131

These results are obtained by using the input data given in the Create Example Data page under Getting Started

Interpretation

The Theil Index provides a measure of inequality by capturing both within-group inequality and between-group inequality. It ranges from 0 to positive infinity, where a value of 0 indicates perfect equality (when all groups or subpopulations have the same value) and a higher value indicates greater inequality.

The interpretation of the Theil Index depends on the context and the variable being measured. It can be used to analyze various dimensions of inequality, such as income, wealth, or education, across different groups or subpopulations. It helps identify which groups contribute the most to overall inequality and the extent of segregation between groups.

A higher value of the Theil Index suggests that certain groups or subpopulations disproportionately contribute to the overall inequality, indicating a greater level of segregation or disparity. On the other hand, a lower value indicates a more equal distribution of the variable across groups.