Gini Coefficient

Overview

The Gini Coefficient is a measure of inequality often used to assess the distribution of a certain variable or attribute across a population. It provides a single scalar value that quantifies the degree of inequality in a distribution.

Calculation

The Gini Coefficient can be calculated using the following steps:

  1. Sort the values of the variable in ascending order.
  2. Calculate the cumulative proportion of the variable at each data point.
  3. Calculate the Lorenz curve, which represents the cumulative proportion of the variable against the cumulative proportion of the population.
  4. Calculate the area between the Lorenz curve and the line of perfect equality (the diagonal line from the origin to the top right corner).
  5. Divide the area between the Lorenz curve and the line of perfect equality by the total area under the line of perfect equality.

Example

Calculation

def gini_coefficient(df, scores, **kwargs):
    """
    Calculate the Gini coefficient for the given scores and protected attribute.

    Args:
        df (pandas.DataFrame): The DataFrame containing the data.
        scores (str): The name of the selected column for scores.

    Returns:
        float: gini coeff

    Raises:
        ValueError: If the scores column is not present
    """
    if len(scores) == 0:
        raise ValueError('scores was not provided')
    
    def gini(x):
        total = 0
        for i, xi in enumerate(x[:-1], 1):
            total += np.sum(np.abs(xi - x[i:]))
        return total / (len(x)**2 * np.mean(x))

    return gini(df[scores])

Usage

gini_co = gini_coefficient(df, 'scores')
print(gini_co)

Output

0.32323232323232337

Interpretation

The Gini Coefficient provides a measure of inequality within a distribution. A lower Gini Coefficient suggests a more equitable distribution, while a higher Gini Coefficient indicates greater inequality.

In the context of income distribution, for example, a Gini Coefficient closer to 0 implies a more equal distribution of income among individuals, while a coefficient closer to 1 suggests a highly unequal distribution.