Gini Coefficient
Overview
The Gini Coefficient is a measure of inequality often used to assess the distribution of a certain variable or attribute across a population. It provides a single scalar value that quantifies the degree of inequality in a distribution.
Calculation
The Gini Coefficient can be calculated using the following steps:
- Sort the values of the variable in ascending order.
- Calculate the cumulative proportion of the variable at each data point.
- Calculate the Lorenz curve, which represents the cumulative proportion of the variable against the cumulative proportion of the population.
- Calculate the area between the Lorenz curve and the line of perfect equality (the diagonal line from the origin to the top right corner).
- Divide the area between the Lorenz curve and the line of perfect equality by the total area under the line of perfect equality.
Example
Calculation
def gini_coefficient(df, scores, **kwargs):
"""
Calculate the Gini coefficient for the given scores and protected attribute.
Args:
df (pandas.DataFrame): The DataFrame containing the data.
scores (str): The name of the selected column for scores.
Returns:
float: gini coeff
Raises:
ValueError: If the scores column is not present
"""
if len(scores) == 0:
raise ValueError('scores was not provided')
def gini(x):
total = 0
for i, xi in enumerate(x[:-1], 1):
total += np.sum(np.abs(xi - x[i:]))
return total / (len(x)**2 * np.mean(x))
return gini(df[scores])
Usage
gini_co = gini_coefficient(df, 'scores')
print(gini_co)
Output
0.32323232323232337
Interpretation
The Gini Coefficient provides a measure of inequality within a distribution. A lower Gini Coefficient suggests a more equitable distribution, while a higher Gini Coefficient indicates greater inequality.
In the context of income distribution, for example, a Gini Coefficient closer to 0 implies a more equal distribution of income among individuals, while a coefficient closer to 1 suggests a highly unequal distribution.
Updated 9 months ago