AI Testing & Metrics

Creating a Test Case

Create a test case with for a given metric id and parameter ids.

# create test case
payload = {
    "name": "accuracy test case",
    "description": "model must have above minimum accuracy",
    "steps": "compute accuracy and compare",
    "category": test_case_category_id,
    "associated_metric": metric_id,
    "required_parameters": parameter_ids_list,
    "operator": ">",
    "test_value": ".5"
}
url = base_api_url + "/test_cases"
response = requests.request("POST", url, headers=headers, data=json.dumps(payload))
test_case_id = json.loads(response.text)['id']

This example test case requires that the accuracy of the model is greater than .5, a rather underwhelming benchmark. Note the inclusion of test_case_category_id, metric_id, and parameter_ids_list from above.

Creating a Test Run

Lastly, we can create a test run which takes many test cases to perform. In this example only one test case is provided, that being the one created above.

# create test run
payload = {
    "name": "accuracy test run",
    "description": "running accuracy test",
    "required_test_cases": [test_case_id]
}
url = base_api_url + "/test_runs"
response = requests.request("POST", url, headers=headers, data=json.dumps(payload))

Again, note the use of test_case_id.

Calling Many Metrics

We can call many metrics by looping over the metric_names_to_python_functions dictionary items.

url = lambda_url
headers = {
  'Content-Type': 'application/json',
  'Authorization': 'Basic M2Q0NzExNDctNmJmYi00Y2RkLWE5ZWUtYjc1MzBmOWYyNTkyOmZjODg5NzdmYmEyM2QzZDliN2Y5MTFkMzRjNjIyMjZk',
  'Cookie': 'sessionid=gzc64jx2wjghdr99obibgdz4xyclbmen'
}
func_name = "False Negative Rate"
payload = {
    "new_parameter_values": {
        "df": df.to_json(),
        "privileged_group": privileged_group,
        "labels": labels,
        "positive_label": positive_label,
        "kwargs": {
            "beta": 2
        },
        "y_true": y_true.to_json(),
        "protected_attribute": protected_attribute,
        "scores": "scores"
    },
    "existing_parameter_values": {},
}

for func_name, func_sig in metric_names_to_python_functions.items():
    payload["func_name"] = func_name
    response = requests.request("POST", url, headers=headers, data=json.dumps(payload))
    local_result = func_sig(df=df, labels=labels, positive_label=positive_label, y_true=y_true, protected_attribute=protected_attribute,  privileged_group=privileged_group, scores='scores', beta=2)
    print(func_name)
    try:                           
        print(f"lambda: {json.loads(response.text)['result']}") 
    except Exception:
        print(response.text)
    print(f"local:  {local_result}")

A simple modification would be to only include desired metrics in a dictionary that is a subset of metric_names_to_python_functions.