AI Testing & Metrics
Creating a Test Case
Create a test case with for a given metric id and parameter ids.
# create test case
payload = {
"name": "accuracy test case",
"description": "model must have above minimum accuracy",
"steps": "compute accuracy and compare",
"category": test_case_category_id,
"associated_metric": metric_id,
"required_parameters": parameter_ids_list,
"operator": ">",
"test_value": ".5"
}
url = base_api_url + "/test_cases"
response = requests.request("POST", url, headers=headers, data=json.dumps(payload))
test_case_id = json.loads(response.text)['id']
This example test case requires that the accuracy of the model is greater than .5, a rather underwhelming benchmark. Note the inclusion of test_case_category_id
, metric_id
, and parameter_ids_list
from above.
Creating a Test Run
Lastly, we can create a test run which takes many test cases to perform. In this example only one test case is provided, that being the one created above.
# create test run
payload = {
"name": "accuracy test run",
"description": "running accuracy test",
"required_test_cases": [test_case_id]
}
url = base_api_url + "/test_runs"
response = requests.request("POST", url, headers=headers, data=json.dumps(payload))
Again, note the use of test_case_id
.
Calling Many Metrics
We can call many metrics by looping over the metric_names_to_python_functions
dictionary items.
url = lambda_url
headers = {
'Content-Type': 'application/json',
'Authorization': 'Basic M2Q0NzExNDctNmJmYi00Y2RkLWE5ZWUtYjc1MzBmOWYyNTkyOmZjODg5NzdmYmEyM2QzZDliN2Y5MTFkMzRjNjIyMjZk',
'Cookie': 'sessionid=gzc64jx2wjghdr99obibgdz4xyclbmen'
}
func_name = "False Negative Rate"
payload = {
"new_parameter_values": {
"df": df.to_json(),
"privileged_group": privileged_group,
"labels": labels,
"positive_label": positive_label,
"kwargs": {
"beta": 2
},
"y_true": y_true.to_json(),
"protected_attribute": protected_attribute,
"scores": "scores"
},
"existing_parameter_values": {},
}
for func_name, func_sig in metric_names_to_python_functions.items():
payload["func_name"] = func_name
response = requests.request("POST", url, headers=headers, data=json.dumps(payload))
local_result = func_sig(df=df, labels=labels, positive_label=positive_label, y_true=y_true, protected_attribute=protected_attribute, privileged_group=privileged_group, scores='scores', beta=2)
print(func_name)
try:
print(f"lambda: {json.loads(response.text)['result']}")
except Exception:
print(response.text)
print(f"local: {local_result}")
A simple modification would be to only include desired metrics in a dictionary that is a subset of metric_names_to_python_functions
.
Updated 9 months ago