cTuning & MLCommons Collective Knowledge Challenges

[ Back ]

Implement CM automation to run benchmark Hugging Face models using MLPerf loadgen

Open date: 2023 Jul 4

Closing date: 2023 Aug 17

Collective Knowledge Contributor award: Yes


Introduction

Open-source MLPerf inference benchmarks were developed by a consortium of 50+ companies and universities (MLCommons) to enable trustable and reproducible comparison of AI/ML systems in terms of latency, throughput, power consumption, accuracy and other metrics across diverse software/hardware stacks from different vendors.

However, it is difficult to customize and run MLPerf benchmarks with non-reference models.

That's why the MLCommons Task Force on automation and reproducibility has developed a Collective Mind automation language to modularize this benchmark and make it easier to run with different models and data sets.

Challenge

Implement a CM workflow to connect any Hugging Face model to MLPerf loadgen and run it with random inputs to obtain a preliminary latency and througput without accuracy.

Resources: * CM script to get ML model from Hugging Face zoo * CM script to convert Hugging Face model to ONNX * CM script to build MLPerf loadgen * CM script to run Python Loadgen with any ONNX model * MLPerf BERT FP32 model is available at Hugging Face

Some results showcases CK workflow to benchmark Hugging Face models with MLPerf from v3.0 (BERT): * https://access.cknowledge.org/playground/?action=experiments&name=2f1f70d8b2594149 * https://access.cknowledge.org/playground/?action=experiments&name=mlperf-inference--v3.0--edge--open-power--language-processing--offline&result_uid=9d2594448bbb4b45

Read this documentation to run reference implementations of MLPerf inference benchmarks using the CM automation language and use them as a base for your developments.

Check this ACM REP'23 keynote to learn more about our open-source project and long-term vision.

Prizes

Organizers


Self link