Open date: 2023 Jul 4
Closing date: 2023 Aug 17
Collective Knowledge Contributor award: Yes
Open-source MLPerf inference benchmarks were developed by a consortium of 50+ companies and universities (MLCommons) to enable trustable and reproducible comparison of AI/ML systems in terms of latency, throughput, power consumption, accuracy and other metrics across diverse software/hardware stacks from different vendors.
However, it is difficult to customize and run MLPerf benchmarks with non-reference models.
That's why the MLCommons Task Force on automation and reproducibility has developed a Collective Mind automation language to modularize this benchmark and make it easier to run with different models and data sets.
Implement a CM workflow to connect any Hugging Face model to MLPerf loadgen and run it with random inputs to obtain a preliminary latency and througput without accuracy.
Resources: * CM script to get ML model from Hugging Face zoo * CM script to convert Hugging Face model to ONNX * CM script to build MLPerf loadgen * CM script to run Python Loadgen with any ONNX model * MLPerf BERT FP32 model is available at Hugging Face
Some results showcases CK workflow to benchmark Hugging Face models with MLPerf from v3.0 (BERT): * https://access.cknowledge.org/playground/?action=experiments&name=2f1f70d8b2594149 * https://access.cknowledge.org/playground/?action=experiments&name=mlperf-inference--v3.0--edge--open-power--language-processing--offline&result_uid=9d2594448bbb4b45
Read this documentation to run reference implementations of MLPerf inference benchmarks using the CM automation language and use them as a base for your developments.
Check this ACM REP'23 keynote to learn more about our open-source project and long-term vision.