Understanding the Target Audience
The primary audience for this guide includes data scientists, machine learning engineers, and business analysts who are keen on improving their experiment tracking skills. These professionals often face challenges such as managing multiple experiments, lacking real-time insights into models, and struggling to visualize results effectively. Their goal is to streamline workflows and make data-driven decisions using comprehensive metrics.
Tutorial Overview
This tutorial provides a hands-on approach to using Hugging Face Trackio for tracking experiments efficiently. We will cover installation steps in Google Colab, preparation of a dataset, and setting up various training runs with different hyperparameters. Throughout the process, we will log metrics, visualize results, and demonstrate how to import external data into the system. This will all be conducted in a single notebook, allowing for real-time observation of the results.
Getting Started
To begin, we will first install the necessary libraries. You can do this by running:
!pip -q install -U trackio scikit-learn pandas matplotlib
Next, essential Python modules and machine learning utilities need to be imported:
import os, time, math, json, random, pathlib, itertools, tempfile
from dataclasses import dataclass
import numpy as np
import pandas as pd
from sklearn.datasets import make_classification
from sklearn.linear_model import SGDClassifier
from sklearn.metrics import accuracy_score, log_loss, confusion_matrix
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
import trackio
Dataset Creation
For this tutorial, we’ll create a synthetic dataset using the following function:
def make_dataset(n=12000, n_informative=18, n_classes=3, seed=42):
X, y = make_classification(
n_samples=n, n_features=32, n_informative=n_informative, n_redundant=0,
n_classes=n_classes, random_state=seed, class_sep=2.0
)
X_train, X_temp, y_train, y_temp = train_test_split(X, y, test_size=0.3, random_state=seed)
X_val, X_test, y_val, y_test = train_test_split(X_temp, y_temp, test_size=0.5, random_state=seed)
ss = StandardScaler().fit(X_train)
return ss.transform(X_train), y_train, ss.transform(X_val), y_val, ss.transform(X_test), y_test
Training and Logging
Next, we will define a configuration class to store our training settings and a function that runs an SGD classifier while logging metrics to Trackio:
def train_and_log(cfg: RunCfg, Xtr, ytr, Xva, yva):
run = trackio.init(
project=cfg.project,
name=f"sgd_lr{cfg.lr}_l2{cfg.l2}",
config={"lr": cfg.lr, "l2": cfg.l2, "epochs": cfg.epochs, "batch_size": cfg.batch_size, "seed": cfg.seed}
)
clf = SGDClassifier(loss="log_loss", penalty="l2", alpha=cfg.l2, learning_rate="constant",
eta0=cfg.lr, random_state=cfg.seed)
# ... (additional code) ...
trackio.finish()
return val_acc
This function allows for tracking losses, accuracy, and confusion matrices throughout the epochs, providing both numerical and visual insights into performance in real time.
Hyperparameter Sweep
We will now execute a hyperparameter sweep across learning rates and L2 regularization:
grid = list(itertools.product([0.01, 0.03, 0.1], [1e-5, 1e-4, 1e-3]))
results = []
for lr, l2 in grid:
acc = train_and_log(RunCfg(lr=lr, l2=l2, seed=123), Xtr, ytr, Xva, yva)
results.append({"lr": lr, "l2": l2, "val_acc": acc})
After running this sweep, we summarize the results into a table, log the best configuration, and conclude our experiment.
Importing External Data
To further enhance our experiment tracking, we can simulate a CSV file of metrics and import it into Trackio:
csv_path = "/content/trackio_demo_metrics.csv"
df_csv = pd.DataFrame({
"step": np.arange(10),
"metric_x": np.linspace(1.0, 0.2, 10),
"metric_y": np.linspace(0.1, 0.9, 10),
})
df_csv.to_csv(csv_path, index=False)
trackio.import_csv(csv_path, project="trackio-csv-import")
This function allows you to view both logged runs and external data in Trackio’s interactive interface side by side.
Conclusion
This tutorial has demonstrated how Trackio can simplify experiment tracking without the hassle of complex infrastructure. By effectively logging, comparing runs, capturing structured results, and importing external data, users can maintain better organization, monitor their progress, and make informed decisions during their experimentation process.
FAQs
- What is Hugging Face Trackio? Trackio is a lightweight tool for tracking machine learning experiments, helping users to log metrics and visualize results seamlessly.
- How does Trackio improve experiment tracking? It provides real-time logging and visualizations that simplify the management of multiple experiments and metrics.
- Can I use Trackio without a heavy infrastructure setup? Yes, Trackio is designed to be lightweight and easy to integrate into your workflows without needing complex setups.
- What types of visualizations does Trackio provide? Trackio offers various visualizations such as confusion matrices and performance graphs to monitor model efficiency.
- Is it possible to import external data into Trackio? Absolutely. Trackio allows users to import CSV files, enabling a broader view of experiment metrics alongside logged runs.


























