Visualizing embeddings in W&B
Visualizing the embeddings in W&B¶
We will upload the data to Weights & Biases and use an Embedding Projector to visualize the embeddings using common dimension reduction algorithms like PCA, UMAP, and t-SNE. The dataset is created in the Obtain_dataset Notebook.
What is Weights & Biases?¶
Weights & Biases is a machine learning platform used by OpenAI and other ML teams to build better models faster. They use it to quickly track experiments, evaluate model performance, reproduce models, visualize results, and share findings with colleagues.
In [2]:
Copied!
import pandas as pd
from sklearn.manifold import TSNE
import numpy as np
# Load the embeddings
datafile_path = "data/fine_food_reviews_with_embeddings_1k.csv"
df = pd.read_csv(datafile_path)
# Convert to a list of lists of floats
matrix = np.array(df.embedding.apply(eval).to_list())
import pandas as pd
from sklearn.manifold import TSNE
import numpy as np
# Load the embeddings
datafile_path = "data/fine_food_reviews_with_embeddings_1k.csv"
df = pd.read_csv(datafile_path)
# Convert to a list of lists of floats
matrix = np.array(df.embedding.apply(eval).to_list())
In [ ]:
Copied!
import wandb
original_cols = df.columns[1:-1].tolist()
embedding_cols = ['emb_'+str(idx) for idx in range(len(matrix[0]))]
table_cols = original_cols + embedding_cols
with wandb.init(project='openai_embeddings'):
table = wandb.Table(columns=table_cols)
for i, row in enumerate(df.to_dict(orient="records")):
original_data = [row[col_name] for col_name in original_cols]
embedding_data = matrix[i].tolist()
table.add_data(*(original_data + embedding_data))
wandb.log({'openai_embedding_table': table})
import wandb
original_cols = df.columns[1:-1].tolist()
embedding_cols = ['emb_'+str(idx) for idx in range(len(matrix[0]))]
table_cols = original_cols + embedding_cols
with wandb.init(project='openai_embeddings'):
table = wandb.Table(columns=table_cols)
for i, row in enumerate(df.to_dict(orient="records")):
original_data = [row[col_name] for col_name in original_cols]
embedding_data = matrix[i].tolist()
table.add_data(*(original_data + embedding_data))
wandb.log({'openai_embedding_table': table})
2. Render as 2D Projection¶
After navigating to the W&B run link, we click the ⚙️ icon in the top right of the Table and change "Render As:" to "Combined 2D Projection".
Example: http://wandb.me/openai_embeddings