在 2D 中可视化嵌入
我们将使用 t-SNE 将嵌入的维数从 1536 减少到 2。一旦嵌入减少到二维,我们就可以在二维散点图中绘制它们。 数据集在 Obtain_dataset Notebook 中创建。
1.降维
我们使用 t-SNE 分解将维度降为 2 维。
import pandas as pd from sklearn.manifold import TSNE import numpy as np # Load the embeddings datafile_path = "data/fine_food_reviews_with_embeddings_1k.csv" df = pd.read_csv(datafile_path) # Convert to a list of lists of floats matrix = np.array(df.embedding.apply(eval).to_list()) # Create a t-SNE model and transform the data tsne = TSNE(n_components=2, perplexity=15, random_state=42, init='random', learning_rate=200) vis_dims = tsne.fit_transform(matrix) vis_dims.shape
(1000, 2)
2.绘制嵌入
我们根据星级评分为每条评论着色,从红色到绿色。
即使在降维的情况下,我们也可以观察到良好的数据分离。
import matplotlib.pyplot as plt
import matplotlib
import numpy as np
colors = ["red", "darkorange", "gold", "turquoise", "darkgreen"]
x = [x for x,y in vis_dims]
y = [y for x,y in vis_dims]
color_indices = df.Score.values - 1
colormap = matplotlib.colors.ListedColormap(colors)
plt.scatter(x, y, c=color_indices, cmap=colormap, alpha=0.3)
for score in [0,1,2,3,4]:
avg_x = np.array(x)[df.Score-1==score].mean()
avg_y = np.array(y)[df.Score-1==score].mean()
color = colors[score]
plt.scatter(avg_x, avg_y, marker='x', color=color, s=100)
plt.title("Amazon ratings visualized in language using t-SNE")
Text(0.5, 1.0, 'Amazon ratings visualized in language using t-SNE')

此文章由OpenAI开源维基百科原创发布,如若转载请注明出处:https://openai.wiki/visualizing_embeddings_in_2d.html