“sam2_hierarch: Unsupervised Human-Friendly Online Object Categorization”, (; similar):
…We use SAM2 to segment incoming images into objects. Each object is then masked out and fed into CLIP to create embeddings of each object.
Instead of discarding these embeddings, we save them (alongside their associated image) in automatically generated categories by clustering them following a simplified Online Hierarchical Agglomerative Clustering (OHAC) algorithm, with the similarity index being the cosine similarity of the stored image embeddings.
As a result of this approach, the dynamically generated classifications can be displayed as a list of folders containing images of objects.
Moving an image from one folder to another and then updating the categories automatically gives us unprecedented control over the model’s learned behavior without further retraining.