“Hierarchical Embeddings for Text Search”, 2024-12-10 (; similar):
List-processing tricks for generating embeddings at different levels of document abstraction to allow efficient semantic searching.
A proposal for better retrieval on large, complex, hierarchically-structured document corpuses, which can be implemented straightforwardly using heuristics and pairwise embedding distances.
We embed each atomic element, merging them heuristically into larger embeddings repeatedly.
Then we search arbitrary text inputs in the reverse direction hierarchically, from the largest embedding to the smallest, to return the best matches in size order while excluding spuriously-similar hits.