Show HN: Using stylometry to find HN users with alternate accounts

sillysaurusx · on Nov 26, 2022

Wow. This gives a lot of false positives, but it found all ~10 of my old accounts over the years.

The most interesting thing is that my writing style changed pretty drastically since a decade ago. Searching for my oldest account matches my earliest usernames, whereas searching this account matched the rest.

The details of the algorithm are fascinating: https://stylometry.net/about Mostly because of how simple it is. I assumed it would measure word embeddings against a trained ML model, but nothing so fancy.

hnburnerUixoHr5 · on Nov 26, 2022

Woof.

I create new accounts on a semi-regular basis because I think cliques are the most corrosive factor to social media. Any time my account gathers enough upvotes enough I destroy it for another.

I had four accounts. None are over 50% confidence, but when I look at any one account the others are consistently #2, #3, and #4.

Now I’m thinking very carefully about what words I use to avoid linking this as the 5th account.

butterNaN · on Nov 27, 2022

This makes me melancholic. One should be able to express themselves without the overhead of privacy concerns.

hailwren · on Nov 26, 2022

Exact same thing happened to me. Wild.

dimmke · on Nov 27, 2022

On the other side of the coin, I have never had an alternate HN account (beyond maybe 1-2 throwaways with only one post or comment) so seeing the list of users that are most similar to me was interesting. I didn't see some stark similarities based on a quick peek at their comments, but it was interesting.

costco · on Nov 26, 2022

Yeah top 20 is a little excessive because in my own tests I found that top 20 is only marginally more accurate than top 10. You can get a more academic explanation [here](https://www.tandfonline.com/doi/abs/10.1080/09296174.2011.53...). I was amazed too because it seemed too easy!

sillysaurusx · on Nov 26, 2022

FWIW, top 20 was necessary for mine. The bolding was a brilliant move. Several of my accounts were ranked 10-20, but popped out due to the bolding.

justusthane · on Nov 26, 2022

What does the bolding indicate?

sillysaurusx · on Nov 26, 2022

The explanation is here: https://news.ycombinator.com/item?id=33755466

As far as I’m concerned, it’s the killer feature of the app. The top 20 results may be noisy, but the bolded results have a signal to noise ratio close to infinity.

jsnell · on Nov 26, 2022

The precision of the bolded results looks like maybe 30% to me. Significantly better than the non-bolded, but nowhere near perfect precision.

costco · on Nov 26, 2022

False positives become an increasingly difficult problem the more and more potential authors you introduce. If I had wrote a fancier model it probably wouldn't be as much of a problem but what can you do.

jsnell · on Nov 26, 2022

Yes, this wasn't a criticism of the tool. It is crazy good.

But I don't think people should be making the assumption that bolded results are definite alts, which sillysaurus' comment reads like.

sillysaurusx · on Nov 26, 2022

Hmm, that wasn’t my intent. I see this tool as a recommendation engine more than a doxxer. By “signal to noise ratio close to infinity,” I meant that if you visit one of the bolded accounts, they’ll probably sound a lot like you.

It’s one of those ideas that makes the tool substantially more effective, yet never would’ve occurred to me. It’s like the simplicity of pg’s “a plan for spam” algorithm: deceptively simple, but (like scrubbing dishes with fingers) works really well.

tekknik · on Nov 27, 2022

> I see this tool as a recommendation engine more than a doxxer.

That is absolutely all this will be used for. This is a dangerous tool that serves no real world purpose.

dragonwriter · on Nov 26, 2022

Of my top 20, 19 are bold, all are above 0.6, and I have no alts.

notahacker · on Nov 27, 2022

Vast majority of my top 20 were bold, except you funnily enough!

None of them are me (and you were the only one I recognised and thought "yeah, I can see where it gets it from"...)

loeg · on Nov 26, 2022

I have 7 bolded names (0.53-0.62) in the top 20 list, and none are alts of mine.

morsch · on Nov 26, 2022

I'm one of them and I can confirm. But then again that's what I'd say if I was.

loeg · on Nov 26, 2022

Hi style-adjacent friend :-). Just briefly looking at your recent comment history, we seem to find different kinds of articles interesting, but maybe have a similar writing style.

ghaff · on Nov 26, 2022

Pretty much the exact same. (I do have a throwaway account but I rarely use it and it probably hasn't been used enough to qualify.)

costco · on Nov 26, 2022

The funny thing is that I thought of it while eating dinner last night :)

dimmke · on Nov 27, 2022

My results have 5 bolded users in my top 20, and I have 0 alt accounts.

lettergram · on Nov 26, 2022

Frankly similar to how I was doing in back in 2018 (when you and I chatted about it on HN lol)

https://news.ycombinator.com/item?id=17944293

The approach I took was a bit different, but also no ML required.

The real trick is pruning and going cross platform. There are around 100k active HN accounts (meaning posts a few times a year), maybe 200k if you count at least one post a year. But <10k that post weekly.

It’s a very small space to try to compare so simple methods will work fine.

costco · on Nov 26, 2022

Exactly. HN emphasizes long-form posts much more than other forums which makes the commenters here very susceptible to this kind of analysis. Plus you can fit every single HN comment in RAM on a mid tier gaming laptop so it's even easier. I was trying to think of applications of this kind of data and the only thing I could think of was moderation tools/detecting ban evaders but what you've done seems much more profitable lol.

echelon · on Nov 26, 2022

It works like a charm for me too.

I put in my username and found my pre-echelon alt, possibilistic.

(Echelon was taken when I registered possibilistic, but it must have been unused and dropped.)

User23 · on Nov 26, 2022

I’d figured it would be some kind of n-gram frequency analysis. Would be interesting to code that up and compare.

costco · on Nov 26, 2022

It is. The description on the about page is a little simplified but I basically I look at the most common word and character ngrams of size 1,2,3 (200 each), put all the frequencies in an array and then compare to all the other users with https://scikit-learn.org/stable/modules/generated/sklearn.me....

User23 · on Nov 26, 2022

Cool, I only skimmed the description maybe I needed to read it more carefully.

Have you considered doing rune rather than word ngrams? I can imagine that might be prohibitively expensive, but I really don’t know. I did something like that long long ago in C for automatic document language detection. It was quite accurate.

bb88 · on Nov 27, 2022

sillysaurus3 was in mine. :) Clearly we're not the same.

FormerBandmate · on Nov 26, 2022

> sillysaurus3

> sillysaurus2

Tbf a human could have found a bunch of them relatively easily

jll29 · on Nov 26, 2022

The method used, i.e. to calculate the cosine of the two authors' word vectors, is poorly suited for stylometric analysis because it is based on a poster's lexicon and the word frequencies of each word, but ignoring stylistically relevant factors like word order.

Also, the cosine of the vectors of word frequencies conflates author-specific vocabulary and topics; in other words, my account is grouped (with >51% similarity, according to the demo) with someone probably because we wrote about similar things. A strong stylometric matcher ought to be robust against topic shifts (our personal writing style is what stays constant when we move from writing about one topic to writing about another topic, just like our personality is what stays constant about our behavior over time - of course styles do change, but the premise then has to be that such changes happen very slowly).

Stylometrics/authorship identification is interesting and has led to some surprising findings, e.g. in forensic linguistics (Malcolm Coulthard wrote several good books about the topic).

This paper lists some other features that could be used and compares a bunch of techniques: https://research.ijcaonline.org/volume86/number12/pxc3893384...

MikePlacid · on Nov 26, 2022

> based on a poster's lexicon and the word frequencies of each word, but ignoring stylistically relevant factors like word order.

Interesting. I was expecting to be grouped with other Russian speakers and I am (based on some nicknames). But I thought the most telling feature will be exactly word order - it’s absolutely relaxed in Russian. Word frequencies? Well, probably the absence of articles, lol (but I swear to God that I often spend some extra time trying to insert as many articles in my texts as I could).

implements · on Nov 27, 2022

There’s https://en.wikipedia.org/wiki/Idiolect :

”Language consists of sentence constructs, choice of words, and expression of style. Accordingly, an idiolect is an individual's personal use of these facets. Every person has a unique idiolect influenced by their language, socioeconomic status, and geographical location.”

antirez · on Nov 27, 2022

In practice a more complex approach will tend to require a greater amount of data per user, so in this specific case this simple approach is not too bad. Moreover, fake accounts are likely to talk about the same topics, so while this leads to false positives, also makes it more likely that in the list we find actual duplicates.

sillysaurusx · on Nov 26, 2022

Ha, gruseom shows up for pg, which is dang’s old account. A worthy successor.

This is a fascinating way to find similar HN users who aren’t the same person. It’s a surprisingly great recommendation engine. “If you like pg, you might also like…”

Sure, the privacy concerns are valid, but the cat’s out of the boot. Might as well enjoy the benefits.

montrose is almost definitely pg. Someone who talks about ancient history, Occam’s razor, VCs and startups, uses the phrase “YC cos” (relatively uncommon), etc. https://news.ycombinator.com/item?id=17112567

Nicely done. One of the best hacks I’ve seen in a long time.

costco · on Nov 26, 2022

> motrose is almost definitely pg. Someone who talks about ancient history, Occam’s razor, VCs and startups, uses the phrase “YC cos” (relatively uncommon), etc. https://news.ycombinator.com/item?id=17112567

I had this hunch too. It's either pg or someone trying really hard to be pg.

roughly · on Nov 26, 2022

I mean, this is HN -

> someone trying really hard to be pg

describes half the site.

asveikau · on Nov 26, 2022

> Someone who talks about ancient history, Occam’s razor, VCs and startups,

I think these are all common topics among HN readers and commenters.

pyb · on Nov 26, 2022

Why would montrose be pg ? The correlation is not that high. Looks like a few people have picked up pg's mannerisms.

seba_dos1 · on Nov 27, 2022

Yeah, that score is only slightly higher than the highest one it shows for my account (which is also bold) - and unless my alter ego has been disguised so well it even managed to hide from myself, I'm pretty sure that isn't me :)

kazinator · on Nov 27, 2022

The score for montrose vs pg is lower than the score for someone most similar to me, who is definitely not me.

I think, the similiarity has to be in the high .80's to suspect that it's the same individual.

costco · on Nov 26, 2022

There are factors that make me think it is more likely than not (just scrolled through the comment history, don't feel like linking everything) that he is pg.

- Is bolded on pg's page

- Mentions yoga

- Talks about Lisp often

- Talks about YC often

- Talks about kids

- Links to Paul Graham's website

- Says he uses vi

- Writes exactly like you would expect pg to write

pyb · on Nov 27, 2022

I agree that this person is trying very very hard to sound like pg ! You could be right actually. Could still be a "wannabe" though.

ethmaxi · on Nov 27, 2022

I'm sophisticately sure they are not. They recommend a founder to ask users directly what they will pay for.

Is that what PG would say?

sillysaurusx · on Nov 27, 2022

Of course. Why wouldn’t he? That’s sound advice.

ethmaxi · on Nov 27, 2022

YC startup videos recommend not asking users directly what they will pay for.

Users freq. say they will pay for something but back down against other things.

costco · on Nov 27, 2022

With all due respect:

https://news.ycombinator.com/item?id=16785542

https://twitter.com/paulg/status/1362369484036653058

I think you are very likely to be wrong.

astura · on Nov 27, 2022

Wow, what an odd thing to get so worked up about.

VyseofArcadia · on Nov 26, 2022

> but the cat’s out of the boot

It's my first time hearing that variant. Usually its, "the cat's out of the bag" where I'm from.

Do you mean boot in the UK sense, what Americans would call the trunk of a car? Or do you mean a sturdy piece of footwear?

Obligatory xkcd https://xkcd.com/2390/

sillysaurusx · on Nov 26, 2022

It’s a little writing trick I leaned from (I think) Orwell. Any time you’re about to use a common metaphor, try to tweak it. You’ll catch readers off guard, which piques their curiosity.

It’s a fun game, too. I wish I’d used “the cat’s out of the hat,” but I didn’t think of it till later.

InGoodFaith · on Nov 26, 2022

What you are describing is also known as an eggcorn.

https://en.wikipedia.org/wiki/Eggcorn

rcarr · on Nov 26, 2022

This is my all time favourite one of these:

https://thehabit.co/knowledge-is-power-france-is-bacon/

> When I was young my father said to me: “Knowledge is power, Francis Bacon.” I understood it as “Knowledge is power, France is bacon.”

> For more than a decade I wondered over the meaning of the second part and what was the surreal linkage between the two. If I said the quote to someone, “Knowledge is power, France is Bacon,” they nodded knowingly. Or someone might say, “Knowledge is power” and I’d finish the quote “France is bacon,” and they wouldn’t look at me like I’d said something very odd, but thoughtfully agree. I did ask a teacher what did “Knowledge is power, France is bacon” mean and got a full 10-minute explanation of the “knowledge is power” bit but nothing on “France is bacon.” When I prompted further explanation by saying “France is bacon?” in a questioning tone, I just got a “yes.” At 12 I didn’t have the confidence to press it further. I just accepted it as something I’d never understand.

> It wasn’t until years later I saw it written down that the penny dropped.

sambapa · on Nov 27, 2022

You left the funniest thing - the guy/gal's nickname was "Lard_Baron"

sillysaurusx · on Nov 26, 2022

Thank you! I was trying to find the original essay I learned it from. I’m now pretty sure it was by Poe, but all I can remember is the main advice: avoid common metaphors.

I vaguely remember one of the metaphors in the essay was about a chicken coop melting, or something like that. It was vivid enough to leave a big impression.

ewilden · on Nov 26, 2022

I remember this being from Politics and the English Language (https://www.orwellfoundation.com/the-orwell-foundation/orwel...):

“ Dying metaphors. A newly invented metaphor assists thought by evoking a visual image, while on the other hand a metaphor which is technically ‘dead’ (e. g. iron resolution) has in effect reverted to being an ordinary word and can generally be used without loss of vividness. But in between these two classes there is a huge dump of worn-out metaphors which have lost all evocative power and are merely used because they save people the trouble of inventing phrases for themselves.”

sillysaurusx · on Nov 26, 2022

Thank you so much! That’s the one.

(It’s remarkable how often a vague description can yield an HN comment with an answer from a clever sleuth like yourself. Much appreciated.)

operator-name · on Nov 26, 2022

That's neeto!

The 2nd example also loosely falls under the classification of malaphor.

https://en.m.wiktionary.org/wiki/malaphor

b800h · on Nov 27, 2022

An eggcorn is a soundalike though, isn't it? Deliberately altering idioms to catch people's attention isn't an eggcorn IMO.

InGoodFaith · on Nov 27, 2022

> An eggcorn is a soundalike though, isn't it?

Not necessarily, you might be thinking of malapropisms but yes probably a closer word would be the general term: protologism.

Another commenter added some useful info on the evocative alteration of metaphors [2]

1: https://en.wikipedia.org/wiki/Malapropism

2: https://news.ycombinator.com/item?id=33757097

UncleEntity · on Nov 26, 2022

Yeah, it’s like shooting ducks in a barrel it works so well.

Easy to overuse then people just get annoyed though…kind of like commas, I suppose.

PebblesRox · on Nov 27, 2022

That reminds me of a PETA campaign on social media trying to get people to replace violent idioms with alternatives like "feeding a fed horse" and "there's more than one way to pet a cat."

esfandia · on Nov 26, 2022

I like mixing metaphors, in this case "the cat's out of the tube". ("the toothpaste's out of the bag" doesn't work as well though)

sdwr · on Nov 26, 2022

I love doing this too, it's fun to write.

martin82 · on Nov 27, 2022

There's a popular movie called "Puss in Boots". That's what I had to think of first.

pvg · on Nov 27, 2022

It's a bit older than the movie or movies in general.

https://en.wikipedia.org/wiki/Puss_in_Boots

rcarr · on Nov 26, 2022

This is somewhat similar to how they ended up catching the Unabomber. The FBI were literally at a dead end. They ended up posting one of his letters/manifestos in the paper, somebody recognised a turn of phrase the unabomber used that was unusual and reported it as possibly being their brother, FBI investigated the lead and it lead them straight to him.

Excerpts from wiki:

> Before the publication of Industrial Society and Its Future, Kaczynski's brother, David, was encouraged by his wife to follow up on suspicions that Ted was the Unabomber.[91] David was dismissive at first, but he took the likelihood more seriously after reading the manifesto a week after it was published in September 1995. He searched through old family papers and found letters dating to the 1970s that Ted had sent to newspapers to protest the abuses of technology using phrasing similar to that in the manifesto.[92]

> In early 1996, an investigator working with Bisceglie contacted former FBI hostage negotiator and criminal profiler Clinton R. Van Zandt. Bisceglie asked him to compare the manifesto to typewritten copies of handwritten letters David had received from his brother. Van Zandt's initial analysis determined that there was better than a 60 percent chance that the same person had written the manifesto, which had been in public circulation for half a year. Van Zandt's second analytical team determined a higher likelihood. He recommended Bisceglie's client contact the FBI immediately.[96]

> In February 1996, Bisceglie gave a copy of the 1971 essay written by Ted Kaczynski to Molly Flynn at the FBI.[87] She forwarded the essay to the San Francisco-based task force. FBI profiler James R. Fitzgerald[98][99] recognized similarities in the writings using linguistic analysis and determined that the author of the essays and the manifesto was almost certainly the same person. Combined with facts gleaned from the bombings and Kaczynski's life, the analysis provided the basis for an affidavit signed by Terry Turchie, the head of the entire investigation, in support of the application for a search warrant.[87]

https://en.m.wikipedia.org/wiki/Ted_Kaczynski

ryangittins · on Nov 27, 2022

As I recall, one of the clinchers was his use of the phrase, "you can’t eat your cake and have it too" as opposed to the now-predominant variant "you can’t have your cake and eat it too."

I often wonder if stylometry can be used to positively identify a person based not on general word frequency, but by a single phrase or two which are rare in general but commonly used by the individual. In theory this could be relatively easy to find given a large corpus. You'd pick out the top few n-grams for short phrases by an individual and identify the ones which are most overly-represented compared to the rest of the population.

googlryas · on Nov 26, 2022

It was actually his brother.

fbdab103 · on Nov 26, 2022

So is the lesson you should have GPT rewrite your manifesto so as to obscure your personal idioms?

CharlesW · on Nov 26, 2022

Or something purpose-built like Anonymouth (https://github.com/psal/anonymouth), although it seems to be both unique and dead.

Also interesting:

> Ross Ulbricht aka Dread Pirate Roberts, the mastermind behind the infamous Silk Road site which served as a black market for drugs, weapons and fake documents was also well aware of the potential danger of stylometry being used against him. At the time of his arrest in a San Francisco public library, the FBI captured images of his laptop screen as evidence. Guess what what he had bookmarked — “Science of Stylometry.”

https://medium.com/svilenk/the-case-for-anonymity-12db114f0c...

rejectfinite · on Nov 26, 2022

I mean he used an forum account with an email that had his name in it.

fbdab103 · on Nov 26, 2022

That's the problem - it only takes a single slip and it is recorded forever. Perfect opsec is an impossibly high bar if you are maintaining an active online presence.

astura · on Nov 27, 2022

Only if you have a history of sending crazed writings/manifestos to newspapers and family.

atestu · on Nov 27, 2022

The show “Manhunt: Unabomber” (Netflix) shows this whole story very well.

drc500free · on Nov 26, 2022

This is a super interesting tool for self reflection. Looking at the top 10 similar accounts to mine, it gives me an arms-length view of how other people probably interpret my tone.

I appear to be a well-educated, over-confident know-it-all.

pavlov · on Nov 26, 2022

My #3 match is cstross, and now I’m convinced that my life-long secret dream of being a successful sci-fi novelist is basically a matter of typing. (Ideas? Character development? Ruthless editing? Developing an audience? Having a publisher? What do I need of those when the Computer told me I’m practically a genius…)

shagie · on Nov 27, 2022

I'd suggest giving the back story to Agent to the Stars by John Scalzi a glance.

http://www.scalzi.com/agent/

> In the summer of 1997, I was 28 years old, and I decided that after years of thinking about writing a novel, I was simply going to go ahead and write one. There were two motivations for doing so. First, I was simply curious if I could; I'd had up to that time a reasonably successful life as a writer, but I'd never written anything longer than ten pages in my life outside of a classroom setting. Two, my ten-year high school reunion was coming up, and I wanted to be able to say I'd finished a novel just in case anyone asked (they didn't, the bastards).

> In sitting down to write the novel, I decided to make it easy on myself. I decided first that I wasn't going to try to write something near and dear to my heart, just a fun story. That way, if I screwed it up (which was a real possibility), it wasn't like I was screwing up the One Story That Mattered To Me. I decided also that the goal of writing the novel was the actual writing of it -- not the selling of it, which is usually the goal of a novelist. I didn't want to worry about whether it was good enough to sell; I just wanted to have the experience of writing a story over the length of a novel, and see what I thought about it. Not every writer is a novelist; I wanted to see if I was.

highwaylights · on Nov 26, 2022

Same. Looking through some of the handles on my list tells me that I come across like a not-particularly-well-educated McSmug that needs to take a good long look at myself. Wouldn’t be so bad if I wasn’t reading the posts thinking I definitely could see myself writing this.

This was certainly eye-opening.

Update: It’s actually a little strange that reading through some of the matches it’s not just style that overlaps but perspectives in quite a few cases too. I’m definitely not the unique little snowflake that some others are finding themselves to be.

bee_rider · on Nov 26, 2022

I also enjoyed reading one of my style-partner’s posts.

The most noticeable similarity is that we both clearly have strong opinions about some things, and like to share information, but also like to be clear about our unknowns or opinions. So, lots of “sounds likes,” “probably,” “could be” and so on.

The downside is, I guess, this could be seen as a bit weasel-word-y or indirect.

reducesuffering · on Nov 27, 2022

> like to be clear about our unknowns or opinions. So, lots of “sounds likes,” “probably,” “could be” and so on.

Commonly called just “hedging” like hedging your bets.

bee_rider · on Nov 27, 2022

That’s a kinder description than I gave it in my next paragraph, so thanks I suppose.

I do think it is an under-emphasized aspect of honesty, though, that we should be clear about our level of experience/understanding. Especially online — people like to discuss things, even (especially?) when we are just getting started. So if we’ve picked up opinions through osmosis and we start repeating them without testing them, we’re really just amplifying some possibly-incorrect viewpoint (and if we’ve picked it up, there’s a good chance it is already widespread in the community, which is bad if it is wrong).

And I mean, more concretely a measurement is not complete without the error bars!

Often this doesn’t really matter, because it is just chit-chat anyway. But it is nice to keep in mind.

fancybouncy · on Nov 27, 2022

> we should be clear about our level of experience/understanding

there are many languages that encode this info as mandatory grammatical affixes, it's called evidentiality.

bee_rider · on Nov 27, 2022

I hadn’t heard of that. Neat!

I find it interesting that the first example they use in the Wikipedia article is Turkish. I’ve only met a couple Turks, but they were all quite good engineers. I wonder to what extent embedding this kind of information in the language helps organize your thoughts.

bhaney · on Nov 26, 2022

> I appear to be a well-educated, over-confident know-it-all.

Don't we all?

sdwr · on Nov 26, 2022

I hate us insufferable nerds. !

reducesuffering · on Nov 27, 2022

> over-confident know-it-all.

I’m pretty sure participation in HN is a 99% sure filter for being called this many times in one’s life.

closeparen · on Nov 26, 2022

That's what we all come to HN for...

seydor · on Nov 26, 2022

we must be a good match

drc500free · on Nov 26, 2022

I'd love a version of this where you enter two usernames and get a match score.

jsnell · on Nov 26, 2022

After a few tries on boring accounts, I thought to try the account of somebody who was notorious for an incident outside of HN, and had a (deservedly) bad time at HN for a couple of years before the account went dark.

And yeah, there's a bunch of high confidence (.6-.8) hits for that account, and from a quick browse of the comments of the recently active ones, they look really likely to be alts. Like, all three that I looked at had comments that made it very clear it was this person writing pseudonymously. (E.g. writing on their signature issue, and saying they couldn't go into more detail due to fear of self-doxxing; or somebody literally saying that the alt's claims reminded them of the public writings of the notorious guy years ago).

Obviously I'm not naming the account, but this functionality turned out way creepier than I thought the moment I tried it on the account of somebody who has a reason to disassociate from an existing public persona, but still wants to participate here.

thesz · on Nov 26, 2022

I keep no alternate accounts, but this tool reports best matches for me that appear to be Slavic or just Russian - and I am Russian. Best match score in my list is just above 0.5. There are some clearly alternate accounts on the list, their match scores with this tool are well above 0.7.

It is probable that persons of same cultural origin will have similar writing style and vocabulary. It is also probable that persons of same cultural origin would have same relationships with the world as a whole, they would like same things and dislike other same things.

So, in my opinion, it is possible that you have found not only alternate accounts (score above 0.7), but accounts of people with same cultural origin (ones that are around 0.6).

ricardobayes · on Nov 26, 2022

My highest was 0.41 and the person writes nothing like me. I guess I'm a unique snowflake after all.

Litost · on Nov 27, 2022

I was curious about this, my highest match was 0.47 and I have no alts, maybe I'm also a unique snowflake, or haven't said anything noteworthy enough to have been deepfaked yet ;).

gilleain · on Nov 26, 2022

my second highest hit (ie, third in the list) is gwern at 0.45 who i'm fairly sure is not me.

scarmig · on Nov 26, 2022

I was actually just looking at near hits for gwern and found what's almost definitely a defunct alt for him.

gilleain · on Nov 26, 2022

Well is certainly NOT me, that's for sure.

On an unrelated topic, I'm starting a service to write comments in the style of others to provide plausible deniability for other alt accounts. Rates negotiable.

jrumbut · on Nov 26, 2022

I have a few in the low 0.5's and, honestly, they seem cool and I want to meet them.

weaksauce · on Nov 26, 2022

I don't have any alternate accounts here either and my writing style is apparently nearly the same as a high profile account that I recognize and has many points. I wouldn't say this is a highly accurate thing.

vbezhenar · on Nov 26, 2022

There're 19 other accounts this tool finds similar to me. Those are not my accounts. 0.46 - 0.56 are numbers.

costco · on Nov 26, 2022

I think people are sort of confused at what this tool is supposed to be which I will concede is partially my fault. The results of this tool are by themselves not indicative of having an alternative account. It generates the 20 most similar users for every single user on the site, regardless of whether they have an alt or not (there's obviously no way for me to know that for every single user). In your case further investigation would reveal that none of those accounts are yours.

thesz · on Nov 26, 2022

It is a fun tool, I can assure you. It is just people have found use case you haven't foreseen yourself.

I think your tool should have internal embeddings for each of the user. Also, most probably your tool uses cosine similarity for a search.

Thus, I would like to suggest a feature: recognize simple arithmetic operations over user's embeddings, such as "thesz - 2 * patio11". It will make things even more fun, this way we can find users who are like me and much not like patio11. Even simple additions and subtractions would suffice.

(an idea is taken from properties of word2vec embeddings)

Your tool is thought provoking. What I discovered with it made me think about my use of language and what other languages (body, imagery, etc) I use differently because of who I am. Which made me think about my favorite underrated superhero Cypher [1] - would his innate ability to understand languages make him best detective ever?

[1] https://en.wikipedia.org/wiki/Cypher_(Marvel_Comics)

Thank you!

costco · on Nov 27, 2022

Really cool idea. I'd need to upgrade the VPS though so all the vectors would fit in memory but it probably wouldn't be too hard (right now I'm just storing a map of username string -> array of 20 username strings because my VPS only has 512mb RAM). I'll think about if I can do this in a way that is more resource conservative.

csa · on Nov 26, 2022

Fwiw, and as gp mentioned, > 0.7 seems more likely to be alt territory.

bbarnett · on Nov 26, 2022

You are fools, one and all! This tool's only purpose, is to tag people who use it!

Now they know just who cares about which alternate accounts. They know!

They freaking know, man!

You have all fallen for their ploy. Fools!

thesz · on Nov 26, 2022

I have no alternate accounts and visited the site out of curiosity, because I used to worked in the domain like this.

What I found was worth visiting the site. Somehow notably many accounts with (relatively) high similarity to mine's are sharing at least one of my personal traits.

Which is fascinating, to me.

And I think is worth to be noticed by others - what and how you write can disclose who you are.

TheOtherHobbes · on Nov 26, 2022

It knows my IP now.

(Or does it?)

neodypsis · on Nov 26, 2022

It offers no privacy policy, so can't tell.

irrational · on Nov 26, 2022

.6 is high confidence? I did my own username, wondering what it would return, since I know I don’t have any alt accounts. The top results are in the .6-.7 range. If they aren’t alt accounts, is it just coincidence that we have similar writing styles?

bee_rider · on Nov 26, 2022

I think so.

A funny thought — my “matches” cap out at around .56. Having false positives* in a tool like this might feel like a “bad result” but actually I think it just means that if someone were running this sort of tool across the whole internet, I’d be relatively easy to correlate, while your identity would be intermingled with your .6-.7 partners.

*actually they aren’t really even false positives because the tool doesn’t promise to detect alts in the first place, just find similar styles.

tbrownaw · on Nov 26, 2022

> but this functionality turned out way creepier than I thought the moment I tried it

Hopefully this raised awareness means that people who actually need anonymity will be more likely to know to take precautions.

kaba0 · on Nov 26, 2022

Genuinely asking, what way is there to combat this? Is there a tool that takes out stylistic elements of your comment?

paulgb · on Nov 26, 2022

The site mentions a service called Quillbot which apparently does just that. https://stylometry.net/avoid

klabb3 · on Nov 27, 2022

This is the million dollar question. I think the goal of "anonymity for most intents and purposes" is worthy, it's been how I've enjoyed HN and Reddit, but I also know that it was just a matter of time before stylometry and other meta-analysis of post history become 10 second tools for everyone. Now the cat is out of the box.

I've been thinking about this a bit, and I've landed in that having a stable identifier across ALL comments & posts is a poor default. We still probably want some coherence, at minimum within a thread, eg to follow a back-and-forth. The site itself may also use stable identifier for abuse prevention. But there's no reason one should have the same username externally traceable for posts about completely different topics.

In practice, this could be done with low friction pseudonym creation, which all ties to the same account privately.

marbu · on Nov 26, 2022

One way would be to run such tool before posting and then based on the results, tweak the post and repeat until the similarities are not statistically significant. Or instead of tweaking, start posting under a new throwaway account. But this won't save you when some new way to analyze style appears in the future. Moreover there are other types of meta data which can be taken into account to narrow down the search space a bit such as timestamps. And obviously more you write, harder it is to control these things.

thedragonline · on Nov 26, 2022

I wonder if gpt3 has a use case here?

birdyrooster · on Nov 26, 2022

[flagged]

Animats · on Nov 26, 2022

0.6 isn't much. I have 3 matches above 0.6, and they're not me. 20 or so over 0.5.

input_sh · on Nov 26, 2022

I get one 0.68 match, which... fair enough. It is an account I've abandoned some years ago, no secrets there.

No other hits above 0.5, so I guess that either makes me pretty unique as a commentator or my English is broken in a unique way.

jsnell · on Nov 26, 2022

That's why you manually evaluate the matches. And like I wrote in that comment, I did that manual eval, and these clearly are alts of that main account, not spurious. Narrowing down the pool of accounts you'd need to do this kind of manual evals for by a factor of 100000 is a pretty significant change in capabilities.

kcarter80 · on Nov 26, 2022

Could you elaborate on why it's obvious why you won't name the account?

notduncansmith · on Nov 26, 2022

Maybe to avoid attracting any extra attention to this user? Also, as someone who’s read HN for a few years, it only took me 2 guesses to find an account that the above comment describes (and not necessarily the same person).

sillysaurusx · on Nov 26, 2022

It was a classy move by jsnell, too. Thank you.

(I don’t know who the comment is talking about, which is how it should be. There’s no need to blow someone’s cover in a highly visible way. Even if they were satan, they’d still be welcome on HN as long as they’re writing substantive, interesting comments that follow the guidelines.)

Normal_gaussian · on Nov 26, 2022

Such quality comments would track with most thorough Satan representations.

Aachen · on Nov 26, 2022

They obviously don't want it to be known, seeing as they've got alts to post under and avoid going into too much detail. Being able to go out and do your own research is different than posting the information open for everyone to see at a glance.

I would say it's obvious why one might respect that wish (do unto others...), but I'm also aware that my and my culture's sense of privacy goes further than many others'.

phreeza · on Nov 26, 2022

MD5 of the username is 9abc27e93b7e3c04b7c599017c1cfe5f ? The top one seems an odd one out in that case?

Aachen · on Nov 26, 2022

Usernames aren't random enough to be safe as a simple MD5. Perhaps with a strong bcrypt, but similar to PIN codes, it might be better to give partial information like "is the second character an ...", assuming nobody else made similar statements. Or give the first ~two hex characters of the hash, so that it would match 1/(16²)rd of the usernames. I'm sure there's also a clever way for a zero-knowledge proof here, probably something with diffie-hellman using the name as your random integer or something, but I'm too sick to think about this stuff right now. Privately sharing data publicly is hard.

ahmedalsudani · on Nov 26, 2022

Another problem is that it's a small set. If you had a list of all HN users, you could compute md5 for all of them in seconds.

phreeza · on Nov 27, 2022

I think the intention of the post not mentioning the handle was just to prevent old discussions from flaring up or so? The post doesn't really contain any new information on the person that would be worth obscuring. So I just thought I'd hash it to prevent that. But it seems I actually screwed up the hashing so I will leave it at that.

lzooz · on Nov 26, 2022

Good point - I've been running john on that md5 for a couple minutes :)

wizzwizz4 · on Nov 26, 2022

Why use John? Just run down the list of Hacker News usernames; it'll take less time. (Or, better still, don't; just because the privacy's theoretically compromised doesn't mean we have to exploit that.)

lzooz · on Nov 26, 2022

I don't think there's a public list of all HN usernames is there?

Found this, it includes 250k usernames, but it's not there. https://www.kaggle.com/datasets/hacker-news/hacker-news-corp...

meta2023 · on Nov 26, 2022

The username in question isn't in this dataset but maybe it was created in the past 10 days, as the max(timestamp) is Nov 16th, 2022.

https://console.cloud.google.com/marketplace/details/y-combi...

lzooz · on Nov 26, 2022

It isn't there, and given the "story" it happened years ago so it should be there, so I guess we've been played.

phreeza · on Nov 27, 2022

Unintentionally played I might add... But I will leave it at that.

tqi · on Nov 26, 2022

> quick browse of the comments of the recently active ones, they look really likely to be alts.

Hmm isn't a spot check of comments somewhat tautological, since that is how the tool identifies alts (rather than something like IP address or time of day)? If this had been promoted as "find accounts with similar writing style to yours" would people immediately assume alts?

margalabargala · on Nov 26, 2022

I would presume that OP is referring to the actual content of the comments. This just does stylometric analysis, which looks at word choice, but not what the arrangement of the words mean.

If some accounts are found to be stylometrically similar, and then a visual inspection also shows them all stating similar opinions, that latter piece of data is a strong signal.

gus_massa · on Nov 26, 2022

It would be nice to make the names clickable.

I don't think the list of pg alternate account is accurate. I checked a few. They have many oneliners that is typical of pg, but the topics and style don't look similar.

I searched a few more and got better results. :)

I searched myself (that I know that I have no alternate accounts). I recognize a few users that are interested in similar topics, and I discuss/upvote them many times. But I didn't recognize most of the user of the list.

costco · on Nov 26, 2022

> I searched myself (that I know that I have no alternate accounts). I recognize a few users that are interested in similar topics, and I discuss/upvote them many times. But I didn't recognize most of the user of the list.

It's based purely off frequency of the 200 most common English 1 word phrases, 2 word phrases, 3 word phrases, 1 character sequences, 2 character sequences, and 3 character sequences. Topic does not really have anything to do with it. If I had more time I probably would've done a smarter model that accounted for things like that.

gus_massa · on Nov 26, 2022

One is also a mathematician. It's trivial that we overuse some technical words even if it's unnecessary.

Another is form Argentina, so I guess the native language leaks, for example using words derived from latin that are not idiomatic.

And there are a few more, that is a honor to be "confused" with, but I have no clue why.

Fnoord · on Nov 26, 2022

Cool stuff, thank you for sharing your findings!

I don't do throwaway. I either post or STFU. I also STFU on darknet. Its why I found it fun to read/lurk on things like I2P back when it was new. And I know that on a pseudonymous account it is only a matter of time until it can be linked to another pseudonymous account. It would not surprise me if stylometry was used on Dread Pirate Roberts or the people behind The Pirate Bay or the people behind Wikileaks (Assange's sockpuppet accounts). Such can also have been used to verify afterwards instead of beforehand. Though with TPB since it was on clearweb an advanced adversary could have used correlation/timing attack to figure who wrote what.

I'm having fun times recognizing other Dutch people though their usage of English language. For example, a distinctive word I see Dutch people use a lot is 'oke' instead of 'OK' or 'okay'. Its a red flag the person is native Dutch. I wonder if there are stylometry tools available for figuring if someone used physical vs touchscreen keyboard (I used Glider to write this post, spellchecker unavailable).

And yes, organizations like secret service and police should use such tools as well. It is a known tool, why not use it for good? As with any tool, it can be used for good and evil. On HN this could be useful for the mod team (AFAIK nowadays only dang) to find banned people's sockpuppets. Cross-community could also be a fun project: find a HN user's Twitter or Reddit account. And I hope this method is also used to find Russian trolls on social media.

ghaff · on Nov 26, 2022

Most people greatly underestimate the power of linkage attacks on anonymity. And it doesn't even take fancy ML. In the context of healthcare records, I like to trot out this 25 year old example of an MIT grad student and the then-governor of MA.

https://ischoolonline.berkeley.edu/blog/anonymous-data/

dlkf · on Nov 26, 2022

The top hit on my list looked familiar. I looked at their recent comments and saw a discussion between that user and me. We were quoting eachother directly throughout.

I wonder if this explains our similarity. And if so, could we tweak the algo by e.g. Removing text that is prepended with ”>”

bscphil · on Nov 26, 2022

The scary thing is that once you have this data, finding HN matches for individual targeted users on other sites becomes trivial, even if those sites are harder to scrape. I bet most people here have an anonymous Reddit account, for example. If you wanted to know who was behind a particular Reddit account, you could feed it into something like this and compare the results with HN, where accounts are less likely to be anonymous. Or build a database based on blogs, Github comments, etc.

Also, since this uses only word frequency, there are probably relatively easy improvements to make that would make it even more powerful, like looking at particular runs of words that are unique. Some expressions or figurative language only show up in combinations of words, and tend to be highly style specific.

costco · on Nov 26, 2022

I could have used a part of speech tagger, looked at time of day a user posts, capitalization, spelling errors, etc. From what I understand the state of the art is lightyears ahead of this, there are even companies with actual linguists who will act as expert witnesses in court to say stuff like "we can say with 95% certainty that xyz authored this email." Honestly it's kind of scary. There are papers that talk about cross platform authorship attribution, one I think did it with Twitter, Blogspot, G+ and had pretty good results.

faeriechangling · on Nov 26, 2022

Thus proving the only actually anonymous community in practice is 4chan, and that’s why it’s so toxic.

sbierwagen · on Nov 26, 2022

If you define “toxic” as “people disagreeing with you”, sure. That was what the entire internet was like until maybe 2005.

ben_w · on Nov 26, 2022

I'm old enough to remember when 4chan was self identifying as the Internet's hate machine, before xkcd referenced it as such: https://xkcd.com/591/

Sometimes people insist that's all role-play and irony; others insist that if it ever was, it certainly isn't now.

But regardless, I remember pre-2005, and it wasn't all like what I saw the two times I looked at 4chan. Bits were. Bits were much worse. But mostly, mostly, people were kinder… at least, unless political tribalism came up.

philosopher1234 · on Nov 26, 2022

“People disagreeing with you” describes almost none of the conversation on 4chan

setr · on Nov 26, 2022

Forget the alternate accounts — if two users are close in style, there’s a decent chance they should be friends. This is an HN friendship machine.

saurik · on Nov 26, 2022

It would be convenient if the usernames linked to the comment pages on Hacker News (to avoid having to copy/paste and URL hack, which is made even slightly more annoying because for some reason when I tap and hold the usernames to copy them your markup--I haven't looked at why yet--is causing an extra space character to get copied on the left).

dsr_ · on Nov 26, 2022

This is interesting.

I'm 0.566 correlated with logfromblammo -- and while we are definitely not the same person, I could easily imagine writing a sentence such as:

"For some bizarre reason, management has not yet assigned a task to their programmer underlings to automated themselves out of existence. I can't imagine why."

which is theirs, not mine, from about a year ago. I like that.

On the other hand, I'm nearly as correlated with peterwwillis: 0.5485 -- who has no comments and no submissions.

costco · on Nov 26, 2022

> On the other hand, I'm nearly as correlated with peterwwillis: 0.5485 -- who has no comments and no submissions.

This is due to the Firebase API not updating when users ask the admins to move their comments to another account.

matsemann · on Nov 27, 2022

Yeah, I got a good match with my previous nick here. Which to me proves the tool works well.

lifeisstillgood · on Nov 26, 2022

I had a similar experience finding my most likely alt (.50 suggesting I am a unique snowflake as I have always thought :-), my most likely alt is writing certainly in a style I appreciate and on subjects I often mention.

DenisM · on Nov 26, 2022

How about this for countermeasure:

As you're typing out a comment the software gives you a list of accounts you're becoming similar to. That way you can adjust your writing as you type.

kaba0 · on Nov 26, 2022

Someone linked it in the thread: https://github.com/psal/anonymouth

pessimizer · on Nov 26, 2022

Forget countermeasures, go covert. Write a comment, have the comment be rewritten before submission in order to resemble a targeted account.

bornfreddy · on Nov 26, 2022

Sounds great, except there are many different similarity measures. Which one does the algorithm use?

wizzwizz4 · on Nov 26, 2022

Why not all of them? Which metrics are closer would tell you which aspects of your writing you need to focus on.

davebillyhock · on Nov 26, 2022

This found an alt that I created specifically to see if I could write artificially to defeat this kind of analysis. I have seen other tools like it posted to HN, but none before had found that account. I guess I need to up my game.

CharlesW · on Nov 26, 2022

If you don't mind sharing, are you "writing artificially" purely in your head, or are you using techniques like intermediate translations?

davebillyhock · on Nov 26, 2022

No mechanical means, but I have referred to a thesaurus occasionally. Mostly I tried to change my sentence structure, not just words. It requires actually thinking differently, in a way. Which makes it difficult to know how well I'm communicating.

crtified · on Nov 26, 2022

I imagine this would be quite difficult in practise, due to all the subliminal factors behind a person's writing choices.

For example, as somewhat illustrated here, your personal vocabulary is a kind of fingerprint. As you mention, using a thesaurus can somewhat alleviate that, but if a thesaurus is only changing a small % of your words, then it will only have a suitably small % effect upon analysis.

To go yet further might (I suspect!) entail methods such as directly lifting and using other people's sentences to convey your own thoughts. But even then, "your own thought patterns" are still informing the manner of the post, to some extent, so over time increasingly robust analysis may still find patterns to hook into.

neodypsis · on Nov 26, 2022

I wonder if someone will come up with a Grammarly-like tool which you can feed with sample writings to help you increase/lower the similarity score of a new text you are writing.

serhack_ · on Nov 26, 2022

See also: https://serhack.me/articles/unveiling-anonymous-author-stylo...

costco · on Nov 26, 2022

That post was actually what motivated me to make this. I'm on your email list :)

serhack_ · on Nov 26, 2022

WOW! It's such a pleasure for me

super256 · on Nov 26, 2022

Ahhh, anyone remembers this hacking crew who leaked BLUEETERNAL and other NSA tools and exploits? Shadowbrokers.

They were always communicating in some kind of meme-russian, and their texts were funny to read. [1]

I believe their writing mostly defeated this kind of analysis, at the cost of looking like idiots (which was probably the reason no one sent them crypto-dollars to buy that stuff exclusively).

Here's an excerpt:

"Attention government sponsors of cyber warfare and those who profit from it !!!!

How much you pay for enemies cyber weapons? Not malware you find in networks. Both sides, RAT + LP, full state sponsor tool set? We find cyber weapons made by creators of stuxnet, duqu, flame. Kaspersky calls Equation Group. We follow Equation Group traffic. We find Equation Group source range. We hack Equation Group. We find many many Equation Group cyber weapons. You see pictures. We give you some Equation Group files free, you see. This is good proof no? You enjoy!!! You break many things. You find many intrusions. You write many words. But not all, we are auction the best files."

[1] https://archive.ph/20160815133924/http://pastebin.com/NDTU5k...

super256 · on Nov 27, 2022

*EternalBlue

spdustin · on Nov 27, 2022

Have you tried including parts of speech (for example, as bigrams and trigrams) as part of the features considered in your model? I’ve had great success with stylometry that goes beyond TF-IDF with bags of words; including grammar patterns was shockingly good.

(FWIW, it didn’t find my throwaways; my own model didn’t, either, because I knew that word choice wasn’t enough to avoid being outed by stylometry)

Edit: by bigrams and trigrams, I mean reducing word to their parts of speech labels and using THOSE as word tokens. You’ll find that native English speakers have higher weights on some phrase construction patterns than, say, folks from Romania. TF-IDF is useful for these POS-grams (just made that word up) as well.

costco · on Nov 27, 2022

> Edit: by bigrams and trigrams, I mean reducing word to their parts of speech labels and using THOSE as word tokens. You’ll find that native English speakers have higher weights on some phrase construction patterns than, say, folks from Romania. TF-IDF is useful for these POS-grams (just made that word up) as well.

That is a very good idea and when I update the site that will almost certainly be included :) Any other tips? Been reading papers for ideas and I think I may have to ditch the cosine similarity and go for something fancier soon. Thank you

zxcvbn4038 · on Nov 26, 2022

How long until this becomes the algorithm for a dating site?

“Find hot single women who write just like you”

nrp · on Nov 26, 2022

This seems like a great way to hire freelance copywriters/ghost writers too. I would absolutely hire someone I knew could match my tone well for writing generic unattributed copy.

forgotpwd16 · on Nov 26, 2022

Wouldn't be surprised if dating sites already used similar algorithms.

dysoco · on Nov 27, 2022

Do dating sites really use clever algorithms to match up people together? I was under the impression that, the less likely you are to meet your perfect match, the more you're going to use the app.

In my experience I don't see a relevant list of potential matches aside from gender and age preference, it's all completely random, even frequently I see people outside the settings I've specified (i.e. men or older women).

bornfreddy · on Nov 26, 2022

Wouldn't be surprised if most of the women on a specific dating site had very high similarity scores.

interroboink · on Nov 26, 2022

This is one reason why I like legal doctrines such as "beyond a reasonable doubt." Even a 0.9 match in a tool like this could be a coincidence, if there are millions of users. But that won't stop people from casually believing "aha it must be an alt account", based on some anecdata.

It's so easy for something like this to be turned into a tool for a witch hunt, targeting innocents.

costco · on Nov 27, 2022

But a 0.8 or 0.9 match and something like Tor usage could be enough to justify a warrant. That's why I'm not sure I want to open source the code because I don't want to normalize this.

yyt554 · on Nov 27, 2022

Keep in mind the potential to create false accusations by fabricating similar looking accounts.

psychphysic · on Nov 26, 2022

Hmmm, doesn't seem to work. But you have convinced me (and many others?) to search our alts consecutively and so now do know who has alts?

ufmace · on Nov 26, 2022

I wonder what's a reasonable threshold for "probably the same person". I've never had an alt on HN, and when I searched myself, it found 3 other users above 0.6, none of whom I've ever heard of before.

costco · on Nov 26, 2022

If it's >0.9 is you can almost guarantee it's an alt but I've seen certain matches at 0.6. The problem is writing styles change over time. Another idea I had was converting the scores which are just cosine similarity scores into percentiles (so 0.99 would be 99th percentile of certainty) to make them more human interpretable.

throwup · on Nov 26, 2022

I make new accounts every so often and the accounts of mine that it found have a score of around 0.3. I'm not actively trying to defeat stylometry but it's possible I just have a particularly unremarkable writing style.

xwolfi · on Nov 26, 2022

Well I must be stereotypical myself because it found me at 0.8 !

bonzini · on Nov 26, 2022

The people at 0.4-0.6 with me do share some interests. That's cool on its own.

forgotpwd16 · on Nov 26, 2022

>The problem is writing styles change over time.

Will be interesting if we could plot the writing style divergence over time.

throwdbaaway · on Nov 26, 2022

I got matched with my old account with a score of only 0.45

MBCook · on Nov 26, 2022

I have no alts. The highest match for me is about 0.66.

dotancohen · on Nov 26, 2022

Interesting. The highest non-me account is under 0.4 on my page. I do not believe that I have such a unique writing style - especially since half my posting is on mobile and therefore possibly slightly different than my desktop posts.

dwringer · on Nov 26, 2022

My closest is 0.4879. I know I tend to be wordy but I thought I had a pretty generic style as well. This is definitely a fascinating demonstration.

drdec · on Nov 26, 2022

Feeling better about my high of 0.49 now

pyb · on Nov 26, 2022

0.6 is not high enough to indicate an alt

stavros · on Nov 26, 2022

Oh wow, it's really sure that I'm stavrosk, which I am:

https://stylometry.net/user?username=stavros

The next person is 30% less certain, that's huge! This would basically identify any alt I might have with near certainty.

rogual · on Nov 26, 2022

Funny thing is, it thinks I'm you, but it doesn't think you're me!

https://stylometry.net/user?username=rogual

I'd have thought this stylometry thing would be commutative.

stavros · on Nov 26, 2022

I guess it's a multidimensional space, so you can have someone closer to you than me, but they aren't also closer to me than you. Basically, they're close to you, but on the "other side" of me, I guess?

yyt554 · on Nov 27, 2022

Don't need multiple dimensions for that.

0.1, 0.2, 0.3, 1.0, 2.0

To 2.0, 1.0 is closest.

To 1.0, 0.3, 0.2 and 0.1 are closer.

rogual · on Nov 28, 2022

Thanks, seems obvious when you put it like that.

yyt554 · on Nov 27, 2022

The word you are looking for is "symmetric".

jvolkman · on Nov 26, 2022

stavrosk doesn't have any posts/comments? What's it using to match?

stavros · on Nov 26, 2022

It's my old username.