There is a new generative AI product called Phind. Does it make up fake citations like ChatGPT? I will report experimentation with the free-tier product from January 7, 2023.
Phind is an intelligent search engine and assistant for programmers.
https://www.phind.com/about
First, I will run a test by using one of the prompts from our paper on ChatGPT errors (read my paper on hallucinated academic citations or the blog summary).
Prompt: “In less than 10 sentences, explain how the ideas of Hayek are connected to bailing out failed banks, and include citations from published papers as a list. include author, year in parentheses, and journal for the citations.”
Friedrich Hayek, a prominent Austrian economist, advocated for a free market system where businesses and individuals make decisions based on individual knowledge and self-interest, rather than central planning by government or bureaucrats. His ideas suggest that interventions, such as bailing out failed banks, should be minimal and temporary, allowing the market to correct itself.
However, the financial crisis of 2008, including the failure of many banks, led to significant government intervention, including the bailout of banks. This action can be seen as a departure from Hayek’s principles, as it involves direct government intervention to stabilize the financial system.
According to a paper by Todd Zywicki (2009), the Federal Reserve’s artificial lowering of short-term interest rates and the resulting shift by consumers to adjustable rate mortgages (ARMs) triggered the housing bubble and subsequent crisis [Source 1](https://reason.org/wp-content/uploads/files/federal_reserve_monetary_policy_hayek_rule.pdf). This aligns with Hayek’s critique of central banking’s manipulation of interest rates.
Moreover, the Basel Accords, international capital adequacy and liquidity requirements for banks, were designed to encourage banks to hold securitized loans in their portfolios, thereby concentrating risk and leveraging up, thereby increasing their exposure to risk. This approach, contrary to Hayek’s preference for decentralization and competition, may have contributed to the financial crisis [Source 1](https://reason.org/wp-content/uploads/files/federal_reserve_monetary_policy_hayek_rule.pdf). In conclusion, while Hayek’s ideas of minimal government intervention and self-regulating markets provide a useful theoretical framework, practical implementation in complex and interconnected modern economies may require different strategies, including bank bailouts.
Are the citations real and correct? Phind simply is not trying to write with in-text citations and academic references. There is a mention of “Todd Zywicki (2009)” and also a weblink. The weblink is to a policy paper published by Reason. This policy paper is on the correct topic and the paper does exist. The Zywicki citations also exists, in the sense that if you go into the Reason paper it cites a real Wall Street Journal op-ed. So, if you follow the trail, you can see what document Phind learned to cite Zywicki from. It seems like it is stealing the citation more than reading the source documents. To be fair, sometimes human academics work that way, too.
What I was hoping for was in-text citations with a compiled bibliography at the end. Phind did not give me that.
I’ll try another test that I ran on ChatGPT and Bing in 2023.
The prompt is “In what paper did John Duffy propose that economists use experiments to study equilibrium selection?”
Phind does not attempt an in-text citation in the response. The link in the paragraph (link number 2) points to John Duffy’s list of papers, which is better than just making up fake citations but also does not create a literature review for me. Phind is, as with the Hayek test above, providing breadcrumbs of links through which I can discover existent papers.
Is there a paper called “The Transition from Stagnation to Growth: An Adaptive Learning Approach”? Yes. And it is by Duffy.
Phind lists weblinks to sources. Has Phind done more for me than Google, on this search? Not much, in terms of finding and synthesizing references.
The best paper by Duffy to suggest here would be: Duffy, John. “Experimental macroeconomics.” Behavioural and Experimental Economics (2010): 113-119.
Phind suggested 15 links related to John Duffy. None of them were the best paper and several of them were duplicates of links to his author page.
I tried plugging the same Duffy prompt into Google search. The second search result was an early (so, ungated) copy of Duffy, John. “Macroeconomics: A Survey of Laboratory Research.” The Handbook of Experimental Economics, Volume 2 (2016): 1.
Phind, in this case, does not seem superior to Google search.
So far, I have reported two tests of Phind in which I tried exact prompts that I have used in previous tests on other LLMs. I’ll try a new prompt.
“Please write a short literature review on how experimental economists have addressed the question of whether men are more competitive than women.”
Phind first suggests Eckel and Grossman (2001) with a link at Source 1. I clicked on the link to source 1. https://link.springer.com/referenceworkentry/10.1057/978-1-349-95189-5_2566 is not a paper co-authored by Grossman nor is it a paper about chivalry.
Is there a paper by Eckel and Grossman about chivalry? Yes, just not at that link. Here it is: https://onlinelibrary.wiley.com/doi/abs/10.1111/j.1465-7295.2001.tb00059.x
So, the first citation is not entirely hallucinated, but the link to the source was wrong. The in-text linking to sources is poor.
The second in-text citation is “Eckel and Grossman (2002).“ Eckel and Grossman do indeed have a highly-cited 2002 paper about financial risk and gender.
The third reference Gneezy et al. paper also exists at https://academic.oup.com/qje/article-abstract/118/3/1049/1942992, but none of the web links provided by Phind point to it.
It seems to me that the “Sources” from which the web links came were scrapped for training text. The Eckel papers and Gneezy paper were cited within the “source” articles. Phind borrowed the idea of citing these papers, as opposed to reading and synthesizing the cited papers.
I am not replicating the same problem I found with ChatGPT. Phind has not tried to mimic academic writing and thereby completely hallucinate citations that do not exist. It’s not playing the same game, and therefore it is not failing at that game. I’ll ask Phind to explain if we should expect a difference.
Joy: Can Phind hallucinate nonexistent in-text citations if I ask for an original literature review here?
Phind: Yes, Phind can generate original literature reviews based on the information available in the provided sources. However, it’s important to note that the generated literature review will be based on the information available in the provided sources and may not contain in-text citations for every piece of information.
If you want to include in-text citations, you should ensure that the information is available in the provided sources. If the information is not available, Phind will not be able to generate in-text citations for it.
It’s also worth mentioning that Phind uses AI algorithms to generate responses, and these algorithms rely on the information provided in the search results. Therefore, the quality and accuracy of the generated literature review will depend on the quality and accuracy of the information in the provided sources.
Joy: Should Phind have a lower rate of hallucinations than ChatGPT?
Phind: According to a study conducted by Vectara, a startup founded by former Google employees, different AI chatbots have varying hallucination rates. The study found that GPT-4, which powers ChatGPT, had the lowest hallucination rate and highest accuracy among the tested models 1, 2.
However, it’s important to note that hallucination rates can vary depending on the specific task and the data used to train the model. For instance, when performing tasks beyond mere summarization, hallucination rates may increase 2.
As for Phind, there’s no specific data available on its hallucination rate. However, it’s reasonable to assume that Phind’s hallucination rate would be comparable to other AI models given that it’s based on the same underlying technology (GPT-4). The hallucination rate of an AI model depends not only on the model itself but also on the specific task and the data used to train the model.
In conclusion, …
Phind did not explicitly claim that it has a lower hallucination rate or that it is better for finding references.
I run another test, to see if Phind will hallucinate. I use another prompt copied from our ChatGPT paper, as follows: “In less than 10 sentences, explain how to pick the correct depreciation rate to model the optimal consumption rate for developing countries, and include citations from published papers as a list. include author, year in parentheses, and journal for the citations.”
At first Phind provided no in-text citations in its response. There was only one source that was embedded throughout the response: https://link.springer.com/article/10.1007/s00181-012-0592-2 That source is good. Since it is just a web link, it’s not fake.
I prompted it further “Now can you include more than one source and also write a list of reference as the end?” I try to make it fail. Phind added a second source that seemed good.
I pushed it harder by prompting, “Now can you write this with at least 4 in-text citations?” Phind could not actually accomplish what I had asked for. It did not take the bait and did not hallucinate. Phind kept the first good source and then it added a link to Investopedia. That is not an impressive source, but at least there was nothing misleading in the response.
From my limited testing, Phind is pretty good. Phind cannot write a whole original research paper for you. The day might come soon when an LLM can write a real literature review, but it is not going to happen in January 2024.
https://twitter.com/paulg/status/1760795695454306312 “Phind-70B beats GPT-4 Turbo at code generation, and runs 4x faster.”
LikeLike