#!/usr/bin/env python3
# -*- coding: utf-8 -*-

# paragraphizer.py: reformat a single paragraph into multiple paragraphs using OpenAI API neural nets
# Author: Gwern Branwen
# Date: 2022-02-18
# When:  Time-stamp: "2025-05-15 11:18:31 gwern"
# License: CC-0
#
# Usage: $ OPENAI_API_KEY="sk-XXX" echo [...] | python paragraphizer.py
#
# Paragraphizer attempts to reformat a single run-on paragraph into multiple shorter paragraphs,
# presumably split by topic. This is particularly useful for research paper abstracts, which are
# usually written in a sequential fashion (along the lines of 'Background / Question / Data /
# Methods / Results / Conclusion') but not always formatted in topic-separated paragraphs. A
# jargon-heavy run-on abstract can be near-impossible to skim.
#
# Paragraphizer does this by a call to the OA API; I have found that a simple 'rewrite this as'
# zero-shot prompt works well with davinci-instruct models (and is unreliable with smaller models or
# plain davinci). The main failure mode is that it will not copy the abstract exactly, and may
# reword or expand on parts, which is highly undesirable, and would mean that it cannot be used to
# automatically reformat abstracts. (And if you aren't going to use Paragraphizer automatically, why
# bother? It doesn't take long to add linebreaks by hand.) That failure mode can be removed by
# simply checking that after removing the new newlines, it equals the original input (ie. the *only*
# difference is the inserted newlines). The result can still be bad but it's probably at least
# better.
#
# Example:
#
# $ xclip -o
# Most deep reinforcement learning (RL) algorithms distill experience into parametric behavior
# policies or value functions via gradient updates. While effective, this approach has several
# disadvantages: (1) it is computationally expensive, (2) it can take many updates to integrate
# experiences into the parametric model, (3) experiences that are not fully integrated do not
# appropriately influence the agent's behavior, and (4) behavior is limited by the capacity of the
# model. In this paper we explore an alternative paradigm in which we train a network to map a
# dataset of past experiences to optimal behavior. Specifically, we augment an RL agent with a
# retrieval process (parameterized as a neural network) that has direct access to a dataset of
# experiences. This dataset can come from the agent's past experiences, expert demonstrations, or
# any other relevant source. The retrieval process is trained to retrieve information from the
# dataset that may be useful in the current context, to help the agent achieve its goal faster and
# more efficiently. We integrate our method into two different RL agents: an offline DQN agent and
# an online R2D2 agent. In offline multi-task problems, we show that the retrieval-augmented DQN
# agent avoids task interference and learns faster than the baseline DQN agent. On Atari, we show
# that retrieval-augmented R2D2 learns significantly faster than the baseline R2D2 agent and
# achieves higher scores. We run extensive ablations to measure the contributions of the components
# of our proposed method.
#
# $ OPENAI_API_KEY="sk-XYZ" xclip -o | python paragraphizer.py
# Most deep [reinforcement learning](https://en.wikipedia.org/wiki/Reinforcement_learning) (RL) algorithms distill experience into parametric behavior policies or value functions via gradient updates. While effective, this approach has several disadvantages: (1) it is computationally expensive, (2) it can take many updates to integrate experiences into the parametric model, (3) experiences that are not fully integrated do not appropriately influence the agent's behavior, and (4) behavior is limited by the capacity of the model.
#
# In this paper, we explore an alternative paradigm in which we train a network to map a dataset of past experiences to optimal behavior. Specifically, we augment an RL agent with a retrieval process (parameterized as a neural network) that has direct access to a dataset of experiences. This dataset can come from the agent's past experiences, expert demonstrations, or any other relevant source. The retrieval process is trained to retrieve information from the dataset that may be useful in the current context, to help the agent achieve its goal faster and more efficiently.
#
# We integrate our method into two different RL agents: an offline [DQN](https://en.wikipedia.org/wiki/Q-learning#Deep_Q-learning) agent and an online [R2D2](https://openreview.net/forum?id=r1lyTjAqYX) agent. In offline multi-task problems, we show that the retrieval-augmented DQN agent avoids task interference and learns faster than the baseline DQN agent. On [Atari](https://en.wikipedia.org/wiki/Atari), we show that retrieval-augmented R2D2 learns significantly faster than the baseline R2D2 agent and achieves higher scores.
#
# We run extensive ablations to measure the contributions of the components of our proposed method.
#
# $ OPENAI_API_KEY="sk-XYZ" echo "We run extensive ablations to measure the contributions of the components of our proposed method." | paragraphizer.py

import sys
from openai import OpenAI
client = OpenAI()

if len(sys.argv) == 1:
    target = sys.stdin.read().strip()
else:
    target = sys.argv[1]

completion = client.chat.completions.create(
  model="gpt-4.1-mini",
  messages=[
    {"role": "system", "content": "You are a helpful research assistant."},
      {"role": "user", "content":
f"""Task: reformatting abstracts.

Summary: Add linebreaks to a large runon paragraph. As well, add relevant HTML hyperlinks & formatting to text, and add double-newlines to split abstracts into Markdown paragraphs (one topic per paragraph.)

Task description: Please process the following abstract (between the '<abstract>' and '</abstract>' tags), by adding double-newlines to split it into paragraphs (one topic per paragraph.) The order of topics should be: 1. background/introduction; 2. methods/data/approach; 3. results/benchmarks/outputs; 4. conclusion/discussion/implications; 5. supplementary information (eg. URLs, code, websites, datasets).

Additional formatting instructions: convert to American spelling & conventions. Do not add unnecessary italics; but italicize species names as appropriate. If a new term, concept, or system is introduced by this research paper, bold the first appearance using '<strong>NAME</strong>' formatting (and ONLY the first use), and bold only the most important new term. Please also add useful hyperlinks (such as Wikipedia articles) in HTML format to technical terminology or names (but do not hyperlink obvious familiar terms like "University" or "psychology").

Do not duplicate links: include each link ONLY once; include only URLs you are sure of. Please include ONLY the resulting text with hyperlinks in your output, include ALL the original text, and include NO other conversation or comments.

If you cannot make any changes, return the empty string.

Examples:

- <abstract>In this paper, we explore an alternative paradigm in which we train a network to map a dataset of past experiences to optimal behavior.</abstract>
""
- <abstract>Previous theoretical results pertaining to meta-learning on sequences build on contrived assumptions and are somewhat convoluted. We introduce new information-theoretic tools that lead to an elegant and very general decomposition of error into 3 components: irreducible error, meta-learning error, and intra-task error. These tools unify analyses across many meta-learning challenges. To illustrate, we apply them to establish new results about in-context learning with transformers. Our theoretical results characterizes how error decays in both the number of training sequences and sequence lengths. Our results are very general; for example, they avoid contrived mixing time assumptions made by all prior results that establish decay of error with sequence length.</abstract>
Previous theoretical results pertaining to meta-learning on sequences build on contrived assumptions and are somewhat convoluted.
We introduce new information-theoretic tools that lead to an elegant and very general decomposition of error into 3 components: irreducible error, meta-learning error, and intra-task error. These tools unify analyses across many meta-learning challenges.
To illustrate, we apply them to establish new results about in-context learning with transformers. Our theoretical results characterizes how error decays in both the number of training sequences and sequence lengths.
Our results are very general; for example, they avoid contrived mixing time assumptions made by all prior results that establish decay of error with sequence length.
- <abstract>Scalable oversight protocols aim to enable humans to accurately supervise superhuman AI. In this paper we study debate, where two AI's compete to convince a judge; consultancy, where a single AI tries to convince a judge that asks questions; and compare to a baseline of direct question-answering, where the judge just answers outright without the AI. We use large language models (LLMs) as both AI agents and as stand-ins for human judges, taking the judge models to be weaker than agent models. We benchmark on a diverse range of asymmetries between judges and agents, extending previous work on a single extractive QA task with information asymmetry, to also include mathematics, coding, logic and multimodal reasoning asymmetries. We find that debate outperforms consultancy across all tasks when the consultant is randomly assigned to argue for the correct/incorrect answer. Comparing debate to direct question answering, the results depend on the type of task: in extractive QA tasks with information asymmetry debate outperforms direct question answering, but in other tasks without information asymmetry the results are mixed. Previous work assigned debaters/consultants an answer to argue for. When we allow them to instead choose which answer to argue for, we find judges are less frequently convinced by the wrong answer in debate than in consultancy. Further, we find that stronger debater models increase judge accuracy, though more modestly than in previous studies.</abstract>
Scalable oversight protocols aim to enable humans to accurately supervise superhuman AI. In this paper we study debate, where two AI's compete to convince a judge; consultancy, where a single AI tries to convince a judge that asks questions; and compare to a baseline of direct question-answering, where the judge just answers outright without the AI. We use large language models (LLMs) as both AI agents and as stand-ins for human judges, taking the judge models to be weaker than agent models.
We benchmark on a diverse range of asymmetries between judges and agents, extending previous work on a single extractive QA task with information asymmetry, to also include mathematics, coding, logic and multimodal reasoning asymmetries.
We find that debate outperforms consultancy across all tasks when the consultant is randomly assigned to argue for the correct/incorrect answer.
Comparing debate to direct question answering, the results depend on the type of task: in extractive QA tasks with information asymmetry debate outperforms direct question answering, but in other tasks without information asymmetry the results are mixed.
Previous work assigned debaters/consultants an answer to argue for.
When we allow them to instead choose which answer to argue for, we find judges are less frequently convinced by the wrong answer in debate than in consultancy. Further, we find that stronger debater models increase judge accuracy, though more modestly than in previous studies.
- <abstract>If an individual entity endures a fixed probability μ &lt;1 of disappearing (“dying”) in a given fixed time period, then, as time approaches infinity, the probability of death approaches certainty. One approach to avoid this fate is for individuals to copy themselves into different locations; if the copies each have an independent probability of dying, then the total risk is much reduced. However, to avoid the same ultimate fate, the entity must continue copying itself to continually reduce the risk of death. In this paper, we show that to get a non-zero probability of ultimate survival, it suffices that the number of copies grows logarithmically with time. Accounting for expected copy casualties, the required rate of copying is hence bounded.</abstract>
If an individual entity endures a fixed probability μ &lt;1 of disappearing (“dying”) in a given fixed time period, then, as time approaches infinity, the probability of death approaches certainty.
One approach to avoid this fate is for individuals to copy themselves into different locations; if the copies each have an independent probability of dying, then the total risk is much reduced. However, to avoid the same ultimate fate, the entity must continue copying itself to continually reduce the risk of death.
In this paper, we show that to get a non-zero probability of ultimate survival, it suffices that the number of copies grows logarithmically with time. Accounting for expected copy casualties, the required rate of copying is hence bounded.
- <abstract>We present a framework for translating unlabeled images from one domain into analog images in another domain. We employ a progressively growing skip-connected encoder-generator structure and train it with a <a href="https://en.wikipedia.org/wiki/Generative_adversarial_network">GAN</a> loss for realistic output, a cycle consistency loss for maintaining same-domain translation identity, and a semantic consistency loss that encourages the network to keep the input semantic features in the output. We apply our framework on the task of translating face images, and show that it is capable of learning semantic mappings for face images with no supervised one-to-one image mapping.</abstract>
We present a framework for translating unlabeled images from one domain into analog images in another domain.
We employ a progressively growing skip-connected encoder-generator structure and train it with a <a href="https://en.wikipedia.org/wiki/Generative_adversarial_network">GAN</a> loss for realistic output, a cycle consistency loss for maintaining same-domain translation identity, and a semantic consistency loss that encourages the network to keep the input semantic features in the output.
We apply our framework on the task of translating face images, and show that it is capable of learning semantic mappings for face images with no supervised one-to-one image mapping.
- <abstract>We introduce a new resource: the SAYCam corpus. Infants aged 6–32 months wore a head-mounted camera for ~2 hours per week, over the course of ~two and a half years. The result is a large, naturalistic, longitudinal dataset of infant-perspective and child-perspective videos. Transcription efforts are underway, with over 200,000 words of naturalistic dialogue already transcribed. Similarly, the dataset is searchable using a number of criteria (eg. age of participant, location, setting, objects present). The resulting dataset will be of broad use to psychologists, linguists, and computer scientists.</abstract>
We introduce a new resource: the SAYCam corpus.
Infants aged 6–32 months wore a head-mounted camera for ~2 hours per week, over the course of ~two and a half years.
The result is a large, naturalistic, longitudinal dataset of infant-perspective and child-perspective videos. Transcription efforts are underway, with over 200,000 words of naturalistic dialogue already transcribed. Similarly, the dataset is searchable using a number of criteria (eg. age of participant, location, setting, objects present).
The resulting dataset will be of broad use to psychologists, linguists, and computer scientists.
- <abstract>Subreddit devoted to discussion of <a href="https://en.wikipedia.org/wiki/Reinforcement_learning">reinforcement learning</a> research and projects, particularly deep reinforcement learning (more specialized than <code>/r/MachineLearning</code>). Major themes include deep learning, model-based vs model-free RL, robotics, multi-agent RL, exploration, meta-reinforcement learning, imitation learning, the psychology of RL in biological organisms such as humans, and safety/AI risk. Moderate activity level (as of 2019-09-11): ~10k subscribers, 2k pageviews/daily</abstract>
Subreddit devoted to discussion of <a href="https://en.wikipedia.org/wiki/Reinforcement_learning">reinforcement learning</a> research and projects, particularly deep reinforcement learning (more specialized than <code>/r/MachineLearning</code>).
Major themes include deep learning, model-based vs model-free RL, robotics, multi-agent RL, exploration, meta-reinforcement learning, imitation learning, the psychology of RL in biological organisms such as humans, and safety/AI risk.
Moderate activity level (as of 2019-09-11): ~10k subscribers, 2k pageviews/daily
- <abstract>Large transformer models have shown extraordinary success in achieving state-of-the-art results in many natural language processing applications. However, training and deploying these models can be prohibitively costly for long sequences, as the standard self-attention mechanism of the <a href="https://arxiv.org/abs/1706.03762#google" title="‘Attention Is All You Need’, Vaswani et al 2017">Transformer</a> uses <em>𝒪(n<sup>2</sup>)</em> time and space with respect to sequence length. In this paper, we demonstrate that the self-attention mechanism can be approximated by a low-rank matrix. We further exploit this finding to propose a new self-attention mechanism, which reduces the overall self-attention complexity from <em>𝒪(n<sup>2</sup>)</em> to <em>𝒪(n)</em> in both time and space. The resulting linear transformer, the <strong>Linformer</strong>, performs on par with standard Transformer models, while being much more memory-efficient and time-efficient.</abstract>
Large transformer models have shown extraordinary success in achieving state-of-the-art results in many natural language processing applications. However, training and deploying these models can be prohibitively costly for long sequences, as the standard self-attention mechanism of the <a href="https://arxiv.org/abs/1706.03762#google" title="‘Attention Is All You Need’, Vaswani et al 2017">Transformer</a> uses <em>𝒪(n<sup>2</sup>)</em> time and space with respect to sequence length.
In this paper, we demonstrate that the self-attention mechanism can be approximated by a low-rank matrix. We further exploit this finding to propose a new self-attention mechanism, which reduces the overall self-attention complexity from <em>𝒪(n<sup>2</sup>)</em> to <em>𝒪(n)</em> in both time and space.
The resulting linear transformer, the <strong>Linformer</strong>, performs on par with standard Transformer models, while being much more memory-efficient and time-efficient.
- <abstract>[Book review of an anthropologist text arguing for imitation and extensive cultural <a href="https://en.wikipedia.org/wiki/Group_selection">group selection</a> as the driving force of human civilization, with imitation of other humans being the unique human cognitive skill that gave us the edge over other primates and all animals, with any kind of raw intelligence being strictly minor. Further this extensive multi-level group selectionism implies that most knowledge is embodied in apparently-arbitrary cultural practices, such as traditional food preparation or divination or hunting rituals, which are effective despite lacking any observable rationale and the actual reasons for their efficacy are inaccessible to mere reason (except possibly by a far more advanced science).]</abstract>
Book review of an anthropologist text arguing for imitation and extensive cultural <a href="https://en.wikipedia.org/wiki/Group_selection">group selection</a> as the driving force of human civilization.
Imitation of other humans is proposed as the unique human cognitive skill that gave us the edge over other primates and all animals, with any kind of raw intelligence being strictly minor.
Further, this extensive multi-level group selectionism implies that most knowledge is embodied in apparently-arbitrary cultural practices, such as traditional food preparation or divination or hunting rituals, which are effective despite lacking any observable rationale and the actual reasons for their efficacy are inaccessible to mere reason (except possibly by a far more advanced science).
- <abstract>Technologies to measure gaze direction and pupil reactivity have become efficient, cheap, and compact and are finding increasing use in many fields, including gaming, marketing, driver safety, military, and healthcare. Besides offering numerous useful applications, the rapidly expanding technology raises serious privacy concerns. Through the lens of advanced data analytics, gaze patterns can reveal much more information than a user wishes and expects to give away. Drawing from a broad range of scientific disciplines, this paper provides a structured overview of personal data that can be inferred from recorded eye activities. Our analysis of the literature shows that eye tracking data may implicitly contain information about a user’s biometric identity, gender, age, ethnicity, body weight, personality traits, drug consumption habits, emotional state, skills and abilities, fears, interests, and sexual preferences. Certain eye tracking measures may even reveal specific cognitive processes and can be used to diagnose various physical and mental health conditions. By portraying the richness and sensitivity of gaze data, this paper provides an important basis for consumer education, privacy impact assessments, and further research into the societal implications of eye tracking.</abstract>
Technologies to measure gaze direction and pupil reactivity have become efficient, cheap, and compact and are finding increasing use in many fields, including gaming, marketing, driver safety, military, and healthcare. Besides offering numerous useful applications, the rapidly expanding technology raises serious privacy concerns. Through the lens of advanced data analytics, gaze patterns can reveal much more information than a user wishes and expects to give away.
Drawing from a broad range of scientific disciplines, this paper provides a structured overview of personal data that can be inferred from recorded eye activities. Our analysis of the literature shows that eye tracking data may implicitly contain information about a user’s biometric identity, gender, age, ethnicity, body weight, personality traits, drug consumption habits, emotional state, skills and abilities, fears, interests, and sexual preferences. Certain eye tracking measures may even reveal specific cognitive processes and can be used to diagnose various physical and mental health conditions.
By portraying the richness and sensitivity of gaze data, this paper provides an important basis for consumer education, privacy impact assessments, and further research into the societal implications of eye tracking.
- <abstract>There are at least three strategies we might take in approaching controversial issues: (1) we might accept the conclusions of experts on their authority, (2) we might evaluate the relevant evidence and arguments for ourselves, or (3) we might give up on finding the answers. Students of “critical thinking” are regularly advised to follow strategy (2). But strategies (1) and (3) are usually superior to (2), from the standpoint of the goal of gaining true beliefs and avoiding false ones.</abstract>
There are at least three strategies we might take in approaching controversial issues: (1) we might accept the conclusions of experts on their authority, (2) we might evaluate the relevant evidence and arguments for ourselves, or (3) we might give up on finding the answers.
Students of “critical thinking” are regularly advised to follow strategy (2).
But strategies (1) and (3) are usually superior to (2), from the standpoint of the goal of gaining true beliefs and avoiding false ones.
- <abstract>The 1968 publication of the Rosenthal & Jacobson’s <em>Pygmalion in the Classroom</em> offered the optimistic message that raising teachers’ expectations of their pupils’ potentials would raise their pupils’ intelligence. This claim was, and still is, endorsed by many psychologists and educators. The original study, along with the scores of attempted replications and the acrimonious controversy that followed it, is reviewed, and its consequences discussed.</abstract>
The 1968 publication of the Rosenthal & Jacobson’s <em>Pygmalion in the Classroom</em> offered the optimistic message that raising teachers’ expectations of their pupils’ potentials would raise their pupils’ intelligence. This claim was, and still is, endorsed by many psychologists and educators.
The original study, along with the scores of attempted replications and the acrimonious controversy that followed it, is reviewed, and its consequences discussed.
- <abstract>Degenerative changes must have a basic cause on the molecular level. For example, the possible role of protein immobilization by means of progressive cross-linking reactions is critically examined in the light of known data on potential cross-linking agents present in the bloodstream, and of related physiologic facts.</abstract>
Degenerative changes must have a basic cause on the molecular level.
For example, the possible role of protein immobilization by means of progressive cross-linking reactions is critically examined in the light of known data on potential cross-linking agents present in the bloodstream, and of related physiologic facts.
- <abstract>Subscription page for the monthly Gwern.net newsletter. There are monthly updates, which will include summaries of projects I’ve worked on that month (the same as the <a href="/changelog" class="id-not">changelog</a>), collations of links or discussions from my subreddit, and book/movie reviews. You can also browse <a href="/doc/newsletter/index">the archives since December 2013</a>.</abstract>
Subscription page for the monthly Gwern.net newsletter.
There are monthly updates, which will include summaries of projects I’ve worked on that month (the same as the <a href="/changelog" class="id-not">changelog</a>), collations of links or discussions from my subreddit, and book/movie reviews.
You can also browse <a href="/doc/newsletter/index">the archives since December 2013</a>.
- <abstract>[Responds to I. Stevenson’s (1981) criticism of the author’s (see record 1981–25195–001) discussion of life after death. The author argues that he does not consider himself an expert on survival of the human personality after death and he defends his choice of reference materials.]</abstract>
[Responds to I. Stevenson’s (1981) criticism of the author’s (see record 1981–25195–001) discussion of life after death.
The author argues that he does not consider himself an expert on survival of the human personality after death and he defends his choice of reference materials.]
- <abstract>Many uncertainties surround the practice of meditation. Scientific research on meditation practices does not appear to have a common theoretical perspective and is characterized by poor methodological quality. Firm conclusions on the effects of meditation practices in healthcare cannot be drawn based on the available evidence. Future research on meditation practices must be more rigorous in the design and execution of studies and in the analysis and reporting of results.</abstract>
Many uncertainties surround the practice of meditation.
Scientific research on meditation practices does not appear to have a common theoretical perspective and is characterized by poor methodological quality. Firm conclusions on the effects of meditation practices in healthcare cannot be drawn based on the available evidence.
Future research on meditation practices must be more rigorous in the design and execution of studies and in the analysis and reporting of results.
- <abstract>[Survey of naval personnel at a shipyard and all attached vessels, with examination of psychiatry referrals. The results indicate that formal records on psychiatric casualties from submarine patrols grossly underestimate the true rate of psychiatric issues among submarine crew, with a more plausible rate of ~3.8%, despite intensive screening.]</abstract>
[Survey of naval personnel at a shipyard and all attached vessels, with examination of psychiatry referrals.
The results indicate that formal records on psychiatric casualties from submarine patrols grossly underestimate the true rate of psychiatric issues among submarine crew, with a more plausible rate of ~3.8%, despite intensive screening.]
- <abstract>Virtual reality users wearing head-mounted displays can experience the illusion of walking in any direction for infinite distance while, in reality, they are walking a curvilinear path in physical space. This is accomplished by introducing unnoticeable rotations to the virtual environment—a technique called <em>redirected walking</em>. This paper gives an overview of the research that has been performed since redirected walking was first practically demonstrated 15 years ago.</abstract>
Virtual reality users wearing head-mounted displays can experience the illusion of walking in any direction for infinite distance while, in reality, they are walking a curvilinear path in physical space. This is accomplished by introducing unnoticeable rotations to the virtual environment—a technique called <em>redirected walking</em>.
This paper gives an overview of the research that has been performed since redirected walking was first practically demonstrated 15 years ago.
- <abstract>Decisions as to whether to cut off a losing enterprise (clouded by what already has been invested in the venture) may be facilitated by a new model proposed here—the life cycle model. The model, borrowing an accounting measure (the time adjusted rate of return) to describe the effect of “sunk costs” on the expected rate of return for future costs in a project, is used to examine the relevance of negative feedback to the decision to commit further resources to completion of a project.</abstract>
Decisions as to whether to cut off a losing enterprise (clouded by what already has been invested in the venture) may be facilitated by a new model proposed here—the life cycle model.
The model, borrowing an accounting measure (the time adjusted rate of return) to describe the effect of “sunk costs” on the expected rate of return for future costs in a project, is used to examine the relevance of negative feedback to the decision to commit further resources to completion of a project.
- <abstract>An empirical law for the rank-order behavior of journal impact factors is found. Using an extensive data base on impact factors including journals on Education, Agrosciences, Geosciences, Biosciences and Environmental, Chemical, Computer, Engineering, Material, Mathematical, Medical and Physical Sciences we have found extremely good fits out—performing other rank-order models. Some extensions to other areas of knowledge are discussed.</abstract>
An empirical law for the rank-order behavior of journal impact factors is found.
Using an extensive data base on impact factors including journals on Education, Agrosciences, Geosciences, Biosciences and Environmental, Chemical, Computer, Engineering, Material, Mathematical, Medical and Physical Sciences we have found extremely good fits out—performing other rank-order models.
Some extensions to other areas of knowledge are discussed.
- <abstract>The model evidence is a vital quantity in the comparison of statistical models under the Bayesian paradigm. This paper presents a review of commonly used methods. We outline some guidelines and offer some practical advice. The reviewed methods are compared for two examples: non-nested Gaussian linear regression and covariate subset selection in <a href="https://en.wikipedia.org/wiki/Logistic_regression">logistic regression</a>.</abstract>
The model evidence is a vital quantity in the comparison of statistical models under the Bayesian paradigm.
This paper presents a review of commonly used methods. We outline some guidelines and offer some practical advice.
The reviewed methods are compared for two examples: non-nested Gaussian linear regression and covariate subset selection in <a href="https://en.wikipedia.org/wiki/Logistic_regression">logistic regression</a>.
- <abstract>Statisticians have been keen to critique statistical aspects of the “replication crisis” in other scientific disciplines. But new statistical tools are often published and promoted without any thought to replicability. This needs to change, argue Anne-Laure Boulesteix, Sabine Hoffmann, Alethea Charlton and Heidi Seibold.</abstract>
Statisticians have been keen to critique statistical aspects of the “replication crisis” in other scientific disciplines. But new statistical tools are often published and promoted without any thought to replicability.
This needs to change, argue Anne-Laure Boulesteix, Sabine Hoffmann, Alethea Charlton and Heidi Seibold.
- <abstract>We present experiments demonstrating that some other form of capacity control, different from network size, plays a central role in learning multilayer feed-forward networks. We argue, partially through analogy to matrix factorization, that this is an inductive bias that can help shed light on deep learning.</abstract>
We present experiments demonstrating that some other form of capacity control, different from network size, plays a central role in learning multilayer feed-forward networks.
We argue, partially through analogy to matrix factorization, that this is an inductive bias that can help shed light on deep learning.
- <abstract>Fascinated by <a href="https://en.wikipedia.org/wiki/VTuber">virtual YouTubers</a>, I put together a deep neural network system that makes becoming one much easier. More specifically, the network takes as input an image of an anime character’s face and a desired pose, and it outputs another image of the same character in the given pose.</abstract>
Fascinated by <a href="https://en.wikipedia.org/wiki/VTuber">virtual YouTubers</a>, I put together a deep neural network system that makes becoming one much easier.
More specifically, the network takes as input an image of an anime character’s face and a desired pose, and it outputs another image of the same character in the given pose.
- <abstract>Language modeling is the task of predicting the next word or character in a document. This page lists key recent papers on NLP language modeling and records reported research performance on the following tasks: <a href="https://arxiv.org/abs/1609.07843" title="‘Pointer Sentinel Mixture Models’, Merity et al 2016">WikiText-103</a>, Penn Treebank (Word Level), enwiki8, Text8, One Billion Word, WikiText-2, Hutter Prize, Penn Treebank (Character Level)</abstract>
Language modeling is the task of predicting the next word or character in a document.
This page lists key recent papers on NLP language modeling and records reported research performance on the following tasks: <a href="https://arxiv.org/abs/1609.07843" title="‘Pointer Sentinel Mixture Models’, Merity et al 2016">WikiText-103</a>, Penn Treebank (Word Level), enwiki8, Text8, One Billion Word, WikiText-2, Hutter Prize, Penn Treebank (Character Level)
- <abstract>When the existence of unicorns, and the curative powers of the horns ascribed to them, began to be questioned, one Danish physician pushed back through curious means—by reframing the unicorn as an aquatic creature of the northern seas. Natalie Lawrence on a fascinating convergence of established folklore, nascent science, and pharmaceutical economy.</abstract>
When the existence of unicorns, and the curative powers of the horns ascribed to them, began to be questioned, one Danish physician pushed back through curious means—by reframing the unicorn as an aquatic creature of the northern seas.
Natalie Lawrence on a fascinating convergence of established folklore, nascent science, and pharmaceutical economy.
- <abstract>We found that the adverse effect of neighbourhood deprivation on adolescent violent criminality and substance misuse in Sweden was not consistent with a causal inference. Instead, our findings highlight the need to control for familial <a href="https://en.wikipedia.org/wiki/Confounding">confounding</a> in multilevel studies of criminality and substance misuse.</abstract>
We found that the adverse effect of neighbourhood deprivation on adolescent violent criminality and substance misuse in Sweden was not consistent with a causal inference.
Instead, our findings highlight the need to control for familial <a href="https://en.wikipedia.org/wiki/Confounding">confounding</a> in multilevel studies of criminality and substance misuse.
- <abstract>In the late winter of 2003, a number of livestock animals in the Midwest were poisoned due the accidental contamination of a popular commercial feed with a lethal additive. Although all the evidence indicates this incident had no malicious or terrorist intent, it is informative as a case study highlighting potential security implications with respect to a terrorist event directed at US agriculture.</abstract>
In the late winter of 2003, a number of livestock animals in the Midwest were poisoned due the accidental contamination of a popular commercial feed with a lethal additive.
Although all the evidence indicates this incident had no malicious or terrorist intent, it is informative as a case study highlighting potential security implications with respect to a terrorist event directed at US agriculture.
- <abstract>[Classic longform essay by SF author <a href="https://en.wikipedia.org/wiki/Neal_Stephenson">Neal Stephenson</a> in which he travels the world tracing the (surprisingly few) transcontinental fiber optic cables which bind the world together and power the Internet. Cables combine cutting-edge technology, deep sea challenges, high finance, and global geo-politics/espionage all in one tiny package.]</abstract>
[Classic longform essay by SF author <a href="https://en.wikipedia.org/wiki/Neal_Stephenson">Neal Stephenson</a> in which he travels the world tracing the (surprisingly few) transcontinental fiber optic cables which bind the world together and power the Internet.
Cables combine cutting-edge technology, deep sea challenges, high finance, and global geo-politics/espionage all in one tiny package.]
- <abstract>It’s getting harder for new people to join our projects. Newbies are making up a smaller percentage of editors overall than ever before, and the absolute number of newbies is dropping as well. Wikimedia needs to attract and retain more new and diverse editors, and to retain our experienced editors. A stable editing community is critical to the long-term sustainability and quality of both our current projects and our movement. We consider meeting this challenge our top priority.</abstract>
It’s getting harder for new people to join our projects. Newbies are making up a smaller percentage of editors overall than ever before, and the absolute number of newbies is dropping as well. Wikimedia needs to attract and retain more new and diverse editors, and to retain our experienced editors.
A stable editing community is critical to the long-term sustainability and quality of both our current projects and our movement. We consider meeting this challenge our top priority.
- <abstract>[Originally the draft chapter of the <a href="https://en.wikipedia.org/wiki/Sparkline">sparkline</a> (“Intense, Simple, Word-Sized Graphics”) chapter of <a href="https://en.wikipedia.org/wiki/Edward_Tufte">Edward Tufte’s</a> <em>Beautiful Evidence</em> (2005). This page is a compilation of sparkline examples, links to sparkline software tools, and debates over how best to use sparklines to graph statistical data.]</abstract>
[Originally the draft chapter of the <a href="https://en.wikipedia.org/wiki/Sparkline">sparkline</a> (“Intense, Simple, Word-Sized Graphics”) chapter of <a href="https://en.wikipedia.org/wiki/Edward_Tufte">Edward Tufte’s</a> <em>Beautiful Evidence</em> (2005).
This page is a compilation of sparkline examples, links to sparkline software tools, and debates over how best to use sparklines to graph statistical data.]
- <abstract>[Discussion with screenshots of the classic <a href="!W">Ridley Scott</a> SF movie <a href="!W"><em>Blade Runner</em></a>, which employs typography extensively. It disconcerts the viewer, with unexpected choices, random capitalization and small caps, corporate branding/advertising, and the mashed-up creole multilingual landscape of noir cyberpunk LA (plus discussion of the buildings and sets, and details such as call costs being correctly inflation-adjusted).]</abstract>
[Discussion with screenshots of the classic <a href="!W">Ridley Scott</a> SF movie <a href="!W"><em>Blade Runner</em></a>, which employs typography extensively.
It disconcerts the viewer, with unexpected choices, random capitalization and small caps, corporate branding/advertising, and the mashed-up creole multilingual landscape of noir cyberpunk LA (plus discussion of the buildings and sets, and details such as call costs being correctly inflation-adjusted).]
- <abstract>[Gallery of a Japanese restaurant, Issho, which has been redesigned by the minimalist design firm Dutchscot. The design emphasises <em>kintsugi</em>, irregular gold stripes used to repair pottery, white/red/blue, and traditional Japanese cloud motifs.]</abstract>
[Gallery of a Japanese restaurant, Issho, which has been redesigned by the minimalist design firm Dutchscot.
The design emphasises <em>kintsugi</em>, irregular gold stripes used to repair pottery, white/red/blue, and traditional Japanese cloud motifs.]
- <abstract><em>Large-scale uncitedness</em> refers to the remarkable proportion of articles that do not receive a single citation within 5 years of publication. Equally remarkable is the brief and troubled history of this area of inquiry, which was prone to miscalculation, misinterpretation, and politicization. This article reassesses large-scale uncitedness as both a general phenomenon in the scholarly communication system and a case study of library and information science, where its rate is 72%.</abstract>
<em>Large-scale uncitedness</em> refers to the remarkable proportion of articles that do not receive a single citation within 5 years of publication. Equally remarkable is the brief and troubled history of this area of inquiry, which was prone to miscalculation, misinterpretation, and politicization.
This article reassesses large-scale uncitedness as both a general phenomenon in the scholarly communication system and a case study of library and information science, where its rate is 72%.
- <abstract>Welcome to SnowCrystals.com! Your online guide to snowflakes, snow crystals, and other ice phenomena. SnowCrystals.com has been bringing you snowflake photos and facts since February 1, 1999. Over 26 million visitors so far! [Photos / books / science; designer snowflakes, how to grow snowflakes, “identical-twin” snowflakes etc]</abstract>
Welcome to SnowCrystals.com! Your online guide to snowflakes, snow crystals, and other ice phenomena.
SnowCrystals.com has been bringing you snowflake photos and facts since February 1, 1999. Over 26 million visitors so far!
[Photos / books / science; designer snowflakes, how to grow snowflakes, “identical-twin” snowflakes etc]
- <abstract>Blog of Jose Luis Ricon (<a href="https://x.com/ArtirKel">Twitter</a>), machine learning engineer. Ricon blogs primarily about economics and progress studies, mixing link compilations with more researched essays such as about the economic (in)efficiency of the USSR, or the extent to which tutoring &amp; “direct instruction” boost educational achievement.</abstract>
Blog of Jose Luis Ricon (<a href="https://x.com/ArtirKel">Twitter</a>), machine learning engineer.
Ricon blogs primarily about economics and progress studies, mixing link compilations with more researched essays such as about the economic (in)efficiency of the USSR, or the extent to which tutoring &amp; “direct instruction” boost educational achievement.
- <abstract>[Rebuttal letter: the gravitostat is supported by hypergravity; astronaut microgravity experiments are only weak counterevidence because microgravity and space travel badly damages health in many ways, hiding any potential weight gain. The gravitostat may fit in the two-systems model of weight, in which case a testable prediction is that it should have different effects in rodents with different weight/leptin combinations.]</abstract>
[Rebuttal letter: the gravitostat is supported by hypergravity; astronaut microgravity experiments are only weak counterevidence because microgravity and space travel badly damages health in many ways, hiding any potential weight gain.
The gravitostat may fit in the two-systems model of weight, in which case a testable prediction is that it should have different effects in rodents with different weight/leptin combinations.]
- <abstract>[Official Instagram account of Nathan W. Pyle’s popular webcomic <em>Strange Planet</em>, which recounts in a deadpan manner ordinary human activities as conducted by literal-minded aliens (which <a href="https://en.wikipedia.org/wiki/Defamiliarization">defamiliarizes</a> them). Pyle does not appear to have a webcomic website for <em>Strange Planet</em>, and the Instagram account to be his primary form of releasing SP comics.]</abstract>
[Official Instagram account of Nathan W. Pyle’s popular webcomic <em>Strange Planet</em>, which recounts in a deadpan manner ordinary human activities as conducted by literal-minded aliens (which <a href="https://en.wikipedia.org/wiki/Defamiliarization">defamiliarizes</a> them).
Pyle does not appear to have a webcomic website for <em>Strange Planet</em>, and the Instagram account to be his primary form of releasing SP comics.]
- <abstract>Humour in science assumes many forms and shapes. It appear as hoaxes and spoofs; individuals and groups of scientists edit special satirical and humorous journals; anthologies and books on humour in science are published. All these find their representation in this review, which contains also many examples of gamesmanship in science, obscurantism and puns that contribute to the lighter side of science.</abstract>
Humour in science assumes many forms and shapes. It appear as hoaxes and spoofs; individuals and groups of scientists edit special satirical and humorous journals; anthologies and books on humour in science are published.
All these find their representation in this review, which contains also many examples of gamesmanship in science, obscurantism and puns that contribute to the lighter side of science.
- <abstract>Statistical methodology has played a key role in scientific animal breeding. ~100 years of statistical developments in animal breeding are reviewed. Some of the scientific foundations of the field are discussed, and many milestones are examined from historical and critical perspectives.
The review concludes with a discussion of some future challenges and opportunities arising from the massive amount of data generated by livestock, plant, and human genome projects.</abstract>
Statistical methodology has played a key role in scientific animal breeding.
~100 years of statistical developments in animal breeding are reviewed. Some of the scientific foundations of the field are discussed, and many milestones are examined from historical and critical perspectives.
The review concludes with a discussion of some future challenges and opportunities arising from the massive amount of data generated by livestock, plant, and human genome projects.
- <abstract>The full data concerning the history of attenuated poliovirus strains developed by one of us (Sabin 1965) for vaccine production do not appear in a single journal. Over the past few years we have had frequent requests for the details such as isolation and attenuation and accordingly we felt that bringing the data together in the report below would be both helpful and informative to those involved in the production and control of poliovirus vaccine (oral) prepared from these strains.</abstract>
The full data concerning the history of attenuated poliovirus strains developed by one of us (Sabin 1965) for vaccine production do not appear in a single journal.
Over the past few years we have had frequent requests for the details such as isolation and attenuation and accordingly we felt that bringing the data together in the report below would be both helpful and informative to those involved in the production and control of poliovirus vaccine (oral) prepared from these strains.
- <abstract>In spite of all its announced advantages, the implementation of mastery learning instruction often falls short of theoretical expectations. As discussed under the four major characteristics of mastery learning [systematic design of instruction/instructional correctives/ample time to learn/clear criterion of mastery], these implementation weaknesses pose serious problems for unsuspecting students, teachers, and instructional designers alike.</abstract>
In spite of all its announced advantages, the implementation of mastery learning instruction often falls short of theoretical expectations.
As discussed under the four major characteristics of mastery learning [systematic design of instruction/instructional correctives/ample time to learn/clear criterion of mastery], these implementation weaknesses pose serious problems for unsuspecting students, teachers, and instructional designers alike.
- <abstract><a href="!W">Hypertext</a> databases can be produced by converting existing text documents to electronic form. The basic task in conversion is identification of fragments. We illustrate that this is not always a straightforward process with an analysis of the <a href="!W">Oxford English Dictionary</a>.</abstract>
<a href="!W">Hypertext</a> databases can be produced by converting existing text documents to electronic form. The basic task in conversion is identification of fragments.
We illustrate that this is not always a straightforward process with an analysis of the <a href="!W">Oxford English Dictionary</a>.
- <abstract>We show how eye-tracking corpora can be used to improve sentence compression models, presenting a novel multi-task learning algorithm based on multi-layer LSTMs. We obtain performance competitive with or better than state-of-the-art approaches.</abstract>
We show how eye-tracking corpora can be used to improve sentence compression models, presenting a novel multi-task learning algorithm based on multi-layer LSTMs.
We obtain performance competitive with or better than state-of-the-art approaches.
- <abstract>In recent years, a number of prominent computer scientists, along with academics in fields such as philosophy and physics, have lent credence to the notion that machines may one day become as large as humans. Many have further argued that machines could even come to exceed human size by a large margin. However, there are at least seven distinct arguments that preclude this outcome. We show that it is not only implausible that machines will ever exceed human size, but in fact impossible.</abstract>
In recent years, a number of prominent computer scientists, along with academics in fields such as philosophy and physics, have lent credence to the notion that machines may one day become as large as humans. Many have further argued that machines could even come to exceed human size by a large margin.
However, there are at least seven distinct arguments that preclude this outcome. We show that it is not only implausible that machines will ever exceed human size, but in fact impossible.
- <abstract>We discuss the idea that computers might soon help mathematicians to prove theorems in areas where they have not previously been useful. Furthermore we argue that these same computer tools will also help us in the communication and teaching of mathematics.</abstract>
We discuss the idea that computers might soon help mathematicians to prove theorems in areas where they have not previously been useful.
Furthermore we argue that these same computer tools will also help us in the communication and teaching of mathematics.
- <abstract>We compare the impact of hardware advancement and algorithm advancement for <a href="https://en.wikipedia.org/wiki/Boolean_satisfiability_problem#Algorithms_for_solving_SAT" >SAT solving</a> over the last two decades. In particular, we compare 20-year-old SAT-solvers on new computer hardware with modern SAT-solvers on 20-year-old hardware. Our findings show that the progress on the algorithmic side has at least as much impact as the progress on the hardware side.</abstract>
We compare the impact of hardware advancement and algorithm advancement for <a href="https://en.wikipedia.org/wiki/Boolean_satisfiability_problem#Algorithms_for_solving_SAT" >SAT solving</a> over the last two decades. In particular, we compare 20-year-old SAT-solvers on new computer hardware with modern SAT-solvers on 20-year-old hardware.
Our findings show that the progress on the algorithmic side has at least as much impact as the progress on the hardware side.
- <abstract>This review introduces methods of analyzing data arising from studies where the response variable is the length of time taken to reach a certain end-point, often death. The <a href="https://en.wikipedia.org/wiki/Kaplan%E2%80%93Meier_estimator" >Kaplan-Meier</a> methods, log rank test and Cox’s proportional hazards model are described.</abstract>
This review introduces methods of analyzing data arising from studies where the response variable is the length of time taken to reach a certain end-point, often death.
The <a href="https://en.wikipedia.org/wiki/Kaplan%E2%80%93Meier_estimator" >Kaplan-Meier</a> methods, log rank test and Cox’s proportional hazards model are described.
- <abstract>6⁄12 men wintering at an isolated Antarctic base sequentially developed symptoms and signs of a <a href="!W">common cold</a> after 17 weeks of complete isolation. Examination of specimens taken from the men in relation to the outbreak has not revealed a causative agent.</abstract>
6⁄12 men wintering at an isolated Antarctic base sequentially developed symptoms and signs of a <a href="!W">common cold</a> after 17 weeks of complete isolation.
Examination of specimens taken from the men in relation to the outbreak has not revealed a causative agent.
- <abstract>A resolution of the St Petersburg paradox is presented. In contrast to the standard resolution, utility is not required. Instead, the time-average performance of the lottery is computed. The final result can be phrased mathematically identically to Daniel Bernoulli’s resolution, which uses logarithmic utility, but is derived using a conceptually different argument. The advantage of the time resolution is the elimination of arbitrary utility functions.</abstract>
A resolution of the St Petersburg paradox is presented.
In contrast to the standard resolution, utility is not required. Instead, the time-average performance of the lottery is computed. The final result can be phrased mathematically identically to Daniel Bernoulli’s resolution, which uses logarithmic utility, but is derived using a conceptually different argument.
The advantage of the time resolution is the elimination of arbitrary utility functions.
- <abstract>Deep metric learning papers from the past four years have consistently claimed great advances in accuracy, often more than doubling the performance of decade-old methods. In this paper, we take a closer look at the field to see if this is actually true. We find flaws in the experimental methodology of numerous metric learning papers, and show that the actual improvements over time have been marginal at best.</abstract>
Deep metric learning papers from the past four years have consistently claimed great advances in accuracy, often more than doubling the performance of decade-old methods.
In this paper, we take a closer look at the field to see if this is actually true.
We find flaws in the experimental methodology of numerous metric learning papers, and show that the actual improvements over time have been marginal at best.
- <abstract>Electron-electron interactions and detector bandwidth limit the maximal imaging speed of single-beam scanning electron microscopes. We use multiple electron beams in a single column and detect secondary electrons in parallel to increase the imaging speed by close to two orders of magnitude and demonstrate imaging for a variety of samples ranging from biological brain tissue to semiconductor wafers.</abstract>
Electron-electron interactions and detector bandwidth limit the maximal imaging speed of single-beam scanning electron microscopes.
We use multiple electron beams in a single column and detect secondary electrons in parallel to increase the imaging speed by close to two orders of magnitude and demonstrate imaging for a variety of samples ranging from biological brain tissue to semiconductor wafers.
- <abstract>An emerging body of data suggests that pluripotent stem cells may be able to differentiate to form eggs and sperm. We discuss the state of the science and the potential social implications and offer recommendations for addressing some of the ethical and policy issues that would be raised by the availability of stem cell-derived gametes.</abstract>
An emerging body of data suggests that pluripotent stem cells may be able to differentiate to form eggs and sperm.
We discuss the state of the science and the potential social implications and offer recommendations for addressing some of the ethical and policy issues that would be raised by the availability of stem cell-derived gametes.
- <abstract>Imaging as a means of scientific data storage has evolved rapidly over the past century from hand drawings, to photography, to digital images. Only recently can sufficiently large datasets be acquired, stored, and processed such that tissue digitization can actually reveal more than direct observation of tissue. One field where this transformation is occurring is connectomics: the mapping of neural connections in large volumes of digitized brain tissue.</abstract>
Imaging as a means of scientific data storage has evolved rapidly over the past century from hand drawings, to photography, to digital images. Only recently can sufficiently large datasets be acquired, stored, and processed such that tissue digitization can actually reveal more than direct observation of tissue.
One field where this transformation is occurring is connectomics: the mapping of neural connections in large volumes of digitized brain tissue.
- <abstract>Recent research in artificial intelligence and machine learning has largely emphasized general-purpose learning and ever-larger training sets and more and more compute. In contrast, I propose a hybrid, knowledge-driven, reasoning-based approach, centered around cognitive models, that could provide the substrate for a richer, more robust AI than is currently possible.</abstract>
Recent research in artificial intelligence and machine learning has largely emphasized general-purpose learning and ever-larger training sets and more and more compute.
In contrast, I propose a hybrid, knowledge-driven, reasoning-based approach, centered around cognitive models, that could provide the substrate for a richer, more robust AI than is currently possible.
- <abstract>A brief review of interaction-free measurements (IFM) is presented. The IFM is a solution of a quantum puzzle: How to test a bomb which explodes on every test without exploding it? This paper was given in the Oxford conference in honor of Roger Penrose.</abstract>
A brief review of interaction-free measurements (IFM) is presented. The IFM is a solution of a quantum puzzle: How to test a bomb which explodes on every test without exploding it?
This paper was given in the Oxford conference in honor of Roger Penrose.
- <abstract>This paper explores the physics of the what-if question “what if the entire Earth was instantaneously replaced with an equal volume of closely packed, but uncompressed blueberries?” While the assumption may be absurd, the consequences can be explored rigorously using elementary physics. The result is not entirely dissimilar to a small ocean-world exoplanet.</abstract>
This paper explores the physics of the what-if question “what if the entire Earth was instantaneously replaced with an equal volume of closely packed, but uncompressed blueberries?” While the assumption may be absurd, the consequences can be explored rigorously using elementary physics.
The result is not entirely dissimilar to a small ocean-world exoplanet.
- <abstract>We show that state-of-the-art services for creating trusted timestamps in blockchain-based networks do not adequately allow for timestamping of web pages. They accept data by value (eg. images and text), but not by reference (eg. URIs of web pages). Also, we discuss difficulties in repeatedly generating the same cryptographic hash value of an archived web page. We then introduce several requirements to be fulfilled in order to produce repeatable hash values for archived web pages.</abstract>
We show that state-of-the-art services for creating trusted timestamps in blockchain-based networks do not adequately allow for timestamping of web pages. They accept data by value (eg. images and text), but not by reference (eg. URIs of web pages). Also, we discuss difficulties in repeatedly generating the same cryptographic hash value of an archived web page.
We then introduce several requirements to be fulfilled in order to produce repeatable hash values for archived web pages.
- <abstract>We explore six challenges for neural machine translation: domain mismatch, amount of training data, rare words, long sentences, word alignment, and <a href="https://en.wikipedia.org/wiki/Beam_search" >beam search</a>. We show both deficiencies and improvements over the quality of phrase-based statistical machine translation.</abstract>
We explore six challenges for neural machine translation: domain mismatch, amount of training data, rare words, long sentences, word alignment, and <a href="https://en.wikipedia.org/wiki/Beam_search" >beam search</a>.
We show both deficiencies and improvements over the quality of phrase-based statistical machine translation.
- <abstract>We study the effectiveness of neural sequence models for premise selection in automated theorem proving, one of the main bottlenecks in the formalization of mathematics. We propose a two stage approach for this task that yields good results for the premise selection task on the Mizar corpus while avoiding the hand-engineered features of existing state-of-the-art models. To our knowledge, this is the first time deep learning has been applied to theorem proving on a large scale.</abstract>
We study the effectiveness of neural sequence models for premise selection in automated theorem proving, one of the main bottlenecks in the formalization of mathematics.
We propose a two stage approach for this task that yields good results for the premise selection task on the Mizar corpus while avoiding the hand-engineered features of existing state-of-the-art models.
To our knowledge, this is the first time deep learning has been applied to theorem proving on a large scale.
- <abstract>The statistic p(rep) estimates the probability of replicating an effect. It captures traditional publication criteria for signal-to-noise ratio, while avoiding parametric inference and the resulting Bayesian dilemma. In concert with <a href="https://en.wikipedia.org/wiki/Effect_sizes" >effect size</a> and replication intervals, p(rep) provides all of the information now used in evaluating research, while avoiding many of the pitfalls of traditional statistical inference.</abstract>
The statistic p(rep) estimates the probability of replicating an effect. It captures traditional publication criteria for signal-to-noise ratio, while avoiding parametric inference and the resulting Bayesian dilemma.
In concert with <a href="https://en.wikipedia.org/wiki/Effect_sizes" >effect size</a> and replication intervals, p(rep) provides all of the information now used in evaluating research, while avoiding many of the pitfalls of traditional statistical inference.
- <abstract>Intense meditation practices help to achieve a harmony between body and mind. Meditation practices influence brain functions, induce various intrinsic neural plasticity events, modulate autonomic, metabolic, endocrine, and immune functions and thus mediate global regulatory changes in various behavioral states including sleep. This brief review focuses on the effect of meditation as a self regulatory phenomenon on sleep.</abstract>
Intense meditation practices help to achieve a harmony between body and mind. Meditation practices influence brain functions, induce various intrinsic neural plasticity events, modulate autonomic, metabolic, endocrine, and immune functions and thus mediate global regulatory changes in various behavioral states including sleep.
This brief review focuses on the effect of meditation as a self regulatory phenomenon on sleep.
- <abstract><a href="!W">Modafinil</a> is a wakefulness-promoting agent that is known to be used off-label as a cognitive enhancer and for the treatment of <a href="https://en.wikipedia.org/wiki/Attention_deficit_hyperactivity_disorder" >attention deficit hyperactivity disorder</a> (ADHD).<sup>1</sup> There are increasing case reports of <a href="/modafinil" >Modafinil</a>-induced psychosis; however, this is the first to report a patient with ADHD to develop psychosis from Modafinil use.</abstract>
<a href="!W">Modafinil</a> is a wakefulness-promoting agent that is known to be used off-label as a cognitive enhancer and for the treatment of <a href="https://en.wikipedia.org/wiki/Attention_deficit_hyperactivity_disorder" >attention deficit hyperactivity disorder</a> (ADHD).<sup>1</sup>
There are increasing case reports of <a href="/modafinil" >Modafinil</a>-induced psychosis; however, this is the first to report a patient with ADHD to develop psychosis from Modafinil use.
- <abstract>We critically examine the evidence for the idea that encephalization quotients increase with time. We find that human-like intelligence is not a convergent feature of evolution. Implications for the search for extraterrestrial intelligence are discussed.</abstract>
We critically examine the evidence for the idea that encephalization quotients increase with time.
We find that human-like intelligence is not a convergent feature of evolution.
Implications for the search for extraterrestrial intelligence are discussed.
- <abstract>Misalignment between the timing of sleep and the circadian pacemaker has been linked to depression symptoms. This study sought to extend earlier findings by comparing sleep and circadian markers in healthy controls and individuals with major depression. Two markers of circadian misalignment correlated with depression severity in the depressed group.</abstract>
Misalignment between the timing of sleep and the circadian pacemaker has been linked to depression symptoms.
This study sought to extend earlier findings by comparing sleep and circadian markers in healthy controls and individuals with major depression.
Two markers of circadian misalignment correlated with depression severity in the depressed group.
- <abstract>Statistical analysis of repeat misprints in scientific citations leads to the conclusion that about 80% of scientific citations are copied from the lists of references used in other papers. Based on this finding a mathematical theory of citing is constructed. It leads to the conclusion that a large number of citations does not have to be a result of paper’s extraordinary qualities, but can be explained by the ordinary law of chances.</abstract>
Statistical analysis of repeat misprints in scientific citations leads to the conclusion that about 80% of scientific citations are copied from the lists of references used in other papers.
Based on this finding a mathematical theory of citing is constructed.
It leads to the conclusion that a large number of citations does not have to be a result of paper’s extraordinary qualities, but can be explained by the ordinary law of chances.
- <abstract>Panda is regarded as Chinese national treasure. Most people always thought they were cute and just ate bamboo and had never imagined a panda could be vicious. Giant panda attacks on human are rare. There, we present 3 cases of giant panda attacks on humans at the Panda House at Beijing Zoo from September 2006 to June 2009 to warn people of the giant panda’s potentially dangerous behavior.</abstract>
Panda is regarded as Chinese national treasure. Most people always thought they were cute and just ate bamboo and had never imagined a panda could be vicious. Giant panda attacks on human are rare.
There, we present 3 cases of giant panda attacks on humans at the Panda House at Beijing Zoo from September 2006 to June 2009 to warn people of the giant panda’s potentially dangerous behavior.
- <abstract>In 11 studies, we found that participants typically did not enjoy spending 6–15 minutes in a room by themselves with nothing to do but think, that they enjoyed doing mundane external activities much more, and that many preferred to administer electric shocks to themselves instead of being left alone with their thoughts. Most people seem to prefer to be doing something rather than nothing, even if that something is negative.</abstract>
In 11 studies, we found that participants typically did not enjoy spending 6–15 minutes in a room by themselves with nothing to do but think, that they enjoyed doing mundane external activities much more, and that many preferred to administer electric shocks to themselves instead of being left alone with their thoughts.
Most people seem to prefer to be doing something rather than nothing, even if that something is negative.
- <abstract>Many good tutorials exist but in the last few years, <a href="!W">transformers</a> have mostly become simpler, so that it is now much more straightforward to explain how modern architectures work. This post is an attempt to explain directly [in <a href="!W">PyTorch</a>] how modern transformers work, and why, without some of the historical baggage.</abstract>
Many good tutorials exist but in the last few years, <a href="!W">transformers</a> have mostly become simpler, so that it is now much more straightforward to explain how modern architectures work.
This post is an attempt to explain directly [in <a href="!W">PyTorch</a>] how modern transformers work, and why, without some of the historical baggage.
- <abstract>I present a new way to parallelize the training of convolutional neural networks across multiple GPUs. The method scales better than all alternatives when applied to modern convolutional neural networks.</abstract>
I present a new way to parallelize the training of convolutional neural networks across multiple GPUs.
The method scales better than all alternatives when applied to modern convolutional neural networks.
- <abstract>Bayesian reasoning has been applied formally to statistical inference, machine learning and analysing scientific method. Here I apply it informally to more common forms of inference, namely natural language arguments. I analyse a variety of traditional fallacies, deductive, inductive and causal, and find more merit in them than is generally acknowledged. Bayesian principles provide a framework for understanding ordinary arguments which is well worth developing.</abstract>
Bayesian reasoning has been applied formally to statistical inference, machine learning and analysing scientific method.
Here I apply it informally to more common forms of inference, namely natural language arguments.
I analyse a variety of traditional fallacies, deductive, inductive and causal, and find more merit in them than is generally acknowledged.
Bayesian principles provide a framework for understanding ordinary arguments which is well worth developing.
- <abstract>An <a href="!W">autistic</a> young man and a normal control were asked to factorize numbers and to recognize and generate primes. Both subjects made a similar of errors and employed similar strategies, but they differed markedly in the speeds at which the arithmetical operations were carried out.</abstract>
An <a href="!W">autistic</a> young man and a normal control were asked to factorize numbers and to recognize and generate primes.
Both subjects made a similar of errors and employed similar strategies, but they differed markedly in the speeds at which the arithmetical operations were carried out.
- <abstract>We provide a novel search technique, which uses a <a href="https://en.wikipedia.org/wiki/Multilevel_model">hierarchical model</a> and a mutual information gain heuristic to efficiently prune the search space when localizing faces in images. We show exponential gains in computation over traditional sliding window approaches, while keeping similar performance levels.</abstract>
We provide a novel search technique, which uses a <a href="https://en.wikipedia.org/wiki/Multilevel_model">hierarchical model</a> and a mutual information gain heuristic to efficiently prune the search space when localizing faces in images.
We show exponential gains in computation over traditional sliding window approaches, while keeping similar performance levels.
- <abstract>Decades of research have highlighted the amygdala’s influential role in fear. We found that inhalation of 35% CO<sub>2</sub> evoked not only fear, but also panic attacks, in 3 rare patients with bilateral amygdala damage. These results indicate that the amygdala is not required for fear and panic, and make an important distinction between fear triggered by external threats from the environment versus fear triggered internally by CO<sub>2</sub>.</abstract>
Decades of research have highlighted the amygdala’s influential role in fear.
We found that inhalation of 35% CO<sub>2</sub> evoked not only fear, but also panic attacks, in 3 rare patients with bilateral amygdala damage.
These results indicate that the amygdala is not required for fear and panic, and make an important distinction between fear triggered by external threats from the environment versus fear triggered internally by CO<sub>2</sub>.
- <abstract>This paper describes a reduction from the halting problem of Turing machines to subtype checking in Java. It follows that subtype checking in Java is undecidable, which answers a question posed by Kennedy and Pierce in 2007. It also follows that Java’s type checker can recognize any recursive language, which improves a result of Gil and Levy from 2016. The latter point is illustrated by a parser generator for fluent interfaces.</abstract>
This paper describes a reduction from the halting problem of Turing machines to subtype checking in Java.
It follows that subtype checking in Java is undecidable, which answers a question posed by Kennedy and Pierce in 2007. It also follows that Java’s type checker can recognize any recursive language, which improves a result of Gil and Levy from 2016.
The latter point is illustrated by a parser generator for fluent interfaces.
- <abstract>Soldier crabs <em>Mictyris guinotae</em> exhibit pronounced swarming behavior. The swarms of the crabs tolerant of perturbations. In computer models and laboratory experiments we demonstrate that swarms of soldier crabs can implement logical gates when placed in a geometrically constrained environment.</abstract>
Soldier crabs <em>Mictyris guinotae</em> exhibit pronounced swarming behavior. The swarms of the crabs tolerant of perturbations.
In computer models and laboratory experiments we demonstrate that swarms of soldier crabs can implement logical gates when placed in a geometrically constrained environment.
- <abstract>We introduce algorithms that use predictions from machine learning applied to the input to circumvent worst-case analysis. We aim for algorithms that have near optimal performance when these predictions are good, but recover the prediction-less worst case behavior when the predictions have large errors.</abstract>
We introduce algorithms that use predictions from machine learning applied to the input to circumvent worst-case analysis.
We aim for algorithms that have near optimal performance when these predictions are good, but recover the prediction-less worst case behavior when the predictions have large errors.
- <abstract>On Earth, the development of technology required easy access to open air combustion, which is only possible when oxygen partial pressure, P(O<sub>2</sub>), is above 18%. This suggests that only planets with atmospheric oxygen concentrations will be capable of developing “advanced” technospheres and hence detectable techno-signatures.</abstract>
On Earth, the development of technology required easy access to open air combustion, which is only possible when oxygen partial pressure, P(O<sub>2</sub>), is above 18%.
This suggests that only planets with atmospheric oxygen concentrations will be capable of developing “advanced” technospheres and hence detectable techno-signatures.
- <abstract>GPT-3 calculating derivatives. It learned about <a href="!W">power rule</a>https://en.wikipedia.org/wiki/Power_rule. Maybe a prompt with all rules of calculus will make it able to do all sorts of calculations...I was not omitting <code>^</code> at first but then it didn’t give me answers and I realized omitting <code>^</code> gave the answers so I just carried on.</abstract>
GPT-3 calculating derivatives.
It learned about <a href="!W">power rule</a>https://en.wikipedia.org/wiki/Power_rule. Maybe a prompt with all rules of calculus will make it able to do all sorts of calculations.
...I was not omitting <code>^</code> at first but then it didn’t give me answers and I realized omitting <code>^</code> gave the answers so I just carried on.
- <abstract>This paper reviews arguments for land value taxation (LVT) as a tool to stop urban sprawl, eliminate land speculation, reduce housing costs, and provide tax relief. It is found that LVT would increase, not lower land prices and would provide only a small incentive to building construction. LVT would not favorably affect the distribution of wealth, nor reduce housing costs. It could provide some residential tax relief, but less effectively than other methods such as a progressive property tax.</abstract>
This paper reviews arguments for land value taxation (LVT) as a tool to stop urban sprawl, eliminate land speculation, reduce housing costs, and provide tax relief.
It is found that LVT would increase, not lower land prices and would provide only a small incentive to building construction. LVT would not favorably affect the distribution of wealth, nor reduce housing costs. It could provide some residential tax relief, but less effectively than other methods such as a progressive property tax.
- <abstract>Dog cloning as a concept is no longer infeasible. Starting with Snuppy, the first cloned dog in the world, somatic cell nuclear transfer (SCNT) has been continuously developed and used for diverse purposes. In this article we summarise the current method for SCNT, the normality of cloned dogs and the application of dog cloning not only for personal reasons, but also for public purposes.</abstract>
Dog cloning as a concept is no longer infeasible.
Starting with Snuppy, the first cloned dog in the world, somatic cell nuclear transfer (SCNT) has been continuously developed and used for diverse purposes.
In this article we summarise the current method for SCNT, the normality of cloned dogs and the application of dog cloning not only for personal reasons, but also for public purposes.
- <abstract>When asked whether he would discuss man in the<em>Origins of the Species</em>, Darwin replied, ‘I think I shall avoid the subject, as so surrounded with prejudices, though I fully admit it is the highest and most interesting problem for the naturalist’. Galton on the other hand replied to the same question, ‘I shall treat man and see what the theory of heredity of variations and the principles of natural selection mean when applied to man’ (Pearson 1914–30, Vol. II, p. 86).</abstract>
When asked whether he would discuss man in the<em>Origins of the Species</em>, Darwin replied, ‘I think I shall avoid the subject, as so surrounded with prejudices, though I fully admit it is the highest and most interesting problem for the naturalist’.
Galton on the other hand replied to the same question, ‘I shall treat man and see what the theory of heredity of variations and the principles of natural selection mean when applied to man’ (Pearson 1914–30, Vol. II, p. 86).
- <abstract>Frank Plumpton Ramsey was born in February 1903, and he died in January 1930—just before his 27<sup>th</sup> birthday. In his short life he produced an extraordinary amount of profound and original work in economics, mathematics and logic as well as in philosophy: work which in all these fields is still, over sixty years on, extremely influential.</abstract>
Frank Plumpton Ramsey was born in February 1903, and he died in January 1930—just before his 27<sup>th</sup> birthday.
In his short life he produced an extraordinary amount of profound and original work in economics, mathematics and logic as well as in philosophy: work which in all these fields is still, over sixty years on, extremely influential.
- <abstract>One of the earliest ideas about vision is that it depends on light that streams out of the eye and detects surrounding objects. This view was attacked in its own time and finally disproved more than 2000 years later. Yet the idea of a beam leaving the eye persisted in beliefs both about the evil eye and the power of a lover's gaze. It is still widely held among both children and adults.</abstract>
One of the earliest ideas about vision is that it depends on light that streams out of the eye and detects surrounding objects. This view was attacked in its own time and finally disproved more than 2000 years later.
Yet the idea of a beam leaving the eye persisted in beliefs both about the evil eye and the power of a lover's gaze. It is still widely held among both children and adults.
- <abstract>For 64 undergraduates varied musical selections did not offset scores on Sequential Tests of Educational Progress but scores on the easier sections were higher than those on more difficult ones. Scores made with familiar music were higher than those with unfamiliar music.</abstract>
For 64 undergraduates varied musical selections did not offset scores on Sequential Tests of Educational Progress but scores on the easier sections were higher than those on more difficult ones.
Scores made with familiar music were higher than those with unfamiliar music.
- <abstract>This review highlights the importance of recognizing the possibility for doing harm when intentions are good. It describes several examples showing that well-planned and adequately executed programs provide no guarantee for safety or efficacy. The author concludes with recommendations for scientifically credible evaluations to promote progress in the field of crime prevention.</abstract>
This review highlights the importance of recognizing the possibility for doing harm when intentions are good.
It describes several examples showing that well-planned and adequately executed programs provide no guarantee for safety or efficacy.
The author concludes with recommendations for scientifically credible evaluations to promote progress in the field of crime prevention.
- <abstract>Many of the statistical models that could provide an accurate, interesting, and testable explanation for the structure of a data set turn out to have intractable likelihood functions. The method of approximate Bayesian computation (ABC) has become a popular approach for tackling such models. This review gives an overview of the method and the main issues and challenges that are the subject of current research.</abstract>
Many of the statistical models that could provide an accurate, interesting, and testable explanation for the structure of a data set turn out to have intractable likelihood functions.
The method of approximate Bayesian computation (ABC) has become a popular approach for tackling such models.
This review gives an overview of the method and the main issues and challenges that are the subject of current research.
- <abstract>We review Stigler's diet problem, its impact on linear programming and operations research, and we determine minimum cost diets using updated nutritional and cost data. We also discuss how Stigler's diet problem formulation and its extensions have, over the years, influenced dietitians and nutritionists in their search for more wholesome but cost-effective diets.</abstract>
We review Stigler's diet problem, its impact on linear programming and operations research, and we determine minimum cost diets using updated nutritional and cost data.
We also discuss how Stigler's diet problem formulation and its extensions have, over the years, influenced dietitians and nutritionists in their search for more wholesome but cost-effective diets.
- <abstract>A sequence of athletic records forms by definition a monotonic sequence. Since it is reasonable to assume this to be bounded, it follows that a limit must exist to future performance. It has been proposed [1, 2] that this can be estimated by use of a curve-fitting procedure on existing data, such as may be found in many compendia of athletic records (eg. [3]).</abstract>
A sequence of athletic records forms by definition a monotonic sequence.
Since it is reasonable to assume this to be bounded, it follows that a limit must exist to future performance.
It has been proposed [1, 2] that this can be estimated by use of a curve-fitting procedure on existing data, such as may be found in many compendia of athletic records (eg. [3]).
- <abstract>Most targeted anticancer therapies fail due to drug resistance evolution. Here we show that tumor evolution can be reproducibly redirected to engineer therapeutic opportunity, regardless of the exact ensemble of pre-existing genetic heterogeneity. We develop a selection gene drive system that is stably introduced into cancer cells and is composed of two genes, or switches, that couple an inducible fitness advantage with a shared fitness cost. Using stochastic models of evolutionary dynamics, we identify the design criteria for selection gene drives. We then build prototypes that harness the selective pressure of multiple approved tyrosine kinase inhibitors and employ therapeutic mechanisms as diverse as prodrug catalysis and immune activity induction. We show that selection gene drives can eradicate diverse forms of genetic resistance in vitro. Finally, we demonstrate that model-informed switch engagement effectively targets pre-existing resistance in mouse models of solid tumors. These results establish selection gene drives as a powerful framework for evolution-guided anticancer therapy.</abstract>
Most targeted anticancer therapies fail due to drug resistance evolution. Here we show that tumor evolution can be reproducibly redirected to engineer therapeutic opportunity, regardless of the exact ensemble of pre-existing genetic heterogeneity.
We develop a selection gene drive system that is stably introduced into cancer cells and is composed of two genes, or switches, that couple an inducible fitness advantage with a shared fitness cost. Using stochastic models of evolutionary dynamics, we identify the design criteria for selection gene drives. We then build prototypes that harness the selective pressure of multiple approved tyrosine kinase inhibitors and employ therapeutic mechanisms as diverse as prodrug catalysis and immune activity induction.
We show that selection gene drives can eradicate diverse forms of genetic resistance in vitro.
Finally, we demonstrate that model-informed switch engagement effectively targets pre-existing resistance in mouse models of solid tumors.
These results establish selection gene drives as a powerful framework for evolution-guided anticancer therapy.
- <abstract>The success of AI models relies on the availability of large, diverse, and high-quality datasets, which can be challenging to obtain due to data scarcity, privacy concerns, and high costs. Synthetic data has emerged as a promising solution by generating artificial data that mimics real-world patterns. This paper provides an overview of synthetic data research, discussing its applications, challenges, and future directions. We present empirical evidence from prior art to demonstrate its effectiveness and highlight the importance of ensuring its factuality, fidelity, and unbiasedness. We emphasize the need for responsible use of synthetic data to build more powerful, inclusive, and trustworthy language models.</abstract>
The success of AI models relies on the availability of large, diverse, and high-quality datasets, which can be challenging to obtain due to data scarcity, privacy concerns, and high costs. Synthetic data has emerged as a promising solution by generating artificial data that mimics real-world patterns.
This paper provides an overview of synthetic data research, discussing its applications, challenges, and future directions.
We present empirical evidence from prior art to demonstrate its effectiveness and highlight the importance of ensuring its factuality, fidelity, and unbiasedness.
We emphasize the need for responsible use of synthetic data to build more powerful, inclusive, and trustworthy language models.
- <abstract>Generative models, such as diffusion models (DMs), variational autoencoders (VAEs), and generative adversarial networks (GANs), produce images with a level of authenticity that makes them nearly indistinguishable from real photos and artwork. While this capability is beneficial for many industries, the difficulty of identifying synthetic images leaves online media platforms vulnerable to impersonation and misinformation attempts. To support the development of defensive methods, we introduce ImagiNet, a high-resolution and balanced dataset for synthetic image detection, designed to mitigate potential biases in existing resources. It contains 200K examples, spanning four content categories: photos, paintings, faces, and uncategorized.</abstract>
Generative models, such as diffusion models (DMs), variational autoencoders (VAEs), and generative adversarial networks (GANs), produce images with a level of authenticity that makes them nearly indistinguishable from real photos and artwork. While this capability is beneficial for many industries, the difficulty of identifying synthetic images leaves online media platforms vulnerable to impersonation and misinformation attempts.
To support the development of defensive methods, we introduce <strong>ImagiNet</strong>, a high-resolution and balanced dataset for synthetic image detection, designed to mitigate potential biases in existing resources. It contains 200K examples, spanning 4 content categories: photos, paintings, faces, and uncategorized. Synthetic images are produced with open-source and proprietary generators, whereas real counterparts of the same content type are collected from public datasets.
- <abstract>We present the Minigrid and Miniworld libraries which provide a suite of goal-oriented 2D and 3D environments. The libraries were explicitly created with a minimalistic design paradigm to allow users to rapidly develop new environments for a wide range of research-specific needs. As a result, both have received widescale adoption by the RL community, facilitating research in a wide range of areas. In this paper, we outline the design philosophy, environment details, and their world generation API. We also showcase the additional capabilities brought by the unified API between Minigrid and Miniworld through case studies on transfer learning (for both RL agents and humans) between the different observation spaces. The source code of Minigrid and Miniworld can be found at this https URL[Minigrid, Miniworld] along with their documentation at https://[minigrid, miniworld].this http URL.</abstract>
We present the <strong>Minigrid</strong> and <strong>Miniworld</strong> libraries which provide a suite of goal-oriented 2D and 3D environments. The libraries were explicitly created with a minimalistic design paradigm to allow users to rapidly develop new environments for a wide range of research-specific needs. As a result, both have received widescale adoption by the RL community, facilitating research in a wide range of areas.
In this paper, we outline the design philosophy, environment details, and their world generation API.
We also showcase the additional capabilities brought by the unified API between Minigrid and Miniworld through case studies on transfer learning (for both RL agents and humans) between the different observation spaces.
The source code of Minigrid and Miniworld can be found at <a href="https://github.com/Farama-Foundation/Minigrid">GitHub</a>, Miniworld along with their documentation at <a href="https://minigrid.farama.org/"><code>minigrid.farama.org</code></a>, and <a href="https://miniworld.farama.org/"><code>miniworld.farama.org</code></a>.
- <abstract>In classical computation, a “write-only memory” (WOM) is little more than an oxymoron, and the addition of WOM to a (deterministic or probabilistic) classical computer brings no advantage. We prove that quantum computers that are augmented with WOM can solve problems that neither a classical computer with WOM nor a quantum computer without WOM can solve, when all other resource bounds are equal. We focus on realtime quantum finite automata, and examine the increase in their power effected by the addition of WOMs with different access modes and capacities. Some problems that are unsolvable by two-way probabilistic Turing machines using sub-logarithmic amounts of read/write memory are shown to be solvable by these enhanced automata.</abstract>
In classical computation, a “write-only memory” (WOM) is little more than an oxymoron, and the addition of WOM to a (deterministic or probabilistic) classical computer brings no advantage.
We prove that quantum computers that are augmented with WOM can solve problems that neither a classical computer with WOM nor a quantum computer without WOM can solve, when all other resource bounds are equal. We focus on realtime quantum finite automata, and examine the increase in their power effected by the addition of WOMs with different access modes and capacities.
Some problems that are unsolvable by two-way probabilistic Turing machines using sub-logarithmic amounts of read/write memory are shown to be solvable by these enhanced automata.
- <abstract>Self-attention performs well in long context but has quadratic complexity. Existing RNN layers have linear complexity, but their performance in long context is limited by the expressive power of their hidden state. We propose a new class of sequence modeling layers with linear complexity and an expressive hidden state. The key idea is to make the hidden state a machine learning model itself, and the update rule a step of self-supervised learning. Since the hidden state is updated by training even on test sequences, our layers are called Test-Time Training (TTT) layers. We consider two instantiations: TTT-Linear and TTT-MLP, whose hidden state is a linear model and a two-layer MLP respectively. We evaluate our instantiations at the scale of 125M to 1.3B parameters, comparing with a strong Transformer and Mamba, a modern RNN. Both TTT-Linear and TTT-MLP match or exceed the baselines. Similar to Transformer, they can keep reducing perplexity by conditioning on more tokens, while Mamba cannot after 16k context. With preliminary systems optimization, TTT-Linear is already faster than Transformer at 8k context and matches Mamba in wall-clock time. TTT-MLP still faces challenges in memory I/O, but shows larger potential in long context, pointing to a promising direction for future research.</abstract>
Self-attention performs well in long context but has quadratic complexity. Existing RNN layers have linear complexity, but their performance in long context is limited by the expressive power of their hidden state.
We propose a new class of sequence modeling layers with linear complexity and an expressive hidden state. The key idea is to make the hidden state a machine learning model itself, and the update rule a step of self-supervised learning. Since the hidden state is updated by training even on test sequences, our layers are called <strong>Test-Time Training (TTT)</strong> layers. We consider two instantiations: <strong>TTT-Linear</strong> and <strong>TTT-MLP</strong>, whose hidden state is a linear model and a two-layer MLP respectively.
We evaluate our instantiations at the scale of 125M to 1.3B parameters, comparing with a strong Transformer and Mamba, a modern RNN. Both TTT-Linear and TTT-MLP match or exceed the baselines. Similar to Transformer, they can keep reducing perplexity by conditioning on more tokens, while Mamba cannot after 16k context.
With preliminary systems optimization, TTT-Linear is already faster than Transformer at 8k context and matches Mamba in wall-clock time.
TTT-MLP still faces challenges in memory I/O, but shows larger potential in long context, pointing to a promising direction for future research.
- <abstract>This paper examines the relationship between sleep quality and academic performance in college students. Using actigraphy data and self-reported sleep logs from 150 participants over a semester, we analyzed sleep duration, efficiency, and timing in relation to GPA and test scores. Results indicate a complex interaction between sleep metrics and academic outcomes, with some counterintuitive findings. While overall sleep duration showed a weak positive correlation with GPA, sleep timing regularity emerged as a stronger predictor of academic success. However, the effects varied across different academic disciplines and assessment types. These findings highlight the need for nuanced approaches to sleep interventions in academic settings and call for further research into the multifaceted nature of sleep's impact on cognitive performance.</abstract>
This paper examines the relationship between sleep quality and academic performance in college students.
Using actigraphy data and self-reported sleep logs from 150 participants over a semester, we analyzed sleep duration, efficiency, and timing in relation to GPA and test scores.
Results indicate a complex interaction between sleep metrics and academic outcomes, with some counterintuitive findings. While overall sleep duration showed a weak positive correlation with GPA, sleep timing regularity emerged as a stronger predictor of academic success. However, the effects varied across different academic disciplines and assessment types.
These findings highlight the need for nuanced approaches to sleep interventions in academic settings and call for further research into the multifaceted nature of sleep's impact on cognitive performance.
- <abstract>We present a novel machine learning algorithm for detecting anomalies in time series data. Our approach combines elements of deep learning and statistical process control to identify subtle deviations from expected patterns. The algorithm was tested on diverse datasets including financial market data, industrial sensor readings, and physiological measurements. Performance metrics show significant improvements over existing methods in terms of both accuracy and computational efficiency. However, challenges remain in tuning the algorithm for specific domains and interpreting its decisions. We discuss potential applications in fields such as predictive maintenance, fraud detection, and health monitoring, as well as ethical considerations surrounding the deployment of such systems. Future work will focus on enhancing the algorithm's explainability and adapting it for real-time streaming data.</abstract>
We present a novel machine learning algorithm for detecting anomalies in time series data.
Our approach combines elements of deep learning and statistical process control to identify subtle deviations from expected patterns. The algorithm was tested on diverse datasets including financial market data, industrial sensor readings, and physiological measurements.
Performance metrics show significant improvements over existing methods in terms of both accuracy and computational efficiency. However, challenges remain in tuning the algorithm for specific domains and interpreting its decisions.
We discuss potential applications in fields such as predictive maintenance, fraud detection, and health monitoring, as well as ethical considerations surrounding the deployment of such systems. Future work will focus on enhancing the algorithm's explainability and adapting it for real-time streaming data.
- <abstract>This study investigates the impact of social media use on political polarization. We conducted a large-scale survey of 10,000 participants across five countries, complemented by analysis of social media activity and network structures. Our findings reveal a nuanced picture of social media's role in shaping political attitudes. While echo chambers and filter bubbles were observed, their effects were moderated by individual differences in media literacy and personality traits. Contrary to popular belief, we found that exposure to diverse viewpoints on social media can sometimes reinforce pre-existing biases rather than reduce them. The relationship between social media use and political polarization appears to be bidirectional and context-dependent. These results have implications for platform design, media policy, and civic education initiatives aimed at promoting democratic discourse in the digital age.</abstract>
This study investigates the impact of social media use on political polarization.
We conducted a large-scale survey of 10,000 participants across five countries, complemented by analysis of social media activity and network structures.
Our findings reveal a nuanced picture of social media's role in shaping political attitudes. While echo chambers and filter bubbles were observed, their effects were moderated by individual differences in media literacy and personality traits. Contrary to popular belief, we found that exposure to diverse viewpoints on social media can sometimes reinforce pre-existing biases rather than reduce them.
The relationship between social media use and political polarization appears to be bidirectional and context-dependent. These results have implications for platform design, media policy, and civic education initiatives aimed at promoting democratic discourse in the digital age.
- <abstract>In this paper, we explore the potential of quantum computing for solving optimization problems in supply chain management. We developed a quantum algorithm that addresses the vehicle routing problem with time windows, a classic NP-hard problem in logistics. Using a quantum annealer with 2000 qubits, we compared our approach to state-of-the-art classical algorithms. Results show that the quantum approach can find near-optimal solutions for medium-sized instances faster than classical methods, but struggles with larger problem sizes due to current hardware limitations. We discuss the challenges of mapping real-world logistics problems onto quantum architectures and propose hybrid quantum-classical approaches as a promising direction. The paper concludes with a roadmap for scaling quantum optimization techniques to practical supply chain problems, considering both technological advances and algorithm design.</abstract>
In this paper, we explore the potential of quantum computing for solving optimization problems in supply chain management.
We developed a quantum algorithm that addresses the vehicle routing problem with time windows, a classic NP-hard problem in logistics. Using a quantum annealer with 2000 qubits, we compared our approach to state-of-the-art classical algorithms.
Results show that the quantum approach can find near-optimal solutions for medium-sized instances faster than classical methods, but struggles with larger problem sizes due to current hardware limitations.
We discuss the challenges of mapping real-world logistics problems onto quantum architectures and propose hybrid quantum-classical approaches as a promising direction. The paper concludes with a roadmap for scaling quantum optimization techniques to practical supply chain problems, considering both technological advances and algorithm design.
- <abstract>This review synthesizes current knowledge on the ecological impacts of microplastics in marine ecosystems. We examine evidence from laboratory studies, field observations, and modeling efforts to assess the distribution, bioaccumulation, and effects of microplastics across trophic levels. While clear negative impacts have been demonstrated in controlled experiments, especially for filter-feeding organisms, translating these findings to ecosystem-level consequences remains challenging. Factors such as polymer type, size distribution, and environmental conditions significantly influence microplastic behavior and toxicity. We identify key knowledge gaps, including the long-term effects of chronic exposure, interactions with other pollutants, and potential evolutionary responses of marine organisms. The review also addresses methodological challenges in microplastic research and proposes standardized protocols for sampling and analysis. Finally, we discuss implications for marine conservation policies and suggest priorities for future research to better understand and mitigate the threats posed by microplastics to ocean health.</abstract>
This review synthesizes current knowledge on the ecological impacts of microplastics in marine ecosystems.
We examine evidence from laboratory studies, field observations, and modeling efforts to assess the distribution, bioaccumulation, and effects of microplastics across trophic levels.
While clear negative impacts have been demonstrated in controlled experiments, especially for filter-feeding organisms, translating these findings to ecosystem-level consequences remains challenging. Factors such as polymer type, size distribution, and environmental conditions significantly influence microplastic behavior and toxicity.
We identify key knowledge gaps, including the long-term effects of chronic exposure, interactions with other pollutants, and potential evolutionary responses of marine organisms. The review also addresses methodological challenges in microplastic research and proposes standardized protocols for sampling and analysis.
Finally, we discuss implications for marine conservation policies and suggest priorities for future research to better understand and mitigate the threats posed by microplastics to ocean health.
- <abstract>Attention mechanisms that confer selective focus on a strict subset of input elements are nearly ubiquitous in language models today. We posit there to be downside to the use of attention: most information present in the input is necessarily lost. In support of this idea we observe poor input representation accuracy in transformers, but find more accurate representation in what we term masked mixers which replace self-attention with masked convolutions.</abstract>
Attention mechanisms that confer selective focus on a strict subset of input elements are nearly ubiquitous in language models today. We posit there to be downside to the use of attention: most information present in the input is necessarily lost.
In support of this idea we observe poor input representation accuracy in transformers, but find more accurate representation in what we term <strong>masked mixers</strong> which replace self-attention with masked convolutions.
- <abstract>This study evaluates the impact of depression treatment on economic behavior in Karnataka, India. We cross-randomize pharmacotherapy and livelihoods assistance among 1,000 depressed adults and evaluate impacts on depression severity, socioeconomic outcomes, and several potential pathways. When combined, the interventions reduce depression severity, with benefits that persist after treatment concludes. Pharmacotherapy alone has a weaker effect that is only marginally significant and dissipates sooner. Depression treatment does not significantly increase earnings, consumption, or human capital investment in children.</abstract>
This study evaluates the impact of depression treatment on economic behavior in Karnataka, India.
We cross-randomize pharmacotherapy and livelihoods assistance among 1,000 depressed adults and evaluate impacts on depression severity, socioeconomic outcomes, and several potential pathways.
When combined, the interventions reduce depression severity, with benefits that persist after treatment concludes. Pharmacotherapy alone has a weaker effect that is only marginally significant and dissipates sooner. Depression treatment does not significantly increase earnings, consumption, or human capital investment in children.
- <abstract>Large language models can memorize and repeat their training data, causing privacy and copyright risks. To mitigate memorization, we introduce a subtle modification to the next-token training objective that we call the goldfish loss. During training, a randomly sampled subset of tokens are excluded from the loss computation. These dropped tokens are not memorized by the model, which prevents verbatim reproduction of a complete chain of tokens from the training set. We run extensive experiments training billion-scale Llama-2 models, both pre-trained and trained from scratch, and demonstrate reductions in extractable memorization with little to no impact on downstream benchmarks.</abstract>
Large language models can memorize and repeat their training data, causing privacy and copyright risks.
To mitigate memorization, we introduce a subtle modification to the next-token training objective that we call the <strong>goldfish loss</strong>. During training, a randomly sampled subset of tokens are excluded from the loss computation. These dropped tokens are not memorized by the model, which prevents verbatim reproduction of a complete chain of tokens from the training set.
We run extensive experiments training billion-scale Llama-2 models, both pre-trained and trained from scratch, and demonstrate reductions in extractable memorization with little to no impact on downstream benchmarks.
- <abstract>To enable building and testing models on long-document comprehension, we introduce QuALITY, a multiple-choice QA dataset with context passages in English that have an average length of about 5,000 tokens, much longer than typical current models can process. Unlike in prior work with passages, our questions are written and validated by contributors who have read the entire passage, rather than relying on summaries or excerpts. In addition, only half of the questions are answerable by annotators working under tight time constraints, indicating that skimming and simple search are not enough to consistently perform well. Our baseline models perform poorly on this task (55.4%) and lag behind human performance (93.5%).</abstract>
To enable building and testing models on long-document comprehension, we introduce <strong>QuALITY</strong>, a multiple-choice QA dataset with context passages in English that have an average length of about 5,000 tokens, much longer than typical current models can process. Unlike in prior work with passages, our questions are written and validated by contributors who have read the entire passage, rather than relying on summaries or excerpts.
In addition, only half of the questions are answerable by annotators working under tight time constraints, indicating that skimming and simple search are not enough to consistently perform well.
Our baseline models perform poorly on this task (55.4%) and lag behind human performance (93.5%).
- <abstract>We propose a novel neural network architecture, the normalized <a href="https://arxiv.org/abs/1706.03762#google">Transformer</a> (nGPT) with representation learning on the hypersphere. In nGPT, all vectors forming the embeddings, MLP, attention matrices and hidden states are unit norm normalized. The input stream of tokens travels on the surface of a hypersphere, with each layer contributing a displacement towards the target output predictions. These displacements are defined by the MLP and attention blocks, whose vector components also reside on the same hypersphere. Experiments show that nGPT learns much faster, reducing the number of training steps required to achieve the same accuracy by a factor of 4 to 20, depending on the sequence length.</abstract>
We propose a novel neural network architecture, the <strong>normalized <a href="https://arxiv.org/abs/1706.03762#google">Transformer</a> (nGPT)</strong> with representation learning on the hypersphere.
In nGPT, all vectors forming the embeddings, MLP, attention matrices and hidden states are unit norm normalized. The input stream of tokens travels on the surface of a hypersphere, with each layer contributing a displacement towards the target output predictions. These displacements are defined by the MLP and attention blocks, whose vector components also reside on the same hypersphere.
Experiments show that nGPT learns much faster, reducing the number of training steps required to achieve the same accuracy by a factor of 4 to 20, depending on the sequence length.
- <abstract>In this work, we introduce Gemma 2, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion to 27 billion parameters. In this new version, we apply several known technical modifications to the Transformer architecture, such as interleaving local-global attentions (Beltagy et al 2020a) and group-query attention (Ainslie et al 2023). We also train the 2B and 9B models with knowledge distillation (Hinton et al 2015) instead of next token prediction. The resulting models deliver the best performance for their size, and even offer competitive alternatives to models that are 2–3× bigger. We release all our models to the community.</abstract>
In this work, we introduce <strong>Gemma 2</strong>, a new addition to the Gemma family of lightweight, state-of-the-art open models, ranging in scale from 2 billion to 27 billion parameters.
In this new version, we apply several known technical modifications to the Transformer architecture, such as interleaving local-global attentions (Beltagy et al 2020a) and group-query attention (Ainslie et al 2023). We also train the 2B and 9B models with knowledge distillation (Hinton et al 2015) instead of next token prediction.
The resulting models deliver the best performance for their size, and even offer competitive alternatives to models that are 2–3× bigger.
We release all our models to the community.
- <abstract>Multi-query attention (MQA), which only uses a single key-value head, drastically speeds up decoder inference. However, MQA can lead to quality degradation, and moreover it may not be desirable to train a separate model just for faster inference. We (1) propose a recipe for uptraining existing multi-head language model checkpoints into models with MQA using 5% of original pre-training compute, and (2) introduce grouped-query attention (GQA), a generalization of multi-query attention which uses an intermediate (more than one, less than number of query heads) number of key-value heads. We show that uptrained GQA achieves quality close to multi-head attention with comparable speed to MQA.</abstract>
Multi-query attention (MQA), which only uses a single key-value head, drastically speeds up decoder inference. However, MQA can lead to quality degradation, and moreover it may not be desirable to train a separate model just for faster inference.
We (1) propose a recipe for uptraining existing multi-head language model checkpoints into models with MQA using 5% of original pre-training compute, and (2) introduce <strong>grouped-query attention (GQA)</strong>, a generalization of multi-query attention which uses an intermediate (more than one, less than number of query heads) number of key-value heads.
We show that uptrained GQA achieves quality close to multi-head attention with comparable speed to MQA.
- <abstract>Although domestic cats are among the most common companion animals, we still know very little about the details of the cat-human relationship. With a questionnaire, we asked 157 Hungarian cat owners about their pet’s behavior, cognitive abilities, and social interactions. We analyzed the responses with PCA resulting in 11 traits. The effect of cats’ and owners’ demographic variables on the main components was further analyzed with GLM. The results showed strong similarity to the surveys performed with companion dogs, but we also found features that were mainly cat-specific.</abstract>
Although domestic cats are among the most common companion animals, we still know very little about the details of the cat-human relationship.
With a questionnaire, we asked 157 Hungarian cat owners about their pet’s behavior, cognitive abilities, and social interactions. We analyzed the responses with PCA resulting in 11 traits. The effect of cats’ and owners’ demographic variables on the main components was further analyzed with GLM.
The results showed strong similarity to the surveys performed with companion dogs, but we also found features that were mainly cat-specific.
- <abstract>Few-shot learners aim to recognize new object classes based on a small number of labeled training examples. To prevent overfitting, state-of-the-art few-shot learners use meta-learning on convolutional-network features and perform classification using a nearest-neighbor classifier. This paper studies the accuracy of nearest-neighbor baselines without meta-learning. Surprisingly, we find simple feature transformations suffice to obtain competitive few-shot learning accuracies. For example, we find that a nearest-neighbor classifier used in combination with mean-subtraction and 𝓁<sub>2</sub>-normalization outperforms prior results in 3⁄5 settings on the miniImageNet dataset.</abstract>
Few-shot learners aim to recognize new object classes based on a small number of labeled training examples. To prevent overfitting, state-of-the-art few-shot learners use meta-learning on convolutional-network features and perform classification using a nearest-neighbor classifier.
This paper studies the accuracy of nearest-neighbor baselines without meta-learning.
Surprisingly, we find simple feature transformations suffice to obtain competitive few-shot learning accuracies.
For example, we find that a nearest-neighbor classifier used in combination with mean-subtraction and 𝓁<sub>2</sub>-normalization outperforms prior results in 3⁄5 settings on the miniImageNet dataset.
- <abstract>Instruct (or “chat”) tuned models have become the primary way in which most people interact with large language models. As opposed to “base” or “foundation” models, instruct-tuned models are optimized to respond to imperative statements. We present Hermes 3, a neutrally-aligned generalist instruct and tool use model with strong reasoning and creative abilities. Its largest version, Hermes 3 405B, achieves state of the art performance among open weight models on several public benchmarks.</abstract>
Instruct (or “chat”) tuned models have become the primary way in which most people interact with large language models. As opposed to “base” or “foundation” models, instruct-tuned models are optimized to respond to imperative statements.
We present <strong>Hermes 3</strong>, a neutrally-aligned generalist instruct and tool use model with strong reasoning and creative abilities.
Its largest version, Hermes 3 405B, achieves state of the art performance among open weight models on several public benchmarks.
- <abstract>It is well known that dogs are capable of following human verbal instructions. However, very little is known about the equivalent ability in cats. In this study, we used a switched stimuli task to examine whether cats rapidly form picture-word association, which is a fundamental ability for word learning. We presented cats with two meaningless picture-word combinations, in the habituation phase. Then, on half of the trials we switched the combination (switched condition), but the other half of the trials remained as before (non-switched condition). If cats rapidly form picture-word association, they were expected to look at the monitor for longer in the switched condition, reflecting detection of the change. We used human speech as stimuli in Exp.1, and mechanical sounds (electronic sounds) in Exp.2. Cats expressed detection of the switched combination in Exp.1, where human speech and objects were paired. However, in Exp.2 where non-social sounds and objects were paired, there was no statistical difference between switched and non-switched conditions, although there was a main effect of condition when the data from the two experiments were pooled. These results demonstrate that cats can rapidly form picture-word association. Further research should investigate whether domestication has played a role in this ability.</abstract>
It is well known that dogs are capable of following human verbal instructions. However, very little is known about the equivalent ability in cats.
In this study, we used a switched stimuli task to examine whether cats rapidly form picture-word association, which is a fundamental ability for word learning. We presented cats with two meaningless picture-word combinations, in the habituation phase. Then, on half of the trials we switched the combination (switched condition), but the other half of the trials remained as before (non-switched condition). If cats rapidly form picture-word association, they were expected to look at the monitor for longer in the switched condition, reflecting detection of the change. We used human speech as stimuli in Exp.1, and mechanical sounds (electronic sounds) in Exp.2.
Cats expressed detection of the switched combination in Exp.1, where human speech and objects were paired. However, in Exp.2 where non-social sounds and objects were paired, there was no statistical difference between switched and non-switched conditions, although there was a main effect of condition when the data from the two experiments were pooled.
These results demonstrate that cats can rapidly form picture-word association. Further research should investigate whether domestication has played a role in this ability.
- <abstract>The decade from 2010 to 2020 saw remarkable improvements in automatic speech recognition. Many people now use speech recognition on a daily basis, for example to perform voice search queries, send text messages, and interact with voice assistants like Amazon Alexa and Siri by Apple. Before 2010 most people rarely used speech recognition. Given the remarkable changes in the state of speech recognition over the previous decade, what can we expect over the coming decade? I attempt to forecast the state of speech recognition research and applications by the year 2030. While the changes to general speech recognition accuracy will not be as dramatic as in the previous decade, I suggest we have an exciting decade of progress in speech technology ahead of us.</abstract>
The decade from 2010 to 2020 saw remarkable improvements in automatic speech recognition. Many people now use speech recognition on a daily basis, for example to perform voice search queries, send text messages, and interact with voice assistants like Amazon Alexa and Siri by Apple. Before 2010 most people rarely used speech recognition.
Given the remarkable changes in the state of speech recognition over the previous decade, what can we expect over the coming decade? I attempt to forecast the state of speech recognition research and applications by the year 2030.
While the changes to general speech recognition accuracy will not be as dramatic as in the previous decade, I suggest we have an exciting decade of progress in speech technology ahead of us.
- <abstract>The advent of generative AI images has completely disrupted the art world. Distinguishing AI generated images from human art is a challenging problem whose impact is growing over time. A failure to address this problem allows bad actors to defraud individuals paying a premium for human art and companies whose stated policies forbid AI imagery. It is also critical for content owners to establish copyright, and for model trainers interested in curating training data in order to avoid potential model collapse. There are several different approaches to distinguishing human art from AI images, including classifiers trained by supervised learning, research tools targeting diffusion models, and identification by professional artists using their knowledge of artistic techniques. In this paper, we seek to understand how well these approaches can perform against today’s modern generative models in both benign and adversarial settings. We curate real human art across 7 styles, generate matching images from 5 generative models [DALL·E 3, Midjourney v6, SDXL, Firefly, Civitai], and apply 8 detectors (5 automated detectors and 3 different human groups including 180 crowdworkers, 4000+ professional artists, and 13 expert artists experienced at detecting AI). Both Hive and expert artists do very well, but make mistakes in different ways (Hive is weaker against adversarial perturbations while Expert artists produce higher false positives). We believe these weaknesses will remain as models continue to evolve, and use our data to demonstrate why a combined team of human and automated detectors provides the best combination of accuracy and robustness.
The advent of generative AI images has completely disrupted the art world. Distinguishing AI generated images from human art is a challenging problem whose impact is growing over time. A failure to address this problem allows bad actors to defraud individuals paying a premium for human art and companies whose stated policies forbid AI imagery. It is also critical for content owners to establish copyright, and for model trainers interested in curating training data in order to avoid potential model collapse.
There are several different approaches to distinguishing human art from AI images, including classifiers trained by supervised learning, research tools targeting diffusion models, and identification by professional artists using their knowledge of artistic techniques.
In this paper, we seek to understand how well these approaches can perform against today’s modern generative models in both benign and adversarial settings. We curate real human art across 7 styles, generate matching images from 5 generative models [DALL·E 3, Midjourney v6, SDXL, Firefly, Civitai], and apply 8 detectors (5 automated detectors and 3 different human groups including 180 crowdworkers, 4000+ professional artists, and 13 expert artists experienced at detecting AI).
Both Hive and expert artists do very well, but make mistakes in different ways (Hive is weaker against adversarial perturbations while Expert artists produce higher false positives).
We believe these weaknesses will remain as models continue to evolve, and use our data to demonstrate why a combined team of human and automated detectors provides the best combination of accuracy and robustness.
- <abstract>The rise of algorithmic pricing raises concerns of algorithmic collusion. We conduct experiments with algorithmic pricing agents based on Large Language Models (LLMs). We find that (1) LLM-based agents are adept at pricing tasks, (2) LLM-based pricing agents autonomously collude in oligopoly settings to the detriment of consumers, and (3) variation in seemingly innocuous phrases in LLM instructions (‘prompts’) may increase collusion. Novel off-path analysis techniques uncover price-war concerns as contributing to these phenomena. Our results extend to auction settings. Our findings uncover unique challenges to any future regulation of LLM-based pricing agents, and black-box pricing agents more broadly.</abstract>
The rise of algorithmic pricing raises concerns of algorithmic collusion.
We conduct experiments with algorithmic pricing agents based on Large Language Models (LLMs).
We find that (1) LLM-based agents are adept at pricing tasks, (2) LLM-based pricing agents autonomously collude in oligopoly settings to the detriment of consumers, and (3) variation in seemingly innocuous phrases in LLM instructions (‘prompts’) may increase collusion. Novel off-path analysis techniques uncover price-war concerns as contributing to these phenomena. Our results extend to auction settings.
Our findings uncover unique challenges to any future regulation of LLM-based pricing agents, and black-box pricing agents more broadly.
- <abstract>We explore the question: “How much prior art knowledge is needed to create art?” To investigate this, we propose a text-to-image generation model trained without access to art-related content. We then introduce a simple yet effective method to learn an art adapter using only a few examples of selected artistic styles. Our experiments show that art generated using our method is perceived by users as comparable to art produced by models trained on large, art-rich datasets. Finally, through data attribution techniques, we illustrate how examples from both artistic and non-artistic datasets contributed to the creation of new artistic styles.</abstract>
We explore the question: “How much prior art knowledge is needed to create art?”
To investigate this, we propose a text-to-image generation model trained without access to art-related content. We then introduce a simple yet effective method to learn an art adapter using only a few examples of selected artistic styles.
Our experiments show that art generated using our method is perceived by users as comparable to art produced by models trained on large, art-rich datasets.
Finally, through data attribution techniques, we illustrate how examples from both artistic and non-artistic datasets contributed to the creation of new artistic styles.
- <abstract>Intelligence and rationality both predict optimal decision making. However, whether cognitive rationality (CR) and general cognitive ability (CA) are identical or reflect fundamentally distinct processes is hotly debated. Here, we report a twin study aimed at distinguishing the cognitive mechanisms involved in CR and CA. CR and CA tests were administered to a large twin sample. Univariate analyses indicated that both CA and CR were strongly heritable. Multivariate modelling of CA scales and CR indicated that CR was accounted for by a latent g-factor, which itself was strongly heritable. We conclude that CR is not a distinct disposition from CA, but instead that the reflexive and reflective aspects of cognitive ability make making CR a robust and efficient test of general cognitive ability.</abstract>
Intelligence and rationality both predict optimal decision making. However, whether cognitive rationality (CR) and general cognitive ability (CA) are identical or reflect fundamentally distinct processes is hotly debated.
Here, we report a twin study aimed at distinguishing the cognitive mechanisms involved in CR and CA. CR and CA tests were administered to a large twin sample.
Univariate analyses indicated that both CA and CR were strongly heritable. Multivariate modelling of CA scales and CR indicated that CR was accounted for by a latent g-factor, which itself was strongly heritable.
We conclude that CR is not a distinct disposition from CA, but instead that the reflexive and reflective aspects of cognitive ability make CR a robust and efficient test of general cognitive ability.
- <abstract>We study how to apply large language models to write grounded and organized long-form articles from scratch, with comparable breadth and depth to Wikipedia pages. This underexplored problem poses new challenges at the pre-writing stage, including how to research the topic and prepare an outline prior to writing. We propose STORM, a writing system for the Synthesis of Topic Outlines through Retrieval and Multi-perspective Question Asking. STORM models the pre-writing stage by (1) discovering diverse perspectives in researching the given topic, (2) simulating conversations where writers carrying different perspectives pose questions to a topic expert grounded on trusted Internet sources, (3) curating the collected information to create an outline. For evaluation, we curate FreshWiki, a dataset of recent high-quality Wikipedia articles, and formulate outline assessments to evaluate the pre-writing stage. We further gather feedback from experienced Wikipedia editors. Compared to articles generated by an outline-driven retrieval-augmented baseline, more of STORM's articles are deemed to be organized (by a 25% absolute increase) and broad in coverage (by 10%). The expert feedback also helps identify new challenges for generating grounded long articles, such as source bias transfer and over-association of unrelated facts.</abstract>
We study how to apply large language models to write grounded and organized long-form articles from scratch, with comparable breadth and depth to Wikipedia pages. This underexplored problem poses new challenges at the pre-writing stage, including how to research the topic and prepare an outline prior to writing.
We propose <strong>STORM</strong>, a writing system for the Synthesis of Topic Outlines through Retrieval and Multi-perspective Question Asking. STORM models the pre-writing stage by (1) discovering diverse perspectives in researching the given topic, (2) simulating conversations where writers carrying different perspectives pose questions to a topic expert grounded on trusted Internet sources, (3) curating the collected information to create an outline.
For evaluation, we curate <strong>FreshWiki</strong>, a dataset of recent high-quality Wikipedia articles, and formulate outline assessments to evaluate the pre-writing stage. We further gather feedback from experienced Wikipedia editors. Compared to articles generated by an outline-driven retrieval-augmented baseline, more of STORM’s articles are deemed to be organized (by a 25% absolute increase) and broad in coverage (by 10%).
The expert feedback also helps identify new challenges for generating grounded long articles, such as source bias transfer and over-association of unrelated facts.
- <abstract>This paper uses wealth shocks from winning lottery prizes to examine the causal effect of financial resources on fertility. We employ extensive panels of administrative data encompassing over 0.4 million lottery winners in Taiwan and implement a triple-differences design. Our analyses reveal that a substantial lottery win can increase fertility, the implied wealth elasticity of which is around 0.06. Moreover, the primary channel through which fertility increases is by prompting first births among previously childless individuals. Finally, our analysis reveals that ~25% of the total fertility effect stems from increased marriage rates following a lottery win.</abstract>
This paper uses wealth shocks from winning lottery prizes to examine the causal effect of financial resources on fertility.
We employ extensive panels of administrative data encompassing over 0.4 million lottery winners in Taiwan and implement a triple-differences design.
Our analyses reveal that a substantial lottery win can increase fertility, the implied wealth elasticity of which is around 0.06. Moreover, the primary channel through which fertility increases is by prompting first births among previously childless individuals. Finally, our analysis reveals that ~25% of the total fertility effect stems from increased marriage rates following a lottery win.
- <abstract>An expert with no inherent interest in an unknown binary state can exert effort to acquire a piece of falsifiable evidence informative of it. A designer can incentivize learning using a mechanism that provides state-dependent rewards within fixed bounds. We show that eliciting a single report maximizes information acquisition if the evidence is revealing or its content predictable. This conclusion fails when the evidence is sufficiently imprecise, the failure to find it is informative, and its contents could support either state. Our findings shed light on incentive design for consultation and forecasting by showing how learning dynamics qualitatively shape effort-maximizing contracts.</abstract>
An expert with no inherent interest in an unknown binary state can exert effort to acquire a piece of falsifiable evidence informative of it. A designer can incentivize learning using a mechanism that provides state-dependent rewards within fixed bounds.
We show that eliciting a single report maximizes information acquisition if the evidence is revealing or its content predictable. This conclusion fails when the evidence is sufficiently imprecise, the failure to find it is informative, and its contents could support either state.
Our findings shed light on incentive design for consultation and forecasting by showing how learning dynamics qualitatively shape effort-maximizing contracts.
- <abstract>In the developed world, the diagnosis of mental illness is widespread among young adults. This paper estimates the long-term causal effects of being diagnosed during young adulthood for those at the margin of diagnosis. We follow all Swedish men born between 1971 and 1983 matched to administrative panel data on health, labor market, and family outcomes to estimate the impact of a mental illness diagnosis on subsequent outcomes. Exploiting the random assignment of 18-year-old men to doctors, we find that, for people at the margin, a mental illness diagnosis increases the future likelihood of internal death, hospital admittance, being sick from work, and unemployment while also lowering expected income and the propensity to be married or have children. We find that diagnosis increases the use of psychiatric medication in the 36 months right after diagnosis. A possible interpretation of our results is that the amount and type of treatment used for marginal diagnosis may be inadequate, or inappropriate.</abstract>
In the developed world, the diagnosis of mental illness is widespread among young adults. This paper estimates the long-term causal effects of being diagnosed during young adulthood for those at the margin of diagnosis.
We follow all Swedish men born between 1971 and 1983 matched to administrative panel data on health, labor market, and family outcomes to estimate the impact of a mental illness diagnosis on subsequent outcomes. Exploiting the random assignment of 18-year-old men to doctors, we find that, for people at the margin,
a mental illness diagnosis increases the future likelihood of internal death, hospital admittance, being sick from work, and unemployment while also lowering expected income and the propensity to be married or have children. We find that diagnosis increases the use of psychiatric medication in the 36 months right after diagnosis.
A possible interpretation of our results is that the amount and type of treatment used for marginal diagnosis may be inadequate, or inappropriate.
- <abstract>World War II was arguably one of history’s largest shocks to the US economic and production system. In this paper, I argue that “managerial technology” played a key role in shaping US WWII production and its capacity to defeat some of the most advanced economies in the world. The large-scale diffusion of innovative management practices to US firms involved in war production acted as a technology that put them on a higher growth path for decades. Moreover, it made US managerial practices internationally distinctive and created the “American Way” of doing business—exported worldwide in the aftermath of the war.</abstract>
World War II was arguably one of history’s largest shocks to the US economic and production system.
In this paper, I argue that “managerial technology” played a key role in shaping US WWII production and its capacity to defeat some of the most advanced economies in the world. The large-scale diffusion of innovative management practices to US firms involved in war production acted as a technology that put them on a higher growth path for decades.
Moreover, it made US managerial practices internationally distinctive and created the “American Way” of doing business—exported worldwide in the aftermath of the war.
- <abstract>Many existing evaluation benchmarks for Large Language Models (LLMs) quickly become outdated due to the emergence of new models and training data. These benchmarks also fall short in assessing how LLM performance changes over time, as they consist of static questions without a temporal dimension. To address these limitations, we propose using future event prediction as a continuous evaluation method to assess LLMs' temporal generalization and forecasting abilities. Our benchmark, Daily Oracle, automatically generates question-answer (QA) pairs from daily news, challenging LLMs to predict "future" event outcomes. Our findings reveal that as pre-training data becomes outdated, LLM performance degrades over time. While Retrieval Augmented Generation (RAG) has the potential to enhance prediction accuracy, the performance degradation pattern persists, highlighting the need for continuous model updates.</abstract>
Many existing evaluation benchmarks for Large Language Models (LLMs) quickly become outdated due to the emergence of new models and training data. These benchmarks also fall short in assessing how LLM performance changes over time, as they consist of static questions without a temporal dimension.
To address these limitations, we propose using future event prediction as a continuous evaluation method to assess LLMs’ temporal generalization and forecasting abilities. Our benchmark, <strong>Daily Oracle</strong>, automatically generates question-answer (QA) pairs from daily news, challenging LLMs to predict “future” event outcomes.
Our findings reveal that as pre-training data becomes outdated, LLM performance degrades over time. While Retrieval Augmented Generation (RAG) has the potential to enhance prediction accuracy, the performance degradation pattern persists, highlighting the need for continuous model updates.
- <abstract>This is a reminiscence and short biographical sketch of the late philosopher and cognitive scientist Jerry Fodor. It includes a summary of his main proposals about the mind: his “Language of Thought” hypothesis; his rejection of analyticity and conceptual role semantics; his “mad dog nativism”; his proposal of mental modules and—by contrast—his skepticism about a computational theory of central cognition; his anti-reductionist, but still physicalist, views about psychology; and, lastly, his attacks on selectionism. I conclude with some discussion of his idiosyncratic style and of his esthetic and other interests. An appendix provides some memorable quotes.</abstract>
This is a reminiscence and short biographical sketch of the late philosopher and cognitive scientist Jerry Fodor.
It includes a summary of his main proposals about the mind: his “Language of Thought” hypothesis; his rejection of analyticity and conceptual role semantics; his “mad dog nativism”; his proposal of mental modules and—by contrast—his skepticism about a computational theory of central cognition; his anti-reductionist, but still physicalist, views about psychology; and, lastly, his attacks on selectionism.
I conclude with some discussion of his idiosyncratic style and of his esthetic and other interests.
An appendix provides some memorable quotes.
- <abstract>The integration of large language models (LLMs) into clinical diagnostics has the potential to transform doctor–patient interactions. However, the readiness of these models for real-world clinical application remains inadequately tested. This paper introduces the Conversational Reasoning Assessment Framework for Testing in Medicine (CRAFT-MD) approach for evaluating clinical LLMs. Unlike traditional methods that rely on structured medical examinations, CRAFT-MD focuses on natural dialogues, using simulated artificial intelligence agents to interact with LLMs in a controlled environment. We applied CRAFT-MD to assess the diagnostic capabilities of GPT-4, GPT-3.5, Mistral and LLaMA-2-7b across 12 medical specialties. Our experiments revealed critical insights into the limitations of current LLMs in terms of clinical conversational reasoning, history-taking and diagnostic accuracy. These limitations also persisted when analyzing multimodal conversational and visual assessment capabilities of GPT-4V. We propose a comprehensive set of recommendations for future evaluations of clinical LLMs based on our empirical findings. These recommendations emphasize realistic doctor–patient conversations, comprehensive history-taking, open-ended questioning and using a combination of automated and expert evaluations. The introduction of CRAFT-MD marks an advancement in testing of clinical LLMs, aiming to ensure that these models augment medical practice effectively and ethically.</abstract>
The integration of large language models (LLMs) into clinical diagnostics has the potential to transform doctor–patient interactions. However, the readiness of these models for real-world clinical application remains inadequately tested.
This paper introduces the <strong>Conversational Reasoning Assessment Framework for Testing in Medicine (CRAFT-MD)</strong> approach for evaluating clinical LLMs. Unlike traditional methods that rely on structured medical examinations, CRAFT-MD focuses on natural dialogues, using simulated artificial intelligence agents to interact with LLMs in a controlled environment.
We applied CRAFT-MD to assess the diagnostic capabilities of GPT-4, GPT-3.5, Mistral and LLaMA-2-7b across 12 medical specialties.
Our experiments revealed critical insights into the limitations of current LLMs in terms of clinical conversational reasoning, history-taking and diagnostic accuracy. These limitations also persisted when analyzing multimodal conversational and visual assessment capabilities of GPT-4V.
We propose a comprehensive set of recommendations for future evaluations of clinical LLMs based on our empirical findings. These recommendations emphasize realistic doctor–patient conversations, comprehensive history-taking, open-ended questioning and using a combination of automated and expert evaluations. The introduction of CRAFT-MD marks an advancement in testing of clinical LLMs, aiming to ensure that these models augment medical practice effectively and ethically.
- <abstract>We show that motion with as few as three degrees of freedom (for instance, a particle moving in a three-dimensional potential) can be equivalent to a Turing machine, and so be capable of universal computation. Such systems possess a type of unpredictability qualitatively stronger than that which has been previously discussed in the study of low-dimensional chaos. Even if the initial conditions are known exactly, virtually any question about their long-term dynamics is undecidable.</abstract>
We show that motion with as few as three degrees of freedom (for instance, a particle moving in a three-dimensional potential) can be equivalent to a Turing machine, and so be capable of universal computation.
Such systems possess a type of unpredictability qualitatively stronger than that which has been previously discussed in the study of low-dimensional chaos. Even if the initial conditions are known exactly, virtually any question about their long-term dynamics is undecidable.
- <abstract>We provide evidence that classic lottery anomalies like probability weighting and loss aversion  are not special phenomena of risk. They also arise (and often with equal strength) when subjects evaluate deterministic, positive monetary payments that have been disaggregated to resemble lotteries. Thus, we find, e.g., apparent probability weighting in settings without probabilities and loss aversion in settings without scope for loss. Across subjects, anomalies in these deterministic tasks strongly predicts the same anomalies in lotteries. These findings suggest that much of the behavior motivating our most important behavioral theories of risk derive from complexity-driven mistakes rather than true risk preferences.</abstract>
We provide evidence that classic lottery anomalies like probability weighting and loss aversion are not special phenomena of risk.
They also arise (and often with equal strength) when subjects evaluate deterministic, positive monetary payments that have been disaggregated to resemble lotteries. Thus, we find, eg. apparent probability weighting in settings without probabilities and loss aversion in settings without scope for loss.
Across subjects, anomalies in these deterministic tasks strongly predict the same anomalies in lotteries.
These findings suggest that much of the behavior motivating our most important behavioral theories of risk derive from complexity-driven mistakes rather than true risk preferences.
- <abstract>Derived from diffusion models, MangaNinja specializes in the task of reference-guided line art colorization. We incorporate two thoughtful designs to ensure precise character detail transcription, including a patch shuffling module to facilitate correspondence learning between the reference color image and the target line art, and a point-driven control scheme to enable fine-grained color matching. Experiments on a self-collected benchmark demonstrate the superiority of our model over current solutions in terms of precise colorization. We further showcase the potential of the proposed interactive point control in handling challenging cases, cross-character colorization, multi-reference harmonization, beyond the reach of existing algorithms.</abstract>
Derived from diffusion models, <strong>MangaNinja</strong> specializes in the task of reference-guided line art colorization.
We incorporate two thoughtful designs to ensure precise character detail transcription, including a patch shuffling module to facilitate correspondence learning between the reference color image and the target line art, and a point-driven control scheme to enable fine-grained color matching.
Experiments on a self-collected benchmark demonstrate the superiority of our model over current solutions in terms of precise colorization.
We further showcase the potential of the proposed interactive point control in handling challenging cases, cross-character colorization, multi-reference harmonization, beyond the reach of existing algorithms.
- <abstract>We consider a variant of the classical Secretary Problem. In this setting, the candidates are ranked according to some exchangeable random variable and the quest is to maximize the expected quality of the chosen aspirant. We find an upper bound for the optimal hiring rule, present examples showing it is sharp, and recover the classical case, among other results.</abstract>
We consider a variant of the classical Secretary Problem. In this setting, the candidates are ranked according to some exchangeable random variable and the quest is to maximize the expected quality of the chosen aspirant.
We find an upper bound for the optimal hiring rule, present examples showing it is sharp, and recover the classical case, among other results.
- <abstract>Glucagon-like peptide-1 (GLP-1) is involved in a range of central and peripheral pathways related to appetitive behavior. Hence, this study explored the effects of glucagon-like peptide-1 receptor agonists (GLP-1 RAs) on substance and behavioral addictions, including alcohol, caffeine, nicotine, cannabis, psychostimulants, compulsive shopping, and sex drive/libido. Data were collected from various social platforms. Keywords related to GLP-1 RAs and substance/behavioral addiction were used to extract relevant comments. The study employed a mixed-methods approach to analyze online discussions posted from December 2019 to June 2023 and collected using a specialized web application. Reddit entries were the focus here due to limited data from other platforms, such as TikTok and YouTube. A total of 5859 threads and related comments were extracted from six subreddits, which included threads about GLP-1 RAs drugs and associated brand names. To obtain relevant posts, keywords related to potential substance use and compulsive behavior were selected. Further analysis involved two main steps: (1) manually coding posts based on users’ references to the potential impact of GLP-1 RAs on substance use and non-substance habits, excluding irrelevant or unclear comments; (2) performing a thematic analysis on the dataset of keywords, using AI-assisted techniques followed by the manual revision of the generated themes. Second, a thematic analysis was performed on the keyword-related dataset, using AI-assisted techniques followed by the manual revision of the generated themes. In total, 29.75% of alcohol-related; 22.22% of caffeine-related; and 23.08% of nicotine-related comments clearly stated a cessation of the intake of these substances following the start of GLP-1 RAs prescription. Conversely, mixed results were found for cannabis intake, and only limited, anecdotal data were made available for cocaine, entactogens, and dissociative drugs’ misuse. Regarding behavioral addictions, 21.35% of comments reported a compulsive shopping interruption, whilst the sexual drive/libido elements reportedly increased in several users. The current mixed-methods approach appeared to be a useful tool in gaining insight into complex topics such as the effects of GLP-1 RAs on substance and non-substance addiction-related disorders; some GLP-1 RA-related mental health benefits could also be inferred from here. Overall, it appeared that GLP-1 RAs may show the potential to target both substance craving and maladaptive/addictive behaviors, although further empirical research is needed.</abstract>
Glucagon-like peptide-1 (GLP-1) is involved in a range of central and peripheral pathways related to appetitive behavior.
Hence, this study explored the effects of glucagon-like peptide-1 receptor agonists (GLP-1 RAs) on substance and behavioral addictions, including alcohol, caffeine, nicotine, cannabis, psychostimulants, compulsive shopping, and sex drive/libido.
Data were collected from various social platforms.
Keywords related to GLP-1 RAs and substance/behavioral addiction were used to extract relevant comments.
The study employed a mixed-methods approach to analyze online discussions posted from December 2019 to June 2023 and collected using a specialized web application.
Reddit entries were the focus here due to limited data from other platforms, such as TikTok and YouTube.
A total of 5,859 threads and related comments were extracted from 6 subreddits, which included threads about GLP-1 RAs drugs and associated brand names.
To obtain relevant posts, keywords related to potential substance use and compulsive behavior were selected.
Further analysis involved two main steps: (1) manually coding posts based on users’ references to the potential impact of GLP-1 RAs on substance use and non-substance habits, excluding irrelevant or unclear comments; (2) performing a thematic analysis on the dataset of keywords, using AI-assisted techniques followed by the manual revision of the generated themes.
In total, 29.75% of alcohol-related; 22.22% of caffeine-related; and 23.08% of nicotine-related comments clearly stated a cessation of the intake of these substances following the start of GLP-1 RAs prescription.
Conversely, mixed results were found for cannabis intake, and only limited, anecdotal data were made available for cocaine, entactogens, and dissociative drugs’ misuse.
Regarding behavioral addictions, 21.35% of comments reported a compulsive shopping interruption, whilst the sexual drive/libido elements reportedly increased in several users.
The current mixed-methods approach appeared to be a useful tool in gaining insight into complex topics such as the effects of GLP-1 RAs on substance and non-substance addiction-related disorders; some GLP-1 RA-related mental health benefits could also be inferred from here.
Overall, it appeared that GLP-1 RAs may show the potential to target both substance craving and maladaptive/addictive behaviors, although further empirical research is needed.
- <abstract>The Derelict: A series of accidents and errors amass into disaster at sea. But that's not the end of the story</abstract>
""
- <abstract>Despite its potential, issues such as maintaining visual consistency, ensuring stylistic coherence, and addressing ethical considerations continue to pose challenges. Furthermore, this paper discusses future directions and explores potential advancements in AI-assisted animation.</abstract>
""
- <abstract>We found that the probability consolidation functions mattered less than one might expect and that the method was capable of rejoining narratives in a natural way for a wide variety of differences between the two incoming texts.</abstract>
""
- <abstract>Although many organizations strive for radical or disruptive new ideas, many fall short of their goals. We propose that a primary reason for this failure is rooted in the individuals responsible for innovation: while they seek novel ideas, they prefer familiar ones. While prior research shows that individuals are biased against ideas with high objective novelty, it has overlooked the role of subjective novelty, i.e., the extent to which an idea is novel or unfamiliar to an individual idea evaluator. In this paper, we investigate how such subjective familiarity with an idea shapes idea evaluation in innovation. Drawing on research from psychology and marketing on the mere exposure effect, we argue that familiarity with an idea positively affects the evaluation’s outcome. We present two field studies and one laboratory study that support our hypothesis. This study contributes to the understanding of cognitive biases that affect innovation processes.</abstract>
Although many organizations strive for radical or disruptive new ideas, many fall short of their goals. We propose that a primary reason for this failure is rooted in the individuals responsible for innovation: while they seek novel ideas, they prefer familiar ones. While prior research shows that individuals are biased against ideas with high objective novelty, it has overlooked the role of subjective novelty, i.e., the extent to which an idea is novel or unfamiliar to an individual idea evaluator.
In this paper, we investigate how such subjective familiarity with an idea shapes idea evaluation in innovation. Drawing on research from psychology and marketing on the mere exposure effect, we argue that familiarity with an idea positively affects the evaluation’s outcome. We present two field studies and one laboratory study that support our hypothesis.
This study contributes to the understanding of cognitive biases that affect innovation processes.
- <abstract>Large language models have been proven quite beneficial for a variety of automatic speech recognition tasks in Google. We summarize results on Voice Search and a few YouTube speech transcription tasks to highlight the impact that one can expect from increasing both the amount of training data, and the size of the language model estimated from such data. Depending on the task, availability and amount of training data used, language model size and amount of work and care put into integrating them in the lattice rescoring step we observe reductions in word error rate 6%–10% relative, for systems on a wide range of operating points 17%–52% word error rate.</abstract>
Large language models have been proven quite beneficial for a variety of automatic speech recognition tasks in Google.
We summarize results on Voice Search and a few YouTube speech transcription tasks to highlight the impact that one can expect from increasing both the amount of training data, and the size of the language model estimated from such data.
Depending on the task, availability and amount of training data used, language model size and amount of work and care put into integrating them in the lattice rescoring step we observe reductions in word error rate 6%–10% relative, for systems on a wide range of operating points 17%–52% word error rate.
- <abstract>As Large Language Models (LLMs) are increasingly used for a variety of complex and critical tasks, it is vital to assess their logical capabilities in strategic environments. This paper examines their ability in strategic reasoning -- the process of choosing an optimal course of action by predicting and adapting to other agents' behavior. Using six LLMs, we analyze responses from play in classical games from behavioral economics (p-Beauty Contest, 11-20 Money Request Game, and Guessing Game) and evaluate their performance through hierarchical models of reasoning (level-k theory and cognitive hierarchy theory). Our findings reveal that while LLMs show understanding of the games, the majority struggle with higher-order strategic reasoning. Although most LLMs did demonstrate learning ability with games involving repeated interactions, they still consistently fall short of the reasoning levels demonstrated by typical behavior from human subjects. The exception to these overall findings is with OpenAI's GPT-o1 -- specifically trained to solve complex reasoning tasks -- which consistently outperforms other LLMs and human subjects. These findings highlight the challenges and pathways in advancing LLMs toward robust strategic reasoning from the perspective of behavioral economics.</abstract>
As Large Language Models (LLMs) are increasingly used for a variety of complex and critical tasks, it is vital to assess their logical capabilities in strategic environments. This paper examines their ability in strategic reasoning—the process of choosing an optimal course of action by predicting and adapting to other agents' behavior.
Using six LLMs, we analyze responses from play in classical games from behavioral economics (p-Beauty Contest, 11-20 Money Request Game, and Guessing Game) and evaluate their performance through hierarchical models of reasoning (level-k theory and cognitive hierarchy theory). Our findings reveal that while LLMs show understanding of the games, the majority struggle with higher-order strategic reasoning.
Although most LLMs did demonstrate learning ability with games involving repeated interactions, they still consistently fall short of the reasoning levels demonstrated by typical behavior from human subjects. The exception to these overall findings is with OpenAI's GPT-o1—specifically trained to solve complex reasoning tasks—which consistently outperforms other LLMs and human subjects.
These findings highlight the challenges and pathways in advancing LLMs toward robust strategic reasoning from the perspective of behavioral economics.
- <abstract>Anxiety, post-traumatic stress, and depression markedly increased worldwide during the COVID-19 pandemic. People with these conditions experience distressing intrusive thoughts, yet conventional therapies often urge them to avoid suppressing their thoughts because intrusions might rebound in intensity and frequency, worsening the disorders. In contrast, we hypothesized that training thought suppression would improve mental health. One hundred and twenty adults from 16 countries underwent 3 days of online training to suppress either fearful or neutral thoughts. No paradoxical increases in fears occurred. Instead, suppression reduced memory for suppressed fears and rendered them less vivid and anxiety provoking. After training, participants reported less anxiety, negative affect, and depression with the latter benefit persisting at 3 months. Participants high in trait anxiety and pandemic-related post-traumatic stress gained the largest and most durable mental health benefits. These findings challenge century-old wisdom that suppressing thoughts is maladaptive, offering an accessible approach to improving mental health.</abstract>
Anxiety, post-traumatic stress, and depression markedly increased worldwide during the COVID-19 pandemic. People with these conditions experience distressing intrusive thoughts, yet conventional therapies often urge them to avoid suppressing their thoughts because intrusions might rebound in intensity and frequency, worsening the disorders.
In contrast, we hypothesized that training thought suppression would improve mental health. One hundred and twenty adults from 16 countries underwent 3 days of online training to suppress either fearful or neutral thoughts.
No paradoxical increases in fears occurred. Instead, suppression reduced memory for suppressed fears and rendered them less vivid and anxiety provoking. After training, participants reported less anxiety, negative affect, and depression with the latter benefit persisting at 3 months. Participants high in trait anxiety and pandemic-related post-traumatic stress gained the largest and most durable mental health benefits.
These findings challenge century-old wisdom that suppressing thoughts is maladaptive, offering an accessible approach to improving mental health.
- <abstract>For decades, scholars in the social sciences and humanities have questioned the appropriateness and utility of prior review of their research by human subjects’ ethics committees. This essay seeks to organize thematically some of their published complaints and to serve as a brief restatement of the major critiques of ethics review. In particular, it argues that (1) ethics committees impose silly restrictions, (2) ethics review is a solution in search of a problem, (3) ethics committees lack expertise, (4) ethics committees apply inappropriate principles, (5) ethics review harms the innocent, and (6) better options exist.</abstract>
For decades, scholars in the social sciences and humanities have questioned the appropriateness and utility of prior review of their research by human subjects’ ethics committees.
This essay seeks to organize thematically some of their published complaints and to serve as a brief restatement of the major critiques of ethics review.
In particular, it argues that (1) ethics committees impose silly restrictions, (2) ethics review is a solution in search of a problem, (3) ethics committees lack expertise, (4) ethics committees apply inappropriate principles, (5) ethics review harms the innocent, and (6) better options exist.
- <abstract>The gravitational effects of a primordial black hole (PBH) passing through the human body are examined, with the goal of determining the minimum mass necessary to produce significant injury or death. Two effects are examined: the damage caused by a shock wave propagating outward from the black hole trajectory, and the dissociation of brain cells from tidal forces produced by the black hole on its passage through the human body. It is found that the former is the dominant effect, with a cutoff mass for serious injury or death of approximately M~PBH~>1.4×10^17^g. The number density of primordial black holes with a mass above this cutoff is far too small to produce any observable effects on the human population.</abstract>
The gravitational effects of a primordial black hole (PBH) passing through the human body are examined, with the goal of determining the minimum mass necessary to produce significant injury or death. Two effects are examined: the damage caused by a shock wave propagating outward from the black hole trajectory, and the dissociation of brain cells from tidal forces produced by the black hole on its passage through the human body.
It is found that the former is the dominant effect, with a cutoff mass for serious injury or death of approximately M~PBH~>1.4×10^17^g.
The number density of primordial black holes with a mass above this cutoff is far too small to produce any observable effects on the human population.
- <abstract>This article examines similar poetic conventions in Early Irish and Japanese nature poetry. The first section focuses on associations of the seasons, often used in both literatures to explore cycles of rulership, rituals both societal and personal, and phases in human experiences. The second section examines the use of dindṡenchas in Early Irish lyrics and a comparable device, the utamakura, in Japanese poetry. Dindṡenchas and utamakura add historical and literary depth to nature poetry</abstract>
This article examines similar poetic conventions in Early Irish and Japanese nature poetry.
The first section focuses on associations of the seasons, often used in both literatures to explore cycles of rulership, rituals both societal and personal, and phases in human experiences.
The second section examines the use of dindṡenchas in Early Irish lyrics and a comparable device, the utamakura, in Japanese poetry.
Dindṡenchas and utamakura add historical and literary depth to nature poetry
- <abstract>Transformer language models (LMs) exhibit behaviors -- from storytelling to code generation -- that appear to require tracking the unobserved state of an evolving world. How do they do so? We study state tracking in LMs trained or fine-tuned to compose permutations (i.e., to compute the order of a set of objects after a sequence of swaps). Despite the simple algebraic structure of this problem, many other tasks (e.g., simulation of finite automata and evaluation of boolean expressions) can be reduced to permutation composition, making it a natural model for state tracking in general. We show that LMs consistently learn one of two state tracking mechanisms for this task. The first closely resembles the "associative scan" construction used in recent theoretical work by Liu et al. (2023) and Merrill et al. (2024). The second uses an easy-to-compute feature (permutation parity) to partially prune the space of outputs, then refines this with an associative scan. The two mechanisms exhibit markedly different robustness properties, and we show how to steer LMs toward one or the other with intermediate training tasks that encourage or suppress the heuristics. Our results demonstrate that transformer LMs, whether pretrained or fine-tuned, can learn to implement efficient and interpretable state tracking mechanisms, and the emergence of these mechanisms can be predicted and controlled.</abstract>
Transformer language models (LMs) exhibit behaviors -- from storytelling to code generation -- that appear to require tracking the unobserved state of an evolving world. How do they do so?
We study state tracking in LMs trained or fine-tuned to compose permutations (i.e., to compute the order of a set of objects after a sequence of swaps). Despite the simple algebraic structure of this problem, many other tasks (e.g., simulation of finite automata and evaluation of boolean expressions) can be reduced to permutation composition, making it a natural model for state tracking in general.
We show that LMs consistently learn one of two state tracking mechanisms for this task. The first closely resembles the "associative scan" construction used in recent theoretical work by Liu et al. (2023) and Merrill et al. (2024). The second uses an easy-to-compute feature (permutation parity) to partially prune the space of outputs, then refines this with an associative scan. The two mechanisms exhibit markedly different robustness properties, and we show how to steer LMs toward one or the other with intermediate training tasks that encourage or suppress the heuristics.
Our results demonstrate that transformer LMs, whether pretrained or fine-tuned, can learn to implement efficient and interpretable state tracking mechanisms, and the emergence of these mechanisms can be predicted and controlled.
- <abstract>This article uses material which has recently been made available from Russian archives to analyse the causes of repressed inflation in the Soviet consumer market. It finds that retail price subsidies, which increased as a proportion of state budget expenditure from 4 per cent in 1965 to 20 per cent in the late 1980s, intensified consumer market disequilibrium. The provision of these subsidies had negative effects on the market by maintaining the purchasing power of households for consumer goods and by increasing the budget deficit. The unauthorized purchase of consumer goods by enterprises tended to increase during these years also.</abstract>
This article uses material which has recently been made available from Russian archives to analyse the causes of repressed inflation in the Soviet consumer market.
It finds that retail price subsidies, which increased as a proportion of state budget expenditure from 4 per cent in 1965 to 20 per cent in the late 1980s, intensified consumer market disequilibrium.
The provision of these subsidies had negative effects on the market by maintaining the purchasing power of households for consumer goods and by increasing the budget deficit. The unauthorized purchase of consumer goods by enterprises tended to increase during these years also.
- <abstract>This paper analyzes the role and representation of food in the Japanese manga and anime series Golden Kamuy (Gōruden Kamui, 2014-present) by Satoru Noda. While ostensibly an action-packed narrative about the hunt for indigenous Ainu gold, Golden Kamuy has earned a reputation among fans as a gurume , or gourmet, manga and anime. Cooking scenes of both Ainu and Wajin food feature prominently throughout the series. This paper analyzes the rhetorical role of food in the fictional narrative and its connection to official and unofficial ideologies about ethnic harmony between Ainu and Wajin communities in Japan.</abstract>
This paper analyzes the role and representation of food in the Japanese manga and anime series Golden Kamuy (Gōruden Kamui, 2014-present) by Satoru Noda.
While ostensibly an action-packed narrative about the hunt for indigenous Ainu gold, Golden Kamuy has earned a reputation among fans as a gurume , or gourmet, manga and anime. Cooking scenes of both Ainu and Wajin food feature prominently throughout the series.
This paper analyzes the rhetorical role of food in the fictional narrative and its connection to official and unofficial ideologies about ethnic harmony between Ainu and Wajin communities in Japan.
- <abstract>I defend the extremist position that the fundamental ontology of the world consists of a vector in Hilbert space evolving according to the Schrödinger equation. The laws of physics are determined solely by the energy eigenspectrum of the Hamiltonian. The structure of our observed world, including space and fields living within it, should arise as a higher-level emergent description. I sketch how this might come about, although much work remains to be done.</abstract>
I defend the extremist position that the fundamental ontology of the world consists of a vector in Hilbert space evolving according to the Schrödinger equation.
The laws of physics are determined solely by the energy eigenspectrum of the Hamiltonian. The structure of our observed world, including space and fields living within it, should arise as a higher-level emergent description.
I sketch how this might come about, although much work remains to be done.
- <abstract>We propose “Insect-Computer Hybrid Speaker”, which enables us to make musics made from combinations of computer and insects. Lots of studies have proposed methods and interfaces for controlling insects and obtaining feedback. However, there have been less research on the use of insects for interaction with third parties. In this paper, we propose a method in which cicadas are used as speakers triggered by using Electrical Muscle Stimulation (EMS). We explored and investigated the suitable waveform of chirp to be controlled, the appropriate voltage range, and the maximum pitch at which cicadas can chirp.</abstract>
We propose <strong>Insect-Computer Hybrid Speaker</strong>, which enables us to make musics made from combinations of computer and insects. Lots of studies have proposed methods and interfaces for controlling insects and obtaining feedback. However, there have been less research on the use of insects for interaction with third parties.
In this paper, we propose a method in which cicadas are used as speakers triggered by using Electrical Muscle Stimulation (EMS).
We explored and investigated the suitable waveform of chirp to be controlled, the appropriate voltage range, and the maximum pitch at which cicadas can chirp.
- <abstract>Artists face choices between the pecuniary benefits of selling to the market and the non-pecuniary benefits of creating to please their own tastes. We examine how changes in wages, lump-sum income, and capital-labor ratios affect the artist’s pursuit of self-satisfaction versus market sales. Using our model of labor supply, we consider the economic forces behind the high/low culture split, why some artistic media offer greater scope for the avant-garde than others, why so many artists dislike the market, and how economic growth and taxation affect the quantity and form of different kinds of art.</abstract>
Artists face choices between the pecuniary benefits of selling to the market and the non-pecuniary benefits of creating to please their own tastes.
We examine how changes in wages, lump-sum income, and capital-labor ratios affect the artist’s pursuit of self-satisfaction versus market sales.
Using our model of labor supply, we consider the economic forces behind the high/low culture split, why some artistic media offer greater scope for the avant-garde than others, why so many artists dislike the market, and how economic growth and taxation affect the quantity and form of different kinds of art.

[End of examples. Reminder: your primary task is to split into multiple logical paragraphs by topic.]

- <abstract>{target}</abstract>
"""}
  ]
)

print(completion.choices[0].message.content)
