Introduction
Behavioural science struggles to be cumulative in part because scientists in many fields fail
to agree on core
constructs
(Bainbridge et al., 2022; Sharp et al., 2023)
.
The literature silos,
which consequently develop,
can appear unconnected but pursue the same phenomena
under different
labels (see e.g., grit and conscientiousness
;
Credé et al., 2017)
.
One reason why connections are lacking is the asymmetry inherent in measure and
construct validation: adding novel constructs to the pile is easier than sorting through it.
Investigators can easily invent a new ad
-
hoc measure and benefit reputationally if
a new
construct becomes associated with their name
(Elson et al., 2023; Flake & Fried, 2020)
.
By
contrast, finding out whether a purported new construct or measure is redundant with the
thousands of existing ones is cumbersome and can cause conflict with other researchers
(Bainbridge et al., 2022; Elson et al., 2023)
.
The same holds for replicating construct
validation studies and reporting evidence of
overfitting
or other problems
(Hussey et al.,
2024; Kopalle & Lehmann, 1997)
.
Untangling the "nomological net"
—
a term coined by Cronbach and
Meehl
(1955)
to
describe the relationships between measures and constructs
—
has become increasingly
difficult
given the
growing
number
of published
measures
(Anvari et al., 2024; Elson et al.,
2023)
.
Conventional construct validation methods, though effective in mapping these
relationships, do not scale to, for instance, the thousands of
measures
that might be related
to neuroticism. To tackle this problem, Condon and
Revelle
(
2015; see also
Condon, 2017;
Condon et al., 2017)
have championed the Synthetic Aperture Personality Assessment in
which survey participants respond to a small random selection
of
a large set of items from
the personality literature. Over time, as the sample size grows, this procedure allows
estimating pairwise correlations between all items. Although the approach is efficient, each
new
item requires thousands of participants to answer the survey before it can
be
correlated with all
existing
items.
Hence,
t
he approach cannot be used to quickly evaluate
new proposed scales.
What is missing is
an efficient way to pr
ioritise, to
prune the growth
in constructs and measures and to sort through the
disorganised
pile of existing measures.
Natural language processing could
provide this
efficiency
. In the social and
behavioural
sciences, subjective self
-
reports are one of the predominant forms of measurement. The
textual nature of survey items lends itself to natural language processing.
Recently,
t
ransformer models have become the state
-
of
-
the
-
art in language
models
(Vaswani et al.,
2017)
, displaying proficiency in understanding and generating text. They have dramatically
reduced the costs of many tasks and chores, notably in programming and generating
images from
verbal
prompts. Although capabilities for natural language generation are
currently more visible in the public eye through the use of chat
-
like interfaces, they are
backed by capabilities in natural language understanding (e.g., classifying or extracting
features
from text).
On a technical level, t
his understanding is implemented by the so
-
called encoder block,
which processes input text and outputs a vector representation. The representation of a
word like “party” in a semantic vector space is context
-
dependent. The same word will
yield a different
vector representation if it occurs in the statement “I am the life of the party”