Error report for Joel et al. (2017)

Read the full article See related articles

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

This article is not in any list yet, why not save it to one of your lists.
Log in to save this article

Abstract

Joel, Eastwick, & Finkel (2017) "Is Romantic Desire Predictable? Machine Learning Applied to Initial Romantic Attraction" was determined to contain Minor Errors that do not affect the core conclusions of the manuscript. That is, errors that have the benefit of being detectable thanks to the presence and sharing of research materials, but whose scope and implications are minor. The detected errors do not rise to the level where I would recommend that a correction be issued. The article used machine learning (random forests) to predict romantic desire from participant traits and preferences, finding that actor and partner desire could be modestly predicted before people met but relationship-specific desire could not. This study has been influential, with over 200 citations on Google Scholar, and addresses questions of broad public and commercial interest about the predictability of romantic attraction.The reviewer, Dr Florian Pargent, conducted a thorough re-execution reproduction and identified several minor issues: small transcription errors (e.g., reporting 38 instead of 40 predictors, 1.30% instead of 1.34% variance explained), undocumented handling of missing values, and methodological choices (e.g., the use of out-of-bag estimates without nested resampling, not accounting for dependency in observations) that would be handled differently by current best practices. Critically, none of these issues affect the core conclusions of the article.A key lesson from this review is that the authors' decision to validate their models on an independent sample acted as a methodological safety net. The reviewer noted concerns about optimistic performance estimates from the within-sample analyses (Table 2). Had these been the only numbers reported, readers’ conclusions might have changed upon reading the reviewer’s criticism that these out-of-bag estimates without nested resampling overestimate performance. However, this was effectively addressed by the more conservative training-testing analyses using independent samples (Table 3). Including a straightforward replication in the study meant that some of the shortcomings of random forest models were reined in. Potential biases from variable selection on the full data, and from ignoring observation dependencies, did not ultimately compromise the paper's conclusions about real-world predictive performance.

Article activity feed