Inference of gene regulatory networks for overcoming low performance in real-world data

Yusuke Hiki
Yuta Tokuoka
Takahiro G. Yamada
Akira Funahashi

This article has been Reviewed by the following groups

Read the full article

Discuss this preprint

Start a discussion What are Sciety discussions?

Listed in

Evaluated articles (Arcadia Science)

Abstract

The identification of gene regulatory networks is important for understanding the mechanisms of various biological phenomena. Many methods have been proposed to infer networks from time-series gene expression data obtained by high-throughput next-generation sequencings. Such methods can effectively infer gene regulatory networks for in silico data, but inferring the networks accurately from in vivo data remiains a challenge because of the large noise and low time sampling rate. Here, we proposed a novel unsupervised learning method, Multi-view attention Long-short term memory for Network inference (MaLoN). It can infer gene regulatory networks with temporal changes in gene regulation using the multi-view attention Long Short-term memory model. Using in vivo benchmark datasets in Saccharomyces cerevisiae and Escherichia coli , we showed that MaLoN can infer gene regulatory networks more accurately than existing methods. The ablated models indicated that the multi-view attention mechanism suppressed false positives. The order of activation of gene regulations inferred by MaLoN was consistent with existing knowledge.

Arcadia Science
Aug 6, 2024

Thank you for your response and helping me clarify my understanding I really appreciate it.

Read the original source
Arcadia Science
Aug 6, 2024

The former is correct. Strictly speaking, in the Max F-measure metric we have used, s is chosen such that the F-mesure is maximal.

Read the original source
Arcadia Science
Aug 6, 2024

I'm Yusuke Hiki, first author of the manuscript. Thanks for your reading details and giving review comments.

Yes, you are right. As illustrated as an input to the Encoder in Fig. 1, the matrix masking all but the expression of gene j is represented as x_{-i, j}(t), which is a different notation from x_{j}(t).

Read the original source
Arcadia Science
Aug 6, 2024

Thanks for your interesting opinions. As you say, we used LSTM because we needed to consider temporal order to represent the effects of real gene regulation. We agree with your idea that models such as Transformer, which predict t0 from t10 or t20, may be able to infer temporal causality. On the other hand, a major advantage of our method MaLoN is that it is possible to interpret from the inferred attention maps at which time (time-view) and between which genes (gene-view) regulation is active. Inference with models such as Transformer makes it possible to interpret the gene-view but not the time-view. Therefore, we believe that while Transformer can be applied to network inference based on time-series data, the best solution is to use LSTM.

Read the original source
Arcadia Science
Aug 6, 2024

I'm Yusuke Hiki, first author of the manuscript. Thank you for your review comments.

Your interpretation is correct. Changes in gene expression are generally caused by gene regulation with nonlinear response, such as Hill's equation. Therefore, it is necessary to perform accurate regression of such nonlinear responses, and methods based on linear regression are not sufficient to correctly detect the causality.

Read the original source
Arcadia Science
Aug 1, 2024

This cannotbe achieved with architectures such as Transformer [25],

First let me say that I really like your pre-print. It takes good understanding of the basic biology and leverages the best thing about neural network based models: the ability to customize your modeling architecture to take advantage of expert knowledge about a particular inferential problem. In this case you have chosen LSTM because you know that, typically, the influence of one gene on another occurs in temporal order. As you suggest in this sentence, that ordering isn't something that a Transformer architecture takes advantage of, and explicitly so. However, I would suggest that a Transformer architecture would be extremely useful in the analysis of time varying expression data for the following reason. While you are looking for 1:1 connections (networks) here …

This cannotbe achieved with architectures such as Transformer [25],

First let me say that I really like your pre-print. It takes good understanding of the basic biology and leverages the best thing about neural network based models: the ability to customize your modeling architecture to take advantage of expert knowledge about a particular inferential problem. In this case you have chosen LSTM because you know that, typically, the influence of one gene on another occurs in temporal order. As you suggest in this sentence, that ordering isn't something that a Transformer architecture takes advantage of, and explicitly so. However, I would suggest that a Transformer architecture would be extremely useful in the analysis of time varying expression data for the following reason. While you are looking for 1:1 connections (networks) here and analyzing your data in a temporal order makes sense for that task, the genes, their expression, their protein expression are all parts of dynamical system. And, following a discrete experimental change you are likely to cause a long running cascade that could cause long range down stream defects in expression. As a result, if you are going to predict gene expression at time t0, it seems entirely reasonable that information at time t10 or t20 may be relevant. Do you think this is so? Could transformers be usefully applied to expression data of this variety?

Read the original source
Arcadia Science
Aug 1, 2024

and then the regressionof target gene expression cannot be sufficiently learned

I find this clause a little hard to parse. I presume this is suggesting that the regression is a linear regression, and that when you try to fit a linear regression to data that are determined by a non-linear system you can't accurately fit the data, is that right?

Read the original source
Arcadia Science
Jul 30, 2024

P recisions

In this case does s denote tuning to how results are classified when calling true and false positives and negatives? Or weight terms in the F-measure score it self?

Read the original source
Arcadia Science
Jul 30, 2024

x−i,j (t), which is the expression of gene j (the expres-sion of all other genes is masked)

Is this a vector with only a value at position j? So a vector of size N with only position j having a value set, hence being different than x_{j}(t)?

Read the original source
Version published to 10.1101/2024.07.16.603684 on bioRxiv
Jul 18, 2024

Constructing Gene Regulatory Network using Chatterjee’s Rank Correlation with Single-cell Transcriptomic Data

This article has 5 authors:
1. Shreyan Gupta
2. Anamitra Chaudhuri
3. Vishnuvasan Raghuraman
4. Yang Ni
5. James J. Cai
This article has no evaluationsLatest version Sep 20, 2025
Interpretable gene network inference with nonlinear causality

This article has 2 authors:
1. Madison S. Krieger
2. William Gilpin
This article has no evaluationsLatest version Sep 29, 2025
Dynamic Gene Regulatory Network Inference with Interpretable, Biophysically-Motivated Neural ODEs

This article has 4 authors:
1. Maggie Beheler-Amass
2. Chrisopher A Jackson
3. David Gresham
4. Richard Bonneau
This article has no evaluationsLatest version Sep 21, 2025

This article has been Reviewed by the following groups

Discuss this preprint

Listed in

Abstract

Article activity feed

Related articles

Constructing Gene Regulatory Network using Chatterjee’s Rank Correlation with Single-cell Transcriptomic Data

Interpretable gene network inference with nonlinear causality

Dynamic Gene Regulatory Network Inference with Interpretable, Biophysically-Motivated Neural ODEs