Real-time, low-latency closed-loop feedback using markerless posture tracking
This article has been Reviewed by the following groups
Discuss this preprint
Start a discussion What are Sciety discussions?Listed in
- Evaluated articles (eLife)
Abstract
The ability to control a behavioral task or stimulate neural activity based on animal behavior in real-time is an important tool for experimental neuroscientists. Ideally, such tools are noninvasive, low-latency, and provide interfaces to trigger external hardware based on posture. Recent advances in pose estimation with deep learning allows researchers to train deep neural networks to accurately quantify a wide variety of animal behaviors. Here we provide a new DeepLabCut-Live! package that achieves low-latency real-time pose estimation (within 15 ms, >100 FPS), with an additional forward-prediction module that achieves zero-latency feedback, and a dynamic-cropping mode that allows for higher inference speeds. We also provide three options for using this tool with ease: (1) a stand-alone GUI (called DLC-Live! GUI ), and integration into (2) Bonsai and (3) AutoPilot . Lastly, we benchmarked performance on a wide range of systems so that experimentalists can easily decide what hardware is required for their needs.
Article activity feed
-
###Reviewer #3
The authors provide a clear and effective response to the demand for robust real-time pose estimation software with closed-loop feedback capabilities. In addition, we appreciate the effort that the authors have put into making the software user-friendly and extensible. The paper is very well written and contains many tools for those in the field to effectively use.
A small weakness is the authors have demonstrated the LED flash latency but do not show an application such as optogenetic stimulation or behavioural manipulation using the system. Also, most of their benchmark numbers are based on videos and not camera streams, this does not fully address potential hardware issues. I believe the heavy dependence on video data and not actual ground truth live video feed is something that should be checked to present accurate …
###Reviewer #3
The authors provide a clear and effective response to the demand for robust real-time pose estimation software with closed-loop feedback capabilities. In addition, we appreciate the effort that the authors have put into making the software user-friendly and extensible. The paper is very well written and contains many tools for those in the field to effectively use.
A small weakness is the authors have demonstrated the LED flash latency but do not show an application such as optogenetic stimulation or behavioural manipulation using the system. Also, most of their benchmark numbers are based on videos and not camera streams, this does not fully address potential hardware issues. I believe the heavy dependence on video data and not actual ground truth live video feed is something that should be checked to present accurate numbers.
Their Kalman filter approach seems useful but the deviations in pose estimation prediction from the normal pose estimation are sometimes 30 px or more. People may make trade-offs between latency and accuracy when using this software. Another important factor for real-time tracking is the accuracy of the pose estimation, it determines whether the system is really useful in true application.
It would be nice to see a bit more validation of the software in a realistic live stream context. The quality of their code is quite high.
The authors emphasize that their software enables "low-latency real-time pose estimation (within 15 ms, at >100 FPS)". Upon inspection of table 2, it appears that this range of latency and speed combinations is primarily achieved using 176x137 px images on Windows/Linux GPU based hardware, with corresponding FPS dropping to well below 100 for larger images in the DLCLive benchmarking tool on all platforms except for Windows. As the range in framerate/latency combinations appears to vary quite a bit between setups and frame sizes, we would suggest including a more realistic range for the latency and framerate in the abstract or at least mention the heavily down-sampled video used.
In table 2, the mean and SD latency appear to be stable across modes, frame sizes, and GPU setups. However, there appears to be a notable spike in the latency range (14 {plus minus} 73) for the image acquisition to LED time on Windows computers that stands out from other latency figures. This latency range is concerning for the consistency of real-time feedback applications on a platform and at a frame size that is likely to be commonly used. Would the authors be able to explain a possible reason for this large SD?
The DLG values appear to have been benchmarked using an existing video as opposed to a live camera feed. It is conceivable that a live camera feed would experience different kinds of hardware-based bottlenecks that are not present when streaming in a video (e.g., USB2 vs. USB3 vs. ethernet vs. wireless). Although this point is partially addressed with the demonstration of real-time feedback based on posture later in the manuscript, a replication of the DLG benchmark with a live stream from a camera at 100 FPS would be helpful to demonstrate frame rates and latency given the hardware bottlenecks introduced by cameras.
In Figure 3, the measurement of the latency from frame to led is not very clear. The DLC will always give pose estimation even when the tongue is not appeared in the image so the LED will always be turning on very quickly after obtaining the pose from the image.
In "Real-time feedback based on posture", the Kalman filter approach to reduce latency through forward prediction is innovative and likely of use for rapid characterization of general behaviours. In Figure 8C, the deviation of pose predictions from non-forward predicted poses appears to follow the general trend of the trajectory but appears to deviate by as many as 50 pixels from the non-forward predicted poses. While this tolerance may be acceptable for general pose estimation, many closed-loop pose estimation implementations may focus on rapid and accurate feedback based on very small movements (e.g. small muscular movements). For example, movements differing in magnitude by a few pixels may distinguish spontaneous twitches from conditioned behaviours. Considering that the demonstrated setup achieves a mean image to LED latency of 82 ms without the Kalman filter, it appears that many users would have to make a large trade-off between accuracy and latency in order to use the system with a conventional webcam and reasonably priced setup. Although the methods discussed are state-of-the-art and impressive considering the hardware used, it may be helpful to include a discussion of how the Kalman filter approach may be improved in the future to improve pose estimation accuracy while maintaining low latency.
The software is compared favourably to existing real-time tracking software in terms of latency (refs 12-14). The efficacy of the existing realtime pose estimation software has been validated on animal movements using closed-loop conditioning paradigms. If feasible, a demonstration of the software reinforcing an animal based on real-time pose estimation (e.g. a similar paradigm to that used in the DLG benchmark video) would provide useful context as to whether the pose estimation strategies discussed are effective in closed-loop experiments. In particular, this would be important to evaluate given the novel Kalman filter approach - which influences the accuracy of pose estimation. We list this closed loop experiment as optional given the pandemic conditions we face. In contrast to the live animal reinforcement experiment, we do feel that real world streaming video to output trigger latencies are required (pt #3).
-
###Reviewer #2
Kane et al. introduce a new set of software tools for implementing real-time, marker-less pose tracking. The manuscript describes these tools, presents a series of benchmarks and demonstrates their use in several experimental settings, which include deploying very low-latency closed-loop events triggered on pose detection. The software core is based on DeepLabCut (DLC), previously developed by the senior authors. The first key development presented is a new python package – DeepLabCut-Live! – which optimises pose inference to increase its speed, a key step for real-time application of DLC. The authors then present a new method for exporting trained DLC networks in a language-independent format and demonstrate how these can be used in three different environments to deploy experiments. Importantly, in addition to …
###Reviewer #2
Kane et al. introduce a new set of software tools for implementing real-time, marker-less pose tracking. The manuscript describes these tools, presents a series of benchmarks and demonstrates their use in several experimental settings, which include deploying very low-latency closed-loop events triggered on pose detection. The software core is based on DeepLabCut (DLC), previously developed by the senior authors. The first key development presented is a new python package – DeepLabCut-Live! – which optimises pose inference to increase its speed, a key step for real-time application of DLC. The authors then present a new method for exporting trained DLC networks in a language-independent format and demonstrate how these can be used in three different environments to deploy experiments. Importantly, in addition to developing their own GUI, the authors have developed plugins for Bonsai and AutoPilot, two software packages already widely used by the systems neuroscience community to run experiments.
The tools presented here are truly excellent and very exciting. In my view DLC has already started a revolution in the quantification of animal behaviour experiments and DeepLabCut-Live! is exactly what the community has been hoping for – to deploy the power of DLC in real-time to perform closed-loop experiments. I have very little doubt that the tools described in this manuscript and their future versions will be a mainstay of systems neuroscience very quickly and for years to come. Key to this is that the software is entirely OpenAccess and easy to deploy with inexpensive hardware. I commend, and as a DLC user, I certainly thank the authors for their efforts. I have a couple of comments below on the manuscript itself, which the authors might want to consider. As for the software itself, all of the benchmarks look good and the case studies make a compelling case for its applicability in real-life – and the beauty of it is that because its Open Access, any issues and improvements needed will be quickly spotted by the community, and I expect duly addressed by the authors judging from their track-record on DLC.
Main comments:
One important parameter that is not really discussed throughout the manuscript is the accuracy of pose estimation. I realize that this might be more of a discussion on DLC itself, but still, when relying on DLC to run closed-loop experiments this becomes a critical parameter. While offline we can just go back, re-train a new network and try again, in a real-time experiment, classification errors might be very costly. The manuscript would benefit from discussing these errors and how they can be best minimised. It would also be helpful to show rates for positive and false negative classification errors for the networks and use-cases presented here, to highlight the main parameters that determine them and perhaps show how classification errors vary as a function of these parameters (e.g., do any of the procedures to decrease inference latency, such as decreasing image resolution or changing the type of network, affect classification accuracy?). Along the same lines, while the use of Kalman Filters to achieve sub-zero latencies is very exciting, it is unclear how robust this approach is. This applies not only to the parameters of the filter itself, but also on the types of behaviour that this approach can work with successfully. Presumably, this requires a high degree of stereotypy and reproducibility of the actions being tracked and I feel that some discussion on this would be valuable.
A related point is that some applications are likely to depend on the detection of many key-points and it is unclear how the number of key-points affects inference speed. For example, the 'light detection task' using AutoPilot uses a single key-point, how would the addition of more key-points affect performance in this particular configuration?
-
###Reviewer #1
The authors present a new software suite enabling real-time markerless posture tracking - with the aim of making low-latency feedback in behavioral experiments possible. They demonstrate the software's capability on a variety of hardware and software platforms – including GPUs, CPUs, different operating systems, and the Bonsai data acquisition platform. Moreover, they demonstrate the real-time feedback capabilities of DeepLabCut-Live!.
While there have been other methods that have been introduced recently that have incorporated real-time feedback on top of DeepLabCut, this software shows improved latency, has cross-platform capabilities, and is relatively easy to use. The software was thoroughly benchmarked (with one small exception that I'll outline below), and although I wasn't able to directly test it myself, I was …
###Reviewer #1
The authors present a new software suite enabling real-time markerless posture tracking - with the aim of making low-latency feedback in behavioral experiments possible. They demonstrate the software's capability on a variety of hardware and software platforms – including GPUs, CPUs, different operating systems, and the Bonsai data acquisition platform. Moreover, they demonstrate the real-time feedback capabilities of DeepLabCut-Live!.
While there have been other methods that have been introduced recently that have incorporated real-time feedback on top of DeepLabCut, this software shows improved latency, has cross-platform capabilities, and is relatively easy to use. The software was thoroughly benchmarked (with one small exception that I'll outline below), and although I wasn't able to directly test it myself, I was easily able to download the code, and the documentation was sufficient for me to understand how it works. I have every confidence that this is a piece of software that will be extensively used by the field.
My one comment is that it would have been good to have some analysis as to how the network accuracy (i.e., real space – not pixel space – error in tracking) scales with resolution, as the fundamental tracking trade-off isn't image size vs. speed, it's accuracy vs. speed. I wouldn't call this an essential revision, but I think that including these curves would greatly help potential users make important hardware and software decisions. Granted, this difference will alter depending on the network, but even getting a sense from the Dog and Mouse networks here would likely be sufficient to provide a general sense.
-
##Preprint Review
This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.
###Summary
This submission introduces a new set of software tools for implementing real-time, marker-less pose tracking. The manuscript describes these tools, presents a series of benchmarks and demonstrates their use in several experimental settings, which include deploying very low-latency closed-loop events triggered on pose detection. The software core is based on DeepLabCut (DLC), previously developed by the senior authors. The first key development presented is a new python package – DeepLabCut-Live! – which …
##Preprint Review
This preprint was reviewed using eLife’s Preprint Review service, which provides public peer reviews of manuscripts posted on bioRxiv for the benefit of the authors, readers, potential readers, and others interested in our assessment of the work. This review applies only to version 1 of the manuscript.
###Summary
This submission introduces a new set of software tools for implementing real-time, marker-less pose tracking. The manuscript describes these tools, presents a series of benchmarks and demonstrates their use in several experimental settings, which include deploying very low-latency closed-loop events triggered on pose detection. The software core is based on DeepLabCut (DLC), previously developed by the senior authors. The first key development presented is a new python package – DeepLabCut-Live! – which optimizes pose inference to increase its speed, a key step for real-time application of DLC. The authors then present a new method for exporting trained DLC networks in a language-independent format and demonstrate how these can be used in three different environments to deploy experiments. Importantly, in addition to developing their own GUI, the authors have developed plugins for Bonsai and AutoPilot, two software packages already widely used by the systems neuroscience community to run experiments.
All three reviewers agreed that this work is exciting, carefully done, and would be of interest to a wide community of researchers. There were, however, four points that the reviewers felt could be addressed to increase the scope and the influence of the work (enumerated below).
The fundamental trade-off in tracking isn't image size vs. speed, but rather accuracy vs. speed. Thus, the reviewers felt that providing a measure of how the real space (i.e., not pixel space) accuracy of the tracking was affected by changing the image resolution would be very helpful to researchers wishing to design experiments that utilize this software.
The manuscript would also benefit from including additional details about the Kalman filtering approach used here (as well as, potentially, further discussion about how it might be improved in future work). For instance, while the use of Kalman Filters to achieve sub-zero latencies is very exciting, it is unclear how robust this approach is. This applies not only to the parameters of the filter itself, but also on the types of behavior that this approach can work with successfully. Presumably, this requires a high degree of stereotypy and reproducibility of the actions being tracked and the reviewers felt that some discussion on this point would be valuable.
A general question that the reviewers had was how the number of key (tracked) points affects the latency. For example, the 'light detection task' using AutoPilot uses a single key-point, how would the addition of more key-points affect performance in this particular configuration? More fully understanding this relationship would be very helpful in guiding future experimental design using the system.
The DLG values appear to have been benchmarked using an existing video as opposed to a live camera feed. It is conceivable that a live camera feed would experience different kinds of hardware-based bottlenecks that are not present when streaming in a video (e.g., USB3 vs. ethernet vs. wireless). Although this point is partially addressed with the demonstration of real-time feedback based on posture later in the manuscript, a replication of the DLG benchmark with a live stream from a camera at 100 FPS would be helpful to demonstrate frame rates and latency given the hardware bottlenecks introduced by cameras. If this is impossible to do at the moment, however, at minimum, adding a discussion stating that this type of demonstration is currently missing and outlining these potential challenges would be important.
-