skip to main content
Ultra-Low-Power Holistic Design for
Smart Bio-signals Computing Platforms




  • Tobias Gemmeke et al., “Memory system design for NTC: Memories for NTC,” in M. Huebner, C. Silvano (Eds.), “Near Threshold Computing”, Springer, Chapter 5, in press.

  • Daniele Bortolotti, Andrea Bartolini, Mauro Mangia, Riccardo Rovatti, Gianluca Setti and Luca Benini, “Energy-Aware Bio-signal Compressed Sensing Reconstruction: FOCUSS on the WBSN-gateway”, accepted to International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC-15), September 23-25, 2015, Turin, Italy.

  • Hossein Mamaghanian, Pierre Vandergheynst, "Ultra-low-power ECG front-end design based on compressed sensing", Proc. of Design, Automation and Test in Europe Conference (DATE ‘15), Grenoble, France, ISBN: 978-3-9815370-4-8, pp. 671 – 676, March 9th-13th, 2015.

  • Hossein Mamaghanian, D. Atienza, P. Vandergheynst, “Ultra-Low Power Design of Multimodal Bio-Signal Wearable Systems”, ENTRA Workshop, Malaga 6-7 May

  • H. Mamaghanian, D. Atienza Alonso and P. Vandergheynst (Dirs.). Compressed sensing: a universal energy-efficient compression scheme for biosignals on wireless body sensor nodes. EPFL, Lausanne, 2014.

  • Rubén Braojos, Hossein Mamaghanian, Alair Dias Junior, Giovanni Ansaloni, David Atienza, Francisco J. Rincón, and Srinivasan Murali. “Ultra-Low Power Design of Wearable Cardiac Monitoring Systems”. In Proceedings of the 51st Annual Design Automation Conference (DAC), 2014. New York, NY, USA.

    Abstract: This paper presents the system-level architecture of novel ultra-low power wireless body sensor nodes (WBSNs) for real-time cardiac monitoring and analysis, and discusses the main design challenges of this new generation of medical devices. In particular, it highlights first the unsustainable energy cost incurred by the straightforward wireless streaming of raw data to external analysis servers. Then, it introduces the need for new cross-layered design methods (beyond hardware and software boundaries) to enhance the autonomy of WBSNs for ambulatory monitoring. In fact, by embedding more onboard intelligence and exploiting electrocardiogram (ECG) specific knowledge, it is possible to perform real-time compressive sensing, filtering, delineation and classification of heartbeats, while dramatically extending the battery lifetime of cardiac monitoring systems. The paper concludes by showing the results of this new approach to design ultra-low power wearable WBSNs in a real-life platform commercialized by SmartCardia. This wearable system allows a wide range of applications, including multi-lead ECG arrhythmia detection and autonomous sleep monitoring for critical scenarios, such as monitoring of the sleep state of airline pilots.

  • Rubén Braojos, Ahmed Dogan, Ivan Beretta, Giovanni Ansaloni and David Atienza. “Hardware/Software Approach for Code Synchronization in Low-Power Multi-Core Sensor Nodes”. In Proceedings of the Design, Automation and Test in Europe (DATE) Conference, 2014. Dresden, Germany.

    Abstract: Latest embedded bio-signal analysis applications, targeting low-power Wireless Body Sensor Nodes (WBSNs), present conflicting requirements. On one hand, bio-signal analysis applications are continuously increasing their demand for high computing capabilities. On the other hand, long-term signal processing in WBSNs must be provided within their highly constrained energy budget. In this context, parallel processing effectively increases the power efficiency of WBSNs, but only if the execution can be properly synchronized among computing elements. To address this challenge, in this work we propose a hardware/software approach to synchronize the execution of bio-signal processing applications in multi-core WBSNs. This new approach requires little hardware resources and very few adaptations in the source code. Moreover, it provides the necessary flexibility to execute applications with an arbitrarily large degree of complexity and parallelism, enabling considerable reductions in power consumption for all multi-core WBSN execution conditions. Experimental results show that a multi-core WBSN architecture using the illustrated approach can obtain energy savings of up to 40%, with respect to an equivalent single-core architecture, when performing advanced bio-signal analysis.

  • Hossein Mamaghanian, Giovanni Ansaloni, David Atienza and Pierre Vandergheynst. “Power-Efficient Joint Compressed Sensing of Multi-Lead ECG Signals”. In Proceedings of the 39th International Conference on Acoustics, Speech and Signal Processing (ICASSP) Conference, 2014. Pisa, Italy.

    Abstract: Compressed Sensing (CS) is a new acquisition- compression paradigm for low-complexity energy-aware sensing and compression. By merging both sampling and compression, CS is very promising to develop practical ultra-low power read- out systems for wireless bio-signal monitoring devices, where large amounts of sensor data need to be transferred through power-hungry wireless links. Lately CS has been successfully applied for real-time energy- aware single-lead ECG compression on resource-constrained Wireless Body Sensor Network (WBSN) motes. Building on our previous work, in this paper we propose a new and promising approach for joint compression of multi-lead ECG signals, where strong correlations exist between them. This situation that exhibit strong correlations, can be exploited to reduce even further amount of data to be transmitted wirelessly, thus addressing the important challenge of ultra-low-power embedded monitoring of multi-lead ECG signals.

  • Jelena Milosevic, Andreas Dittrich, Alberto Ferrante, Miroslav Malek, Daniel Camilo Rojas Quirós, Rubén Braojos, Giovanni Ansaloni, and David Atienza. "Risk Assessment of Atrial Fibrillation: a Failure Prediction Approach". In Proceedings of Computing in Cardiology, 2014. Cambridge, MA, USA.

    Abstract: We present a methodology for identifying patients who have experienced Paroxysmal Atrial Fibrillation (PAF) among a given subject population. Our work is intended as an initial step towards the design of an unobtrusive portable system for concurrent detection and monitoring of chronic cardiac conditions. The methodology comprises two stages: off-line training and on-line analysis. During training the most significant features are selected using machine learning methods, without relying on a manual selection based on previous knowledge. Analysis is done in two phases: feature extraction and detection of PAF patients. Light-weight algorithms are employed in the feature extraction phase, allowing the on-line implementation of this step on wearable sensor nodes. The detection phase employs techniques borrowed from the field of failure prediction. While these algorithms have found extensive application in diverse scenarios, their application to automated cardiac analysis has not been sufficiently investigated to date. The proposed methodology is able to correctly classify 68% of the test records in the PAF Prediction Challenge database [1], performing comparably to state of the art off-line algorithms. Nonetheless, the proposed method employs embedded signal processing for the critical feature extraction step, which is executed on resource-constrained body sensor nodes. This allows for a real-time and energy-efficient implementation.

  • Daniele  Bortolotti,  Hossein Mamaghanian, Andrea Bartolini, Maryam Ashouei, Jan Stuijt, David Atienza, Pierre Vandergheynst, and Luca Benini. "Approximate Compressed Sensing: Ultra-low Power Biosignal Processing Via Aggressive Voltage Scaling on a Hybrid Memory Multi-core Processor". In Proceedings of the IEEE International Symposium on Low Power Electronics and Design (ISLPED), 2014. La Jolla, CA USA.

    Abstract: Technology scaling enables the design of low cost biosignal processing chips suited for emerging wireless body-area sensing applications. Energy consumption severely limits such applications and memories are becoming the energy bottleneck to achieve ultra-low-power operation. When aggressive voltage scaling is used, memory operation becomes unreliable due to the lack of sufficient Static Noise Margin. This paper introduces an approximate biosignal Compressed Sensing approach. We propose a digital architecture featuring a hybrid memory (6T-SRAM/SCMEM cells) designed to control perturbations on specific data structures. Combined with a statistically robust reconstruction algorithm, the system tolerates memory errors and achieves significant energy savings with low area overhead

  • V. Rajesh Pamula,  Marian Verhelst, Chris Van Hoof and Refet Firat Yazicioglu, “Computationally-Efficient Compressive Sampling for Low-Power Pulseoximeter System”, accepted for presentation at the IEEE Biomedical Circuits and Systems Conference (BioCAS 2014), 22-24 October, 2014, Lausanne, Switzerland.

    Abstract: This paper presents a computationally-efficient compressive sampling system for photoplethysmogram (PPG) signals. The approach relies on the exploration of the Discrete Cosine Transform (DCT) as sparsifying basis for reconstruction of randomly sampled signals, along with an overlapped window reconstruction algorithm which improves reconstruction accuracy of shorter windows, without sacrificing reconstruction accuracy. Simulation results demonstrate a reduction in CPU execution time by a factor of 2.4 without degradation of reconstruction accuracy compared to a traditional longer window reconstruction approach. This facilitates computationally-efficient, low-latency signal reconstruction.

  • Rachit Mohan,  Yan Long, Georges Gielen, Chris Van Hoof and Refet Firat Yazicioglu, “0.35V time-domain based Instrumentation Amplifier”, accepted for publication in the IET Electronics Letters.

    Abstract: A time-domain based amplifier concept is proposed to obtain high voltage gains with low power consumption and at an ultra-low supply voltage of 0.35V. A prototype instrumentation amplifier designed with the proposed technique in 180nm technology, consumes 210nW power and 0.1mm2 of active area.

  • S.Benatti, E.Farella, L.Benini: “Towards EMG Control Interface for Smart Garments” in adjunct proceedings of ACM international conference on Ubiquitous Computing (UBICOMP’14), Seattle, Sept.2014.

    Abstract: Wearable computing devices can greatly enhance the quality of life, helping interaction with smart environment, activity recognition and healthcare applications. Smart garments offer the opportunity to integrate sensors and electronics in unobtrusive wearable systems. The paper presents a case study of an embedded hand gesture recognition system, which uses EMG electrodes embeddable in smart clothes. We analyze the main challenges of a real-time system for pattern recognition and the results of the proposed experiment demonstrate the feasibility of a real-time system for pattern recognition, which can be integrated in smart clothes.

  • S.Benatti, B.Milosevic, F.Casamassima, P.Schonle, P.Bunjaku, S.Fateh, Q.Huang, L.Benini :”EMG-based Hand Gesture Recognition With Flexible Analog Front End” in proceedings of international IEEE conference on BIOmedical Circuit And Systems (BIOCAS 2014), Lausanne, Oct 2014.

    Abstract: Conditioning and processing of biological signals represent interesting challenges for wearable electronics in health applications. Information gathering from these signals requires complex hardware circuitry and dedicated computation resources. The design of innovative analog front-end integrated circuits, combined with efficient signal processing algorithms, allows the development of platforms for monitoring, activity and gesture recognition based on embedded real-time systems. This paper describes an Electromyography pattern recognition system based on the combination of low cost passive sensors, an innovative analog front-end and a low power microcontroller. The performance of the proposed system matches state-of-the-art high-end active sensors, opening the way to the development of affordable and accurate wearable devices.

  • Danilo Porcarelli, Irene Donati, Jetmir Nehani, Davide Brunelli, Michele Magno, Luca Benini, “DESIGN AND IMPLEMENTATION OF A MULTI SENSORS SELF SUSTAINABLE WEARABLE DEVICE” in proceedings of the European Embedded Design in Education and Research Conference (EDERC 2014)

    Abstract: Wearable electronics is increasingly attracting researchers and manufacturers for the application opportunities it opens. Chip makers are responding to the hot wearable-computer trend with new components, microcontrollers and sensors. The most critical challenge is the autonomy of the systems. Even if battery management can help in extending the lifetime, the trade-off between features given by a multi-sensory platforms and autonomy can determine if a platform will win or not in the marketplace. In this paper we present a bracelet device, which attempts to maximize the capability of sensors on board, while still keeping the energy consumption low. Aggressive power management and an accurate selection of the sensors are addressed in this paper to demonstrate the effectiveness of our design.

  • Michele Magno, Danilo Porcarelli, Davide Brunelli, Luca Benini, “InfiniTime: A Multi-sensor Energy Neutral Wearable Bracelet” to appear at the 5th International Green Computing Conference (IGCC 2014)

    Abstract: Wearable technology is gaining popularity, with people wearing everything “smart” from clothing to glasses and watches. Nowadays wearables are battery-powered and a critical issue is the limited lifetime. So most devices have to be recharged every few days or even hours and thus they miss the expectations for a truly unobtrusive user experience. This paper presents InfiniTIME, a novel sensor-rich smart bracelet powered by small photovoltaic cells, designed to achieve energy neutrality even with modest indoor light levels . Experimental characterization of the fully operational prototype demonstrates a wide range of energy optimization techniques used to achieve the neutrality target. Simulations using energy intake measurements from various deployment scenarios confirm that the InfiniTIME achieves energy neutrality with indoor lighting levels in an office for several realistic application scenarios featuring data acquisition from the on-board camera and multiple sensors ,visualization and radio connectivity.

  • Francesco Paci, Davide Brunelli, Luca Benini, “0, 1, 2, Many - A Classroom Occupancy Monitoring System For Smart Public Buildings“; In proceedings of t he Design & Architectures for Signal & Image Processing (Dasip) 2014

    Abstract: In recent years, research in IoT-enabled smart building solutions has accelerated significantly, thanks to the coming of age of wireless distributed sensing hardware, protocols and software. Moreover the computational capability of mobile and wireless platforms is now significant and complex tasks can be distributed and executed over remote low-power nodes. Data can be processed locally on-board to reduce network overhead and to increase architectural scalability. We present an heterogeneous sensor network that makes use of low power sensors and cameras to monitor a building's environmental conditions and to correlate them with the number of people inside the monitored environment. Occupancy monitoring is a critical missing link in smart buildings especially for rooms which may have large number of occupants (e.g. classrooms), because it heavily affects control strategies and impacts environmental conditions. Our system estimates the number of people in classrooms by using a novel people counting algorithm which runs in  real-time, even on limited resources platforms. Finally, data is collected in a IoT-cloud-based infrastructure, and used to trigger remote actions.

  • Francesco Conti, Chuck Pilkington, Andrea Marongiu, Luca Benini: "He-P2012: Architectural heterogeneity exploration on a scalable many-core platform," Application-specific Systems, Architectures and Processors (ASAP), 2014 IEEE 25th International Conference on , vol., no., pp.114,120, 18-20 June 2014; doi: 10.1109/ASAP.2014.6868645

    Architectural heterogeneity is a promising solution to overcome the utilization wall and provide Moore's Law-like performance scaling in future SoCs. However, heterogeneous architectures increase the size and complexity of the design space along several axes: granularity of the heterogeneous processors, coupling with the software cores, communication interfaces, etc. As a consequence, significant enhancements are required to tools and methodologies to explore the huge design space effectively. In this work, we provide three main contributions: first, we describe an extension to the STMicroelectronics P2012 platform to support tightly-coupled shared memory HW processing elements (HWPE), along with our changes to the P2012 simulation flow to integrate this extension. Second, we propose a novel methodology for the semi-automatic definition and instantiation of HWPEs from a C program based on a interface description language. Third, we explore several architectural variants on a set of benchmarks originally developed for the homogeneous version of P2012, achieving up to 123x speedup for the accelerated code region (~98% of the Amdahl limit for the whole application), thereby demonstrating the efficiency of tightly memory-coupled hardware acceleration.

  • Francesco Conti, Davide Rossi, Antonio Pullini, Igor Loi, Luca Benini: "Energy-Efficient Vision on the PULP Platform for Ultra-Low Power Parallel Computing", Signal Processing Systems (SiPS), 2014 IEEE International Workshop on, 20-22 October 2014 (accepted)

    Many-core architectures structured as fabrics of tightly-coupled clusters have shown promising results on embedded computer vision benchmarks, providing state-of-art performance with a reduced power budget. We propose PULP (Parallel processing Ultra-Low Power platform), an architecture built on clusters of tightly-coupled OpenRISC ISA cores, with advanced techniques for fast performance and energy scalability that exploit the capabilities of the STMicroelectronics UTB FD-SOI 28nm technology. As a use case for PULP, we show that a computationally demanding vision kernel based on Convolutional Neural Networks can be quickly and efficiently switched from a low power, low frame-rate operating point to a high frame-rate one when a detection is performed. Our results show that PULP performance can be scaled over a 1x-354x range, with a peak performance/power efficiency of 211 GOPS/W.

  • Jelicic, V.; Magno, M.; Brunelli, D.; Bilas, V.; Benini, L., „Benefits of Wake-Up Radio in Energy-Efficient Multimodal Surveillance Wireless Sensor Network" Sensors Journal, IEEE , vol.14, no.9, pp.3210,3220, Sept. 2014 doi: 10.1109/JSEN.2014.2326799 

    Scarce energy budget of battery-powered wireless sensor nodes calls for cautious power management not to compromise performance of the system. To reduce both energy consumption and delay in energy-hungry wireless sensor networks for latency-restricted surveillance scenarios, this paper proposes a multimodal two-tier architecture with wake-up radio receivers. In video surveillance applications, using information from distributed low-power pyroelectric infrared (PIR) sensors, which detect human presence limits the activity of cameras and reduces their energy consumption. PIR sensors transmit the information about the event to camera nodes using wake-up radio receivers. We show the benefits of wake-up receivers over duty cycling in terms of overcoming energy consumption versus latency tradeoff (proved with two orders of magnitude lower latency-only 9 ms). At the same time, the power consumption of the camera node, including a wake-up receiver is comparable with the one having only duty-cycled main transceiver with 1% duty cycle (about 32 mW for 25 activations per hour).

  • Giuseppe Tagliavini, Germain Haugou, Luca Benini "Optimizing Memory Bandwidth in OpenVX Graph Execution on Embedded Many-Core Accelerators" Proceedings of 2014 Conference on Design and Architectures for Signal and Image Processing (DASIP), 2014, Madrid, Spain

    Computer vision and computational photography are hot applications areas for mobile and embedded computing platforms. As a consequence,  many-core accelerators are being developed to efficiently execute highly-parallel image processing kernels. However, power and cost constraints impose hard limits on the main memory bandwidth available, and push for software optimizations which minimize the usage of large frame buffers to store the intermediate results of multi-kernel applications.  In this work we propose a set of techniques, mainly based on graph analysis and image tiling, targeted to accelerate the execution on cluster-based many-core accelerators of image processing applications expressed as standard OpenVX graphs. We have developed a run-time framework which implements these techniques using a front-end compliant to the OpenVX standard, and based on an  OpenCL extension that enables more explicit control and efficient reuse of on-chip memory and greatly reduces the recourse to off-chip memory for storing intermediate results. Experiments performed on the STHORM many-core accelerator prototype demonstrate that our approach leads to massive reductions of main memory related stall time even when the main memory bandwidth available to the accelerator is severely constrained.

  • Daniele Bortolotti, Andrea Bartolini, and Luca Benini. "An ultra-low power resilient multi-core architecture with static and dynamic tolerance to ambient temperature-induced variability." Microprocessors and Microsystems (2014).

    Near-threshold operation is today a key research area in Ultra-Low Power (ULP) computing, as it promises a major boost in energy efficiency compared to super-threshold computing and it mitigates thermal bottlenecks. Unfortunately near-threshold operation is plagued by greatly increased sensitivity to threshold voltage variations, such as those caused by ambient temperature fluctuation. In this paper we focus on a tightly-coupled ULP processor cluster architecture where a low latency, high-bandwidth processor-to-L1-memory interconnection network plays a key role. We propose an architectural scheme to tolerate ambient temperature-induced variations capable of statically (off-line) and dynamically (on-line) adapting the processor-to-L1-memory latency without compromising execution correctness. We extensively tested our solution in different scenarios and we evaluated the different design trade-offs, showing the cost, performance and reliability gain compared to state-of-the-art static solutions. The dynamic solution, thanks to its lightweight runtime overhead, outperforms the static solution and is able to reach a performance gain up to 25% in a typical use case scenario with a very low (<4%) area overhead.

  • Daniele Bortolotti, Mauro Mangia, Andrea Bartolini, Riccardo Rovatti, Gianluca Setti and Luca Benini. "Rakeness-based Compressed Sensing on Ultra-Low Power Multi-Core Biomedical Processors". To appear in Conference on Design and Architectures for Signal and Image Processing (DASIP 2014)

    Technology scaling enables today the design of ultra-low cost wireless body sensor networks for wearable biomedical monitors. The typical behaviour of such systems consists of multi-channel input biosignals acquisition, data compression and final output transmission or storage. To achieve minimal energy operation and extend battery life, several aspects must be considered, ranging from signal processing to architectural optimizations. The recently proposed Rakeness-based Compressed Sensing (CS) paradigm deploys the localization of input signal energy to further increase compression without sensible RSNR degradation. Such output size reduction allows for trading off energy from the compression stage to the transmission or storage stage. In this paper we analyze such tradeoffs considering a multi-core DSP for input biosignal computation and different technologies for either transmission or local storage. The experimental results show the effectiveness of the Rakeness approach (on average ≈ 44% more efficient than the baseline) and assess the energy gains in a technological perspective.

  • Roberto Diversi, Andrea Bartolini, Andrea Tilli, Francesco Beneventi, Luca Benini, “SCC thermal model identification via advanced bias-compensated least-squares”, in proceedings of the Design, Automation and Test in Europe (DATE) Conference, March 2013, Grenoble France (winner of the DATE 2103 Best Paper Award).

    Abstract: Compact thermal models and modeling strategies are today a cornerstone for advanced power management to counteract the emerging thermal crisis for many-core systems on-chip. System identification techniques allow to extract models directly from the target device thermal response. Unfortunately, standard Least Squares techniques cannot effectively cope with both model approximation and measurement noise typical of real systems. In this work, we present a novel distributed identification strategy capable of coping with real-life temperature sensor noise and effectively extracting a set of low-order predictive thermal models for the tiles of Intel’s Single-chip-Cloud-Computer (SCC) many-core prototype.

  • Mohammad Reza Kakoee, Igor Loi, Luca Benini, “A Shared-FPU Architecture for Ultra-low Power MPSoCs”, in proceedings of the ACM International Conference on Computing Frontiers (CF) May 2013, Ischia Italy.

    Abstract: In this work we propose a shared floating point unit (FPU) architecture for ultra-low power (ULP) system on chips operating at near threshold voltage (NTV). Since high-performance FP units (FPUs) are large and complex, but their utilization is relatively low, adding one FPU per each core in a ULP multicore is costly and power hungry. In our approach, we share a few FPUs among all the cores in the system. This increases the utilization of FPUs leading to an energy-efficient design. As a part of our approach, we propose two different FPU allocation techniques: optimal and random. Experimental results demonstrate that compared to a traditional private-FPU approach, our technique in a multicore system with 8 processors and 2 shared FPUs can increase the performance/(area*power) by 5× for applications with 10% FP operations and by 2.5× for applications with 25% FP operations.

  • Daniele Bortolotti, Andrea Bartolini, Luca Benini, “An Ambient Temperature Variation Tolerance Scheme for an Ultra-Low Power Shared-L1 Processor Cluster”, in proceedings of the Proceedings of the Euromicro Conference on Digital System Design(DSD) , Sept 2013, Santander Spain.

    Abstract: Near Threshold Operation is today a key research area in ultra-low power (ULP) computing, as it promises 10x improvement in energy efficiency compared to super-threshold operation, and it mitigates thermal bottlenecks. Unfortunately near-threshold operation is plagued by greatly increased sensitivity to threshold voltage variations, such as those caused by ambient temperature fluctuation. In this paper we focus on tightly-coupled ULP processor cluster architecture where a low latency, high-bandwidth processor-to-L1-memory interconnection network plays a key role. We propose a lightweight runtime solution to tolerate ambient temperature induced variations by dynamically adapting the processor-to-L1-memory latency without compromising execution correctness. We extensively tested our solution in different scenarios and we evaluate the different design trade-offs, showing the cost, performance reliability gain compared to state-of-the-art static solutions. Our solution is able to reach a performance gain up to 25% in a typical use case scenario with a very low (~ 4%) area overhead.

  • Daniele Bortolotti, Davide Rossi, Andrea Bartolini, Luca Benini, “A Variation Tolerant Architecture for Ultra Low Power Multi-processor Cluster”, in proceedings of the 23th International Workshop on Power and Timing Modeling, Optimization and Simulation (PATMOS) Sept 2013, Karlsruhe, Germany.

    Abstract: Process and environmental temperature variations have a detrimental effect on performance and reliability of modern embedded systems. This sensitivity to operating conditions significantly increases in ultra-low-power (ULP) devices and in all those applications that rely on reduced supply voltage to achieve energy efficiency. We propose a lightweight runtime solution to tolerate process and environmental temperature variations. The novelty of our solution is the ability to tackle both hold time and setup time sensitivity to variations by dynamically adapting latencies of the datapaths without compromising execution correctness. We extensively tested our solution evaluating the trade-offs, demon strating the cost, performance, reliability gain compared to state-of-the-art static solutions. The proposed solution is able to reach a performance gain up to 30% with a very low (~  4%) area overhead.

  • Francesco Conti, Andrea Marongiu, Luca Benini, “Synthesis-friendly techniques for tightly-coupled integration of hardware accelerators into shared-memory multi-core clusters”, in proceedings of the International Conference on Hardware/Software Codesign and System Synthesis (CODES), October 2013, Montreal, Canada.

    Abstract: Several many-core designs tackle scalability issues by lever-aging tightly-coupled clusters as building blocks, where low-latency, high-bandwidth interconnection between a small/medium number of cores and L1 memory achieves high performance/watt.
    Tight coupling of hardware accelerators into these multi-core clusters constitutes a promising approach to further improve performance/area/watt. However, accelerators are often clocked at a lower frequency than processor clusters for energy efficiency reasons. In this paper, we propose a technique to integrate shared-memory accelerators within the tightly-coupled clusters of the STMicroelectronics STHORM architecture. Our methodology significantly relaxes timing constraints for tightly-coupled accelerators, while optimizing data bandwidth. In addition, our technique allows to operate the accelerator at an integer submultiple of the cluster frequency. Experimental results show that the proposed approach allows to recover up to 84% of the slow-down implied by reduced accelerator speed.

  • Francesco Conti, Andrea Marongiu, Chuck Pilkington, Luca Benini. He-P2012: Performance and Energy Exploration of Architecturally Heterogeneous Many-Cores, in Journal of Signal Processing Systems (under review)
    Publication (PDF, 4.3 MB)

    Abstract The end of Dennardian scaling in advanced technologies brought about new architectural templates to overcome the so-called utilization wall and provide Moore’s Law-like performance and energy scaling in embedded SoCs. One of the most promising templates, architectural heterogeneity, is hindered by high cost due to the design space explosion and the lack of effective exploration tools. Our work provides three contributions towards a scalable and effective methodology for design space exploration in embedded MC-SoCs. First, we present the He-P2012 architecture, augmenting the state-of-art STMicroelectronics P2012 platform with heterogeneous shared-L1 coprocessors called HW processing elements (HWPE). Second, we propose a novel methodology for the semi-automatic definition and instantiation of shared-memory HWPEs from a C source, supporting both simple and structured data types. Third, we demonstrate that the integration of HWPEs can provide significant performance and energy efficiency benefits on a set of benchmarks originally developed for the homogeneous P2012, achieving up to 123x speedup on the accelerated code region (∼98% of Amdahl’s law limit) while saving 2/3 of the energy.

  • Francesco Conti, Davide Rossi, Antonio Pullini, Igor Loi, Luca Benini. PULP: A Ultra-Low Power Parallel Accelerator for Energy-Efficient and Flexible Embedded Vision, in Journal of Signal Processing Systems (under review)

    Abstract: Novel pervasive devices such as smart surveillance cameras and autonomous micro-UAVs could greatly benefit from the availability of a computing device supporting embedded computer vision at a very low power budget. To this end, we propose PULP (Parallel processing Ultra-Low Power platform), an architecture built on clusters of tightly-coupled Open-RISC ISA cores, with advanced techniques for fast performance and energy scalability that exploit the capabilities of the STMicroelectronics UTBB FD-SOI 28nm technology. We show that PULP performance can be scaled over a 1x-354x range, with a peak theoretical energy efficiency of 211 GOPS/W. We present performance results for several demanding kernels from the image processing and vision domain, with post-layout power modeling: a motion detection application that can run at an efficiency up to 192 GOPS/W (90% of the theoretical peak); a ConvNet-based detector for smart surveillance that can be switched between 0.7 and 27fps operating modes, scaling energy consumption per frame between 1.2 and 12mJ on a 320x240 image; and FAST+Lucas-Kanade optical flow on a 128x128 image at the ultra-low energy budget of 14 μJ per frame at 60fps.

  • Baraldi, L.; Paci, F.; Serra, G.; Benini, L.; Cucchiara, R., "Gesture Recognition Using Wearable Vision Sensors to Enhance Visitors’ Museum Experiences," Sensors Journal, IEEE , vol.15, no.5, pp.2705,2714, May 2015 doi: 10.1109/JSEN.2015.2411994
    Publication (PDF, 2.8 MB)

    Abstract: We introduce a novel approach to cultural heritage experience: by means of ego-vision embedded devices we develop a system, which offers a more natural and entertaining way of accessing museum knowledge. Our method is based on distributed self-gesture and artwork recognition, and does not need fixed cameras nor radio-frequency identifications sensors. We propose the use of dense trajectories sampled around the hand region to perform self-gesture recognition, understanding the way a user naturally interacts with an artwork, and demonstrate that our approach can benefit from distributed training. We test our algorithms on publicly available data sets and we extend our experiments to both virtual and real museum scenarios, where our method shows robustness when challenged with real-world data. Furthermore, we run an extensive performance analysis on our ARM-based wearable device.

  • Benatti S., Milosevic B., Tomasini M., Farella E., Schonle P., Bunjaku P., Rovere G., Fateh S., Huang Q., Benini L., Multiple Biopotentials Acquisition System for Wearable Applications, Proceedings of the International Conference on Biomedical Electronics and Devices, Scitepress,2015, pp. 260-268.
    Publication (PDF, 1.1 MB)

    Wearable devices for monitoring vital signs such as heart-rate, respiratory rate and blood pressure are demonstrating to have an increasing role in improving quality of life and in allowing prevention for chronic cardiac diseases. However, the design of a wearable system without reference to ground potential requires multi-level strategies to remove noise caused from power lines. This paper describes a bio-potential acquisition embedded system designed with an innovative analog front-end, showing the performance in EEG and ECG applications and the comparison between different noise reduction algorithms. We demonstrate that the proposed system is able to acquire bio-potentials with a signal quality equivalent to state-of-the-art bench-top biomedical devices and can be therefore used for monitoring purpose, with the advantages of a low-cost low-power wearable devices.

  • M. Tomasini, S. Benatti, F. Casamassima, B. Milosevic, S. Fateh, E. Farella and L. Benini, Digitally Controlled Feedback for DC Offset Cancellation in a Wearable Multichannel EMG Platform, To appear in Proc.  of EMBC, 2015
    Publication (PDF, 0.7 MB)

    Abstract: Wearable systems capable to capture vital signs allow the development of advanced medical applications. One notable example is the use of surface electromyography (EMG) to gather muscle activation potentials, in principle an easy input for prosthesis control. However, the acquisition of such signals is affected by high variability and ground loop problems. Moreover, the input impedance influenced in time by motion and perspiration determines an offset, which can be orders of magnitude higher than the signal of interest. We propose a wearable device equipped with a digitally controlled Analog Front End (AFE) for biopotentials acquisition with zero-offset. The proposed AFE solution has an internal Digital to Analog Converter (DAC) used to adjust independently the reference of each channel removing any DC offset. The analog integrated circuit is coupled with a microcontroller, which periodically estimates the offset and implements a closed loop feedback on the analog part. The proposed approach was tested on EMG signals acquired from 4 subjects while performing different activities and shows that the system correctly acquires signals with no DC offset.

  • A framework for optimizing OpenVX applications performance on embedded manycore accelerators
    Proceedings of the 18th International Workshop on Software and Compilers for Embedded Systems (SCOPES'15), ACM, At Sankt Goar, Germany
    DOI: 10.1145/2764967.2776858
    Publication (PDF, 0.2 MB)

    Nowadays Computer Vision application are ubiquitous, and their presence on embedded devices is more and more widespread. Heterogeneous embedded systems featuring a clustered manycore accelerator are a very promising target to execute embedded vision algorithms, but the code optimization for these platforms is a challenging task. Moreover, designers really need support tools that are both fast and accurate. In this work we introduce ADRENALINE, an environment for development and optimization of OpenVX applications targeting manycore accelerators. ADRENALINE consists of a custom OpenVX run-time and a virtual platform, and overall it is intended to provide support to enhance performance of embedded vision applications.

  • Synergistic Architecture and Programming Model Support for Approximate Micropower Computing
    IEEE Computer Society Annual Symposium on VLSI (ISVLSI'15) To be published
    Publication (PDF, 0.4 MB)

    Abstract: Energy consumption is a major constraining factor for embedded multi-core systems. Using aggressive voltage scaling can reduce power consumption, but memory operations become unreliable. Several embedded applications exhibit inherent tolerance to computation approximation, for which indicating parts that can tolerate errors has proven a viable way to reduce energy consumption. In this work we propose an extension to OpenMP to specify what regions of code and data are tolerant to approximation. A compiler pass places data into memory regions with different reliability guarantees according to their tolerance to errors. The voltage supply level is dynamically adjusted according to tolerance policies, with the overall goal of minimizing energy in full compliance with precision constraints.

  • ADRENALINE: an OpenVX environment to optimize embedded vision applications on many-core accelerators
    IEEE 9th International Symposium on Embedded Multicore/Many-core Systems-on-Chip (MCSoC-15) ACCEPTED, to be published in IEEE proceedings
    Publication (PDF, 0.6 MB)

    Abstract:The acceleration of Computer Vision algorithms is an important enabler to support the more and more pervasive applications of the embedded vision domain. Heterogeneous systems featuring a clustered many-core accelerator are a very promising target for embedded vision workloads, but the code optimization for these platforms is a challenging task. In this work we introduce ADRENALINE1, a novel framework for fast prototyping and optimization of OpenVX applications for heterogeneous SoCs with many-core accelerators. ADRENALINE consists of an optimized OpenVX run-time system and a virtual platform, and it is intended to provide support to a wide range of end users. We highlight the benefits of this approach in different optimization contexts.

  • Jetmir Nehani, Davide Brunelli, Michele Magno, Lukas Sigrist, Luca Benini : "An Energy Neutral Wearable Camera with EPD Display", Proceedings of the 2015 workshop on Wearable Systems and Applications, Florence 2015
    Publication (PDF, 1.4 MB)

    Abstract: Wearable technologies are flooding the consumer market, and have massively entered the market of electronic con- sumers and people are now surrounded by an increasing number of ”smart” objects to wear. The main issue that limits the success of these devices is limited battery life- time. Energy-neutral operation, which does not require battery recharging or replacement (similar to automatic quartz watches) is highly desirable in this context. In this paper, we present the first energy-neutral wearable device, equipped with an ultra low power camera and an electrophoretic display (EPD) which is supplied by a solar energy harvester. The novel design includes several hardware and software optimizations to achieve energy neutrality. In particular, we implemented innovative methods for displaying gray-scale images to obtain up to 9 gray-scale levels using a black-and- white display. This reduces by 43.7% the energy consumption in comparison to the state of art. Moreover, we implemented aggressive power management for the camera acquisition which saves up to 91.4% of energy to acquire an image. Experimental results, with different scenarios, demonstrate advanced functionality and the energy neutrality of the system that can acquire and display up to 54 images per hour in indoor scenario.

  • Mamaghanian, Hossein: "Compressed Sensing: A universal Energy-efficient Compression Scheme for Biosignals on Wireless Body Sensor Nodes". 2014, Thesis: ÉCOLE POLYTECHNIQUE FÉDÉRALE DE LAUSANNE.
    Publication (PDF, 3.1 MB)

    Within this thesis, we quantify the potential of the emerging compressed sensing (CS) paradigm for low-complexity and energy-efficient Electrocardiogram (ECG) sensing and data compression for storage or transmission, considering both software and hardware aspects. This thesis is the first work to present and fully investigate the potential of CS as an ultra-low power sensing/compression technique for ECG signals.

    Our results prove the suitability of CS as an ultra-low power compression technique for limited resource WBSNs. Our results show that, indeed CS when implemented as a digital compression technique could outperform state-of-the-art ECG compression in terms of overall energy consumption. The need for fast and robust reconstruction algorithms inspired us to develop new model- based reconstruction technique to fully leverage the prior information (beyond simple sparsity) from the underlying signal, improving the compression results for both single lead and joint multi-lead ECG compression. Inspired by the promise of the CS to merge sampling and compression, and removing large part of the digital architecture. Here in this thesis, we have designed one of the first realizations of the CS-based A2I readout system, the spread spectrum random modulation pre-integrator (SRMPI). The design uses spread spectrum techniques prior to random modulation in order to produce the low rate set of digital samples. The results shows that new proposed architecture offers a compelling alternative, in particular for low power and computationally-constrained embedded systems. Building on our design, we have proposed a novel and promising design, Hybrid CS-based front-end which is tailored for signals with medium bandwidth like our target WBSN nodes.

    Finally, we overview the effects of technology scaling in the design of low-cost processing integrated circuits for CS compression in WBSNs; and advocate the use of a novel robust CS technique to successfully recover the compressed data in presence of unbounded error levels in ultra-low power memories due to aggressive voltage scaling. Moreover, this proposed technique achieves significant energy savings on WBSNs with respect to state-of-the-art designs in nano-scale technologies.

  • Venkata Rajesh Pamula, Marian Verhelst, Chris Van Hoof and Refet Firat Yazicioglu, “A novel feature extraction algorithm for on the sensor node processing of compressive sampled photoplethysmography signals”, in Proc. of IEEE Sensors Conference, Nov. 2015.

  • Venkata Rajesh Pamula, Jose Manuel Valero Sarmiento, Long Yan, Alper Bozkurt, Chris Van Hoof, Nick Van Helleputte, Refet Firat Yazicioglu and Marian Verhelst, “A 172uW Compressive Sampling Photoplethysmographic readout with embedded direct heart rate and variability extraction from compressively sampled data”, accepted for presentation at the International Solid State Circuits Conference (ISSCC), 2016.

  • Venkata Rajesh Pamula, Chris Van Hoof, Marian Verhelst and Refet Firat Yazicioglu, “A 17nA adaptive sampling controller for online data rate reduction in low power ECG systems”, submitted to IEEE Transactions of Biomedical Engineering (TBME) (Under review)