![]() |
![]() |
Publications
Primary tabs
Publications
“Reducing Cache Coherence Traffic with Hierarchical Directory Cache and NUMA-Aware Runtime Scheduling”, Proceedings of the 2016 International Conference on Parallel Architectures and Compilation - PACT '16. Haifa, Israel, pp. 275 - 286, 2016. ,
“Runtime-Guided Mitigation of Manufacturing Variability in Power-Constrained Multi-Socket NUMA Nodes”, ICS '16 Proceedings of the 2016 International Conference on Supercomputing . Istanbul, Turkey, pp. 1-12, 2016. ,
“TaskPoint: Sampled simulation of task-based programs”, Performance Analysis of Systems and Software (ISPASS), 2016 IEEE International Symposium on. Uppsala, Sweden, pp. 296–306, 2016. ,
“DReAM: An Approach to Estimate per-Task DRAM Energy in Multicore Systems”, ACM Transactions on Design Automation of Electronic Systems, vol. 22. pp. 1 - 26, 2016. ,
“Evaluation of HPC Applications’ Memory Resource Consumption via Active Measurement”, IEEE Transactions on Parallel & Distributed Systems, vol. 27. IEEE Computer Society, Los Alamitos, CA, USA, pp. 2560-2573, 2016. ,
“Sensible Energy Accounting with Abstract Metering for Multicore Systems”, ACM Transactions on Architecture and Code Optimization (TACO), vol. 12, no. 11th International Conference on High-Performance Embedded Architectures and Compilers (HiPEAC). 2016. ,
“Task Scheduling Techniques for Asymmetric Multi-core Systems”, IEEE Transactions on Parallel and Distributed Systems. pp. 1 - 1, 2016. ,
“Thread Assignment in Multicore/Multithreaded Processors: A Statistical Approach”, IEEE Transactions on Computers, vol. 65, no. 1. pp. 256-269, 2016. ,
“Coherence Protocol for Transparent Management of Scratchpad Memories in Shared Memory Manycore Architectures”, Proceedings of the 42nd International Symposium on Computer Architecture (ISCA). ACM, Portland, Oregon, pp. 720-732, 2015. ,
“Contention-based Nonminimal Adaptive Routing in High-radix Networks”, 29th IEEE International Parallel & Distributed Processing Symposium. pp. 103-112, 2015. ,
“Exploiting Asynchrony from Exact Forward Recovery for DUE in Iterative Solvers”, SC'15 - Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis. ACM, Austin, Texas, pp. 53:1–53:12, 2015. ,
“Increasing Multicore System Efficiency through Intelligent Bandwidth Shifting”, International Symposium on High- Performance Computer Architecture (HPCA). pp. 39-50, 2015. ,
“Runtime-Aware Architectures”, Euro-Par 2015: Parallel Processing, vol. 9233, no. Lecture Notes in Computer Science. pp. 16-27, 2015. ,
“Runtime-Guided Management of Scratchpad Memories in Multicore Architectures”, 2015 International Conference on Parallel Architecture and Compilation (PACT) . pp. 379-391, 2015. ,
“Throughput Unfairness in Dragonfly Networks under Realistic Traffic Patterns”, 2015 IEEE International Conference on Cluster Computing . pp. 801-808, 2015. ,
“VSR sort: A novel vectorised sorting algorithm & architecture extensions for future microprocessors”, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA) . IEEE, Burlingame, CA, USA, pp. 26-38, 2015. ,
“On-the-fly adaptive routing for dragonfly interconnection networks”, The Journal of Supercomputing, vol. 71. Springer US, pp. 1116-1142, 2015. ,
“Picos: A hardware runtime architecture support for OmpSs”, Future Generation Computer Systems, vol. 53. pp. 130-139, 2015. ,
“Adaptive and application dependent runtime guided hardware prefetcher reconfiguration on the IBM POWER7”, 6th International Workshop on Adaptive Self-tuning Computing Systems. arXiv.org, Amsterdam, Netherlands, pp. 1–6, 2015. ,
“Evaluating the Impact of OpenMP 4.0 Extensions on Relevant Parallel Workloads”, International Workshop on OpenMP (IWOMP) , no. Lecture Notes in Computer Science. LNCS, pp. 60-72, 2015. ,