Antonio Pena

- ANTONIO PENA
- Position: GROUP MANAGER
- Computer Sciences - Accelerators and Communications for High Performance Computing
- email: antonio [dot] pena [at] bsc [dot] es
- Tel: +34 934137734
Access to ORCID Profile

Primary tabs

Biography

Please see related websites and full CV in separate sections at the end of this webpage.

I am currently a Leading Researcher and Group Manager at the Barcelona Supercomputing Center (BSC), Computer Sciences Department, where I lead the "Accelerators and Communications for HPC" Group. I also hold a secondary appointment as Teaching and Research Staff at Universitat Politècnica de Catalunya. I am a Ramón y Cajal Fellow and former Marie Sklodowska-Curie Individual Fellow. I currently hold an ERC Consolidator Grant. Among others, I'm a recipient of the 2023 Agustín de Betancourt y Molina Award by the Spanish Royal Academy of Engineering, a 2017 IEEE TCHPC Award for Excellence for Early Career Researchers in High Performance Computing, and ACM/IEEE Sr. Member. I'm involved in the organization and steering committees of several conferences and workshops such as SC, IEEE Cluster, or AsHES. My research interests in the area of runtime systems and programming models for high performance computing include resource heterogeneity and communications.

I was previously at Argonne National Laboratory, Mathematics and Computer Science Division, as a Postdoctoral Appointee (2012-2015). I was driving the heterogeneous memory and accelerator computing areas of research within the Pogramming Models and Runtime Systems group led by Dr. Pavan Balaji, where I was the technical lead of the DMEM and VOCL projects. I was also part of the core MPICH R&D team.

I hold a BS + MS degree in Computer Engineering (2006), and MS and PhD degrees in Advanced Computer Systems (2010, 2013), from Universitat Jaume I of Castelló, Spain. I pursued my doctorate in Advanced Computer Systems, in a joint collaboration between the Universitat Jaume I of Castellón (Spain) and the Universitat Politècnica de València (Spain). My PhD dissertation, titled "GPU Virtualization for High Performance Clusters", was awarded with the Cum Laude distinction and more recently (Sep. 2015) with the Extraordinary Doctoral Award from the Jaume I University. This work started the rCUDA project, from which I am the original developer and architect. Later, I acted as the Development Supervisor of the project.

Research

International Conferences

D. Huber, S. Iserte, M. Schreiber, A. J. Peña, and M. Schulz, “Bridging the Gap Between Genericity and Programmability of Dynamic Resources in HPC”, in ISC High Performance, Hamburg, Germany, June 2025.
S. Iserte, I. Martín-Álvarez, K. Rojek, J. I. Aliaga, M. Castillo, and A. J. Peña, “Towards the democratization and standardization of dynamic resources with MPI spawning,” in 15th International Conference on Parallel Processing & Applied Mathematics (PPAM), Ostrava, Czech Republic, Sep., 2024. Best Paper Award.
K. Matsumura, S. García de Gonzalo, and A. J. Peña, "A symbolic emulator for shuffle synthesis on the NVIDIA PTX code", in The 32nd ACM SIGPLAN International Conference on Compiler Construction (CC), Montréal, Canada, Feb. 2023.
M. Jordà, S. Rai, E. Ayguadé, J. Labarta, and A. J. Peña, "ecoHMEM: Improving object placement methodology for hybrid memory systems in HPC", in IEEE Cluster, Germany, Sep. 2022. Best Paper Finalist (Ranked 2nd).
K. Matsumura, S. Garcia De Gonzalo, and A. J. Peña, "JACC: An OpenACC runtime framework with kernel-level and multi-GPU parallelization", in 28th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC), Bangalore, India, Dec. 2021.
L. Toledo, P. Valero-Lara, J. S. Vetter, and A. J. Peña, "Towards Enhancing Coding Productivity for GPU Programming Using Static Graphs", in 28th IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC), Bangalore, India, Dec. 2021.
N. Guidotti, P. Ceyrat, J. Barreto, J. Monteiro, R. Rodrigues, R. Fonseca, X. Martorell, and A. J. Peña, “Particle-in-cell simulation using asynchronous tasking”, in 27th International European Conference on Parallel and Distributed Computing (Euro-Par), Lisbon, Portugal, Aug. 2021.
L. Toledo, A. J. Peña, S. Catalan, and P. Valero-Lara, “Tasking in accelerators: Performance evaluation”, in The 20th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT), Gold Coast, Australia, Dec. 2019.
A. Farres, C. Rosas, M. Hanzich, M. Jordà and A.J. Peña, “Performance evaluation of fully anisotropic elastic wave propagation on NVIDIA Volta GPUs”, in 81st EAGE Conference and Exhibition. June 2019.
P. Valero-Lara, R. Sirvent, A. J. Peña, X. Martorell, and J. Labarta, “MPI+OpenMP tasking scalability for the simulation of the human brain”, in 25th European MPI Users’ Group Meeting (EuroMPI). Barcelona, Spain, Sep. 2018.
K. Sala, J. Bellon, P. Farre, X. Teruel, J. M. Perez, A. J. Peña, D. Holmes, V. Beltran, and J. Labarta, “Improving the interoperability between MPI and task-based programming models”, in 25th European MPI Users’ Group Meeting (EuroMPI). Barcelona, Spain, Sep. 2018.
P. Valero-Lara, I. Martinez-Perez, R. Sirvent, X. Martorell, and A. J. Peña, "NVIDIA GPUs scalability to solve multiple (batch) tridiagonal systems. Implementation of cuThomasBatch", in 12th International Conference on Parallel Processing and Applied Mathematics (PPAM), Lublin, Poland, Sep. 2017.
H. Servat, A. J. Peña, G. Llort, E. Mercadal, H. C. Hoppe, and J. Labarta, "Automating the application data placement in hybrid memory systems", in IEEE Cluster, Hawaii, USA, Sep. 2017.
A. Castelló, S. Seo, R. Mayo, P. Balaji, E. S. Quintana-Ortí, and A. J. Peña, "GLT: A unified API for lightweight thread libraries", in 23rd International European Conference on Parallel and Distributed Computing (Euro-Par), Santiago de Compostela, Spain, Aug. 2017.
V. Garcia-Flores, E. Ayguade, and A. J. Peña, "Efficient data sharing on heterogeneous systems", in The 46th International Conference on Parallel Processing (ICPP), Bristol, UK, Aug. 2017.
A. Castelló, S. Seo, R. Mayo, P. Balaji, E. S. Quintana-Orti, and A. J. Peña, "GLTO: On the adequacy of lightweight thread approaches for OpenMP implementations", in The 46th International Conference on Parallel Processing (ICPP), Bristol, UK, Aug. 2017.
A. J. Peña, V. Beltran, C. Clauss, and T. Moschny, "Supporting automatic recovery in offloaded distributed programming models through MPI-3 techniques", in International Conference on Supercomputing (ICS), Chicago, USA, June 2017.
P. Valero-Lara, I. Martínez-Pérez, A. J. Peña, X. Martorell, R. Sirvent, and J. Labarta, "cuHinesBatch: Solving multiple Hines systems on GPUs. Human Brain Project", in International Conference on Computational Science (ICCS), Zúrich, Switzerland, June 2017.
J. Gómez-Luna, I. El Hajj, L. Chang, V. Garcia-Flores, S. Garcia de Gonzalo, T. B. Jablin, A. J. Peña, and W. Hwu, "Chai: Collaborative heterogeneous applications for integrated-architectures", in IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), San Francisco, USA, Apr. 2017.
V. Garcia, J. Gomez-Luna, T. Grass, A. Rico, E. Ayguade, and A. J. Peña, "Evaluating the effect of last-level cache sharing on integrated GPU-CPU systems with heterogeneous applications", in IEEE International Symposium on Workload Characterization (IISWC), Rhode Island, USA, Sep. 2016.
A. Castelló, A. J. Peña, S. Seo, R. Mayo, P. Balaji, and E. S. Quintana-Orti, "A review of lightweight thread approaches for high performance computing", in IEEE Cluster, Taipei, Taiwan, Sep. 2016.
S. Ghosh, J. Hammond, A. J. Peña, P. Balaji, A. Gebremedhin, and B. Chapman, "One-sided interface for matrix operations using MPI-3 RMA: A case study with Elemental", in International Conference on Parallel Processing (ICPP), Philadelphia, PA, USA, Aug. 2016.
A. J. Peña, W. Bland, and P. Balaji, "VOCL-FT: Introducing techniques for efficient soft error coprocessor recovery", in The International Conference for High Performance Computing, Networking, Storage and Analysis (SC15), Austin, TX, USA, Nov. 2015.
A. Aji, A. J. Peña, P. Balaji, and W. Feng, "Automatic command queue scheduling for task-parallel workloads in OpenCL", in IEEE Cluster, Chicago, IL, USA, Sep. 2015.
A. Castelló, A. J. Peña, R. Mayo, P. Balaji, and E. S. Quintana-Ortí, "Exploring the suitability of remote GPGPU virtualization for the OpenACC programming model using rCUDA", in IEEE Cluster, Chicago, IL, USA, Sep. 2015.
M. Si, A. J. Peña, J. Hammond, P. Balaji, M. Takagi, and Y. Ishikawa, "Casper: An asynchronous progress model for MPI RMA on many-core architectures", in 29th IEEE International Parallel & Distributed Processing Symposium (IPDPS), Hyderabad, India, May 2015.
M. Si, A. J. Peña, J. Hammond, P. Balaji, and Y. Ishikawa, "Scaling NWChem with efficient and portable asynchronous communication in MPI RMA", in The 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), Shenzhen, Guangdong, China, May 2015. Scale Challenge Finalist.
A. J. Peña and P. Balaji, "Toward the efficient use of multiple explicitly managed memory subsystems", in IEEE Cluster, Madrid, Spain, Sep. 2014.
M. Si, A. J. Peña, P. Balaji, M. Takagi, and Y. Ishikawa, "MT-MPI: Multithreaded MPI for many-core environments", in ACM International Conference on Supercomputing (ICS), Munich, Germany, June 2014.
A. Castelló, J. Duato, R. Mayo, A. J. Peña, E. S. Quintana-Ortí, V. Roca, and F. Silla, "On the use of remote GPUs and low-power processors for the acceleration of scientific applications", in The Fourth International Conference on Smart Grids, Green Communications and IT Energy-aware Technologies (ENERGY), Chamonix, France, Apr. 2014. Best Paper.
A. J. Peña, R. G. Correa Carvalho, J. S. Dinan, P. Balaji, R. Thakur, and W. D. Gropp, “Analysis of topology-dependent MPI performance on Gemini networks”, in The Euro MPI Users’ Group Conference (EuroMPI), Madrid, Spain, Sep. 2013.
C. Reaño, F. Silla, R. Mayo, E. S. Quintana-Ortí, J. Duato, and A. J. Peña, "Influence of InfiniBand FDR on the performance of remote GPU virtualization", in IEEE Cluster, Indianapolis, IN, USA, Sep. 2013. Best Technical Paper.
A. J. Peña and S. Alam, “Evaluation of inter- and intra-node data transfer efficiencies between GPU devices and their impact on scalable applications”, in The 13th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), pp. 144-151, Delft, The Netherlands, May 2013.
C. Reaño, A. J. Peña, F. Silla, J. Duato, R. Mayo, and E. S. Quintana-Ortí, “CU2rCU: towards the complete rCUDA remote GPU virtualization and sharing solution”, in Proceedings of the International Conference on High Performance Computing (HiPC), Pune, India, Dec. 2012.
S. Alam, J. Poznanovic, U. Varetto, N. Bianchi, A. J. Peña, N. Suvanphim, “Early experiences with the Cray XK6 hybrid CPU and GPU MPP platform”, in Cray User Group Conference (CUG), Stuttgart, Germany, Apr. 2012.
J. Duato, A. J. Peña, F. Silla, J. C. Fernández, R. Mayo, and E. S. Quintana, “Enabling CUDA acceleration within virtual machines using rCUDA”, in High Performance Computing Conference (HiPC), Bangalore, India, Dec. 2011.
J. Duato, R. Mayo, A. J. Peña, E. S. Quintana-Ortí, and F. Silla, “Performance of CUDA virtualized remote GPUs in high performance clusters”, in International Conference on Parallel Processing (ICPP), pp. 365-374, Taipei, Taiwan, Sep. 2011.

International Journals

A. Tarraf, M. Schreiber, A. Cascajo, J. B. Besnard, M. A. Vef, D. Huber, S. Happ, A. Brinkmann, D. E. Singh, H. C. Hoppe, A. Miranda, A. J. Peña, R. Machado, M. Garcia-Gasulla, M. Schulz, P. Carpenter, S. Pickartz, T. Rotaru, S. Iserte, V. Lopez, J. Ejarque, H. Sirwani, J. Carretero, F. Wolf, "Malleability in modern HPC systems: Current experiences, challenges, and future opportunities", Transactions on Parallel and Distributed Systems (TPDS), IEEE, vol. 35, no. 6, May 2024.
G. Lloret-Talavera, M. Jordà, H. Servat, F. Boemer, C. Chauhan, S. Tomishima, N. N. Shah, and A. J. Peña, “Enabling homomorphically encrypted inference for large DNN models”, Transactions on Computers (TC), IEEE, vol. 71, no. 5, May 2022.
L. Toledo, P. Valero-Lara, J. S. Vetter, and A. J. Peña, "Towards Enhancing Coding Productivity for GPU Programming Using Static Graphs", Electronics, MDPI, vol. 11, no. 9, Apr. 2022.
M. Jordà, P. Valero-Lara, and A. J. Peña, “cuConv: A CUDA implementation of convolution for CNN inference”, Cluster Computing, Springer, Jan. 2022.
S. Iserte, R. Mayo, E. S. Quintana-Orti, and A. J. Peña, "DMRlib: Easy-coding and efficient resource management for job malleability", Transactions on Computers, IEEE, vol. 70, no. 9, Sep. 2021.
A. Castelló, R. Mayo, S. Seo, P. Balaji, E. S. Quintana-Ortí, and A. J. Peña, "Analysis of threading libraries for high performance computing", Transactions on Computers, IEEE, vol. 69, no. 9, Sep. 2020.
A. J. Peña and M. Si, "Guest editorial: Special issue on applications and system software for hybrid exascale systems", Parallel Computing, Elsevier, vol. 91, Mar. 2020.
S. Iserte, H. Martínez, S. Barrachina, M. Castillo, R. Mayo, and A. J. Peña, "Dynamic reconfiguration of noniterative scientific applications: A case study with HPG Aligner", The International Journal of High Performance Computing Applications (IJHPCA), SAGE, vol. 33, no. 5, pp. 804-816, Sep. 2019.
K. Sala, X. Teruel, J. M. Perez, A. J. Peña, V. Beltran, and J. Labarta, “Integrating blocking and non-blocking MPI primitives with task-based programming models”, Parallel Computing, Elsevier, vol. 85, pp. 153-166, Jul. 2019.
M. Jorda, P. Valero-Lara, and A. J. Peña, "Performance evaluation of cuDNN convolution algorithms on NVIDIA Volta GPUs", Access, IEEE, vol. 7, pp. 70461-70473, May 2019.
P. Valero-Lara, R. Sirvent, A. J. Peña, and J. Labarta, “MPI+OpenMP tasking scalability for multi-morphology simulations of the human brain”, Parallel Computing, Elsevier, vol. 84, pp. 50-61, May 2019.
P. Valero-Lara, I. Martínez-Pérez, R. Sirvent, X. Martorell, and A. J. Peña, “cuThomasBatch and cuThomasVBatch CUDA routines to compute batch of tridiagonal systems on NVIDIA GPUs”, Concurrency and Computation: Practice and Experience, Wiley, vol. 30, no. 24, Dec. 2018.
P. Valero-Lara, I. Martínez-Pérez, R. Sirvent, A. J. Peña, X. Martorell, and J. Labarta, "Simulating the behavior of the human brain on GPUs", Oil & Gas Science and Technology - Revue d'IFP Energies Nouvelles, vol. 73, no. 63, Nov. 2018.
A. Castelló, A. J. Peña, R. Mayo, J. Planas, E. S. Quintana-Ortí, and P. Balaji, "Exploring the interoperability of remote GPGPU virtualization using rCUDA and directive-based programming models", Journal of Supercomputing, Springer, Nov. 2018, vol. 74, no. 11, pp. 5628–5642.
S. Iserte, R. Mayo, E. S. Quintana-Orti, V. Beltran, and A. J. Peña, "DMR API: Improving cluster productivity by turning applications into malleable", Parallel Computing, Elsevier, vol. 78, pp. 54-66, Oct. 2018.
H. Servat, J. Labarta, H. C. Hoppe, J. Giménez, and A. J. Peña, "Understanding memory access patterns using the BSC performance tools", Parallel Computing, Elsevier, vol. 78, pp. 1-14, Oct. 2018.
M. Si, A. J. Peña, J. Hammond, P. Balaji, M. Takagi, and Y. Ishikawa, “Dynamic adaptable asynchronous progress model for MPI RMA multiphase applications”, Transactions on Parallel and Distributed Systems (TPDS), IEEE Computer Society, vol. 29, no. 9, pp. 1975-1989, Sep. 2018.
S. Chandrasekaran and A. J. Peña, "Special issue on applications for the heterogeneous computing era 2017", Parallel Computing, Elsevier, vol. 77, pp. 125-127, Sep. 2018. Editorial.
A. Castelló, R. Mayo, K. Sala, V. Beltran, P. Balaji, and A. J. Peña, "On the adequacy of lightweight thread approaches for high-level parallel programming models", Future Generation Computer Systems, Elsevier, vol. 84, July 2018.
S. Chandrasekaran and A. J. Peña, "Special issue on topics on heterogeneous computing", Parallel Computing, Elsevier, vol. 68, pp. 1-2, Oct. 2017. Editorial.
A. M. Aji, A. J. Peña, P. Balaji, and W. Feng, "MultiCL: Enabling automatic scheduling for task-parallel workloads in OpenCL", Parallel Computing, Elsevier, vol. 58, pp. 37-55, Oct. 2016.
A. J. Peña and P. Balaji, "A data-oriented profiler to assist in data partitioning and distribution for heterogeneous memory in HPC", Parallel Computing, Elsevier, vol. 51, pp. 46-55, Jan. 2016.
C. Reaño, F. Silla, A. Castelló, A. J. Peña, R. Mayo, E. S. Quintana-Ortí, and J. Duato, "Improving the user experience of the rCUDA remote GPU virtualization framework", Concurrency and Computation: Practice and Experience, Wiley, vol. 27, no. 14, pp. 3749-3770, Sep. 2015.
A. J. Peña, C. Reaño, F. Silla, R. Mayo, E. S. Quintana-Ortí, and J. Duato, "A complete and efficient CUDA-sharing solution for HPC clusters", Parallel Computing, Elsevier, vol. 40, no. 10, pp. 574-588, Dec. 2014.

International Workshops

K. Matsumura, S. Garcia de Gonzalo, and A. J. Peña, "ACC Saturator: Automatic kernel optimization for directive-based GPU code", in Eleventh Workshop on Accelerator Programming and Directives (WACCPD), Atlanta, GA, USA, Nov. 2024.
S. Iserte, V. Lopez, M. Garcia-Gasulla, and A. J. Peña, “Parallel efficiency-aware standard MPI-based malleability,” in Euro-par Workshops Proceedings, Madrid, Spain, Aug. 2024.
M. Usman, S. Iserte, R. Ferrer, and A. J. Peña, “DPU offloading programming with the OpenMP API”, in The Ninth Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC), Denver, CO, USA, Nov. 2023.
O. Korakitis, S. García de Gonzalo, N. Guidotti, J. Barreto, J. Monteiro, and A. J. Peña, "OmpSs-2 and OpenACC interoperation", in Ninth Workshop on Accelerator Programming Using Directives (WACCPD), Dallas, TX, USA, Nov. 2022.
S. Rivas-Gomez, A. J. Peña, D. Moloney, E. Laure, and S. Markidis, “Exploring the Vision Processing Unit as co-processor for inference”, in The Eighth International Workshop on Accelerators and Hybrid Exascale Systems (AsHES), Vancouver, Canada, May 2018.
M. Jordà, P. Valero-Lara, and A. J. Peña, "Convolutional Deep Learning (cuDNN) on NVIDIA GPUs", in Workshop on Optimization and Learning: Challenges and Applications (OLA), Alicante, Spain, Feb. 2018.
S. Iserte, R. Mayo, E. S. Quintana-Orti, V. Beltran, and A. J. Peña, "Efficient scalable computing through flexible applications and adaptive workloads", in Tenth International Workshop on Parallel Programming Models and Systems Software for High-End Computing (P2S2), Bristol, UK, Aug. 2017.
H. Servat, J. Labarta, H. Hoppe, J. Gimenez, and A. J. Peña, "Integrating memory perspective into the BSC performance tools", in Tenth International Workshop on Parallel Programming Models and Systems Software for High-End Computing (P2S2), Bristol, UK, Aug. 2017.
S. Iserte, A. J. Peña, R. Mayo, E. S. Quintana-Ortí, and V. Beltrán, "Dynamic management of resource allocation for OmpSs jobs", in PhD Symposium on Sustainable Ultrascale Computing Systems (NESUS PhD), Timisoara, Romania, Feb. 2016.
A. J. Peña and P. Balaji. "A framework for tracking memory accesses in scientific applications", in 43nd International Conference on Parallel Processing Workshops (ICPP-W), Minneapolis, MN, USA, Sep. 2014.
J. Duato, A. J. Peña, F. Silla, R. Mayo, and E. S. Quintana-Ortí, “rCUDA: reducing the number of GPU-based accelerators in high performance clusters”, in Proceedings of the International Conference on High Performance Computing and Simulation (HPCS), Caen, France, June 2010.
J. Duato, F. D. Igual, R. Mayo, A. J. Peña, E. S. Quintana-Ortí, and F. Silla, “An efficient implementation of GPU virtualization in high performance clusters”, in Euro-Par 2009, Parallel Processing – Workshops, 6043, pp. 385-394, Lecture Notes in Computer Science, Springer-Verlag, 2010.
J. Duato, A. J. Peña, F. Silla, R. Mayo, and E. S. Quintana-Ortí, “Modeling the CUDA remoting virtualization behaviour in high performance networks”, in Workshop on Language, Compiler, and Architecture Support for GPGPU (LCA-GPGPU-I), Bangalore, India, Jan. 2010.
M. F. Dolz, J. C. Fernández, E. S. Quintana-Ortí, R. Mayo, and A. J. Peña, “Research line on power-aware computing by the High Performance Computing and Architectures Group”, in COST Action IC0804 on Energy Efficiency in Large Scale Distributed Systems, pp. 32-36, Tolouse, France, Nov. 2009.

International Posters

P. F. Dutot, J. Fecht, K. Gaddameedi, D. Huber, S. Iserte, M. Minion, M. Schulz, M. Schreiber, V. Schüller, A. J. Peña, and O. Richard, “A Layered Approach for Dynamic Resource Management in HPC”, in 30th International European Conference on Parallel and Distributed Computing (Euro-Par), Madrid, Spain, Aug. 2024
P. F. Dutot, J. Fecht, K. Gaddameedi, D. Huber, S. Iserte, M. Minion, M. Schultz, M. Schreiber, V. Schüller, A. J. Peña, O. Richard, “Leveraging dynamic resource management in HPC”, in ISC High Performance, Hamburg, Germany, May 2024.
M. Usman, S. Iserte, R. Ferrer, and A. J. Peña, “OpenMP offloading to DPU”, in IEEE Cluster, Santa Fe, NM, Oct. 2023.
O. Korakitis, S. Garcia de Gonzalo, N. Guidotti, J. Barreto, J. Monteiro, and A. J. Peña, "Towards OmpSs-2 and OpenACC interoperation", in Principles and Practice of Parallel Programming (PPoPP), Seoul, Korea, Apr. 2022.
A. J. Peña, "EPEEC: Productivity at exascale", in ISC High Performance, Frankfurt, Germany, June 2019.
S. Iserte, A. J. Peña, R. Mayo, “Productivity-enhancing malleability for HPC applications”, in The 27th Int. Conf. on Parallel Architectures and Compilation Techniques (PACT), Cyprus, Nov. 2018.
K. Kim, A. J. Peña, P. Carpenter, P. Petrakis, M. Ploumidis, M. Marazakis, Y. Guo, K. Raffenetti, and P. Balaji, “Toward developing a Unimem OFI provider for MPI support”, in 25^th European MPI Users' Group Meeting (EuroMPI), Barcelona, Spain, Sep. 2018.
S. Iserte. H. Martinez, S. Barrachina, M. Castillo, R. Mayo, E. S. Quintana-Orti, and A. J. Peña, “MPI malleability integration into a bioinformatics tool”, in 25^th European MPI Users' Group Meeting (EuroMPI), Barcelona, Spain, Sep. 2018.
P. Valero-Lara, I. Martinez-Perez, A. J. Peña, X. Martorell, R. Sirvent, and J. Labarta, "Simulating the behavior of the human brain on NVIDIA GPUs (Human Brain Project)", in GPU Technology Conference (GTC), Silicon Valley, USA, May 2017.
V. García, J. Gómez-Luna, T. Grass, A. Rico, A. J. Peña, and E. Ayguadé, "Analyzing the effect of last level cache sharing on integrated platforms with fine-grain CPU-GPU collaboration", in GPU Technology Conference Euope (GTC Europe), Amsterdam, Netherlands, Sep. 2016.
A. Castelló, A. J. Peña, S. Seo, R. Mayo, P. Balaji, and E. S. Quintana-Ortí, “On the use of lightweight threads”, in Advanced Computer Architectures and Compilation for Embedded Systems (ACACES), pp. 83-86, HiPEAC Network of Excellence, Fuggi, Italy, July 2016.
A. J. Peña and P. Balaji, "Understanding data access patterns using object-differentiated memory profiling", in The 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), Shenzhen, Guangdong, China, May 2015.
K. Raffenetti, A. J. Peña, and P. Balaji, "Toward implementing robust support for Portals 4 networks in MPICH", in The 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), Shenzhen, Guangdong, China, May 2015.
C. Reaño, F. Silla, A. J. Peña, G. Shainer, S. Schultz, A. Castelló, E. S. Quintana-Ortí, and J. Duato, "Boosting the performance of remote GPU virtualization using InfiniBand Connect-IB and PCIe 3.0", in IEEE Cluster, Madrid, Spain, Sep. 2014.
J. Duato, A. J. Peña, F. Silla, R. Mayo, and E. S. Quintana-Ortí, “rCUDA InfiniBand performance”, in International Supercomputing Conference (ISC), Hamburg, Germany, June 2011.
J. Duato, R. Mayo, A. J. Peña, E. S. Quintana-Ortí, and F. Silla. “Network influence on rCUDA”, in Advanced Computer Architectures and Compilation for Embedded Systems (ACACES), pp. 9-12, HiPEAC Network of Excellence, Terrassa (Barcelona), Spain, July 2010.
J. Duato, F. D. Igual, R. Mayo, A. J. Peña, E. S. Quintana-Ortí, and F. Silla, “Virtualized remote GPUs”, in Advanced Computer Architectures and Compilation for Embedded Systems (ACACES), pp. 221-224, HiPEAC Network of Excellence, Terrassa (Barcelona), Spain, July 2009.
A. J. Peña, and J. Fabregat, “A robust bolid and fireball detection algorithm for all-sky sequential images”, in Meteoroids, Barcelona, Spain, June 2007.

International Oral Communications

A. J. Peña, L. Martens, P. Mehta, Z. Pindado, and T. Spendlhofer, “Private deep neural network inference engines with homomorphic encryption”, in 5th Workshop on Artificial Intelligence and Cryptography (AICRYPT), Madrid, Spain, May 2025.
A. J. Peña, “ecoHMEM: Automatic data-placement in heterogeneous memory systems”, in 16th Joint Laboratory for Exascale Computing (JLESC) Workshop, Kobe, Japan, Apr. 2024.
A. J. Peña, “HomE: Enabling homomorphic encryption of DL, a (recently started) ERC Consolidator Grant”, in 15th Joint Laboratory for Exascale Computing (JLESC) Workshop, Bordeaux, France, Mar. 2023.
H. Elshazly and A. J. Peña, “Seamless heterogeneous memory management via the ecoHMEM methodology”, in 15th Joint Laboratory for Exascale Computing (JLESC) Workshop, Bordeaux, France, Mar. 2023.
A. J. Peña and S. Iserte, “Dynamic resources in MPI”, in 15th Joint Laboratory for Exascale Computing (JLESC) Workshop, Bordeaux, France, Mar. 2023.
M. Usman, A. J. Peña and S. Iserte, “DPU offloading with OpenMP programming model”, in 15th Joint Laboratory for Exascale Computing (JLESC) Workshop, Bordeaux, France, Mar. 2023.
A. J. Peña, "EPEEC: Productivity at exascale", in "Communications of the ACM Europe Region Special Section Virtual Workshop", Virtual, Aug. 2021.
K. Matsumura, S. Garcia de Gonzalo, and A. J. Peña, “Wrapping up existing OpenACC compilers for runtime extension”, in PhD Forum, ISC High Performance, Virtual, June 2021.
G. Lloret-Talavera, M. Jordà, H. Servat, F. Boemer, C. Chauhan, S. Tomishima, N. N. Shah, and A. J. Peña (BSC), "Optane PMem as an enabler for large DNN models with homomorphic encryption", in 2nd Workshop on Heterogeneous Memory Systems (HMEM), Virtual, June 2021.
L. Toledo, P. Valero-Lara, and A. J. Peña, “Accelerating machine learning applications using CUDA Graph and OpenACC”, in GPU Technology Conference, Virtual. Apr. 2021.
A. J. Peña, “Results and lessons learned by EPEEC”, in PROHEXA: Programming environments and models for improved productivity for heterogeneous Exascale Computing Systems Workshop, Virtual. Jan. 2021.
A. J. Peña, “Overview of EPEEC: European joint Effort toward a Highly Productive Programming Environment for Heterogeneous Exascale Computing”, in PROHEXA: Programming environments and models for improved productivity for heterogeneous Exascale Computing Systems Workshop, Virtual. Jan. 2021.
G. Lloret-Talavera, M. Jorda, H. Servat, F. Boemer, C. Chauhan, S. Tomishima, N. N. Shah, and A. J. Peña, “Optane PMem as an enabler for large DNN models with homomorphic encryption”, in Intel Extreme Performance Users Group (IXPUG) Annual Conference, Virtual. Oct. 2020.
A. J. Peña, "A software ecosystem to save money in DRAM and increase performance with Optane DIMMs", in Intel HPC + AI Pavilion, online, June 2019.
M. Jordà, H. Servat, J. Labarta, and A. J. Peña, "Easing the use of Optane DIMMs as part of heterogeneous memory systems", in Intel HPC Developer Conference, Denver, CO, USA, Nov. 2019.
M. Jordà, H. Servat, and A. J. Peña, “Toward easing the use of Optane DIMMs as part of heterogeneous memory systems”, in Intel Extreme Performance Users Group (IXPUG) Annual Conference, Geneva, Switzerland. Sep. 2019.
M. Jorda, P. Valero-Lara, and A. J. Peña, “Improved Convolution Implementations on NVIDIA GPUs”, in 9th Joint Laboratory for Extreme-Scale Computing Workshop (JLESC), Knoxville, TN, USA, Apr. 2019.
P. Valero-Lara and A. J. Peña, “Filling the performance gap in convolution implementations for NVIDIA GPUs”, in GPU Technology Conference (GTC), Silicon Valley, USA. Mar. 2019.
P. Valero-Lara, I. Martínez-Pérez, A. J. Peña, X. Martorell, R. Sirvent, and J. Labarta. "cuHinesBatch: Solving multiple Hines systems on GPUs", in 2nd HBP Student Conference (HBPSC), Ljubljana, Slovenia, Feb. 2018.
A. J. Peña, “Collaboration opportunities with the Accelerators and Communications for HPC Team at BSC”, in 8th Joint Laboratory for Extreme-Scale Computing Workshop (JLESC), Barcelona, Spain, Apr. 2018.
A. J. Peña et. al., “Use of the Folding profiler to assist on data distribution for heterogeneous memory systems”, in 8th Joint Laboratory for Extreme-Scale Computing Workshop (JLESC), Barcelona, Spain, Apr. 2018.
A. J. Peña, “MultiGPU made easy by OmpSs + CUDA/OpenACC”, in GPU Technology Conference (GTC), Silicon Valley, USA. Mar. 2018.
A. J. Peña and F. Mantovani, “Automatic frequency scaling for embedded coprocessor acceleration”, in Workshop on Heterogeneous and Low-Power Data Center technologies (HeLP-DC), Manchester, UK, Jan. 2018.
A. J. Peña, H. Servat, et. al, "Use of the Folding profiler to assist on data distribution for heterogeneous memory systems", in 7th Joint Laboratory for Extreme-Scale Computing Workshop (JLESC), Urbana, USA, June 2017.
A. J. Peña, H. Servat, G. Llort, J. Giménez, J. Labarta, “Use of the Folding profiler to assist on data distribution for heterogeneous memory systems”, in 6th Joint Laboratory for Extreme-Scale Computing Workshop (JLESC), Kobe, Japan, Dec. 2016.
A. J. Peña and L. Oden, "Data distribution approaches for heterogeneous memory systems", in 5th Joint Laboratory for Extreme-Scale Computing Workshop (JLESC), Lyon, France, June 2016.
A. J. Peña and H. Servat, "Data placement on heterogeneous memory systems in HPC", in 4th Joint Laboratory for Extreme-Scale Computing Workshop (JLESC), Bonn, Germany, Nov. 2015.
H. Servat and A. J. Peña, "Study the use of the Folding hardware-based profiler to assist on data distribution for heterogeneous memory systems in HPC", in 3rd Joint Laboratory for Extreme-Scale Computing Workshop (JLESC), Barcelona, Spain, June 2015.
A. J. Peña and P. Balaji, "The upcoming era of memory heterogeneity in compute nodes", in 2nd Joint Laboratory for Extreme-Scale Computing Workshop (JLESC), Chicago, IL, Nov. 2014.
A. J. Peña, “Virtualization of accelerators in high performance clusters”, in The International Conference for High Performance Computing, Networking, Storage and Analysis (SC), Dissertation Research Showcase, Salt Lake City, UT, Nov. 2012.

Keynotes

“Toward Extreme Heterogeneity… but how?”. The 1st International Workshop on Extreme Heterogeneity Solutions (ExHET), Seoul, South Korea, Apr. 2022.
"Using heterogeneous memory systems", in Córdoba HiPerNav Week, Córdoba, Spain, Sep. 2018.
“The heterogeneity challenge”, in Heterogeneity Alliance: Better Together, HiPEAC 2018 Conference, Manchester, UK, Jan. 2018.
"The nightmare and power of heterogeneity in HPC", in Tenth International Workshop on Parallel Programming Models and Systems Software for High-End Computing (P2S2), Bristol, UK, Aug. 2017.

Panels

"Runtimes and Workflow Systems for Extreme Heterogeneity: Challenges and Opportunities”, in The International Conference for High Performance Computing, Networking, Storage and Analysis (SC23). Denver, CO, USA, Nov. 2023.
"Memory Heterogeneity in High Performance Computing", in The International Conference for High Performance Computing, Networking, Storage, and Analysis (SC22), Dallas, TX, USA, Nov. 2022.
“Personal Experience, Tips & Tricks”, in Consolidator Grants 2022 – coaching session, Agency for Management of University and Research Grants (AGAUR), Generalitat de Catalunya, Sep. 2022.
"PROCESS Panel", in Workshop on Platform-driven e-infrastructure innovations, Amsterdam, The Netherlands, Oct. 2018.

Birds of a Feather

"Latest MPICH works at BSC (To be BSC-MPI)", in MPICH: A high-performance open-source MPI implementation. The International Confeerence for High Performance Computing, Networking, Storage and Analytics (SC19), Denver, CO, USA, Nov. 2019.
“A novel tool for analysing benefits of data placement in multi-level memory hierarchies”, in Multi-Level Memory and Storage for HPC, Data Analytics & AI. ISC High Performance, Frankfurt, Germany, June 2019.

Invited Talks in Institutions

“Highly Productive Programming Environment for Heterogeneous Exascale Computing (EPEEC)”. Science and Technology Facilities Council (STFC), UK Research and Innovation (UKRI). Feb. 2022.
"Supercomputación: qué es, para qué sirve y cómo la avanzamos desde la investigación en el BSC-CNS", Universidade da Coruña, Spain, Apr. 2019.
"Programming models and heterogeneity in HPC", Foundation for Research and Technology - Hellas (FORTH), Heraklion, Greece, Nov. 2017.
"Virtualization of accelerators in high performance clusters", Argonne National Laboratory, Argonne, IL, USA, Oct. 2012.

Invited Talks in Conferences, Workshops, and Symposia

A. J. Peña. “Two use cases for large CXL memories at BSC: ecoHMEM and HomE”, in CXL Forum at ISC 2023, Hamburg, Germany, May 2023.
A. J. Peña. “Update on BSC's MareNorstrum5 and OpenMP efforts for DPUs”, in 13th Annual Swiss Conference & HPCXXL User Group, Lugano, Switzerland, Apr. 2023.
A. J. Peña. “ecoHMEM: Improving Object Placement Methodology for Hybrid Memory Systems in HPC”, in Huawei Annual Compute Architecture Innovation Summit, Israel, Nov. 2022.
A. J. Peña. "EPEEC: The European Joint Effort Towards Highly Productive Programming for Heterogeneous HPC", in Targeting Future Exascale and Extreme Heterogeneity Era Minisymposium, SIAM Conference on Parallel Processing for Scientific Computing (PP22), Seattle, WA, US, Feb. 2022.
A. J. Peña. "EPEEC: Europe toward high coding productivity for Exascale”, in 27^th International European Conference on Parallel and Distributed Computing (Euro-Par), Online, Sep. 2021.
A. J. Peña. "HPC & resource heterogeneity”, in Convergence of Big-Data Analytics, Cloud, and High-performance Computing with EVOLVE, HiPEAC Computer Systems Week Autumn 2019, Bilbao, Spain, Oct. 2019.
A. J. Peña. “The H2020 EPEEC Vision to Programming Productivity at Exascale”, in A Structured Approach for Programming Extreme Scale and Heterogeneous Systems Workshop, Zurich, Switzerland, Sep. 2019.
A. J. Peña. “EPEEC: Productivity at Exascale”, in Workshop on Platform-driven e-infrastructure innovations, Amsterdam, The Netherlands, Oct. 2018.
A. J. Peña, "Best GPU code practices combining OpenACC, CUDA, and OmpSs", in Workshop on Open Source Supercomputing (OpenSuCo), Frankfurt, Germany, June 2017.
A. J. Peña, “Toward heterogeneous memory systems for HPC", in Enhancing Software Development for Emerging Platforms Using Algorithms and Performance Tools Minisymposium, SIAM CSE, Salt Lake City, UT, USA, Mar. 2015.
A. J. Peña and R. Mayo, “rCUDA 4: GPGPU as a service in HPC clusters”, in HPC Advisory Council Spain Conference, Málaga, Spain, Sep. 2012.
F. Silla and A. J. Peña, “rCUDA, an approach to provide remote access to GPU computational power”, in HPC Advisory Council Switzerland Conference, Lugano, Switzerland, Mar. 2012.

Invited Talks in Exhibition Booths

"EPEEC: European joint Effort toward a Highly Productive Programming Environment for Heterogeneous Exascale Computing", in PRACE Booth at ISC 2021 Digital, June 2021.
“Breaking the DRAM size wall for DNN inference and homomorphic encryption”, in Intel Vendor Use-Case, ISC High Performance, Virtual, June 2021.
"A tool for best use of 3D XPoint technology for HPC and AI compelling workloads", in Intel Booth, ISC High Performance, Frankfurt, Germany, June 2019.

Spanish Conferences

S. Iserte, V. Lopez, M. Garcia-Gasulla, and A. J. Peña, ''Maleabilidad MPI basada en la eficiencia paralela'', in XXXIII Jornadas SARTECO, Ciudad Real (Spain), Sep. 2023.
A. Castelló, R. Mayo, S. Seo, P. Balaji, E. S. Quintana-Ortí, and A. J. Peña, “GLTO: Una implementación de OpenMP sobre hilos ligeros”, in XXIX Jornadas de Paralelismo, Teruel, Spain, Sep. 2018.
S. Iserte, R. Mayo, E. S. Quintana-Orti, V. Beltran, and A. J. Peña, "El camino desde la maleabilidad MPI hasta las cargas de trabajo adaptativas", in XXVIII Jornadas de Paralelismo. Malaga, Spain, Sep. 2017.
A. Castelló, J. Duato, R. Mayo, A. J. Peña, E. S. Quintana-Ortí, V. Roca, and F. Silla, "Acelerando aplicaciones científicas con GPUs remotas y procesadores de bajo consumo", in XXV Jornadas de Paralelismo. Valladolid, Spain, Sep. 2014.
S. Iserte, A. Castelló, A. J. Peña, C. Reaño, J. Prades, F. Silla, R. Mayo, E. S. Quintana-Ortí, and J. Duato, "Extendiendo SLURM con soporte para el uso de GPUs remotas", in XXV Jornadas de Paralelismo. Valladolid, Spain, Sep. 2014.
C. Reaño, A. Castelló, S. Iserte, A. J. Peña, F. Silla, R. Mayo, E. S. Quintana-Ortí, and J. Duato, “Virtualización remota de GPUs: evaluación de soluciones disponibles para CUDA”, in XXIV Jornadas de Paralelismo. Madrid, Spain, Sep. 2013.
S. Iserte, A. Castelló, C. Reaño, A. J. Peña, F. Silla, R. Mayo, E. S. Quintana-Ortí, and J. Duato, “Un planificador de GPUs remotas en clusters HPC”, in XXIV Jornadas de Paralelismo. Madrid, Spain, Sep. 2013.
C. Reaño, A. J. Peña, F. Silla, J. Duato, R. Mayo, and E. S. Quintana-Ortí, “CU2rCU: a CUDA-to-rCUDA converter”, in XXIII Jornadas de Paralelismo, pp. 44-49. Elche, Spain, Sep. 2012.
J. Duato, A. J. Peña, F. Silla, J. C. Fernández, R. Mayo, and E. S. Quintana-Ortí, “A new approach to rCUDA”, XXII Jornadas de Paralelismo, pp. 305-310, La Laguna, Spain, Sep. 2011.
C. Reaño, A. J. Peña, F. Silla, R. Mayo, E. S. Quintana-Ortí, and J. Duato, “rCUDA: Uso concurrente de dispositivos compatibles con CUDA de forma remota. Adaptación a CUDA 4”, in XXII Jornadas de Paralelismo, pp. 311-316, La Laguna, Spain, Sep. 2011.
J. Duato, A. J. Peña, F. Silla, R. Mayo, and E. S. Quintana-Ortí, “rCUDA: a framework to perform remote CUDA calls”, in XXI Jornadas de Paralelismo, pp. 519-526, Valencia, Spain, Sep. 2010.
J. Duato, F. D. Igual, R. Mayo, A. J. Peña, E. S. Quintana-Ortí, and F. Silla, “CUDA remoto para clusters de altas prestaciones”, in II Workshop en Aplicaciones de Nuevas Arquitecturas de Consumo y Altas Prestaciones (ANACAP), Móstoles (Madrid), Spain, Nov. 2009.
J. Duato, A. J. Peña, F. Silla, F. D. Igual, R. Mayo, and E. S. Quintana-Ortí, “Accelerating computing through virtualized remote GPUs”, in XX Jornadas de Paralelismo, pp. 635-639, A Coruña, Spain, Sep. 2009.
A. J. Peña, J. M. Claver, A. Sanjuan, and V. Arnau, “Análisis paralelo de secuencias de ADN mediante el uso de GPU y CUDA”, in Workshop de Aplicaciones de Nuevas Arquitecturas de Consumo y Altas Prestaciones (ANACAP), Móstoles (Madrid), Spain, Nov. 2008.
R. Rodríguez, J. M. Claver, G. Fernández, A. J. Peña, and J. L. Sánchez, “Aceleración de la estimación de movimiento en la codificación H.264/AVC mediante GPUs”, in Workshop de Aplicaciones de Nuevas Arquitecturas de Consumo y Altas Prestaciones (ANACAP), Móstoles (Madrid), Spain, Nov. 2008.

Main Research Lines

Memberships

Sr. Member of IEEE since July 2021. Member since 2013. IEEE TCSC member since 2014. IEEE Comp. Society member since 2016.
Sr. Member of ACM since Apr. 2021. Member since 2014.
Member of Marie Curie Alumni Association since Mar. 2021.
Full member of HiPEAC since Mar. 2017.

Teaching

Computer Structure (Labs). B.S. Computer Science. Technical University of Catalonia, Spain. Since 2016.

Antonio Pena

Primary tabs

Biography

Research

International Conferences

International Journals

International Workshops

International Posters

International Oral Communications

Keynotes

Panels

Birds of a Feather

Invited Talks in Institutions

Invited Talks in Conferences, Workshops, and Symposia

Invited Talks in Exhibition Booths

Spanish Conferences

Main Research Lines

Memberships

Teaching

Find Us

Contact Us

View Map

Connect With Us

View Map

Antonio Pena

Primary tabs

Biography

Research

International Conferences

International Journals

International Workshops

International Posters

International Oral Communications

Keynotes

Panels

Birds of a Feather

Invited Talks in Institutions

Invited Talks in Conferences, Workshops, and Symposia

Invited Talks in Exhibition Booths

Spanish Conferences

Main Research Lines

Memberships

Teaching

Subscribe to Newsletter