PhD, Postdoctoral and Internship opportunities
I am looking for excellent researchers and students at various levels (postdocs, graduates, and undergraduates) to work with on several projects listed below. In general, a strong background in computer science and good C/C++/Pyhon knowledge and experience in software development are required. Most of our research is done on bleeding edge experimental testbeds and the largest supercomputers in the world, such as Aurora. Please contact me for more information.
Current openings: Postdoctoral Appointee - DataStates (Requisition Number: 408686)
DataStates is a data model in which users do not interact with a data service directly to read/write datasets but rather tag datasets with properties expressing hints, constraints, and persistency semantics, which automatically adds snapshots (called data states) into the lineage– a history recording the evolution of all snapshots using an optimal I/O plan. Such an approach has several advantages: (1) it eliminates the need to deal with complex heterogeneous storage stacks at large scale, shifting the focus on the meaning and properties of the data instead; (2) it bring an incentive to collaborate more, verify and understand the results more thoroughly by sharing and analyzing intermediate results; (3) it encourages the development of new algorithms and ideas that reuse and revisit intermediate and historical data frequently. Such capabilities are particularly important to facilitate quicker advances at the intersection of HPC, artificial intelligence and big data analytics.
Status: To be started soon. Positions available!
VeloC (Very Low Overhead Checkpointing System) is a multi-level checkpoint-restart runtime for HPC supercomputing infrastructures and large-scale data centers sponsored by ECP (Exascale Computing Project). It aims to delivers high performance and scalability for complex heterogeneous storage hierarchies without sacrificing ease of use and flexibility. Checkpoint-restart is primarily used as a fault-tolerance mechanism for tightly coupled HPC applications but is essential in many other administrative use cases: suspend-resume, migration, debugging. Furthermore, many applications naturally return to previous states as part of the computational model (e.g., adjoint computations, neural networks), which can be performed efficiently using checkpoint-restart.
Collaborators: Franck Cappello, Sheng Di (Argonne National Laboratory, USA); Kathryn Mohror, Adam Moody, Gregory Kosinovski (Lawrence Livermore National Laboratory, USA).
Status: VeloC is openly available under the MIT license here. Positions available!
HP-CDS (High Performance Collaborative Distributed Storage) is an experimental storage prototype specifically designed to deliver high throughput with low resource utilization at scale for data-intensive distributed applications that exhibit non-trivial I/O patterns or irregularity due to multi-tenancy. It is centered around the idea of organizing the storage elements in a decentralized peer-to-peer network that constantly exchanges information about locally observed content and I/O access patterns in order to discover global trends that can be exploited by collaboration, such as: dynamic prefetch of data blocks from peers with similar access pattern, on-the-fly de-duplication and dissemination of hot data, automated system-level storage elasticity.
Collaborators: Andrzej Kochut, Alexei Karve (IBM Research USA); Kate Keahey (Argonne National Laboratory, USA); Pierre Riteau (University of Chicago, USA)
Status: HP-CDS was integrated into the OpenStack ecosystem and used within IBM.
BlobSeer is a large-scale distributed data storage service that is centered around the idea of using versioning both at data and metadata level to deliver high throughput under concurrency. BlobSeer was leveraged and demonstrated its effectiveness in several contexts, including: big data analytics based on Hadoop MapReduce, scalable checkpoint-restart for HPC applications, virtual disk dissemination, snapshoting and live block migration in large scale IaaS clouds. BlobSeer became a main research direction of the KerData team, with numerous projects and PhD theses centered around it.
Collaborators: Gabriel Antoniu, Luc Bouge (INRIA, France) and many other current and former members of the KerData team.
Status: BlobSeer is openly available under LGPL here.