Positioning of the ExaViz project
The goal of the ExaViz project is to design advanced interactive visualization and analysis tools for massive data sets and computation results in the advent of exascale computing by deploying a modular and parallel software infrastructure. We specifically target scientific applications for manipulating and exploring simulations of complex biological objects or materials architectures at the nanoscopic scale.
The ExaViz project is very timely, as France is currently preparing for exascale computing. Several agencies such as the ANR funding agency and the GENCI HPC society are partners in the International Exascale Software Project [EXA11a]. The roadmap of this European consortium focuses on data analysis and visualization in chapter 4.3.2 [EXA11b]. With the inauguration of the Tier-0 PRACE supercomputer Curie, an Exascale Computing Research centre recently opened its doors. ExaViz is in a strong position to benefit from current developments in this field. The collaborative CollaViz platform currently under development [COLLAVIZ] will serve as a reference. Although CollaViz is more focused on collaborative and remote applications, some of its open source framework and web-based technologies may be used within ExaViz. In contrast to such general frameworks, ExaViz specifically focuses on the needs of molecular applications and will be driven by state-of-the-art research needs.
Every scientific field has its own tools and specificities. Modeling at the nanoscopic scale, complex biological systems or materials made of atoms and molecules classically relies on the following sequential workflow
· carry out (exascale) simulations / obtain experimental datasets
· post-process and analyze output from step #1
· visualize data from step #2, return to step #2 for new post-processing & analysis
· depending on results, return to step #1 and re-run simulations with new parameters/setup or carry out new experiments
In visual analysis, the difficulties lie in coupling modules and pieces of software from steps #2 and #3, and seamlessly integrating them on hardware such as PCs or PC clusters to create operational ‘problem solving environments’. For atomic-level nanoscopic simulations (e.g. molecular dynamics), the interactive exploration and post-processing of the models increasingly requires distributed and/or parallel environments. Yet, most current visualization and analysis software components lack scalability and are not specifically optimized. In order to render them exa-scalable, current tools have to be rewritten with intelligent parallelism and memory-management strategies in mind. This is necessary in order to achieve various tasks with a certain bounded latency: graphics display/refresh rate, interactions such as picking, probing, not to mention more advanced features such as haptic rendering (force and/or tactile feedback) to guide the exploration.
Efforts will focus on developing a modular (component oriented) framework to build custom dataflows for the interactive data analysis and visualization. The framework will be designed to ease the integration of specialized codes and make them cooperate even if they were not initially designed for this feature. The framework will take care of networking and deployment issues on the target (heterogenenous) platform to keep as much as possible the scientist focused on its analysis task.
The foundation of the ExaVis project emerged from a previously funded ANR project, FVNano, that involved three of the five partners united in the present project [FVNANO]. FVNano successfully dealt with high performance computing, focusing on the interactive coupling and steering of simulations. Visualization was not a main task in FVNano, yet we rapidly identified the need for better simulation post-processing tools and their integration in a visual analysis workflow, which finally led to the ExaViz proposal.
The ExaViz project addresses the objectives of themes 1, 3 and 4 of the Digital Models (MN) call for proposals. Theme 4, visualization and interactive simulation, is at the heart of the ExaViz project aiming to integrate and visualize simulation data and experimental information. This includes simple visualization of results and interacting with them, as well as creating immersive virtual worlds for the post-processing of numerical simulations within a Virtual Reality context. Theme 1, the design and analysis of complex systems, explicitly mentions the objective to get ready for the exascale using parallel approaches and new architectures such as GPUs. This concerns parallel visualization and post-treatment of simulation data. Our applications are aimed at deepening scientific knowledge and address current major challenges and critical societal issues in biology, health and materials science. For example in health, we propose a grand challenge application simulating an entire influenza virion. These applications are intimately linked to tackling the data deluge (theme 3) by designing scalable algorithms to explore and mine complex data in real time.
The project takes full advantage of existing research infrastructures, notably the “EVE” (Evolutive Virtual Environment) virtual reality platforms at LIMSI, MIReV (Mur d'Images pour la Réalité Virtuelle) at LIFO and Digitalis (700 cores machine with integrated visualization capabilities at MOAIS, which represents a heritage of technological platforms with several million euros already invested. The present project will highlight and make the best use of these platforms in the fields of life and materials science and direct itself towards exa-scalable technologies of the future which are as yet little-used in these disciplines. We will also request access to larger computing facilities like machines from the Grid’5000 network, machines from meso-centers up to tests on the TGCC machines if the scalability of simulations meets the expected threshold.
Materials Science applications will largely benefit from a top-level experimental platform supported by the High Field NMR TGIR network (CNRS FR3050), with 17.6 and 20.0 Tesla magnets at CEMHTI. The bridges between experiments and simulations that we aim to develop in the context of ExaViz will hence considerably reinforce the position and attractiveness of solid-state NMR and the TGIR Network for solving major challenges in catalysis (ANR blanc international 2009 ALUBOROSIL), glasses [FLO09] and molten solids science (ANR Blanc 2011 DYSTRAS), and biomaterials (ANR BiotechS 2008 GaBiPhoCe and ANR blanc 2009 NANOSHAP).
ExaViz is fully coherent with the strategy of the National Alliance for Life Sciences and Health [AVIESAN], and in particular with the topic of “Molecular and structural bases of living organisms”, which includes molecular modelling and bioinformatics. This is clearly stated in the strategic orientations of the Alliance: "Scientific and medical implications: The structural description [..] of molecules (individual or organised into complexes), [..] require multidisciplinary research to be conducted combining biology, physics, chemistry, bioinformatics and mathematics. The implications cover science, medicine and technology [..]. Explore, understand, cure: Identifying the structure of a protein or an assembly of proteins enables us to understand its role in a pathology. It also provides a glimpse of the eventual development of drugs or the prevention of resistance mechanisms to some of these". This also applies to the Alliance of Digital Sciences and Technologies [ALLISTENE], and ExaViz addresses six major problems identified by the Alliance: modelling, simulation and control of complex systems, as well as human-system interaction, content and usage.
On the European scale, visual analytics has been supported since 2008 as VisMaster coordination action by the European Commission, and exascale simulations are at the heart of various activities such as the European Exascale Software Initiative [EESI]. This is closely modeled on international strategies such as the International Exascale Software Initiative [EXA11a]. All this reflects an international race to build an exascale computer where key players include the United States, Europe, China and Japan [THI10]. According to the 2010 SciDAC report, biologists may be among the first users of such a machine: "In biology, the challenges of modeling at multiple scales are driving the need for exascale computing and a new set of algorithms and approaches" [EXA10]. Materials science, and in particular catalysis, is another serious contender [DOE07].