OverviewData-intensive science is the fourth paradigm of scientific discovery. Data-intensive distributed computing (DIDC) is a marriage of data-intensive scientific methods with distributed computing in an attempt to define and build computational infrastructure that can support the increasing volumes of digital data innundating today's researchers.
The DISPEL LanguageThe DISPEL language provides the core power of the ADMIRE approach, allowing data analysis experts to describe complex data-intensive workflows in a stable, canonical way which allows for change both in the tools used to create them and in the platforms that enact them.
Architecture and ToolsThe ADMIRE architecture envisages a wide range of tools coupled to a powerful, extensible enactment framework through a controlled, canonical interaction point called a Gateway.
Domain experts and data analysis experts work together at the tools level to devise ways to extract business information from data. Data analysis experts use the DISPEL language to develop canonical workflows and the necessary algorithmic enactors or processing elements that can be chained together to implement the required solution. DIDC experts implement and support the machinery of the Gateway and its execution environment.
ADMIRE's architecture implements the separation of concerns using a conceptual hourglass.
The top bulb is a creative domain where concepts are dynamic and ever changing and therefore the humans in this space need support from tools.
In the neck of the hourglass, creativity should not be present. We need a stable means of discourse between the analysts, developers and the enactment platform. For data-intensive computing, the DISPEL language acts as the stable means of discourse.
In the bottom bulb we again need creativity. The complexities of mapping a DISPEL sentence onto distributed, physical computational resources an data sources requires many type of expertise and signi?cant levels of automation, such as optimization.