Data Science is a rapidly blossoming field of study with a highly multidisciplinary characteristic. Data Science can be defined as the convergence of Computer Science, programming, mathematical modeling, data analytics, academic expertise, traditional AI research and applying statistical techniques through scientific programming tools, streaming computing platforms, and linked data to extract new knowledge discovery through data patterns and provide new insights from distributed computing platforms.
Data science often involves processing huge amounts of data, since the previously exponential growth in the speed of individual CPU has slowed down and the amount of data continues to increase, leveraging computers effectively must entail parallel computation.
Therefore, it is critical to provide well scale performance with parallel computing techniques and apply traditional research with machine learning and deep learning algorithm to design novel patterns and architectures.
It is also important to consider that, nowadays, technologies provide to researchers the ability to collect a huge amount of data, making possible to deal with problems that, only a few years ago, were out of their reach. Such a wealth of data, also called Big Data, requires the development of tools and methodologies with a high scalability degree and able to process virtually unbounded amounts of data within the Data Science scenario. The confluence of big data, massively powerful cloud computing platforms, and the need of businesses from all sectors to leverage their data repositories has created a high-growth environment and demand for parallel data science methodologies.
In this perspective, Data Science needs to embrace data parallel computing techniques for efficient data analysis and analytics. Parallel algorithms for numerical processing, Parallel Data search, and other parallel computing algorithms can also facilitate advanced Data analytics and insights.
The goal is to combine data and processes into a configurable, structured set of steps that implement automated computational solutions of an application with capabilities including provenance management, execution management and reporting tools, integration of distributed computation and data management technologies, ability to ingest local and remote scripts, and sensor management and data streaming interfaces.
Furthermore, such Data Science workflows will provide in-depth treatment of the evolution of high performance, parallel computing architectures and how these architectures and computational ecosystems support Data Science.
This special issue seeks high quality contributions in the field of Parallel and Distributed Computing for Data Science. Submissions will be judged on their originality, significance, clarity, relevance, and technical correctness.
Topics of interests include:
Data Analytics with High Performance Computing;
Parallel Computing techniques for Machine Learning;
Parallel Data Analysis;
High performance data networking, management, and analytics;
Parallel Algorithms for Data Science;
Parallel programming methodologies for Data Science;
Parallel Data Structures;
Multicore Systems, Clusters and GPU for Data Science;