‹ Back to Research

SME HPC Adoption Programme in Europe

Context

The predictive analytics market was experiencing significant growth, with organizations increasingly adopting these techniques. Artelnics developed Neural Designer, a professional predictive analytics solution designed to intelligently use data for tasks like discovering relationships, recognizing patterns, and predicting trends. A key challenge for Neural Designer was to handle complex interactions in big datasets, requiring high performance through parallel processing. While the underlying OpenNN code had OpenMP parallelization, which accelerated performance on desktop computers, the emergence of Big Data necessitated handling datasets that exceeded single-machine memory limits.

Procedure

To overcome Big Data challenges and enhance performance, Artelnics collaborated with the Barcelona Supercomputing Center (BSC) through the POP and SHAPE projects. They implemented shared and distributed memory parallelization using OpenMP and MPI. The primary goal was to enable the OpenNN library to load and analyze larger datasets by distributing them across different nodes. The development involved an iterative process where Artelnics and BSC exchanged improved code versions and performance analysis results. Initial prototypes were analyzed on Mare Nostrum III to identify bottlenecks like replicated code and load imbalances. The team then evaluated the MPI+OpenMP version on the Marconi Broadwell partition, running jobs in exclusive mode to ensure accurate performance assessment. They focused on evaluating both the combined MPI+OpenMP performance and the scaling of the MPI implementation.

Results

The implementation of OpenMP and MPI parallelization significantly enhanced Neural Designer’s capabilities. The new code version allowed Artelnics to build predictive models on multi-core computer instances and supercomputing clusters. The code achieved efficiencies close to 90% for both MPI and OpenMP parallelizations. For combined MPI+OpenMP configurations on a single 32-core node, the parallel efficiency ranged from 87% to 98%. While minor performance degradation was observed with increased threads due to slight IPC reduction and thread imbalance, its overall impact was small. Regarding MPI scaling, with OMP_NUM_THREADS set to 1, the application achieved a speed-up of 3.49 with 128 MPI ranks compared to 32 ranks, representing 87% of the ideal speed-up. This successful implementation meant Neural Designer could analyze larger datasets in less time, delivering previously unachievable results to Artelnics’ customers. Future work aimed to further optimize the code by reducing synchronizations and barriers for even larger-scale executions. Artelnics also planned to make Neural Designer available on cloud platforms like AWS Marketplace and Microsoft Azure, allowing users to leverage more efficient machine

[elementor-template id=”764″]