Sca-ML: A Novel Approach for Scalable Manifold Learning
Information
Contributors:
Abstract:
Unsupervised learning, especially graph-based Manifold Learning (ML) dimensionality reduction methods, has great importance in today’s data analysis studies. These methods are defined as the spectral projection of the high-dimensional data to a lower-dimensional space while preserving its geometric properties. However, these methods might cause some performance issues in big data analysis due to their high computational complexity. The main objective of the project is to develop new methodologies to increase the parallel scalability of manifold learning in high-performance computing platforms. Besides, intrinsic dimensionality will automatically be determined by utilizing algorithmic benefits of the techniques (such as spectrum slicing and spectral projectors based on contour integrals) used for developing the new algorithms. As depicted in literature in recent years, large-scale eigensolvers (Krylov subspace or Newton-based) have high data dependencies. Therefore, these methods suffer from inter-process communications at large-scale computing platforms, and their scalability is limited. The main limitations for modern large-scale computing systems arise from communication bottlenecks. Hence, there is a lot of attention in the related studies to reduce the communication needs of state-of-art algorithms. In this project, we will develop a novel method to reduce the communicational overhead of the eigendecomposition problem arising from solving the eigenvalue problem in ML. Moreover, the benefit of the determination of intrinsic dimension without extra computational cost will increase the impact of the project’s outcomes. The Project is funding by TÜBİTAK(The Scientific and Technological Research Council of Turkey) under project number 120E281.Visit the Project Website
- Murat Manguoğlu (Middle East Technical University)
- Gülşen Taşkın Kaya (İstanbul Technical University)
- Emrullah Fatih Yetkin (Kadir Has University)
Abstract:
Unsupervised learning, especially graph-based Manifold Learning (ML) dimensionality reduction methods, has great importance in today’s data analysis studies. These methods are defined as the spectral projection of the high-dimensional data to a lower-dimensional space while preserving its geometric properties. However, these methods might cause some performance issues in big data analysis due to their high computational complexity. The main objective of the project is to develop new methodologies to increase the parallel scalability of manifold learning in high-performance computing platforms. Besides, intrinsic dimensionality will automatically be determined by utilizing algorithmic benefits of the techniques (such as spectrum slicing and spectral projectors based on contour integrals) used for developing the new algorithms. As depicted in literature in recent years, large-scale eigensolvers (Krylov subspace or Newton-based) have high data dependencies. Therefore, these methods suffer from inter-process communications at large-scale computing platforms, and their scalability is limited. The main limitations for modern large-scale computing systems arise from communication bottlenecks. Hence, there is a lot of attention in the related studies to reduce the communication needs of state-of-art algorithms. In this project, we will develop a novel method to reduce the communicational overhead of the eigendecomposition problem arising from solving the eigenvalue problem in ML. Moreover, the benefit of the determination of intrinsic dimension without extra computational cost will increase the impact of the project’s outcomes. The Project is funding by TÜBİTAK(The Scientific and Technological Research Council of Turkey) under project number 120E281.Visit the Project Website