Skip to main content

Showing 1–4 of 4 results for author: Akbarian, P

  1. arXiv:2405.14131  [pdf, other

    stat.ML cs.LG

    Statistical Advantages of Perturbing Cosine Router in Sparse Mixture of Experts

    Authors: Huy Nguyen, Pedram Akbarian, Trang Pham, Trang Nguyen, Shujian Zhang, Nhat Ho

    Abstract: The cosine router in sparse Mixture of Experts (MoE) has recently emerged as an attractive alternative to the conventional linear router. Indeed, the cosine router demonstrates favorable performance in image and language tasks and exhibits better ability to mitigate the representation collapse issue, which often leads to parameter redundancy and limited representation potentials. Despite its empir… ▽ More

    Submitted 22 May, 2024; originally announced May 2024.

    Comments: 44 pages, 2 figures

  2. arXiv:2401.13875  [pdf, other

    stat.ML cs.LG

    Is Temperature Sample Efficient for Softmax Gaussian Mixture of Experts?

    Authors: Huy Nguyen, Pedram Akbarian, Nhat Ho

    Abstract: Dense-to-sparse gating mixture of experts (MoE) has recently become an effective alternative to a well-known sparse MoE. Rather than fixing the number of activated experts as in the latter model, which could limit the investigation of potential experts, the former model utilizes the temperature to control the softmax weight distribution and the sparsity of the MoE during training in order to stabi… ▽ More

    Submitted 24 June, 2024; v1 submitted 24 January, 2024; originally announced January 2024.

    Comments: Accepted to ICML 2024, 47 pages, 2 figures, 2 tables

  3. arXiv:2310.14188  [pdf, other

    stat.ML cs.LG

    A General Theory for Softmax Gating Multinomial Logistic Mixture of Experts

    Authors: Huy Nguyen, Pedram Akbarian, TrungTin Nguyen, Nhat Ho

    Abstract: Mixture-of-experts (MoE) model incorporates the power of multiple submodels via gating functions to achieve greater performance in numerous regression and classification applications. From a theoretical perspective, while there have been previous attempts to comprehend the behavior of that model under the regression settings through the convergence analysis of maximum likelihood estimation in the… ▽ More

    Submitted 24 June, 2024; v1 submitted 22 October, 2023; originally announced October 2023.

    Comments: Accepted to ICML 2024, 32 pages, 3 figures, 3 tables

  4. arXiv:2309.13850  [pdf, other

    stat.ML cs.LG

    Statistical Perspective of Top-K Sparse Softmax Gating Mixture of Experts

    Authors: Huy Nguyen, Pedram Akbarian, Fanqi Yan, Nhat Ho

    Abstract: Top-K sparse softmax gating mixture of experts has been widely used for scaling up massive deep-learning architectures without increasing the computational cost. Despite its popularity in real-world applications, the theoretical understanding of that gating function has remained an open problem. The main challenge comes from the structure of the top-K sparse softmax gating function, which partitio… ▽ More

    Submitted 23 February, 2024; v1 submitted 24 September, 2023; originally announced September 2023.

    Comments: Accepted to ICLR 2024, 38 pages, 3 figures, 1 table