Large Scale Machine Learning

Machine learning models are progressively becoming complex and training datasets are getting larger by the data. Embeddings models are trained over Web scale collections of text and graphs, language models are learnt over millions or billions of sentences. It is fair to say that training machine learning models over huge datasets has reaped massive benefits over the last few years in downstream tasks but on the other hand has become computationally expensive. One of our goals is how to build large models – embeddings models and language models over massive collections of text and graphs – in reasonable time and modest academic hardware resources. In this stream of research we focus on investigating data management, extraction and learning regimes using distributed and approximate algorithms towards building large machine learning models.


This thread of my research is supported by the EU Project SoBigData. SoBigData is a Pan European Research Infrastructure for Big Data and Social Mining. From data to knowledge, investigating stories ethically, paying attention to citizens privacy.


[1] Asynchronous Training of Word Embeddings for Large Text Corpora. A. Anand, M. Khosla, J. Singh, J. Zab, Z. Zhang. In International Conference on Web Search and Data Mining, WSDM 2019.

[2] Node Representation Learning for Directed Graphs. M. Khosla, J. Leonhardt, W. Nejdl, A. Anand. Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2019.

[3] ArchiveSpark: Efficient Web Archive Access, Extraction and Derivation. H. Holzmann, V. Goel, A. Anand. Joint Conference on Digital Libraries, JCDL 2016, Newark, NJ, USA, June 19 - 23, 2016. (Best Paper Nominee)

[4] A Comparative Study for Unsupervised Network Representation Learning. Megha Khosla, Vinay Setty, Avishek Anand. in Transactions on Knowledge and Data Engineering (TKDE), 2020..

[5] Delusive PageRank in Incomplete Graphs.. Helge Holzmann, Avishek Anand and Megha Khosla. in Complex Networks 2018..

[6] Estimating PageRank Deviations in Crawled Graphs.. Helge Holzmann, Avishek Anand and Megha Khosla. in Journal of Applied Network Science, 2019..

Avishek Anand
Avishek Anand
Assistant Professor of Information Retrieval

My research interests lies in the application of machine learning in problems relating to information retrieval and the Web.