!!Heng Tao Shen - Selected Publications
\\
(1) Fumin Shen (Postdoctoral fellow), Chunhua Shen, Wei Liu, Heng Tao Shen. “Supervised discrete hashing”. In Proceedings of the 28th IEEE CVPR, pages 37-45, 2015. (Top conference in computer vision. According to Google Scholar Metrics 2021, CVPR is ranked the 1st in Engineering and Computer Science category, and 4th among all categories with H5-index 356)\\
\\
The major difficulty of learning to hash lies in handling the discrete constraints imposed on the generated hash codes, which typically makes optimization NP-hard in general. This work introduced the first discrete hashing method by reformulating the objective such that it can be solved efficiently by using a regularization algorithm with cyclic coordinate descent. It opens the window for hashing to generate codes without relaxing the discrete constraint, and set the theoretical foundation for the research topic of hashing. It has received 1020 Google Scholar Citations.\\
\\
(2) Bokun Wang (Undergraduate Student), Yang Yang, Xing Xu, Alan Hanjalic, Heng Tao Shen. “Adversarial cross-modal retrieval”.  In Proceedings of the 25th ACM Multimedia, pages 154-162, 2017. (Top conference in multimedia)\\
\\
This paper addresses the challenging problem of cross-modal retrieval that is one of the most intensive research directions in computer science. It took an original perspective on the problem and introduced the concept of adversarial learning. The underlying idea is that a modality classifier (as the adversary) could be deployed to steer the process of representation learning by constantly checking whether the items in the learned representation space can be distinguished from each other in terms of their modality. The goal is that the representation learner wins against the adversary and generates modality-invariant representations. It received the only one Best Paper Award from 1000+ submissions to the conference, with 485 Google Scholar Citations now.\\
\\
(3) Lianli Gao, Zhao Guo, Hanwang Zhang, Xing Xu, Heng Tao Shen (Corresponding author). "Video Captioning With Attention-Based LSTM and Semantic Consistency". IEEE Transactions on Multimedia, 19(9):2045-2055, 2017. (2021 Journal Impact Factor 6.513. Top journal in Multimedia).   \\
\\
It integrates an attention mechanism with LSTM to capture salient structures of video, and explores the correlation between multi-modal representations for generating sentences. Now, inspired by this work, it has become common practice to exploit visual concepts and improve captioning quality by making sentence semantics consistent with visual contents.  It received the only one Prize Paper Award 2020 from all the 600+ papers published in IEEE Transactions on Multimedia in 2017, 2018, or 2019. \\
It is an ESI Highly Cited Paper , with 438 Google Scholar Citation now.\\
\\
(4) Jingdong Wang, Ting Zhang, Nicu Sebe, Heng Tao Shen (Corresponding author).  “A survey on learning to hash”. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(4):769-790, 2018.  (2021 Journal Impact Factor 16.389. Top journal in AI)\\
\\
This article summarized Prof. Shen’s work on hashing and overviews the literature of hashing in the context of indexing and retrieving big multimedia data. It experimentally compared the state of the art hashing methods and enlightened the trends for future research on hashing.  It is an ESI Highly Cited Paper, with 1280 Google Scholar Citations now, together with its preliminary version in arXiv.\\
\\
(5) Fumin Shen, Yan Xu, Li Liu, Yang Yang, Zi Huang, Heng Tao Shen (Corresponding author). "Unsupervised Deep Hashing with Similarity-Adaptive and Discrete Optimization". IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(12):3034-3044, 2018.  (2021 Journal Impact Factor 16.389. Top journal in AI)\\
\\
It presents a simple yet effective unsupervised hashing framework which alternatingly proceeds over three training modules: deep hash model training, similarity graph updating and binary code optimization. The key difference from the widely-used two-step hashing method is that the output representations of the learned deep model help update the similarity graph matrix, which is then used to improve the subsequent code optimization.  It is an ESI Hot and Highly Cited Paper, with 263 Google Scholar Citations now. \\
\\
(6) Zheng Zhang, Li Liu, Fumin Shen, Heng Tao Shen, Ling Shao. "Binary Multi-View Clustering". IEEE Transactions on Pattern Analysis and Machine Intelligence, 41(7):1774-1782, 2019.  (2021 Journal Impact Factor 16.389. Top journal in AI) \\
\\
This paper  presents a novel Binary Multi-View Clustering framework, which can dexterously manipulate multi-view image data and easily scale to large data, with two key components: compact collaborative discrete representation learning and binary clustering structure learning, in a joint learning framework.  It enjoys the significant reduction in both computation and memory footprint, while observing superior or very competitive performance, in comparison with state-of-the-art clustering methods.\\
\\
It is an ESI Highly Cited Paper, with 241 Google Scholar Citations now.\\
\\
(7) Jingjing Li, Ke Lu, Zi Huang, Lei Zhu, Heng Tao Shen. "Transfer Independently Together: A Generalized Framework for Domain Adaptation".  IEEE Transactions on Cybernetics, 49(6):2144-2155, 2019. (2021 Journal Impact Factor 11.448. Top journal in Cybernetics)\\
\\
This paper presents a generalized framework, named as transfer independently together (TIT) to learn multiple transformations, one for each domain (independently), to map data onto a shared latent space, where the domains are well aligned. The multiple transformations are jointly optimized in a unified framework (together) by an effective formulation. TIT is applicable to arbitrary sample dimensionality and does not need labeled target samples for training. It is an ESI Hot and Highly Cited Paper, with 200 Google Scholar Citations now.\\
\\
(8) Jingkuan Song, Yuyu Guo, Lianli Gao, Xuelong Li, Alan Hanjalic, Heng Tao Shen  (Corresponding author). "From Deterministic to Generative: Multimodal Stochastic RNNs for Video Captioning".  IEEE Transactions on Neural Networks and Learning Systems, 30(10):3047-3058, 2019. (2021 Journal Impact Factor 10.451. Top Journal in Neural Network) \\
\\
It introduces a generative approach to model the uncertainty observed in the data using latent stochastic variables. It can improve the performance of video captioning and generate multiple sentences to describe a video considering different random factors. Specifically, a multimodal long short-term memory (LSTM) is first proposed to interact with both visual and textual features to capture a high-level representation. Then, a backward stochastic LSTM is proposed to support uncertainty propagation by introducing latent variables. It is an ESI Hot and Highly Cited Paper, with 188 Google Scholar Citations now.\\
\\
(9) Lianli Gao, Xiangpeng Li, Jingkuan Song, Heng Tao Shen  (Corresponding author). "Hierarchical LSTMs with Adaptive Attention for Visual Captioning". IEEE Transactions on Pattern Analysis and Machine Intelligence, 42(5):1112-1131, 2020.  (2021 Journal Impact Factor 16.389. Top journal in AI) \\
\\
It introduces a hierarchical LSTM with adaptive attention for visual captioning. The idea is to utilize the spatial or temporal attention for selecting specific regions or frames to predict the related words, while the adaptive attention is for deciding whether to depend on the visual information or the language context information. It is an ESI Hot and Highly Cited Paper, with 170 Google Scholar Citations now. \\
\\
(10) Heng Tao Shen, Luchen Liu, Yang Yang, Xing Xu, Zi Huang, Fumin Shen, Richang Hong. “Exploiting subspace relation in semantic labels for cross-modal hashing”. IEEE Transactions on Knowledge and Data Engineering, 33(10):3351-3365, 2021. (2021 Journal Impact Factor 6.977. Top journal in Data Engineering) \\
\\
It presents a novel supervised cross-modal hashing method dubbed Subspace Relation Learning for Cross-modal Hashing (SRLCH), which exploits relation information of labels in semantic space to make similar data from different modalities closer in the low-dimension Hamming subspace. SRLCH preserves the modality relationships, the discrete constraints and nonlinear structures. It is an ESI Highly Cited Paper, with 73 Google Scholar Citations now.