default search action
Shaojun Wei
Person information
Other persons with a similar name
SPARQL queries
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
showing all ?? records
2020 – today
- 2024
- [j176]Weiwei Wu, Fengbin Tu, Xiangyu Li, Shaojun Wei, Shouyi Yin:
SWG: an architecture for sparse weight gradient computation. Sci. China Inf. Sci. 67(2) (2024) - [j175]Chenchen Deng, Tianzhu Xiong, Zhaoshi Li, Zhiwei Liu, Yao Wang, Jianfeng Zhu, Jun Yang, Shaojun Wei, Leibo Liu:
CATCAM: a 28 nm constant-time alteration TCAM enabling less than 50 ns update latency. Sci. China Inf. Sci. 67(4) (2024) - [j174]Fengbin Tu, Zihan Wu, Yiqi Wang, Weiwei Wu, Leibo Liu, Yang Hu, Shaojun Wei, Shouyi Yin:
MulTCIM: Digital Computing-in-Memory-Based Multimodal Transformer Accelerator With Attention-Token-Bit Hybrid Sparsity. IEEE J. Solid State Circuits 59(1): 90-101 (2024) - [j173]Ruiqi Guo, Xiaofeng Chen, Lei Wang, Yang Wang, Hao Sun, Jingchuan Wei, Huiming Han, Leibo Liu, Shaojun Wei, Yang Hu, Shouyi Yin:
CIMFormer: A Systolic CIM-Array-Based Transformer Accelerator With Token-Pruning-Aware Attention Reformulating and Principal Possibility Gathering. IEEE J. Solid State Circuits 59(10): 3317-3329 (2024) - [j172]Yubin Qin, Yang Wang, Dazheng Deng, Xiaolong Yang, Zhiren Zhao, Yang Zhou, Yuanqi Fan, Jingchuan Wei, Tianbao Chen, Leibo Liu, Shaojun Wei, Yang Hu, Shouyi Yin:
Ayaka: A Versatile Transformer Accelerator With Low-Rank Estimation and Heterogeneous Dataflow. IEEE J. Solid State Circuits 59(10): 3342-3356 (2024) - [j171]Jiangxue Liu, Cankun Zhao, Shuohang Peng, Bohan Yang, Hang Zhao, Xiangdong Han, Min Zhu, Shaojun Wei, Leibo Liu:
A Low-Latency High-Order Arithmetic to Boolean Masking Conversion. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2024(2): 630-653 (2024) - [j170]Xiangren Chen, Bohan Yang, Jianfeng Zhu, Jun Liu, Shuying Yin, Guang Yang, Min Zhu, Shaojun Wei, Leibo Liu:
UpWB: An Uncoupled Architecture Design for White-box Cryptography Using Vectorized Montgomery Multiplication. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2024(2): 677-713 (2024) - [j169]Cankun Zhao, Hang Zhao, Jiangxue Liu, Bohan Yang, Wenping Zhu, Shuying Yin, Min Zhu, Shaojun Wei, Leibo Liu:
Breaking Ground: A New Area Record for Low-Latency First-Order Masked SHA-3 Advancing from the 4x Area Era to the 3x Area Era. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2024(4): 231-257 (2024) - [j168]Gang Zeng, Jianfeng Zhu, Yichi Zhang, Ganhui Chen, Zhenhai Yuan, Shaojun Wei, Leibo Liu:
A High-Performance Genomic Accelerator for Accurate Sequence-to-Graph Alignment Using Dynamic Programming Algorithm. IEEE Trans. Parallel Distributed Syst. 35(2): 237-249 (2024) - [c178]Zhou Wang, Haochen Du, Baoyi Han, Yanqing Xu, Xiaonan Tang, Yang Zhou, Zhe Zheng, Wenpeng Cui, Yanwei Xiong, Shaojun Wei, Shushan Qiao, Shouyi Yin:
RTPE: A High Energy Efficiency Inference Processor with RISC-V based Transformation Mechanism. AICAS 2024: 297-301 - [c177]Zhou Wang, Haochen Du, Baoyi Han, Yanqing Xu, Xiaonan Tang, Yang Zhou, Zhe Zheng, Wenpeng Cui, Yanwei Xiong, Shaojun Wei, Shushan Qiao, Shouyi Yin:
RCPE: An Excellent Performance Training Processor with RISC-V based Compression Mechanism. AICAS 2024: 302-306 - [c176]Yichi Zhang, Dibei Chen, Gang Zeng, Jianfeng Zhu, Zhaoshi Li, Longlong Chen, Shaojun Wei, Leibo Liu:
Harp: Leveraging Quasi-Sequential Characteristics to Accelerate Sequence-to-Graph Mapping of Long Reads. ASPLOS (3) 2024: 512-527 - [c175]Ting Li, Jinjiang Yang, Yin Zhou, Shaojun Wei:
Research on Performance Optimization of Encryption Algorithms for Network Security Framework. CSAIDE 2024: 650-653 - [c174]Zhiheng Yue, Shaojun Wei, Yang Hu, Shouyi Yin:
CAP: A General Purpose Computation-in-memory with Content Addressable Processing Paradigm. DAC 2024: 22:1-22:6 - [c173]Xujiang Xiang, Zhiheng Yue, Yuxuan Li, Liuxin Lv, Shaojun Wei, Yang Hu, Shouyi Yin:
Dyn-Bitpool: A Two-sided Sparse CIM Accelerator Featuring a Balanced Workload Scheme and High CIM Macro Utilization. DAC 2024: 35:1-35:6 - [c172]Zheng Xu, Xu Dai, Shaojun Wei, Shouyi Yin, Yang Hu:
GSPO: A Graph Substitution and Parallelization Joint Optimization Framework for DNN Inference. DAC 2024: 214:1-214:6 - [c171]Xiaolong Yang, Yang Wang, Yubin Qin, Jiachen Wang, Shaojun Wei, Yang Hu, Shouyi Yin:
FQP: A Fibonacci Quantization Processor with Multiplication-Free Computing and Topological-Order Routing. DAC 2024: 230:1-230:6 - [c170]Hang Zhao, Cankun Zhao, Wenping Zhu, Bohan Yang, Shaojun Wei, Leibo Liu:
Sparse Polynomial Multiplication-Based High-Performance Hardware Implementation for CRYSTALS-Dilithium. HOST 2024: 150-159 - [c169]Zhiheng Yue, Huizheng Wang, Jiahao Fang, Jinyi Deng, Guangyang Lu, Fengbin Tu, Ruiqi Guo, Yuxuan Li, Yubin Qin, Yang Wang, Chao Li, Huiming Han, Shaojun Wei, Yang Hu, Shouyi Yin:
Exploiting Similarity Opportunities of Emerging Vision AI Models on Hybrid Bonding Architecture. ISCA 2024: 396-409 - [c168]Yubin Qin, Yang Wang, Zhiren Zhao, Xiaolong Yang, Yang Zhou, Shaojun Wei, Yang Hu, Shouyi Yin:
MECLA: Memory-Compute-Efficient LLM Accelerator with Scaling Sub-matrix Partition. ISCA 2024: 1032-1047 - [c167]Zhiheng Yue, Xujiang Xiang, Fengbin Tu, Yang Wang, Yiming Wang, Shaojun Wei, Yang Hu, Shouyi Yin:
15.1 A 0.795fJ/bit Physically-Unclonable Function-Protected TCAM for a Software-Defined Networking Switch. ISSCC 2024: 276-278 - [c166]Yihong Zhu, Wenping Zhu, Yi Ouyang, Junwen Sun, Min Zhu, Qi Zhao, Jinjiang Yang, Chen Chen, Qichao Tao, Guang Yang, Aoyang Zhang, Shaojun Wei, Leibo Liu:
16.2 A 28nm 69.4kOPS 4.4μJ/Op Versatile Post-Quantum Crypto-Processor Across Multiple Mathematical Problems. ISSCC 2024: 298-300 - [c165]Ruiqi Guo, Lei Wang, Xiaofeng Chen, Hao Sun, Zhiheng Yue, Yubin Qin, Huiming Han, Yang Wang, Fengbin Tu, Shaojun Wei, Yang Hu, Shouyi Yin:
20.2 A 28nm 74.34TFLOPS/W BF16 Heterogenous CIM-Based Accelerator Exploiting Denoising-Similarity for Diffusion Models. ISSCC 2024: 362-364 - [c164]Yang Wang, Xiaolong Yang, Yubin Qin, Zhiren Zhao, Ruiqi Guo, Zhiheng Yue, Huiming Han, Shaojun Wei, Yang Hu, Shouyi Yin:
34.1 A 28nm 83.23TFLOPS/W POSIT-Based Compute-in-Memory Macro for High-Accuracy AI Applications. ISSCC 2024: 566-568 - [c163]Yiqi Wang, Zhen He, Chenggang Zhao, Zihan Wu, Mingyu Gao, Huiming Han, Shaojun Wei, Yang Hu, Fengbin Tu, Shouyi Yin:
ETCIM: An Error-Tolerant Digital-CIM Processor with Redundancy-Free Repair and Run-Time MAC and Cell Error Correction. VLSI Technology and Circuits 2024: 1-2 - [c162]Yang Wang, Xiaolong Yang, Yubin Qin, Zhiren Zhao, Ruiqi Guo, Zhiheng Yue, Huiming Han, Shaojun Wei, Yang Hu, Shouyi Yin:
A 22nm 54.94TFLOPS/W Transformer Fine-Tuning Processor with Exponent-Stationary Re-Computing, Aggressive Linear Fitting, and Logarithmic Domain Multiplicating. VLSI Technology and Circuits 2024: 1-2 - [c161]Ruiqi Guo, Xiaofeng Chen, Lei Wang, Fengbin Tu, Shaojun Wei, Yang Hu, Shouyi Yin:
A 28nm 4170-TFLOPS/W/b and 195-TFLOPS/mm2/b Multiply-Free Fully-Digital Floating-Point Compute-In-Memory Macro with Mitchell's Approximation. VLSI Technology and Circuits 2024: 1-2 - [c160]Yubin Qin, Yang Wang, Xiaolong Yang, Zhiren Zhao, Shaojun Wei, Yang Hu, Shouyi Yin:
A 52.01 TFLOPS/W Diffusion Model Processor with Inter-Time-Step Convolution-Attention-Redundancy Elimination and Bipolar Floating-Point Multiplication. VLSI Technology and Circuits 2024: 1-2 - [i11]Jinyi Deng, Xinru Tang, Zhiheng Yue, Guangyang Lu, Qize Yang, Jiahao Zhang, Jinxi Li, Chao Li, Shaojun Wei, Yang Hu, Shouyi Yin:
Efficient Orchestrated AI Workflows Execution on Scale-out Spatial Architecture. CoRR abs/2405.17221 (2024) - [i10]Jiangxue Liu, Cankun Zhao, Shuohang Peng, Bohan Yang, Hang Zhao, Xiangdong Han, Min Zhu, Shaojun Wei, Leibo Liu:
A Low-Latency High-Order Arithmetic to Boolean Masking Conversion. IACR Cryptol. ePrint Arch. 2024: 45 (2024) - 2023
- [j167]Yihong Zhu, Wenping Zhu, Chongyang Li, Min Zhu, Chenchen Deng, Chen Chen, Shuying Yin, Shouyi Yin, Shaojun Wei, Leibo Liu:
RePQC: A 3.4-uJ/Op 48-kOPS Post-Quantum Crypto-Processor for Multiple-Mathematical Problems. IEEE J. Solid State Circuits 58(1): 124-140 (2023) - [j166]Yang Wang, Yubin Qin, Dazheng Deng, Jingchuan Wei, Yang Zhou, Yuanqi Fan, Tianbao Chen, Hao Sun, Leibo Liu, Shaojun Wei, Shouyi Yin:
An Energy-Efficient Transformer Processor Exploiting Dynamic Weak Relevances in Global Attention. IEEE J. Solid State Circuits 58(1): 227-242 (2023) - [j165]Fengbin Tu, Yiqi Wang, Zihan Wu, Ling Liang, Yufei Ding, Bongjin Kim, Leibo Liu, Shaojun Wei, Yuan Xie, Shouyi Yin:
ReDCIM: Reconfigurable Digital Computing- In -Memory Processor With Unified FP/INT Pipeline for Cloud AI Acceleration. IEEE J. Solid State Circuits 58(1): 243-255 (2023) - [j164]Ruiqi Guo, Zhiheng Yue, Xin Si, Hao Li, Te Hu, Limei Tang, Yabing Wang, Hao Sun, Leibo Liu, Meng-Fan Chang, Qiang Li, Shaojun Wei, Shouyi Yin:
TT@CIM: A Tensor-Train In-Memory-Computing Processor Using Bit-Level-Sparsity Optimization and Variable Precision Quantization. IEEE J. Solid State Circuits 58(3): 852-866 (2023) - [j163]Fengbin Tu, Zihan Wu, Yiqi Wang, Ling Liang, Liu Liu, Yufei Ding, Leibo Liu, Shaojun Wei, Yuan Xie, Shouyi Yin:
TranCIM: Full-Digital Bitline-Transpose CIM-based Sparse Transformer Accelerator With Pipeline/Parallel Reconfigurable Modes. IEEE J. Solid State Circuits 58(6): 1798-1809 (2023) - [j162]Fengbin Tu, Yiqi Wang, Ling Liang, Yufei Ding, Leibo Liu, Shaojun Wei, Shouyi Yin, Yuan Xie:
SDP: Co-Designing Algorithm, Dataflow, and Architecture for In-SRAM Sparse NN Acceleration. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 42(1): 109-121 (2023) - [j161]Mingyang Kou, Jiangyuan Gu, Hailong Yao, Shaojun Wei, Shouyi Yin:
TAEM 2.0: A Faster Transfer-Aware Effective Loop Mapping for Heterogeneous Resources on CGRA. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 42(8): 2552-2565 (2023) - [j160]Xiangyu Kong, Jianfeng Zhu, Xingchen Man, Guihuan Song, Yi Huang, Chenchen Deng, Pengfei Gou, Shouyi Yin, Shaojun Wei, Leibo Liu:
M2STaR: A Multimode Spatio-Temporal Redundancy Design for Fault-Tolerant Coarse-Grained Reconfigurable Architectures. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 42(9): 2938-2951 (2023) - [j159]Yiqi Wang, Fengbin Tu, Leibo Liu, Shaojun Wei, Yuan Xie, Shouyi Yin:
SPCIM: Sparsity-Balanced Practical CIM Accelerator With Optimized Spatial-Temporal Multi-Macro Utilization. IEEE Trans. Circuits Syst. I Regul. Pap. 70(1): 214-227 (2023) - [j158]Shaojun Wei, Xinhan Lin, Fengbin Tu, Yang Wang, Leibo Liu, Shouyi Yin:
Reconfigurability, Why It Matters in AI Tasks Processing: A Survey of Reconfigurable AI Chips. IEEE Trans. Circuits Syst. I Regul. Pap. 70(3): 1228-1241 (2023) - [j157]Weiwei Wu, Fengbin Tu, Mengqi Niu, Zhiheng Yue, Leibo Liu, Shaojun Wei, Xiangyu Li, Yang Hu, Shouyi Yin:
STAR: An STGCN ARchitecture for Skeleton-Based Human Action Recognition. IEEE Trans. Circuits Syst. I Regul. Pap. 70(6): 2370-2383 (2023) - [j156]Shuqin Su, Bohan Yang, Vladimir Rozic, Mingyuan Yang, Min Zhu, Shaojun Wei, Leibo Liu:
A Closer Look at the Chaotic Ring Oscillators based TRNG Design. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2023(2): 381-417 (2023) - [j155]Longlong Chen, Jianfeng Zhu, Guiqiang Peng, Mingxu Liu, Shaojun Wei, Leibo Liu:
GEM: Ultra-Efficient Near-Memory Reconfigurable Acceleration for Read Mapping by Dividing and Predictive Scattering. IEEE Trans. Parallel Distributed Syst. 34(12): 3059-3072 (2023) - [c159]Xiaofeng Chen, Ruiqi Guo, Zhiheng Yue, Yang Hu, Leibo Liu, Shaojun Wei, Shouyi Yin:
A Systolic Computing-in-Memory Array based Accelerator with Predictive Early Activation for Spatiotemporal Convolutions. AICAS 2023: 1-5 - [c158]Zhou Wang, Jingchuan Wei, Xiaonan Tang, Boxiao Han, Hongjun He, Leibo Liu, Shaojun Wei, Shouyi Yin:
TPE: A High-Performance Edge-Device Inference with Multi-level Transformational Mechanism. AICAS 2023: 1-5 - [c157]Ruiqi Guo, Yang Wang, Xiaofeng Chen, Lei Wang, Hao Sun, Jingchuan Wei, Leibo Liu, Shaojun Wei, Yang Hu, Shouyi Yin:
CIMFormer: A 38.9TOPS/W-8b Systolic CIM-Array Based Transformer Processor with Token-Slimmed Attention Reformulating and Principal Possibility Gathering. A-SSCC 2023: 1-3 - [c156]Yubin Qin, Yang Wang, Dazheng Deng, Xiaolong Yang, Zhiren Zhao, Yang Zhou, Yuanqi Fan, Jingchuan Wei, Tianbao Chen, Leibo Liu, Shaojun Wei, Yang Hu, Shouyi Yin:
A 28nm 49.7TOPS/W Sparse Transformer Processor with Random-Projection-Based Speculation, Multi-Stationary Dataflow, and Redundant Partial Product Elimination. A-SSCC 2023: 1-3 - [c155]Zhou Wang, Jingchuan Wei, Boxiao Han, Hongjun He, Leibo Liu, Shaojun Wei, Shouyi Yin:
CPE: An Energy-Efficient Edge-Device Training with Multi-dimensional Compression Mechanism. DAC 2023: 1-6 - [c154]Qidie Wu, Jiangyuan Gu, Youxu Lin, Boxiao Han, Hongjun He, Yang Hu, Leibo Liu, Shaojun Wei, Shouyi Yin:
RMP-MEM: A HW/SW Reconfigurable Multi-Port Memory Architecture for Multi-PEA Oriented CGRA. DAC 2023: 1-6 - [c153]Yihong Zhu, Wenping Zhu, Chen Chen, Min Zhu, Zhengdong Li, Shaojun Wei, Leibo Liu:
Mckeycutter: A High-throughput Key Generator of Classic McEliece on Hardware. DAC 2023: 1-6 - [c152]Shuohang Peng, Bohan Yang, Shuying Yin, Hang Zhao, Cankun Zhao, Shaojun Wei, Leibo Liu:
A Low-Randomness First-Order Masked Xoodyak. HOST 2023: 48-56 - [c151]Dibei Chen, Tairan Zhang, Yi Huang, Jianfeng Zhu, Yang Liu, Pengfei Gou, Chunyang Feng, Binghua Li, Shaojun Wei, Leibo Liu:
Orinoco: Ordered Issue and Unordered Commit with Non-Collapsible Queues. ISCA 2023: 11:1-11:14 - [c150]Yubin Qin, Yang Wang, Dazheng Deng, Zhiren Zhao, Xiaolong Yang, Leibo Liu, Shaojun Wei, Yang Hu, Shouyi Yin:
FACT: FFN-Attention Co-optimized Transformer Architecture with Eager Correlation Prediction. ISCA 2023: 22:1-22:14 - [c149]Xiangyu Kong, Yi Huang, Jianfeng Zhu, Xingchen Man, Yang Liu, Chunyang Feng, Pengfei Gou, Minggui Tang, Shaojun Wei, Leibo Liu:
MapZero: Mapping for Coarse-grained Reconfigurable Architectures with Reinforcement Learning and Monte-Carlo Tree Search. ISCA 2023: 46:1-46:14 - [c148]Yibo Wu, Jianfeng Zhu, Wenrui Wei, Longlong Chen, Liang Wang, Shaojun Wei, Leibo Liu:
Shogun: A Task Scheduling Framework for Graph Mining Accelerators. ISCA 2023: 51:1-51:15 - [c147]Zhiheng Yue, Yang Wang, Huizheng Wang, Yabing Wang, Ruiqi Guo, Limei Tang, Leibo Liu, Shaojun Wei, Yang Hu, Shouyi Yin:
CV-CIM: A 28nm XOR-Derived Similarity-Aware Computation-in-Memory for Cost-Volume Construction. ISSCC 2023: 138-139 - [c146]Fengbin Tu, Zihan Wu, Yiqi Wang, Weiwei Wu, Leibo Liu, Yang Hu, Shaojun Wei, Shouyi Yin:
MuITCIM: A 28nm $2.24 \mu\mathrm{J}$/Token Attention-Token-Bit Hybrid Sparse Digital CIM-Based Accelerator for Multimodal Transformers. ISSCC 2023: 248-249 - [c145]Fengbin Tu, Yiqi Wang, Zihan Wu, Weiwei Wu, Leibo Liu, Yang Hu, Shaojun Wei, Shouyi Yin:
TensorCIM: A 28nm 3.7nJ/Gather and 8.3TFLOPS/W FP32 Digital-CIM Tensor Processor for MCM-CIM-Based Beyond-NN Acceleration. ISSCC 2023: 254-255 - [c144]Jinyi Deng, Xinru Tang, Jiahao Zhang, Yuxuan Li, Linyun Zhang, Boxiao Han, Hongjun He, Fengbin Tu, Leibo Liu, Shaojun Wei, Yang Hu, Shouyi Yin:
Towards Efficient Control Flow Handling in Spatial Architecture via Architecting the Control Flow Plane. MICRO 2023: 1395-1408 - [c143]Yi Huang, Lingkun Kong, Dibei Chen, Zhiyu Chen, Xiangyu Kong, Jianfeng Zhu, Konstantinos Mamouras, Shaojun Wei, Kaiyuan Yang, Leibo Liu:
CASA: An Energy-Efficient and High-Speed CAM-based SMEM Seeding Accelerator for Genome Alignment. MICRO 2023: 1423-1436 - [c142]Yang Wang, Yubin Qin, Dazheng Deng, Xiaolong Yang, Zhiren Zhao, Ruiqi Guo, Zhiheng Yue, Leibo Liu, Shaojun Wei, Yang Hu, Shouyi Yin:
A 28nm 77.35TOPS/W Similar Vectors Traceable Transformer Processor with Principal-Component-Prior Speculating and Dynamic Bit-wise Stationary Computing. VLSI Technology and Circuits 2023: 1-2 - [i9]Jinyi Deng, Xinru Tang, Jiahao Zhang, Yuxuan Li, Linyun Zhang, Fengbin Tu, Leibo Liu, Shaojun Wei, Yang Hu, Shouyi Yin:
Towards Efficient Control Flow Handling in Spatial Architecture via Architecting the Control Flow Plane. CoRR abs/2307.02847 (2023) - [i8]Haojia Hui, Jiangyuan Gu, Xunbo Hu, Yang Hu, Leibo Liu, Shaojun Wei, Shouyi Yin:
WindMill: A Parameterized and Pluggable CGRA Implemented by DIAG Design Flow. CoRR abs/2309.01273 (2023) - [i7]Yang Hu, Xinhan Lin, Huizheng Wang, Zhen He, Xingmao Yu, Jiahao Zhang, Qize Yang, Zheng Xu, Sihan Guan, Jiahao Fang, Haoran Shang, Xinru Tang, Xu Dai, Shaojun Wei, Shouyi Yin:
Wafer-scale Computing: Advancements, Challenges, and Future Perspectives. CoRR abs/2310.09568 (2023) - [i6]Shuqin Su, Bohan Yang, Vladimir Rozic, Mingyuan Yang, Min Zhu, Shaojun Wei, Leibo Liu:
A Closer Look at the Chaotic Ring Oscillators based TRNG Design. IACR Cryptol. ePrint Arch. 2023: 40 (2023) - 2022
- [b2]Shaojun Wei, Leibo Liu, Jianfeng Zhu, Chenchen Deng:
Software Defined Chips - Volume I, 2. Springer 2022, ISBN 978-981-19-6993-5, pp. 1-311 - [j154]Chenchen Deng, Min Zhu, Jinjiang Yang, Youyu Wu, Jiaji He, Bohan Yang, Jianfeng Zhu, Shouyi Yin, Shaojun Wei, Leibo Liu:
An energy-efficient dynamically reconfigurable cryptographic engine with improved power/EM-side-channel-attack resistance. Sci. China Inf. Sci. 65(4) (2022) - [j153]Huiyu Mo, Wenping Zhu, Wenjing Hu, Qiang Li, Ang Li, Shouyi Yin, Shaojun Wei, Leibo Liu:
A 12.1 TOPS/W Quantized Network Acceleration Processor With Effective-Weight-Based Convolution and Error-Compensation-Based Prediction. IEEE J. Solid State Circuits 57(5): 1542-1557 (2022) - [j152]Yang Wang, Yubin Qin, Dazheng Deng, Jingchuan Wei, Tianbao Chen, Xinhan Lin, Leibo Liu, Shaojun Wei, Shouyi Yin:
Trainer: An Energy-Efficient Edge-Device Training Processor Supporting Dynamic Weight Pruning. IEEE J. Solid State Circuits 57(10): 3164-3178 (2022) - [j151]Zongsheng Hou, Neng Zhang, Bohan Yang, Hanning Wang, Min Zhu, Shouyi Yin, Shaojun Wei, Leibo Liu:
Efficient FHE Radix-2 Arithmetic Operations Based on Redundant Encoding. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 41(7): 2024-2037 (2022) - [j150]Baofen Yuan, Jianfeng Zhu, Xingchen Man, Zijiao Ma, Shouyi Yin, Shaojun Wei, Leibo Liu:
Dynamic-II Pipeline: Compiling Loops With Irregular Branches on Static-Scheduling CGRA. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 41(9): 2929-2942 (2022) - [j149]Ang Li, Huiyu Mo, Wenping Zhu, Qiang Li, Shouyi Yin, Shaojun Wei, Leibo Liu:
BitCluster: Fine-Grained Weight Quantization for Load-Balanced Bit-Serial Neural Network Accelerators. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 41(11): 4747-4757 (2022) - [j148]Yong Wu, Honglan Jiang, Zining Ma, Pengfei Gou, Yong Lu, Jie Han, Shouyi Yin, Shaojun Wei, Leibo Liu:
An Energy-Efficient Approximate Divider Based on Logarithmic Conversion and Piecewise Constant Approximation. IEEE Trans. Circuits Syst. I Regul. Pap. 69(7): 2655-2668 (2022) - [j147]Zhiheng Yue, Yabing Wang, Yubin Qin, Leibo Liu, Shaojun Wei, Shouyi Yin:
BR-CIM: An Efficient Binary Representation Computation-In-Memory Design. IEEE Trans. Circuits Syst. I Regul. Pap. 69(10): 3940-3953 (2022) - [j146]Yang Wang, Yubin Qin, Leibo Liu, Shaojun Wei, Shouyi Yin:
SWPU: A 126.04 TFLOPS/W Edge-Device Sparse DNN Training Processor With Dynamic Sub-Structured Weight Pruning. IEEE Trans. Circuits Syst. I Regul. Pap. 69(10): 4014-4027 (2022) - [j145]Yang Wang, Dazheng Deng, Leibo Liu, Shaojun Wei, Shouyi Yin:
PL-NPU: An Energy-Efficient Edge-Device DNN Training Processor With Posit-Based Logarithm-Domain Computing. IEEE Trans. Circuits Syst. I Regul. Pap. 69(10): 4042-4055 (2022) - [j144]Jianxun Yang, Fengbin Tu, Yixuan Li, Yiqi Wang, Leibo Liu, Shaojun Wei, Shouyi Yin:
GQNA: Generic Quantized DNN Accelerator With Weight-Repetition-Aware Activation Aggregating. IEEE Trans. Circuits Syst. I Regul. Pap. 69(10): 4069-4082 (2022) - [j143]Xiangren Chen, Bohan Yang, Shouyi Yin, Shaojun Wei, Leibo Liu:
CFNTT: Scalable Radix-2/4 NTT Multiplication Architecture with an Efficient Conflict-free Memory Mapping Scheme. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2022(1): 94-126 (2022) - [j142]Cankun Zhao, Neng Zhang, Hanning Wang, Bohan Yang, Wenping Zhu, Zhengdong Li, Min Zhu, Shouyi Yin, Shaojun Wei, Leibo Liu:
A Compact and High-Performance Hardware Architecture for CRYSTALS-Dilithium. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2022(1): 270-295 (2022) - [c141]Xiangren Chen, Bohan Yang, Yong Lu, Shouyi Yin, Shaojun Wei, Leibo Liu:
Efficient access scheme for multi-bank based NTT architecture through conflict graph. DAC 2022: 91-96 - [c140]Jinyi Deng, Linyun Zhang, Lei Wang, Jiawei Liu, Kexiang Deng, Shibin Tang, Jiangyuan Gu, Boxiao Han, Fei Xu, Leibo Liu, Shaojun Wei, Shouyi Yin:
Mixed-granularity parallel coarse-grained reconfigurable architecture. DAC 2022: 343-348 - [c139]Zhiheng Yue, Yabing Wang, Leibo Liu, Shaojun Wei, Shouyi Yin:
MC-CIM: a reconfigurable computation-in-memory for efficient stereo matching cost computation. DAC 2022: 457-462 - [c138]Shixuan Zheng, Xianjue Zhang, Leibo Liu, Shaojun Wei, Shouyi Yin:
Atomic Dataflow based Graph-Level Workload Orchestration for Scalable DNN Accelerators. HPCA 2022: 475-489 - [c137]Yibo Wu, Liang Wang, Xiaohang Wang, Jie Han, Jianfeng Zhu, Honglan Jiang, Shouyi Yin, Shaojun Wei, Leibo Liu:
Upward Packet Popup for Deadlock Freedom in Modular Chiplet-Based Systems. HPCA 2022: 986-1000 - [c136]Weiliang Chen, Zhaoshi Li, Leibo Liu, Shaojun Wei:
Dynamically Reconfigurable Memory Address Mapping for General-Purpose Graphics Processing Unit. ICTA 2022: 1-2 - [c135]Mingyuan Yang, Yemeng Zhang, Bohan Yang, Hanning Wang, Shouyi Yin, Shaojun Wei, Leibo Liu:
A SHA-512 Hardware Implementation Based on Block RAM Storage Structure. IPDPS Workshops 2022: 132-135 - [c134]Xingchen Man, Jianfeng Zhu, Guihuan Song, Shouyi Yin, Shaojun Wei, Leibo Liu:
CaSMap: agile mapper for reconfigurable spatial architectures by automatically clustering intermediate representations and scattering mapping process. ISCA 2022: 259-273 - [c133]Fengbin Tu, Yiqi Wang, Zihan Wu, Ling Liang, Yufei Ding, Bongjin Kim, Leibo Liu, Shaojun Wei, Yuan Xie, Shouyi Yin:
A 28nm 29.2TFLOPS/W BF16 and 36.5TOPS/W INT8 Reconfigurable Digital CIM Processor with Unified FP/INT Pipeline and Bitwise In-Memory Booth Multiplication for Cloud Deep Learning Acceleration. ISSCC 2022: 1-3 - [c132]Yang Wang, Yubin Qin, Dazheng Deng, Jingchuan Wei, Yang Zhou, Yuanqi Fan, Tianbao Chen, Hao Sun, Leibo Liu, Shaojun Wei, Shouyi Yin:
A 28nm 27.5TOPS/W Approximate-Computing-Based Transformer Processor with Asymptotic Sparsity Speculating and Out-of-Order Computing. ISSCC 2022: 1-3 - [c131]Fengbin Tu, Zihan Wu, Yiqi Wang, Ling Liang, Liu Liu, Yufei Ding, Leibo Liu, Shaojun Wei, Yuan Xie, Shouyi Yin:
A 28nm 15.59µJ/Token Full-Digital Bitline-Transpose CIM-Based Sparse Transformer Accelerator with Pipeline/Parallel Reconfigurable Modes. ISSCC 2022: 466-468 - [c130]Yihong Zhu, Wenping Zhu, Min Zhu, Chongyang Li, Chenchen Deng, Chen Chen, Shuying Yin, Shouyi Yin, Shaojun Wei, Leibo Liu:
A 28nm 48KOPS 3.4µJ/Op Agile Crypto-Processor for Post-Quantum Cryptography on Multi-Mathematical Problems. ISSCC 2022: 514-516 - [i5]Hongjiang Chen, Yang Wang, Leibo Liu, Shaojun Wei, Shouyi Yin:
FAQS: Communication-efficient Federate DNN Architecture and Quantization Co-Search for personalized Hardware-aware Preferences. CoRR abs/2210.08450 (2022) - [i4]Hongjiang Chen, Yang Wang, Leibo Liu, Shaojun Wei, Shouyi Yin:
HQNAS: Auto CNN deployment framework for joint quantization and architecture search. CoRR abs/2210.08485 (2022) - [i3]Yihong Zhu, Wenping Zhu, Chen Chen, Min Zhu, Zhengdong Li, Shaojun Wei, Leibo Liu:
Compact GF(2) systemizer and optimized constant-time hardware sorters for Key Generation in Classic McEliece. IACR Cryptol. ePrint Arch. 2022: 1277 (2022) - 2021
- [j141]Hai Huang, Leibo Liu, Min Zhu, Shouyi Yin, Shaojun Wei:
Fast substitution-box evaluation algorithm and its efficient masking scheme for block ciphers. Sci. China Inf. Sci. 64(8) (2021) - [j140]Fengbin Tu, Weiwei Wu, Yang Wang, Hongjiang Chen, Feng Xiong, Man Shi, Ning Li, Jinyi Deng, Tianbao Chen, Leibo Liu, Shaojun Wei, Yuan Xie, Shouyi Yin:
Evolver: A Deep Learning Processor With On-Device Quantization-Voltage-Frequency Tuning. IEEE J. Solid State Circuits 56(2): 658-673 (2021) - [j139]Jianfeng Zhu, Ao Luo, Guanhua Li, Bowei Zhang, Yong Wang, Gang Shan, Yi Li, Jianfeng Pan, Chenchen Deng, Shouyi Yin, Shaojun Wei, Leibo Liu:
Jintide: Utilizing Low-Cost Reconfigurable External Monitors to Substantially Enhance Hardware Security of Large-Scale CPU Clusters. IEEE J. Solid State Circuits 56(8): 2585-2601 (2021) - [j138]Fengbin Tu, Weiwei Wu, Yang Wang, Hongjiang Chen, Feng Xiong, Man Shi, Ning Li, Jinyi Deng, Tianbao Chen, Leibo Liu, Shaojun Wei, Yuan Xie, Shouyi Yin:
Erratum to "Evolver: a Deep Learning Processor With On-Device Quantization-Voltage-Frequency Tuning". IEEE J. Solid State Circuits 56(9): 2895 (2021) - [j137]Jianxun Yang, Yuyao Kong, Zhao Zhang, Zhuangzhi Liu, Jing Zhou, Yiqi Wang, Yonggang Liu, Chenfu Guo, Te Hu, Congcong Li, Leibo Liu, Jin Zhang, Shaojun Wei, Jun Yang, Shouyi Yin:
TIMAQ: A Time-Domain Computing-in-Memory-Based Processor Using Predictable Decomposed Convolution for Arbitrary Quantized DNNs. IEEE J. Solid State Circuits 56(10): 3021-3038 (2021) - [j136]Neng Zhang, Qiao Qin, Zongsheng Hou, Bohan Yang, Shouyi Yin, Shaojun Wei, Leibo Liu:
Efficient Comparison and Addition for FHE With Weighted Computational Complexity Model. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 40(9): 1896-1908 (2021) - [j135]Yibo Wu, Liang Wang, Xiaohang Wang, Jie Han, Shouyi Yin, Shaojun Wei, Leibo Liu:
A Deflection-Based Deadlock Recovery Framework to Achieve High Throughput for Faulty NoCs. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 40(10): 2170-2183 (2021) - [j134]Hui Wu, Zhe Su, Jilin Zhang, Shaojun Wei, Zhihua Wang, Hong Chen:
A Design Flow for Click-Based Asynchronous Circuits Design With Conventional EDA Tools. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 40(11): 2421-2425 (2021) - [j133]Yihong Zhu, Min Zhu, Bohan Yang, Wenping Zhu, Chenchen Deng, Chen Chen, Shaojun Wei, Leibo Liu:
LWRpro: An Energy-Efficient Configurable Crypto-Processor for Module-LWR. IEEE Trans. Circuits Syst. I Regul. Pap. 68(3): 1146-1159 (2021) - [j132]Huiyu Mo, Leibo Liu, Wenping Zhu, Qiang Li, Shouyi Yin, Shaojun Wei:
A 460 GOPS/W Improved Mnemonic Descent Method-Based Hardwired Accelerator for Face Alignment. IEEE Trans. Multim. 23: 1122-1135 (2021) - [j131]Longlong Chen, Jianfeng Zhu, Yangdong Deng, Zhaoshi Li, Jian Chen, Xiaowei Jiang, Shouyi Yin, Shaojun Wei, Leibo Liu:
An Elastic Task Scheduling Scheme on Coarse-Grained Reconfigurable Architectures. IEEE Trans. Parallel Distributed Syst. 32(12): 3066-3080 (2021) - [c129]Yang Wang, Dazheng Deng, Leibo Liu, Shaojun Wei, Shouyi Yin:
LPE: Logarithm Posit Processing Element for Energy-Efficient Edge-Device Training. AICAS 2021: 1-4 - [c128]Yang Wang, Yubin Qin, Leibo Liu, Shaojun Wei, Shouyi Yin:
HPPU: An Energy-Efficient Sparse DNN Training Processor with Hybrid Weight Pruning. AICAS 2021: 1-4 - [c127]Cheng Li, Jiangyuan Gu, Shouyi Yin, Leibo Liu, Shaojun Wei:
Combining Memory Partitioning and Subtask Generation for Parallel Data Access on CGRAs. ASP-DAC 2021: 204-209 - [c126]Song Zhang, Jiangyuan Gu, Shouyi Yin, Leibo Liu, Shaojun Wei:
A Multiple-Precision Multiply and Accumulation Design with Multiply-Add Merged Strategy for AI Accelerating. ASP-DAC 2021: 229-234 - [c125]Jilin Zhang, Mingxuan Liang, Jinsong Wei, Shaojun Wei, Hong Chen:
A 28nm Configurable Asynchronous SNN Accelerator with Energy-Efficient Learning. ASYNC 2021: 34-39 - [c124]Xinhan Lin, Liang Sun, Fengbin Tu, Leibo Liu, Xiangyu Li, Shaojun Wei, Shouyi Yin:
ADROIT: An Adaptive Dynamic Refresh Optimization Framework for DRAM Energy Saving In DNN Training. DAC 2021: 751-756 - [c123]Haichang Yang, Zhaoshi Li, Jiawei Wang, Shouyi Yin, Shaojun Wei, Leibo Liu:
HeteroKV: A Scalable Line-rate Key-Value Store on Heterogeneous CPU-FPGA Platforms. DATE 2021: 834-837 - [c122]Jianxun Yang, Zhao Zhang, Zhuangzhi Liu, Jing Zhou, Leibo Liu, Shaojun Wei, Shouyi Yin:
FuseKNA: Fused Kernel Convolution based Accelerator for Deep Neural Networks. HPCA 2021: 894-907 - [c121]Weiyi Sun, Zhaoshi Li, Shouyi Yin, Shaojun Wei, Leibo Liu:
ABC-DIMM: Alleviating the Bottleneck of Communication in DIMM-based Near-Memory Processing with Inter-DIMM Broadcast. ISCA 2021: 237-250 - [c120]Huiyu Mo, Wenping Zhu, Wenjing Hu, Guangbin Wang, Qiang Li, Ang Li, Shouyi Yin, Shaojun Wei, Leibo Liu:
9.2A 28nm 12.1TOPS/W Dual-Mode CNN Processor Using Effective-Weight-Based Convolution and Error-Compensation-Based Prediction. ISSCC 2021: 146-148 - [c119]Ruiqi Guo, Zhiheng Yue, Xin Si, Te Hu, Hao Li, Limei Tang, Yabing Wang, Leibo Liu, Meng-Fan Chang, Qiang Li, Shaojun Wei, Shouyi Yin:
15.4 A 5.99-to-691.1TOPS/W Tensor-Train In-Memory-Computing Processor Using Bit-Level-Sparsity-Based Optimization and Variable-Precision Quantization. ISSCC 2021: 242-244 - [c118]Ruiqi Guo, Hao Li, Ruhui Liu, Zhixiao Zhang, Limei Tang, Hao Sun, Leibo Liu, Meng-Fan Chang, Shaojun Wei, Shouyi Yin:
A 6.54-to-26.03 TOPS/W Computing-In-Memory RNN Processor using Input Similarity Optimization and Attention-based Context-breaking with Output Speculation. VLSI Circuits 2021: 1-2 - [c117]Yang Wang, Yubin Qin, Dazheng Deng, Jingchuan Wei, Tianbao Chen, Xinhan Lin, Leibo Liu, Shaojun Wei, Shouyi Yin:
A 28nm 276.55TFLOPS/W Sparse Deep-Neural-Network Training Processor with Implicit Redundancy Speculation and Batch Normalization Reformulation. VLSI Circuits 2021: 1-2 - 2020
- [j130]Leibo Liu, Jianfeng Zhu, Zhaoshi Li, Yanan Lu, Yangdong Deng, Jie Han, Shouyi Yin, Shaojun Wei:
A Survey of Coarse-Grained Reconfigurable Architecture and Design: Taxonomy, Challenges, and Applications. ACM Comput. Surv. 52(6): 118:1-118:39 (2020) - [j129]Guiqiang Peng, Leibo Liu, Sheng Zhou, Shouyi Yin, Shaojun Wei:
A 2.92-Gb/s/W and 0.43-Gb/s/MG Flexible and Scalable CGRA-Based Baseband Processor for Massive MIMO Detection. IEEE J. Solid State Circuits 55(2): 505-519 (2020) - [j128]Neng Zhang, Qiao Qin, Hang Yuan, Chenggao Zhou, Shouyi Yin, Shaojun Wei, Leibo Liu:
NTTU: An Area-Efficient Low-Power NTT-Uncoupled Architecture for NTT-Based Multiplication. IEEE Trans. Computers 69(4): 520-533 (2020) - [j127]Yibo Wu, Leibo Liu, Liang Wang, Xiaohang Wang, Jie Han, Chenchen Deng, Shaojun Wei:
Aggressive Fine-Grained Power Gating of NoC Buffers. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 39(11): 3177-3189 (2020) - [j126]Shixuan Zheng, Xianjue Zhang, Daoli Ou, Shibin Tang, Leibo Liu, Shaojun Wei, Shouyi Yin:
Efficient Scheduling of Irregular Network Structures on CNN Accelerators. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 39(11): 3408-3419 (2020) - [j125]Chenchen Deng, Bo Wang, Leibo Liu, Min Zhu, Youyu Wu, Hui Li, Shouyi Yin, Shaojun Wei:
A 60 Gb/s-Level Coarse-Grained Reconfigurable Cryptographic Processor With Less Than 1-W Power. IEEE Trans. Circuits Syst. II Express Briefs 67-II(2): 375-379 (2020) - [j124]Hang Wang, Xiang Li, Daqiang Han, Shiquan Yu, Shouyi Yin, Shaojun Wei, Nanning Zheng, Xuchong Zhang, Tiancheng Wang, Wenchang Li, Qiubo Chen, Pengju Ren, Xiaogang Wu, Hongbin Sun, Zhiqiang Jiang:
A 4K × 2K@60fps Multifunctional Video Display Processor for High Perceptual Image Quality. IEEE Trans. Circuits Syst. I Regul. Pap. 67-I(2): 451-463 (2020) - [j123]Neng Zhang, Bohan Yang, Chen Chen, Shouyi Yin, Shaojun Wei, Leibo Liu:
Highly Efficient Architecture of NewHope-NIST on FPGA using Low-Complexity NTT/INTT. IACR Trans. Cryptogr. Hardw. Embed. Syst. 2020(2): 49-72 (2020) - [j122]Huiyu Mo, Leibo Liu, Wenping Zhu, Qiang Li, Hong Liu, Shouyi Yin, Shaojun Wei:
A Multi-Task Hardwired Accelerator for Face Detection and Alignment. IEEE Trans. Circuits Syst. Video Technol. 30(11): 4284-4298 (2020) - [j121]Liang Wang, Leibo Liu, Jie Han, Xiaohang Wang, Shouyi Yin, Shaojun Wei:
Achieving Flexible Global Reconfiguration in NoCs Using Reconfigurable Rings. IEEE Trans. Parallel Distributed Syst. 31(3): 611-622 (2020) - [j120]Leibo Liu, Xingchen Man, Jianfeng Zhu, Shouyi Yin, Shaojun Wei:
Pattern-Based Dynamic Compilation System for CGRAs With Online Configuration Transformation. IEEE Trans. Parallel Distributed Syst. 31(12): 2981-2994 (2020) - [j119]Leibo Liu, Guiqiang Peng, Pan Wang, Sheng Zhou, Qiushi Wei, Shouyi Yin, Shaojun Wei:
Energy- and Area-Efficient Recursive-Conjugate-Gradient-Based MMSE Detector for Massive MIMO Systems. IEEE Trans. Signal Process. 68: 573-588 (2020) - [j118]Pan Wang, Leibo Liu, Sheng Zhou, Guiqiang Peng, Shouyi Yin, Shaojun Wei:
Near-Optimal MIMO-SCMA Uplink Detection With Low-Complexity Expectation Propagation. IEEE Trans. Wirel. Commun. 19(2): 1025-1037 (2020) - [c116]Jianxun Yang, Yuyao Kong, Zhao Zhang, Zhuangzhi Liu, Jing Zhou, Yiqi Wang, Yonggang Liu, Chenfu Guo, Te Hu, Congcong Li, Leibo Liu, Jin Zhang, Shaojun Wei, Jun Yang, Shouyi Yin:
A Time-Domain Computing-in-Memory based Processor using Predictable Decomposed Convolution for Arbitrary Quantized DNNs. A-SSCC 2020: 1-4 - [c115]Mingyang Kou, Jiangyuan Gu, Shaojun Wei, Hailong Yao, Shouyi Yin:
TAEM: Fast Transfer-Aware Effective Loop Mapping for Heterogeneous Resources on CGRA. DAC 2020: 1-6 - [c114]Liang Wang, Leibo Liu, Xiaohang Wang, Jie Han, Chenchen Deng, Shaojun Wei:
CDRing: Reconfigurable Ring Architecture by Exploiting Cycle Decomposition of Torus Topology. DAC 2020: 1-6 - [c113]Feng Xiong, Fengbin Tu, Man Shi, Yang Wang, Leibo Liu, Shaojun Wei, Shouyi Yin:
STC: Significance-aware Transform-based Codec Framework for External Memory Access Reduction. DAC 2020: 1-6 - [c112]Ning Li, Leibo Liu, Shaojun Wei, Shouyi Yin:
A High-performance Inference Accelerator Exploiting Patterned Sparsity in CNNs. FCCM 2020: 243 - [c111]Peishuo Li, Zihang Jiang, Shouyi Yin, Dandan Song, Peng Ouyang, Leibo Liu, Shaojun Wei:
PAGAN: A Phase-Adapted Generative Adversarial Networks for Speech Enhancement. ICASSP 2020: 6234-6238 - [c110]Yanan Lu, Leibo Liu, Jian Liu, Shouyi Yin, Shaojun Wei:
A Reconfigurable Branch Predictor for Spatial Computing Architectures. ICDSP 2020: 295-299 - [c109]Yifan Yang, Zhaoshi Li, Yangdong Deng, Zhiwei Liu, Shouyi Yin, Shaojun Wei, Leibo Liu:
GraphABCD: Scaling Out Graph Analytics with Asynchronous Block Coordinate Descent. ISCA 2020: 419-432 - [c108]Dibei Chen, Zhaoshi Li, Tianzhu Xiong, Zhiwei Liu, Jun Yang, Shouyi Yin, Shaojun Wei, Leibo Liu:
CATCAM: Constant-time Alteration Ternary CAM with Scalable In-Memory Architecture. MICRO 2020: 342-355 - [c107]Huiyu Mo, Leibo Liu, Wenjing Hu, Wenping Zhu, Qiang Li, Ang Li, Shouyi Yin, Jian Chen, Xiaowei Jiang, Shaojun Wei:
TFE: Energy-efficient Transferred Filter-based Engine to Compress and Accelerate Convolutional Neural Networks. MICRO 2020: 751-765 - [i2]Yihong Zhu, Min Zhu, Bohan Yang, Wenping Zhu, Chenchen Deng, Chen Chen, Shaojun Wei, Leibo Liu:
A High-performance Hardware Implementation of Saber Based on Karatsuba Algorithm. IACR Cryptol. ePrint Arch. 2020: 1037 (2020)
2010 – 2019
- 2019
- [b1]Leibo Liu, Guiqiang Peng, Shaojun Wei:
Massive MIMO Detection Algorithm and VLSI Architecture. Springer 2019, ISBN 978-981-13-6361-0, pp. 1-336 - [j117]Shouyi Yin, Peng Ouyang, Jianxun Yang, Tianyi Lu, Xiudong Li, Leibo Liu, Shaojun Wei:
An Energy-Efficient Reconfigurable Processor for Binary-and Ternary-Weight Neural Networks With Flexible Data Bit Width. IEEE J. Solid State Circuits 54(4): 1120-1136 (2019) - [j116]Yinglin Zhao, Peng Ouyang, Wang Kang, Shouyi Yin, Youguang Zhang, Shaojun Wei, Weisheng Zhao:
An STT-MRAM Based in Memory Architecture for Low Power Integral Computing. IEEE Trans. Computers 68(4): 617-623 (2019) - [j115]Hai Huang, Leibo Liu, Qihuan Huang, Yingjie Chen, Shouyi Yin, Shaojun Wei:
Low Area-Overhead Low-Entropy Masking Scheme (LEMS) Against Correlation Power Analysis Attack. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 38(2): 208-219 (2019) - [j114]Shouyi Yin, Shibin Tang, Xinhan Lin, Peng Ouyang, Fengbin Tu, Leibo Liu, Shaojun Wei:
A High Throughput Acceleration for Hybrid Neural Networks With Efficient Resource Management on FPGA. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 38(4): 678-691 (2019) - [j113]Leibo Liu, Wenping Zhu, Shouyi Yin, Shaojun Wei:
A Binary-Feature-Based Object Recognition Accelerator With 22 M-Vector/s Throughput and 0.68 G-Vector/J Energy-Efficiency for Full-HD Resolution. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 38(7): 1265-1277 (2019) - [j112]Liang Wang, Ping Lv, Leibo Liu, Jie Han, Ho-fung Leung, Xiaohang Wang, Shouyi Yin, Shaojun Wei, Terrence S. T. Mak:
A Lifetime Reliability-Constrained Runtime Mapping for Throughput Optimization in Many-Core Systems. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 38(9): 1771-1784 (2019) - [j111]Dajiang Liu, Shouyi Yin, Guojie Luo, Jiaxing Shang, Leibo Liu, Shaojun Wei, Yong Feng, Shangbo Zhou:
Data-Flow Graph Mapping Optimization for CGRA With Deep Reinforcement Learning. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 38(12): 2271-2283 (2019) - [j110]Man Shi, Peng Ouyang, Shouyi Yin, Leibo Liu, Shaojun Wei:
A Fast and Power-Efficient Hardware Architecture for Non-Maximum Suppression. IEEE Trans. Circuits Syst. II Express Briefs 66-II(11): 1870-1874 (2019) - [j109]Shixuan Zheng, Peng Ouyang, Dandan Song, Xiudong Li, Leibo Liu, Shaojun Wei, Shouyi Yin:
An Ultra-Low Power Binarized Convolutional Neural Network-Based Speech Recognition Processor With On-Chip Self-Learning. IEEE Trans. Circuits Syst. I Regul. Pap. 66-I(12): 4648-4661 (2019) - [j108]Fengbin Tu, Shouyi Yin, Peng Ouyang, Leibo Liu, Shaojun Wei:
Reconfigurable Architecture for Neural Approximation in Multimedia Computing. IEEE Trans. Circuits Syst. Video Technol. 29(3): 892-906 (2019) - [j107]Leibo Liu, Qiang Wang, Wenping Zhu, Huiyu Mo, Tianchen Wang, Shouyi Yin, Yiyu Shi, Shaojun Wei:
A Face Alignment Accelerator Based on Optimized Coarse-to-Fine Shape Searching. IEEE Trans. Circuits Syst. Video Technol. 29(8): 2467-2481 (2019) - [j106]Huiyu Mo, Leibo Liu, Wenping Zhu, Shouyi Yin, Shaojun Wei:
Face Alignment With Expression- and Pose-Based Adaptive Initialization. IEEE Trans. Multim. 21(4): 943-956 (2019) - [j105]Shouyi Yin, Shibin Tang, Xinhan Lin, Peng Ouyang, Fengbin Tu, Leibo Liu, Jishen Zhao, Cong Xu, Shuangchen Li, Yuan Xie, Shaojun Wei:
Parana: A Parallel Neural Architecture Considering Thermal Problem of 3D Stacked Memory. IEEE Trans. Parallel Distributed Syst. 30(1): 146-160 (2019) - [c106]Xi Chen, Shouyi Yin, Dandan Song, Peng Ouyang, Leibo Liu, Shaojun Wei:
Small-Footprint Keyword Spotting with Graph Convolutional Network. ASRU 2019: 539-546 - [c105]Jilin Zhang, Hui Wu, Jinsong Wei, Shaojun Wei, Hong Chen:
An Asynchronous Reconfigurable SNN Accelerator With Event-Driven Time Step Update. A-SSCC 2019: 213-216 - [c104]Huiyu Mo, Leibo Liu, Wenping Zhu, Qiang Li, Hong Liu, Wenjing Hu, Yao Wang, Shaojun Wei:
A 1.17 TOPS/W, 150fps Accelerator for Multi-Face Detection and Alignment. DAC 2019: 80 - [c103]Hong Liu, Leibo Liu, Wenping Zhu, Qiang Li, Huiyu Mo, Shaojun Wei:
L-MPC: A LUT based Multi-Level Prediction-Correction Architecture for Accelerating Binary-Weight Hourglass Network. DAC 2019: 192 - [c102]Xingchen Man, Leibo Liu, Jianfeng Zhu, Shaojun Wei:
A General Pattern-Based Dynamic Compilation Framework for Coarse-Grained Reconfigurable Architectures. DAC 2019: 195 - [c101]Hui Yan, Zhaoshi Li, Leibo Liu, Shouyi Yin, Shaojun Wei:
Constructing Concurrent Data Structures on FPGA with Channels. FPGA 2019: 172-177 - [c100]Yu Pan, Peng Ouyang, Yinglin Zhao, Shouyi Yin, Youguang Zhang, Shaojun Wei, Weisheng Zhao:
A Skyrmion Racetrack Memory based Computing In-memory Architecture for Binary Neural Convolutional Network. ACM Great Lakes Symposium on VLSI 2019: 271-274 - [c99]Leibo Liu, Ao Luo, Guanhua Li, Jianfeng Zhu, Yong Wang, Gang Shan, Jianfeng Pan, Shouyi Yin, Shaojun Wei:
Jintide®: A Hardware Security Enhanced Server CPU with Xeon® Cores under Runtime Surveillance by an In-Package Dynamically Reconfigurable Processor. Hot Chips Symposium 2019: 1-25 - [c98]Kai Lu, Zhaoshi Li, Leibo Liu, Jiawei Wang, Shouyi Yin, Shaojun Wei:
ReDESK: A Reconfigurable Dataflow Engine for Sparse Kernels on Heterogeneous Platforms. ICCAD 2019: 1-8 - [c97]Hang Yuan, Wei Guo, Chip-Hong Chang, Yuan Cao, Shaojun Wei, Shouyi Yin, Chenchen Deng, Leibo Liu, Wei Ge, Fan Zhang:
A Reliable Physical Unclonable Function Based on Differential Charging Capacitors. ISCAS 2019: 1-5 - [c96]Feng Xiong, Fengbin Tu, Shouyi Yin, Shaojun Wei:
Towards Efficient Compact Network Training on Edge-Devices. ISVLSI 2019: 61-67 - [c95]Zhaoshi Li, Leibo Liu, Yangdong Deng, Jiawei Wang, Zhiwei Liu, Shouyi Yin, Shaojun Wei:
FPGA-Accelerated Optimistic Concurrency Control for Transactional Memory. MICRO 2019: 911-923 - [c94]Jianxun Yang, Leibo Liu, Jin Zhang, Shaojun Wei, Shouyi Yin:
An Energy-Efficient Architecture for Accelerating Inference of Memory-Augmented Neural Networks. NANOARCH 2019: 1-6 - [c93]Weiwei Wu, Shouyi Yin, Fengbin Tu, Leibo Liu, Shaojun Wei:
MoNA: Mobile Neural Architecture with Reconfigurable Parallel Dimensions. NEWCAS 2019: 1-4 - [c92]Ruiqi Guo, Yonggang Liu, Shixuan Zheng, Ssu-Yen Wu, Peng Ouyang, Win-San Khwa, Xi Chen, Jia-Jing Chen, Xiudong Li, Leibo Liu, Meng-Fan Chang, Shaojun Wei, Shouyi Yin:
A 5.1pJ/Neuron 127.3us/Inference RNN-based Speech Recognition Processor using 16 Computing-in-Memory SRAM Macros in 65nm CMOS. VLSI Circuits 2019: 120- - [i1]Xi Chen, Shouyi Yin, Dandan Song, Peng Ouyang, Leibo Liu, Shaojun Wei:
Small-footprint Keyword Spotting with Graph Convolutional Network. CoRR abs/1912.05124 (2019) - 2018
- [j104]Shouyi Yin, Tianyi Lu, Xianqing Yao, Zhicong Xie, Leibo Liu, Shaojun Wei:
Multi-Bank Memory Aware Force Directed Scheduling for High-Level Synthesis. IEEE Access 6: 7526-7540 (2018) - [j103]Zhaoshi Li, Leibo Liu, Yangdong Deng, Shouyi Yin, Shaojun Wei:
Breaking the Synchronization Bottleneck with Reconfigurable Transactional Execution. IEEE Comput. Archit. Lett. 17(2): 147-150 (2018) - [j102]Shuang Liang, Shouyi Yin, Leibo Liu, Wayne Luk, Shaojun Wei:
FP-BNN: Binarized neural network on FPGA. Neurocomputing 275: 1072-1086 (2018) - [j101]Ruofei Hu, Binren Tian, Shouyi Yin, Shaojun Wei:
Optimization of Softmax Layer in Deep Neural Network Using Integral Stochastic Computation. J. Low Power Electron. 14(4): 475-480 (2018) - [j100]Shouyi Yin, Peng Ouyang, Shibin Tang, Fengbin Tu, Xiudong Li, Shixuan Zheng, Tianyi Lu, Jiangyuan Gu, Leibo Liu, Shaojun Wei:
A High Energy Efficient Reconfigurable Hybrid Neural Network Processor for Deep Learning Applications. IEEE J. Solid State Circuits 53(4): 968-982 (2018) - [j99]Shouyi Yin, Zhicong Xie, Chenyue Meng, Peng Ouyang, Leibo Liu, Shaojun Wei:
Memory Partitioning for Parallel Multipattern Data Access in Multiple Data Arrays. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 37(2): 431-444 (2018) - [j98]Leibo Liu, Zhuoquan Zhou, Shaojun Wei, Min Zhu, Shouyi Yin, Shengyang Mao:
DRMaSV: Enhanced Capability Against Hardware Trojans in Coarse Grained Reconfigurable Architectures. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 37(4): 782-795 (2018) - [j97]Leibo Liu, Chen Yang, Shouyi Yin, Shaojun Wei:
CDPM: Context-Directed Pattern Matching Prefetching to Improve Coarse-Grained Reconfigurable Array Performance. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 37(6): 1171-1184 (2018) - [j96]Jiale Yan, Shouyi Yin, Fengbin Tu, Leibo Liu, Shaojun Wei:
GNA: Reconfigurable and Efficient Architecture for Generative Network Acceleration. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 37(11): 2519-2529 (2018) - [j95]Leibo Liu, Bo Wang, Chenchen Deng, Min Zhu, Shouyi Yin, Shaojun Wei:
Anole: A Highly Efficient Dynamically Reconfigurable Crypto-Processor for Symmetric-Key Algorithms. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 37(12): 3081-3094 (2018) - [j94]Leibo Liu, Zhaoshi Li, Chen Yang, Chenchen Deng, Shouyi Yin, Shaojun Wei:
HReA: An Energy-Efficient Embedded Dynamically Reconfigurable Fabric for 13-Dwarfs Processing. IEEE Trans. Circuits Syst. II Express Briefs 65-II(3): 381-385 (2018) - [j93]Guiqiang Peng, Leibo Liu, Sheng Zhou, Shouyi Yin, Shaojun Wei:
A 1.58 Gbps/W 0.40 Gbps/mm2 ASIC Implementation of MMSE Detection for $128\times 8~64$ -QAM Massive MIMO in 65 nm CMOS. IEEE Trans. Circuits Syst. I Regul. Pap. 65-I(5): 1717-1730 (2018) - [j92]Peng Ouyang, Shouyi Yin, Leibo Liu, Youguang Zhang, Weisheng Zhao, Shaojun Wei:
A Fast and Power-Efficient Hardware Architecture for Visual Feature Detection in Affine-SIFT. IEEE Trans. Circuits Syst. I Regul. Pap. 65-I(10): 3362-3375 (2018) - [j91]Jiangyuan Gu, Shouyi Yin, Leibo Liu, Shaojun Wei:
Stress-Aware Loops Mapping on CGRAs with Dynamic Multi-Map Reconfiguration. IEEE Trans. Parallel Distributed Syst. 29(9): 2105-2120 (2018) - [j90]Yanan Lu, Leibo Liu, Yangdong Deng, Jian Weng, Shouyi Yin, Yiyu Shi, Shaojun Wei:
Triggered-Issuance and Triggered-Execution: A Control Paradigm to Minimize Pipeline Stalls in Distributed Controlled Coarse-Grained Reconfigurable Arrays. IEEE Trans. Parallel Distributed Syst. 29(10): 2360-2372 (2018) - [j89]Guiqiang Peng, Leibo Liu, Sheng Zhou, Yang Xue, Shouyi Yin, Shaojun Wei:
Algorithm and Architecture of a Low-Complexity and High-Parallelism Preprocessing-Based K -Best Detector for Large-Scale MIMO Systems. IEEE Trans. Signal Process. 66(7): 1860-1875 (2018) - [j88]Shouyi Yin, Tianyi Lu, Zhicong Xie, Leibo Liu, Shaojun Wei:
Bit-Level Disturbance-Aware Memory Partitioning for Parallel Data Access for MLC STT-RAM. IEEE Trans. Very Large Scale Integr. Syst. 26(11): 2345-2357 (2018) - [c91]Hang Wang, Hongbin Sun, Xuchong Zhang, Qiubo Chen, Pengju Ren, Xiaogang Wu, Shouyi Yin, Zhiqiang Jiang, Xiang Li, Daqiang Han, Shiquan Yu, Shaojun Wei, Nanning Zheng:
A 4K×2K@60fps Multi-format Multi-function Display Processor for High Perceptual Quality. APCCAS 2018: 427-430 - [c90]Weijia Chen, Hui Wu, Shaojun Wei, Anping He, Hong Chen:
An Asynchronous Energy-Efficient CNN Accelerator with Reconfigurable Architecture. A-SSCC 2018: 51-54 - [c89]Guiqiang Peng, Leibo Liu, Qiushi Wei, Yao Wang, Shouyi Yin, Shaojun Wei:
A 2.69 Mbps/mW 1.09 Mbps/kGE Conjugate Gradient-based MMSE Detector for 64-QAM 128×8 Massive MIMO Systems. A-SSCC 2018: 191-194 - [c88]Hang Yuan, Leibo Liu, Hui Li, Shouyi Yin, Shaojun Wei:
A Full Multicast Reconfigurable Non-blocking Permutation Network. CyberC 2018 - [c87]Xinhan Lin, Shouyi Yin, Fengbin Tu, Leibo Liu, Xiangyu Li, Shaojun Wei:
LCP: a layer clusters paralleling mapping method for accelerating inception and residual networks on FPGA. DAC 2018: 16:1-16:6 - [c86]Shixuan Zheng, Yonggang Liu, Shouyi Yin, Leibo Liu, Shaojun Wei:
An efficient kernel transformation architecture for binary- and ternary-weight neural network inference. DAC 2018: 137:1-137:6 - [c85]Ruofei Hu, Binren Tian, Shouyi Yin, Shaojun Wei:
Efficient Hardware Architecture of Softmax Layer in Deep Neural Network. DSP 2018: 1-5 - [c84]Fengbin Tu, Weiwei Wu, Shouyi Yin, Leibo Liu, Shaojun Wei:
RANA: Towards Efficient Neural Acceleration with Refresh-Optimized Embedded DRAM. ISCA 2018: 340-352 - [c83]Jianxin Guo, Shouyi Yin, Peng Ouyang, Fengbin Tu, Shibin Tang, Leibo Liu, Shaojun Wei:
Bit-width Adaptive Accelerator Design for Convolution Neural Network. ISCAS 2018: 1-5 - [c82]Zhihui Wang, Shouyi Yin, Fengbin Tu, Leibo Liu, Shaojun Wei:
An Energy Efficient JPEG Encoder with Neural Network Based Approximation and Near-Threshold Computing. ISCAS 2018: 1-5 - [c81]Shouyi Yin, Peng Ouyang, Jianxun Yang, Tianyi Lu, Xiudong Li, Leibo Liu, Shaojun Wei:
An Ultra-High Energy-Efficient Reconfigurable Processor for Deep Neural Networks with Binary/Ternary Weights in 28NM CMOS. VLSI Circuits 2018: 37-38 - [c80]Shouyi Yin, Peng Ouyang, Shixuan Zheng, Dandan Song, Xiudong Li, Leibo Liu, Shaojun Wei:
A 141 UW, 2.46 PJ/Neuron Binarized Convolutional Neural Network Based Self-Learning Speech Recognition Processor in 28NM CMOS. VLSI Circuits 2018: 139-140 - 2017
- [j87]Weizhi Xu, Shouyi Yin, Zhen Zhang, Hao Dong, Rui Shi, Leibo Liu, Shaojun Wei:
Reconfigurable VLSI Architecture for Real-Time 2D-to-3D Conversion. IEEE Access 5: 26604-26613 (2017) - [j86]Leibo Liu, Yingjie Chen, Chenchen Deng, Shouyi Yin, Shaojun Wei:
Implementation of in-loop filter for HEVC decoder on reconfigurable processor. IET Image Process. 11(9): 685-692 (2017) - [j85]Shouyi Yin, Peng Ouyang, Xu Dai, Leibo Liu, Shaojun Wei:
An AdaBoost-Based Face Detection System Using Parallel Configurable Architecture With Optimized Computation. IEEE Syst. J. 11(1): 260-271 (2017) - [j84]Chenchen Deng, Leibo Liu, Yang Liu, Shouyi Yin, Shaojun Wei:
PMCC: Fast and Accurate System-Level Power Modeling for Processors on Heterogeneous SoC. IEEE Trans. Circuits Syst. II Express Briefs 64-II(5): 540-544 (2017) - [j83]Bo Wang, Leibo Liu, Chenchen Deng, Min Zhu, Shouyi Yin, Zhuoquan Zhou, Shaojun Wei:
Exploration of Benes Network in Cryptographic Processors: A Random Infection Countermeasure for Block Ciphers Against Fault Attacks. IEEE Trans. Inf. Forensics Secur. 12(2): 309-322 (2017) - [j82]Chen Yang, Leibo Liu, Kai Luo, Shouyi Yin, Shaojun Wei:
CIACP: A Correlation- and Iteration- Aware Cache Partitioning Mechanism to Improve Performance of Multiple Coarse-Grained Reconfigurable Arrays. IEEE Trans. Parallel Distributed Syst. 28(1): 29-43 (2017) - [j81]Chen Wu, Chenchen Deng, Leibo Liu, Jie Han, Jiqiang Chen, Shouyi Yin, Shaojun Wei:
A Multi-Objective Model Oriented Mapping Approach for NoC-based Computing Systems. IEEE Trans. Parallel Distributed Syst. 28(3): 662-676 (2017) - [j80]Shouyi Yin, Xianqing Yao, Tianyi Lu, Dajiang Liu, Jiangyuan Gu, Leibo Liu, Shaojun Wei:
Conflict-Free Loop Mapping for Coarse-Grained Reconfigurable Architecture with Multi-Bank Memory. IEEE Trans. Parallel Distributed Syst. 28(9): 2471-2485 (2017) - [j79]Guiqiang Peng, Leibo Liu, Peng Zhang, Shouyi Yin, Shaojun Wei:
Low-Computing-Load, High-Parallelism Detection Method Based on Chebyshev Iteration for Massive MIMO Systems With VLSI Architecture. IEEE Trans. Signal Process. 65(14): 3775-3788 (2017) - [j78]Fengbin Tu, Shouyi Yin, Peng Ouyang, Shibin Tang, Leibo Liu, Shaojun Wei:
Deep Convolutional Neural Network Architecture With Reconfigurable Computation Patterns. IEEE Trans. Very Large Scale Integr. Syst. 25(8): 2220-2233 (2017) - [c79]Jiangyuan Gu, Shouyi Yin, Leibo Liu, Shaojun Wei:
Energy-aware loops mapping on multi-vdd CGRAs without performance degradation. ASP-DAC 2017: 312-317 - [c78]Jiangyuan Gu, Shouyi Yin, Shaojun Wei:
Stress-Aware Loops Mapping on CGRAs with Considering NBTI Aging Effect. DAC 2017: 40:1-40:6 - [c77]Qiang Wang, Leibo Liu, Wenping Zhu, Huiyu Mo, Chenchen Deng, Shaojun Wei:
A 700fps Optimized Coarse-to-Fine Shape Searching Based Hardware Accelerator for Face Alignment. DAC 2017: 57:1-57:6 - [c76]Peng Ouyang, Shouyi Yin, Shaojun Wei:
A Fast and Power Efficient Architecture to Parallelize LSTM based RNN for Cognitive Intelligence Applications. DAC 2017: 63:1-63:6 - [c75]Yanan Lu, Leibo Liu, Yangdong Deng, Jian Weng, Zhaoshi Li, Chenchen Deng, Shaojun Wei:
Minimizing Pipeline Stalls in Distributed-Controlled Coarse-Grained Reconfigurable Arrays with Triggered Instruction Issue and Execution. DAC 2017: 71:1-71:6 - [c74]Shouyi Yin, Zhicong Xie, Shaojun Wei:
Disturbance Aware Memory Partitioning for Parallel Data Access in STT-RAM. DAC 2017: 84:1-84:6 - [c73]Jianxin Guo, Shouyi Yin, Peng Ouyang, Leibo Liu, Shaojun Wei:
Bit-Width Based Resource Partitioning for CNN Acceleration on FPGA. FCCM 2017: 31 - [c72]Tianyi Lu, Shouyi Yin, Xianqing Yao, Zhicong Xie, Leibo Liu, Shaojun Wei:
Joint Modulo Scheduling and Memory Partitioning with Multi-Bank Memory for High-Level Synthesis (Abstract Only). FPGA 2017: 290 - [c71]Shouyi Yin, Dajiang Liu, Lifeng Sun, Xinhan Lin, Leibo Liu, Shaojun Wei:
Learning Convolutional Neural Networks for Data-Flow Graph Mapping on Spatial Programmable Architectures (Abstract Only). FPGA 2017: 295 - [c70]Peng Ouyang, Shouyi Yin, Chunxiao Xing, Leibo Liu, Shaojun Wei:
A Power Efficient Architecture with Optimized Parallel Memory Accessing for Feature Generation. ACM Great Lakes Symposium on VLSI 2017: 287-292 - [c69]Zhaoshi Li, Leibo Liu, Yangdong Deng, Shouyi Yin, Yao Wang, Shaojun Wei:
Aggressive Pipelining of Irregular Applications on Reconfigurable Hardware. ISCA 2017: 575-586 - [c68]Tianyi Lu, Shouyi Yin, Xianqing Yao, Zhicong Xie, Leibo Liu, Shaojun Wei:
Memory fartitioning-based modulo scheduling for high-level synthesis. ISCAS 2017: 1-4 - [c67]Shouyi Yin, Dajiang Liu, Lifeng Sun, Leibo Liu, Shaojun Wei:
DFGNet: Mapping dataflow graph onto CGRA by a deep learning approach. ISCAS 2017: 1-4 - [c66]Shibin Tang, Shouyi Yin, Shixuan Zheng, Peng Ouyang, Fengbin Tu, Leiyue Yao, JinZhou Wu, Wenming Cheng, Leibo Liu, Shaojun Wei:
AEPE: An area and power efficient RRAM crossbar-based accelerator for deep CNNs. NVMSA 2017: 1-6 - [c65]Shouyi Yin, Jinjin Duan, Peng Ouyang, Leibo Liu, Shaojun Wei:
Multi-CNN and decision tree based driving behavior evaluation. SAC 2017: 1424-1429 - 2016
- [j77]Shuang Liang, Shouyi Yin, Leibo Liu, Yike Guo, Shaojun Wei:
A Coarse-Grained Reconfigurable Architecture for Compute-Intensive MapReduce Acceleration. IEEE Comput. Archit. Lett. 15(2): 69-72 (2016) - [j76]Peng Ouyang, Shouyi Yin, Chenchen Deng, Leibo Liu, Shaojun Wei:
A fast face detection architecture for auto-focus in smart-phones and digital cameras. Sci. China Inf. Sci. 59(12): 122402:1-122402:13 (2016) - [j75]Leibo Liu, Dong Wang, Yingjie Chen, Min Zhu, Shouyi Yin, Shaojun Wei:
An Implementation of Multiple-Standard Video Decoder on a Mixed-Grained Reconfigurable Computing Platform. IEICE Trans. Inf. Syst. 99-D(5): 1285-1295 (2016) - [j74]Shan Cao, Zoran Salcic, Zhaolin Li, Shaojun Wei, Yingtao Ding:
Temperature-aware multi-application mapping on network-on-chip based many-core systems. Microprocess. Microsystems 46: 149-160 (2016) - [j73]Mingyu Wang, Fang Wang, Shaojun Wei, Zhaolin Li:
A pipelined area-efficient and high-speed reconfigurable processor for floating-point FFT/IFFT and DCT/IDCT computations. Microelectron. J. 47: 19-30 (2016) - [j72]Shouyi Yin, Jiangyuan Gu, Dajiang Liu, Leibo Liu, Shaojun Wei:
Joint Modulo Scheduling and Vdd Assignment for Loop Mapping on Dual- Vdd CGRAs. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 35(9): 1475-1488 (2016) - [j71]Shouyi Yin, Peng Ouyang, Leibo Liu, Shaojun Wei:
A Fast and Power-Efficient Memory-Centric Architecture for Affine Computation. IEEE Trans. Circuits Syst. II Express Briefs 63-II(7): 668-672 (2016) - [j70]Wenping Zhu, Leibo Liu, Guangli Jiang, Shouyi Yin, Shaojun Wei:
A 135-frames/s 1080p 87.5-mW Binary-Descriptor-Based Image Feature Extraction Accelerator. IEEE Trans. Circuits Syst. Video Technol. 26(8): 1532-1543 (2016) - [j69]Bo Wang, Leibo Liu, Chenchen Deng, Min Zhu, Shouyi Yin, Shaojun Wei:
Against Double Fault Attacks: Injection Effort Model, Space and Time Randomization Based Countermeasures for Reconfigurable Array Architecture. IEEE Trans. Inf. Forensics Secur. 11(6): 1151-1164 (2016) - [j68]Leibo Liu, Junbin Wang, Jianfeng Zhu, Chenchen Deng, Shouyi Yin, Shaojun Wei:
TLIA: Efficient Reconfigurable Architecture for Control-Intensive Kernels with Triggered-Long-Instructions. IEEE Trans. Parallel Distributed Syst. 27(7): 2143-2154 (2016) - [j67]Shouyi Yin, Xinhan Lin, Leibo Liu, Shaojun Wei:
Exploiting Parallelism of Imperfect Nested Loops on Coarse-Grained Reconfigurable Architectures. IEEE Trans. Parallel Distributed Syst. 27(11): 3199-3213 (2016) - [j66]Shouyi Yin, Dajiang Liu, Yu Peng, Leibo Liu, Shaojun Wei:
Improving Nested Loop Pipelining on Coarse-Grained Reconfigurable Architectures. IEEE Trans. Very Large Scale Integr. Syst. 24(2): 507-520 (2016) - [j65]Shouyi Yin, Peng Ouyang, Tianbao Chen, Leibo Liu, Shaojun Wei:
A Configurable Parallel Hardware Architecture for Efficient Integral Histogram Image Computing. IEEE Trans. Very Large Scale Integr. Syst. 24(4): 1305-1318 (2016) - [j64]Shouyi Yin, Xianqing Yao, Dajiang Liu, Leibo Liu, Shaojun Wei:
Memory-Aware Loop Mapping on Coarse-Grained Reconfigurable Architectures. IEEE Trans. Very Large Scale Integr. Syst. 24(5): 1895-1908 (2016) - [j63]Shouyi Yin, Pengcheng Zhou, Leibo Liu, Shaojun Wei:
Trigger-Centric Loop Mapping on CGRAs. IEEE Trans. Very Large Scale Integr. Syst. 24(5): 1998-2002 (2016) - [j62]Shouyi Yin, Weizhi Xu, Jiakun Li, Leibo Liu, Shaojun Wei:
CWFP: Novel Collective Writeback and Fill Policy for Last-Level DRAM Cache. IEEE Trans. Very Large Scale Integr. Syst. 24(7): 2548-2561 (2016) - [c64]Xinhan Lin, Shouyi Yin, Leibo Liu, Shaojun Wei:
Exploiting parallelism of imperfect nested loops with sibling inner loops on coarse-grained reconfigurable architectures. ASP-DAC 2016: 456-461 - [c63]Chen Yang, Leibo Liu, Shouyi Yin, Shaojun Wei:
Data cache prefetching via context directed pattern matching for coarse-grained reconfigurable arrays. DAC 2016: 64:1-64:6 - [c62]Shouyi Yin, Zhicong Xie, Chenyue Meng, Leibo Liu, Shaojun Wei:
Multibank memory optimization for parallel data access in multiple data arrays. ICCAD 2016: 32 - [c61]Shouyi Yin, Xianqing Yao, Tianyi Lu, Leibo Liu, Shaojun Wei:
Joint loop mapping and data placement for coarse-grained reconfigurable architecture with multi-bank memory. ICCAD 2016: 127 - [c60]Shan Cao, Zoran Salcic, Yingtao Ding, Zhaolin Li, Shaojun Wei, Xianli Zhao:
Temperature-aware task scheduling heuristics on Network-on-Chips. ISCAS 2016: 2603-2606 - [c59]Peng Ouyang, Shouyi Yin, Chunxiao Xing, Leibo Liu, Shaojun Wei:
Energy management on DVS based coarse-grained reconfigurable platform. NANOARCH 2016: 49-54 - 2015
- [j61]Chen Wu, Chenchen Deng, Leibo Liu, Shouyi Yin, Jie Han, Shaojun Wei:
Reliability-aware mapping for various NoC topologies and routing algorithms under performance constraints. Sci. China Inf. Sci. 58(8): 1-14 (2015) - [j60]Guoyue Jiang, Zhaolin Li, Fang Wang, Shaojun Wei:
Mapping of Embedded Applications on Hybrid Networks-on-Chip with Multiple Switching Mechanisms. IEEE Embed. Syst. Lett. 7(2): 59-62 (2015) - [j59]Chaoyun Yao, Chaochao Feng, Minxuan Zhang, Wei Guo, Shouzhong Zhu, Shaojun Wei:
Exploring partitioning methods for multicast in 3D bufferless Network on Chip. IEICE Electron. Express 12(22): 20150802 (2015) - [j58]Yu Peng, Shouyi Yin, Leibo Liu, Shaojun Wei:
Battery-Aware Loop Nests Mapping for CGRAs. IEICE Trans. Inf. Syst. 98-D(2): 230-242 (2015) - [j57]Bing Xu, Shouyi Yin, Leibo Liu, Shaojun Wei:
Low-Power Loop Parallelization onto CGRA Utilizing Variable Dual VDD. IEICE Trans. Inf. Syst. 98-D(2): 243-251 (2015) - [j56]Rui Shi, Shouyi Yin, Leibo Liu, Qiongbing Liu, Shuang Liang, Shaojun Wei:
The Implementation of Texture-Based Video Up-Scaling on Coarse-Grained Reconfigurable Architecture. IEICE Trans. Inf. Syst. 98-D(2): 276-287 (2015) - [j55]Dajiang Liu, Shouyi Yin, Leibo Liu, Shaojun Wei:
Mapping Multi-Level Loop Nests onto CGRAs Using Polyhedral Optimizations. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. 98-A(7): 1419-1430 (2015) - [j54]Chen Yang, Leibo Liu, Yansheng Wang, Shouyi Yin, Peng Cao, Shaojun Wei:
Configuration Approaches to Enhance Computing Efficiency of Coarse-Grained Reconfigurable Array. J. Circuits Syst. Comput. 24(3): 1550043:1-1550043:21 (2015) - [j53]Shouyi Yin, Peng Ouyang, Leibo Liu, Yike Guo, Shaojun Wei:
Fast Traffic Sign Recognition with a Rotation Invariant Binary Pattern Based Feature. Sensors 15(1): 2161-2180 (2015) - [j52]Shouyi Yin, Hao Dong, Guangli Jiang, Leibo Liu, Shaojun Wei:
A Novel 2D-to-3D Video Conversion Method Using Time-Coherent Depth Maps. Sensors 15(7): 15246-15264 (2015) - [j51]Weizhi Xu, Shouyi Yin, Leibo Liu, Zhiyong Liu, Shaojun Wei:
High-Performance Motion Estimation for Image Sensors with Video Compression. Sensors 15(8): 20752-20778 (2015) - [j50]Guangli Jiang, Leibo Liu, Wenping Zhu, Shouyi Yin, Shaojun Wei:
A 181 GOPS AKAZE Accelerator Employing Discrete-Time Cellular Neural Networks for Real-Time Feature Extraction. Sensors 15(9): 22509-22529 (2015) - [j49]Chen Wu, Chenchen Deng, Leibo Liu, Jie Han, Jiqiang Chen, Shouyi Yin, Shaojun Wei:
An Efficient Application Mapping Approach for the Co-Optimization of Reliability, Energy, and Performance in Reconfigurable NoC Architectures. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 34(8): 1264-1277 (2015) - [j48]Peng Ouyang, Shouyi Yin, Yuchi Zhang, Leibo Liu, Shaojun Wei:
A Fast Integral Image Computing Hardware Architecture With High Power and Area Efficiency. IEEE Trans. Circuits Syst. II Express Briefs 62-II(1): 75-79 (2015) - [j47]Zhen Zhang, Shouyi Yin, Leibo Liu, Shaojun Wei:
A real-time time-consistent 2D-to-3D video conversion system using color histogram. IEEE Trans. Consumer Electron. 61(4): 524-530 (2015) - [j46]Leibo Liu, Dong Wang, Min Zhu, Yansheng Wang, Shouyi Yin, Peng Cao, Jun Yang, Shaojun Wei:
An Energy-Efficient Coarse-Grained Reconfigurable Processing Unit for Multiple-Standard Video Decoding. IEEE Trans. Multim. 17(10): 1706-1720 (2015) - [j45]Leibo Liu, Dong Wang, Min Zhu, Yansheng Wang, Shouyi Yin, Peng Cao, Jun Yang, Shaojun Wei:
Correction to "An Energy-Efficient Coarse-Grained Reconfigurable Processing Unit for Multiple-Standard Video Decoding". IEEE Trans. Multim. 17(12): 2354-2355 (2015) - [j44]Yu Ren, Leibo Liu, Shouyi Yin, Jie Han, Shaojun Wei:
Efficient Fault-Tolerant Topology Reconfiguration Using a Maximum Flow Algorithm. ACM Trans. Reconfigurable Technol. Syst. 8(3): 19:1-19:24 (2015) - [j43]Guoyue Jiang, Zhaolin Li, Fang Wang, Shaojun Wei:
A Low-Latency and Low-Power Hybrid Scheme for On-Chip Networks. IEEE Trans. Very Large Scale Integr. Syst. 23(4): 664-677 (2015) - [j42]Jianfeng Zhu, Leibo Liu, Shouyi Yin, Xiao Yang, Shaojun Wei:
A Hybrid Reconfigurable Architecture and Design Methods Aiming at Control-Intensive Kernels. IEEE Trans. Very Large Scale Integr. Syst. 23(9): 1700-1709 (2015) - [j41]Leibo Liu, Chen Wu, Chenchen Deng, Shouyi Yin, Qinghua Wu, Jie Han, Shaojun Wei:
A Flexible Energy- and Reliability-Aware Application Mapping for NoC-Based Reconfigurable Architectures. IEEE Trans. Very Large Scale Integr. Syst. 23(11): 2566-2580 (2015) - [j40]Dajiang Liu, Shouyi Yin, Yu Peng, Leibo Liu, Shaojun Wei:
Optimizing Spatial Mapping of Nested Loop for Coarse-Grained Reconfigurable Architectures. IEEE Trans. Very Large Scale Integr. Syst. 23(11): 2581-2594 (2015) - [j39]Peng Ouyang, Shouyi Yin, Leibo Liu, Shaojun Wei:
Energy Management on Battery-Powered Coarse-Grained Reconfigurable Platforms. IEEE Trans. Very Large Scale Integr. Syst. 23(12): 3085-3098 (2015) - [c58]Leibo Liu, Yu Ren, Chenchen Deng, Shouyi Yin, Shaojun Wei, Jie Han:
A novel approach using a minimum cost maximum flow algorithm for fault-tolerant topology reconfiguration in NoC architectures. ASP-DAC 2015: 48-53 - [c57]Yu Peng, Shouyi Yin, Leibo Liu, Shaojun Wei:
Battery-aware mapping optimization of loop nests for CGRAs. ASP-DAC 2015: 767-772 - [c56]Guoyue Jiang, Zhaolin Li, Fang Wang, Shaojun Wei:
Scheduling stream programs with improving arithmetic unit usage on NoC-based VLIW multi-core architectures. Conf. Computing Frontiers 2015: 18:1-18:8 - [c55]Shouyi Yin, Peng Ouyang, Leibo Liu, Shaojun Wei:
A 83fps 1080P resolution 354 mW silicon implementation for computing the improved robust feature in affine space. CICC 2015: 1-4 - [c54]Junbin Wang, Leibo Liu, Jianfeng Zhu, Shouyi Yin, Shaojun Wei:
Acceleration of control flows on reconfigurable architecture with a composite method. DAC 2015: 45:1-45:6 - [c53]Guangli Jiang, Leibo Liu, Wenping Zhu, Shouyi Yin, Shaojun Wei:
A 127 fps in full hd accelerator based on optimized AKAZE with efficiency and effectiveness for image feature extraction. DAC 2015: 87:1-87:6 - [c52]Chenyue Meng, Shouyi Yin, Peng Ouyang, Leibo Liu, Shaojun Wei:
Efficient memory partitioning for parallel data access in multidimensional arrays. DAC 2015: 160:1-160:6 - [c51]Shouyi Yin, Dajiang Liu, Leibo Liu, Shaojun Wei, Yike Guo:
Joint affine transformation and loop pipelining for mapping nested loop on CGRAs. DATE 2015: 115-120 - [c50]Shouyi Yin, Jiakun Li, Leibo Liu, Shaojun Wei, Yike Guo:
Cooperatively managing dynamic writeback and insertion policies in a last-level DRAM cache. DATE 2015: 187-192 - [c49]Fengbin Tu, Shouyi Yin, Peng Ouyang, Leibo Liu, Shaojun Wei:
RNA: a reconfigurable architecture for hardware neural acceleration. DATE 2015: 695-700 - [c48]Chen Yang, Leibo Liu, Shouyi Yin, Shaojun Wei:
Cost-Effective Memory Architecture to Achieve Flexible Configuration and Efficient Data Transmission for Coarse-Grained Reconfigurable Array (Abstract Only). FPGA 2015: 263 - [c47]Leibo Liu, Yingjie Victor Chen, Dong Wang, Min Zhu, Shouyi Yin, Shaojun Wei:
A Mixed-Grained Reconfigurable Computing Platform for Multiple-Standard Video Decoding (Abstract Only). FPGA 2015: 267 - [c46]Junbin Wang, Leibo Liu, Jianfeng Zhu, Shouyi Yin, Shaojun Wei:
A Novel Composite Method to Accelerate Control Flow on Reconfigurable Architecture (Abstract Only). FPGA 2015: 270 - [c45]Shouyi Yin, Pengcheng Zhou, Leibo Liu, Shaojun Wei:
Acceleration of Nested Conditionals on CGRAs via Trigger Scheme. ICCAD 2015: 597-604 - [c44]Hao Dong, Shouyi Yin, Guangli Jiang, Leibo Liu, Shaojun Wei:
An automatic depth map generation method by image classification. ICCE 2015: 168-169 - [c43]Zhen Zhang, Shouyi Yin, Leibo Liu, Shaojun Wei:
Real-time time-consistent 2D-to-3D video conversion based on color histogram. ICCE 2015: 188-189 - [c42]Tao Tan, Shouyi Yin, Peng Ouyang, Leibo Liu, Shaojun Wei:
Efficient lane detection system based on monocular camera. ICCE 2015: 202-203 - [c41]Fengbin Tu, Shouyi Yin, Peng Ouyang, Leibo Liu, Shaojun Wei:
Neural approximating architecture targeting multiple application domains. ISCAS 2015: 2509-2512 - [c40]Chaoyun Yao, Chaochao Feng, Minxuan Zhang, Wei Guo, Shouzhong Zhu, Shaojun Wei:
Partitioning Methods for Multicast in Bufferless 3D Network on Chip. NCCET 2015: 13-22 - [c39]Xu Dai, Shouyi Yin, Peng Ouyang, Leibo Liu, Shaojun Wei:
A Multi-modal 2D + 3D Face Recognition Method with a Novel Local Feature Descriptor. WACV 2015: 657-662 - 2014
- [j38]Shanshan Cai, Leibo Liu, Shouyi Yin, Renyan Zhou, Weilong Zhang, Shaojun Wei:
Optimization of speeded-up robust feature algorithm for hardware implementation. Sci. China Inf. Sci. 57(4): 1-15 (2014) - [j37]Leibo Liu, Yingjie Victor Chen, Dong Wang, Shouyi Yin, Xing Wang, Long Wang, Hao Lei, Peng Cao, Shaojun Wei:
Implementation of multi-standard video decoder on a heterogeneous coarse-grained reconfigurable processor. Sci. China Inf. Sci. 57(8): 1-14 (2014) - [j36]Leibo Liu, Yingjie Victor Chen, Shouyi Yin, Li Zhou, Hang Yuan, Shaojun Wei:
Implementation of AVS Jizhun decoder with HW/SW partitioning on a coarse-grained reconfigurable multimedia system. Sci. China Inf. Sci. 57(8): 1-14 (2014) - [j35]Leibo Liu, Yansheng Wang, Shouyi Yin, Min Zhu, Xing Wang, Shaojun Wei:
Row-based configuration mechanism for a 2-D processing element array in coarse-grained reconfigurable architecture. Sci. China Inf. Sci. 57(10): 1-18 (2014) - [j34]Shouyi Yin, Shengjia Shao, Leibo Liu, Shaojun Wei:
MapReduce inspired loop mapping for coarse-grained reconfigurable architecture. Sci. China Inf. Sci. 57(12): 1-14 (2014) - [j33]Hongyin Luo, Shaojun Wei, Deming Chen, Donghui Guo:
Hybrid circuit-switched network for on-chip communication in large-scale chip-multiprocessors. J. Parallel Distributed Comput. 74(9): 2818-2830 (2014) - [j32]Ruoyu Xu, Wai Chiu Ng, George Jie Yuan, Shouyi Yin, Shaojun Wei:
A 1/2.5 inch VGA 400 fps CMOS Image Sensor With High Sensitivity for Machine Vision. IEEE J. Solid State Circuits 49(10): 2342-2351 (2014) - [j31]Shouyi Yin, Xu Dai, Peng Ouyang, Leibo Liu, Shaojun Wei:
A Multi-Modal Face Recognition Method Using Complete Local Derivative Patterns and Depth Maps. Sensors 14(10): 19561-19581 (2014) - [j30]Guoyue Jiang, Zhaolin Li, Fang Wang, Shaojun Wei:
A High-Utilization Scheduling Schemeof Stream Programs on ClusteredVLIW Stream Architectures. IEEE Trans. Parallel Distributed Syst. 25(4): 840-850 (2014) - [j29]Yansheng Wang, Leibo Liu, Shouyi Yin, Min Zhu, Peng Cao, Jun Yang, Shaojun Wei:
On-Chip Memory Hierarchy in One Coarse-Grained Reconfigurable Architecture to Compress Memory Space and to Reduce Reconfiguration Time and Data-Reference Time. IEEE Trans. Very Large Scale Integr. Syst. 22(5): 983-994 (2014) - [j28]Yuan Li, Paul Chow, Jiang Jiang, Minxuan Zhang, Shaojun Wei:
Software/Hardware Parallel Long-Period Random Number Generation Framework Based on the WELL Method. IEEE Trans. Very Large Scale Integr. Syst. 22(5): 1054-1059 (2014) - [j27]Shan Cao, Zhaolin Li, Fang Wang, Shaojun Wei:
Compiler-Assisted Leakage- and Temperature- Aware Instruction-Level VLIW Scheduling. IEEE Trans. Very Large Scale Integr. Syst. 22(6): 1416-1428 (2014) - [j26]Leibo Liu, Dong Wang, Shouyi Yin, Yingjie Victor Chen, Min Zhu, Shaojun Wei:
SimRPU: A Simulation Environment for Reconfigurable Architecture Exploration. IEEE Trans. Very Large Scale Integr. Syst. 22(12): 2635-2648 (2014) - [c38]Shouyi Yin, Peng Ouyang, Leibo Liu, Shaojun Wei:
Extending lifetime of battery-powered coarse-grained reconfigurable computing platforms. DATE 2014: 1-6 - [c37]Dajiang Liu, Shouyi Yin, Leibo Liu, Shaojun Wei:
Exploiting Outer Loop Parallelism of Nested Loop on Coarse-Grained Reconfigurable Architectures. FCCM 2014: 32 - [c36]Chenchen Deng, Leibo Liu, Zhaoshi Li, Shouyi Yin, Shaojun Wei:
Teach Reconfigurable Computing using mixed-grained fabrics based hardware infrastructure. FIE 2014: 1-9 - [c35]Chen Yang, Leibo Liu, Yansheng Wang, Shouyi Yin, Peng Cao, Shaojun Wei:
Configuration approaches to improve computing efficiency of coarse-grained reconfigurable multimedia processor. FPL 2014: 1-4 - [c34]