A High Efficient On-Chip Interconnection Network in SIMD CMPs

Wu, Dan; Dai, Kui; Zou, Xuecheng; Rao, Jinli; Chen, Pan

doi:10.1007/978-3-642-13119-6_13

Dan Wu²⁰,
Kui Dai²⁰,
Xuecheng Zou²⁰,
Jinli Rao²⁰ &
…
Pan Chen²⁰

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 6081))

Included in the following conference series:

International Conference on Algorithms and Architectures for Parallel Processing

1885 Accesses
3 Citations

Abstract

In order to improve the performance of on-chip data communications in SIMD (Single Instruction Multiple Data) architecture, we propose an efficient and modular interconnection architecture called Broadcast and Permutation Mesh network (BP-Mesh). BP-Mesh architecture possesses not only low complexity and high bandwidth, but also well flexibility and scalability. Detailed hardware implementation is discussed in the paper. And the proposed architecture is evaluated in terms of area cost and performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

SCCN: A Time-Effective Hierarchical Interconnection Network for Network-On-Chip

Article 23 April 2019

P-NoC: Performance Evaluation and Design Space Exploration of NoCs for Chip Multiprocessor Architecture Using FPGA

Article 19 June 2020

PDNOC: An Efficient Partially Diagonal Network-on-Chip Design

References

Bohr, M.T.: Interconnect scaling - the real limiter to high performance ULSI. In: IEEE International Electron Devices Meeting, pp. 241–244 (1995)
Google Scholar
Matzke, D.: Will physical scalability sabotage performance gains? IEEE Computer 30, 37–39 (1997)
Google Scholar
Wolfe, A.: Intel clears up post-tejas confusion. VARBusiness (May 17, 2004), http://www.varbusiness.com/sections/news/breakingnews.jhtml?articleld=18842588
Agarwal, V., Hrishikesh, M.S., Keckler, S.W., Burger, D.: Clock rate versus IPC: The end of the road for conventional microarchitectures. In: Proc. Of IEEE 27th International Symposium on Computer Architecture (ISCA-27), pp. 248–259 (2000)
Google Scholar
Chandrakasan, A.P., Sheng, S., Brodersen, R.W.: Low-power CMOS digital design. IEEE Journal of Solid-State Circuits 27, 473–484 (1992)
Article Google Scholar
Barnes, G.H., Brown, R.M., Kato, M., Kuck, D.J., et al.: The Illiac IV computer. IEEE Transactions on Computers C-17, 746–757 (1968)
Article Google Scholar
Batcher, K.E.: Design of a massively parallel processor. IEEE Transactions on Computers C-29, 836–840 (1980)
Article Google Scholar
Parkinson, D., Hunt, D.J., MacQueen, K.S.: THE AMT DAP 500. In: Proc. Of the 33rd IEEE International Conference of Computer Society, pp. 196–199 (March 1988)
Google Scholar
Nickolls, J.R.: The design of the MasPar MP-1: A cost effective massively parallel computer. In: Proc. Of the 35th IEEE International Conference of Computer Society, pp. 25–28 (March 1990)
Google Scholar
Singh, H., Lee, M.-H., Lu, G., Kurdahi, F.J., et al.: MorphoSys: An integrated reconfigurable system for data-parallel and computation-intensive applications. IEEE Transactions on Computers 49, 465–481 (2000)
Article Google Scholar
Fujita, Y., Kyo, S., Yamashita, N., Okazaki, S.: A 10 GIPS SIMD processor for PC-based real-time vision applications. In: Proc. Of the 4th IEEE International Workshop on Computer Architecture for Machine Perception (CAMP 1997), pp. 22–32 (October 1997)
Google Scholar
ClearSpeed Whitepaper: CSX Processor Architecture, http://www.clearspeed.com/newsevents/presskit
Khailany, B., Dally, W.J., Kapasi, U.J., Mattson, P., et al.: Imagine: Media processing with streams. IEEE Micro 21, 35–46 (2001)
Article Google Scholar
Fatemi, H., Corporaal, H., Basten, T., Kleihorst, R., Jonker, P.: Designing area and performance constrained SIMD/VLIW image processing architectures. In: Blanc-Talon, J., Philips, W., Popescu, D.C., Scheunders, P. (eds.) ACIVS 2005. LNCS, vol. 3708, pp. 689–696. Springer, Heidelberg (2005)
Chapter Google Scholar
Makino, J., Hiraki, K., Inaba, M.: GRAPE-DR: 2-Pflops massively-parallel computer with 512-core, 512-Gflops processor chips for scientific computing. In: Proc. Of the 2007 ACM/IEEE Conference on Supercomputing (SC 2007), pp. 1–11 (2007)
Google Scholar
Balfour, J., Dally, W.J.: Design tradeoffs for tiled CMP on-chip networks. In: Proc. Of the 20th Annual International Conference on Supercomputing (ICS 2006), pp. 187–198 (June 2006)
Google Scholar
Das, R., Eachempati, S., Mishra, A.K., Narayanan, V., Das, C.R.: Design and evaluation of a hierarchical on-chip interconnect for next-generation CMPs. In: Proc. of IEEE 15th International Symposium on High Performance Computer Architecture (HPCA 2009), pp. 175–186 (Febuary 2009)
Google Scholar
Banerjee, A., Wolkotte, P.T., Mullins, R.D., Moore, S.W., Smit, G.J.M.: An energy and performance exploration of network-on-chip architectures. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 17, 319–329 (2009)
Article Google Scholar
Cannon, L.E.: A cellular computer to implement the kalman filter algorithm. Ph.D. thesis, Montana State University (1969)
Google Scholar
Fatahalian, K., Sugerman, J., Hanrahan, P.: Understanding the efficiency of GPU algorithms for matrix-matrix multiplication. In: Proc. Of the ACM SIGGRAPH/EUROGRAPHICS Conference on Graphics Hardware, pp. 133–137 (August 2004)
Google Scholar
Bahn, J.H., Yang, J., Bagherzadeh, N.: Parallel FFT algorithms on network-on-chips. In: Proc. Of the 5th International Conference on Information Technology: New Generations, pp. 1087–1093 (April 2008)
Google Scholar
Kumar, R., Zyuban, V., Tullsen, D.M.: Interconnections in multi-core architectures: understanding mechanism, overheads and scaling. In: Proc. Of the 32nd International Symposium on Computer Architecture (ISCA 2005), pp. 408–419 (June 2005)
Google Scholar
Cheng, L., Muralimanohar, N., Ramani, K., Balasubramonian, R., Carter, J.B.: Interconnect-Aware Coherence Protocols for Chip Multiprocessors. In: Proc. of the 33rd International Symposium on Computer Architecture (ISCA 2006), pp. 339–351 (2006)
Google Scholar
Flores, A., Aragon, J.L., Acacio, M.E.: An energy consumption characterization of on-chip interconnection networks for tiled CMP architectures. Journal of Supercomputing 45, 341–364 (2008)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Electronic Science and Technology, Huazhong University of Science and Technology, 1037 Luoyu Road, Wuhan, China
Dan Wu, Kui Dai, Xuecheng Zou, Jinli Rao & Pan Chen

Authors

Dan Wu
View author publications
You can also search for this author in PubMed Google Scholar
Kui Dai
View author publications
You can also search for this author in PubMed Google Scholar
Xuecheng Zou
View author publications
You can also search for this author in PubMed Google Scholar
Jinli Rao
View author publications
You can also search for this author in PubMed Google Scholar
Pan Chen
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Information Engineering, Chung Hua University, 300, Hsinchu, Taiwan, China
Ching-Hsien Hsu
Department of Computer Science, St. Francis Xavier University, B2G 2W5, Antigonish, NS, Canada
Laurence T. Yang
Department of Computer Science ad Engineering, Seoul National University of Technology, 172 Gongreund 2-dong, Nowon-gou, 139-742, Seoul, Korea
Jong Hyuk Park
Division of Computer Engineering, Mokwon University, 302-729, Daejeon, Korea
Sang-Soo Yeo

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wu, D., Dai, K., Zou, X., Rao, J., Chen, P. (2010). A High Efficient On-Chip Interconnection Network in SIMD CMPs. In: Hsu, CH., Yang, L.T., Park, J.H., Yeo, SS. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2010. Lecture Notes in Computer Science, vol 6081. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13119-6_13

Download citation

DOI: https://doi.org/10.1007/978-3-642-13119-6_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-13118-9
Online ISBN: 978-3-642-13119-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A High Efficient On-Chip Interconnection Network in SIMD CMPs

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

SCCN: A Time-Effective Hierarchical Interconnection Network for Network-On-Chip

P-NoC: Performance Evaluation and Design Space Exploration of NoCs for Chip Multiprocessor Architecture Using FPGA

PDNOC: An Efficient Partially Diagonal Network-on-Chip Design

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

A High Efficient On-Chip Interconnection Network in SIMD CMPs

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

SCCN: A Time-Effective Hierarchical Interconnection Network for Network-On-Chip

P-NoC: Performance Evaluation and Design Space Exploration of NoCs for Chip Multiprocessor Architecture Using FPGA

PDNOC: An Efficient Partially Diagonal Network-on-Chip Design

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation