Abstract
Mainstream processors implement the instruction scheduler using a monolithic CAM-based issue queue (IQ), which consumes increasingly high energy as its size scales. In particular, its instruction wakeup logic accounts for a major portion of the consumed energy. Our study shows that instructions with 2 non-ready operands (called 2OP instructions) are in small percentage, but tend to spend long latencies in the IQ. They can be effectively shelved in a small RAM-based waiting instruction buffer (WIB) and steered into the IQ at appropriate time. With this two-level shelving ability, half of the CAM tag comparators are eliminated in the IQ, which significantly reduces the energy of wakeup operation. In addition, we propose an adaptive banking scheme to downsize the IQ and reduce the bit-width of tag comparators. Experiments indicate that for an 8-wide issue superscalar or SMT processor, the energy consumption of the instruction scheduler can be reduced by 67%. Furthermore, the new design has potentially faster scheduler clock speed while maintaining close IPC to the monolithic scheduler design. Compared with the previous work on eliminating tags through prediction, our design is superior in terms of both energy reduction and SMT support.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Gowan M K, Biro L L, Jackson D B. Power considerations in the design of the Alpha 21264 microprocessor. In Proc. 35th ACM/IEEE Design Automation Conf., San Francisco, California, June, 1998, pp. 726–731.
Wilcox K, Manne S. Alpha processors: A history of power issues and a look to the future. In Cool-Chips Tutorial During 32nd Int. Symp. Microarchitecture, Haifa, Israel, 1999.
Folegnani D, González A. Energy-effective issue logic. In Proc. 28th Int. Symp. Computer Architecture, Goteberg, Sweden, June 30–July 4, 2001, pp. 230–239.
Ernst D, Austin T. Efficient dynamic scheduling through tag elimination. In Proc. 29th Int. Symp. Computer Architecture, Anchorage, Alaska, May 25–29, 2002, pp. 37–46.
Albonesi D. Dynamic IPC/clock rate optimization. In Proc. 25th Int. Symp. Computer Architecture, Barcelona, Spain, June 27–July 1, 1998, pp. 282–292.
Buyuktosunoglu A et al. A circuit level implementation of an adaptive issue queue for power-aware microprocessors. In Proc. 11th Great Lakes Symp. VLSI Design, West Lafayette, Indiana, 2001, pp. 73–78.
Dropsho S et al. Integrating adaptive on-chip storage structures for reduced dynamic power. In Proc. 11th Int. Conf. Parallel Architectures and Compilation Techniques, Charlottesville, Virginia, Sept. 22–25, 2002, pp. 141–152.
J Sharky, Dmitry V Ponomarev. Efficient instruction schedulers for SMT processors. In Proc. 12th Int. Symp. High-Performance Computer Architecture, Austin, Texas, Feb. 11–15, 2006, pp. 288–298.
Palacharla S, Jouppi N P, Smith J E. Complexity-effective superscalar processors. In Proc. 24th Int. Symp. Computer Architecture, Denver, Colorado, June 2–4, 1997, pp. 206–218.
Lebeck A R et al. A large, fast instruction window for tolerating cache misses. In Proc. 29th Int. Symp. Computer Architecture, Anchorage, Alaska, May 25–29, 2002, pp. 59–70.
Brooks D, Tiwari V, Martonosi M. Wattch: A framework for architectural-level power analysis and optimization. In Proc. 27th Int. Symp. Computer Architecture, Vancouver, British Columbia, Canada, June 5–9, 2000, pp. 83–94.
Sharkey J. M-Sim: A flexible, multi-threaded simulation environment. Technical Report CS-TR-05-DP1, Dept. of CS, SUNY Binghamton, 2005.
Burger D, Austin T. The SimpleScalar tool set: Version 2.0. Technical Report. Dept. of CS, Univ. of Wisconsin-Madison, June 1997.
Sherwood T, Perelman E, Hamerly G, Calder B. Automatically characterizing large scale program behavior. In Proc. 10th Int. Conf. Architectural Support for Programming Languages and Operating Systems, San Jose, USA, 2002, pp. 45–57.
Ponomarev D et al. Energy-efficient issue queue design. IEEE Trans. Very Large Scale Integration Systems, 2003, 11(5): 789–800.
Victor V Zuyban, Peter M Kogge. Inherently lower-power high-performance superscalar architectures. IEEE Trans. Computers, 2001, 50(3): 268–285.
Brown M, Stark J, Patt Y. Select-free instruction scheduling logic. In Proc. 34th Int. Symp. Microarchitecture, Istanbul, Turkey, December 1–5, 2001, pp. 204–213.
Author information
Authors and Affiliations
Corresponding author
Additional information
Supported by the National High Technology Development 863 Program of China under Grant No. 2004AA1Z1010.
Rights and permissions
About this article
Cite this article
Zhao, YL., Li, XF., Tong, D. et al. An Energy-Efficient Instruction Scheduler Design with Two-Level Shelving and Adaptive Banking. J Comput Sci Technol 22, 15–24 (2007). https://doi.org/10.1007/s11390-007-9001-2
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11390-007-9001-2