# Performance Analysis of Efficient Virtual Channel Router for NoC <sup>1</sup>Omprakash Ghorse, <sup>2</sup>Uma Shankar Kurmi, <sup>3</sup> Dharmendra Dongardiye <sup>1</sup>M.Tech Scholar, <sup>2</sup>Assistant Professor, <sup>3</sup>Assistant Professor <sup>1</sup>Electronics & Communication Department, <sup>1</sup>IES College of Technology, Bhopal, India Abstract - The requirements of on-chip communication in many systems are best served through the implementation of a new generation chip-wide network. The physical interconnections on-chip becomes a limiting issue for performance and energy consumption. The communication latency of network on chip is one of the important factors which directly impact on the performance of the system- on-chip. In this paper we introduce a new router architecture that can perform the virtual channel allocation and switch allocation in parallel to reduce the latency (critical path). Due to the parallel operation of these two stages the packet can be transfer in a reduced pipeline. The experimental result shows that router can operate at the maximum frequency and also reduction in the chip area. Index Terms - virtual channel, speculation #### I. INTRODUCTION System on chip (SoC) design technique provides a powerful and flexible design solution to integrate a complex structure on a single chip with the improvement in high density VLSI technology. As the technology increase its capacity it becomes more and more complex. The communication with the SoCs Intellectual Property (IP) blocks for example multi-processors, I/O devices, memory blocks, dedicated hardware, etc. becomes more difficult [1]. A multi functioning scalable communication infrastructure that better support to the communication between IP blocks as compare to the traditional bus structure is called network-on-chip (NoC). The on chip network scheme does not uses dedicated wires for communicate among IP blocks instead of wires it is use to transport packets over the network [2]. NoC have three components namely as Network Interface (NI) card is used for packetizing and de-packetizing of message, links is used to create the connection for communication between routers, and the last one and most important is router for send out the packet in the network depending upon the routing algorithms used. Router is the back bone of the NoC, so that the structure and design characteristics of router are directly impact on the performance of NoC. To meet the requirement of high performance interconnections for the large-scale SoCs as well as chip multiprocessor expected to dominate computing in now a days, minimizing communication delay of NoC becomes one of the most critical design challenges for on-chip routers [2]. In general, when packets arrive at an input port, there are different complex operations like Routing, Channel Allocation, Switch Allocation, and Switch Traversal are execute for forwarding them to the next router. Due to these operations increase the communication delay in the interconnection networks and it may turn into the critical path of the system. To minimize the communication delay (latency), various types of router structures have been proposed [3] [4]. In this paper we propose a router architecture that performs virtual channel allocation in parallel with the switch allocation which removes the dependency between virtual channel allocation and switch allocation. As a result the router can operate at maximum frequency. The rest of the paper is organized as follows. In section 2 we examine the other router architecture such as convectional router architecture and virtual channel output queuing router architecture. The proposed efficient virtual channel router design is describe in section 3. The experimental results have shown in section 4 and in section 5 describe simulation result, last section 6 shows the conclusion # II. RELATED WORK In this section we examine the architecture of the convectional router and virtual channel output queuing router. Conventional VC Router Architecture A wormhole router used topology is 2-D mesh topology as the baseline router; the router architecture is also extending for the others network topologies [5]. The router have five input ports named as East (E), West (W), North (N), South (S), and Local (L) for the communicating with neighboring routers and its dedicated IP core. Fig. 1 shows the architecture of a conventional VC router [6] that is referred as the basic in our work. Router have input buffers which is used for storing the incoming packets if the rout is busy, Routing Computation logic for selecting the route through which the packet can forwarding to their destination, and other operations such as VC allocator, Switch Allocator and Crossbar Switch also performed. A packet is generally divided into numbers of flits (flow control digits) where the head flit contains all the necessary routing information and the other following flits carry only payload data. Figure 1 Architecture of Conventional Virtual Channel Router A head flit will advance to the output channel through the four pipeline stages that consist of the Routing Computation (RC), VC Allocation (VA) for obtain the output channels, Switch Allocation (SA) for allocating the time slot in to the crossbar switch and output channels, and Switch Traversal (ST) for transferring flits through the crossbar. When once the head flit completes the computation of a route and allocation of a VC, then there is nothing to do for remaining flits in the RC and VA stages. However, they cannot avoid these stages and advance directly to the SA stage because they must remain in order and behind the head flit. If each of the pipeline stages is performed take one clock cycle, then at least four clock cycles are required to transfer a head flit through a router. Obviously, this delay at each router causes progressive latency increases in the interconnection networks. Virtual Output Queuing Architecture Figure 2 Virtual output queuing scheme This is the technique to reduce the number of pipeline stages. In this work, propose another low latency router architecture, which utilizes virtual output queuing (VOQ) scheme [9] to reduce the processing time of a packet transfer. In this type of structural design, each input port maintains a dedicated virtual channel (VC) for each output channel (single VOQ). Since each input VC is reserved for an output channel, the pipeline of a packet transfer can be shortened it to two stages of switch allocation and switch traversal. By speculatively performing these two stages in a parallel fashion. #### III. PROPOSED EFFICIENT VIRTUAL CHANNEL ROUTER DESIGN Speculation The basic concept of speculation is that, the process in which parallel operation of virtual channel allocation and switch allocation is performed [1]. It consider that a flit will succeed in its virtual-channel allocation, and proceeds to request for crossbar switch passage both process runs in parallel. If the flit is actually granted an output virtual channel, then it can immediately traverse through the crossbar switch and leave for the next hop if it also won the switch arbitration too. Speculation can be well understand by considering an example below in **Fig.3** we have a five buffered input ports, two input buffered port are request to transfer the data on the same output port. Here we consider two data packets are "A" and "B". At the start routing algorithm destination of both data packet is checked and it was initiate that both packets are forwarded through the output port "E" of the crossbar switch. Figure 3 Example for Speculation In the first attempt one out of both packet which has higher priority has been allocated virtual channel and at the same time (parallel) switch allocation is also done for that same packet due to this parallel operation one packet is routed to the "E" output port of the crossbar in only one clock. Let us consider that packet is "B". since both packets "A" and "B" desire to forwarded through the same output port "E" therefore along with packet "B", packet "A" succeeds in allocation of virtual channel but it fails in allocation of Switch. So that this packet will re-try for the Allocation of switch in subsequently clock cycle after the transformation of packet "B". Speculation minimize the delay since it perform both the Channel Allocation and Switch Allocation in single clock and if packet fails in switch allocation, then that packet is transfer in next clock cycle till that time it is stored in buffers which are available at the input port and that packet does not lost, this idea is known as speculation. # Efficient Virtual Channel Router In virtual-channel router, a head flit has to ensure that it has first reserved an output virtual channel for the packet before it can request for its own passage through the crossbar switch and leave for the next hop. There hence exists a dependency between virtual-channel allocation and switch allocation. This serializes the arbitration of a virtual channel and the switch allocation, significantly increasing the latency of a virtual-channel router. Figure 4 Flow of Flit through Efficient Virtual Channel Router In an efficient virtual channel router, however the virtual channel allocation and switch allocation states proceeds in parallel then the switch traversal as shown in the above flow diagram **Fig. 4**. This parallel operation can shorten the pipeline stages as compare to virtual channel router. A shorter router pipeline results in lower network latency and greater throughput. Basically the proposed architecture has two major components which are input section and the crossbar switch. The virtual channel allocation is performed in to the input section and the switch allocation is performed in the crossbar section. These two operations are performed in parallel. ### Arbiter Design Figure 5 Block Diagram of Arbiter Arbiters are an important class of combinational components in on chip networks where arbitration occurs frequently. Arbiter controls the arbitration of all the input ports and resolves contention problem among them. It maintains the updated status of all the ports and knows which ports are free and which ports are busy in communicating with each other. An arbiter is required to determine how the resources are shared among the many requestors. There are different types of arbiter round robbing, matrix are used according to the requirements. #### Fixed Priority Arbiter In our arbiter scheme we used a fixed priority arbiter. Every one input port has its own fixed priority level and an arbiter grants an active request signal with the highest priority depending on this priority level. For instance 1 has the highest priority among N requests, and request 1 is active it will be granted regardless any other request signals. If request 1 is not active, then the request signal with the next higher priority will be granted. In other words, the present request (lower priority) only will be served if the previous request (higher priority) has not appear or been served already. We have design fixed priority arbiter using the finite state machine. Here we assign the highest priority for the input port 1 after that port 2, port 3, port 4 and the lowest priority for the input port 5. If the request of port 1 is high then no other request is grants. # IV. EXPERIMENTAL RESULT The router core is earlier implemented architectures – the first one is the modular router architecture [8], the second one is the architecture of multiple VOQ [9]. All routers have the same parameters as five bi-directional ports, 16-bit data width, and 4-flit buffer size. Our target FPGA device is the Xilinx Virtex-5 XC5VFX70T with 11,200 slices and 148 blocks of Block RAM for comparison with VOQ and Xilinx Virtex-2 pro 40 and Virtex-2 XCV6000 for modular router architecture. We have used Verilog-HDL for the circuit design, and for the functional structural simulation. The Xilinx integrated tool environment ISE 14.1i and 9.1i is used for the automated logic synthesis, mapping, placing and routing of circuits. Tools included in this environment generate reports describing the area and speed of implementation. Our optimization goal is higher speed of the design. Below tables shows the number of slices, LUTs and the operating frequency of proposed router compare with the previous work. Table 1 Implementation Result on Virtex2 pro 40 | FPGA – Virtex2 pro 40 | | | | |-----------------------|---------|---------------|--| | | [8] | Proposed work | | | Slices | 838 | 815 | | | LUTs | 20 BRAM | 759 | | | Frequency | 67 MHz | 219.2 MHz | | Table 2 Implementation Result on Virtex2 XCV6000 | FPGA – Virtex2 XC2V6000 | | | | |-------------------------|---------|---------------|--| | | [8] | Proposed work | | | Slices | 838 | 810 | | | LUTs | 20 BRAM | 749 | | | Frequency | 59 MHz | 175.4 MHz | | Table 3 Implementation Result on Virtex5 XC5VFX70T | Tuble 3 Implementation Result on Vittems 1105 VI 117 01 | | | | | |---------------------------------------------------------|-----|---------------|--|--| | FPGA – Virtex5 XC5VFX70T | | | | | | | [9] | Proposed work | | | | Slices | 711 | 608 | |-----------|---------|---------| | LUTs | 2,459 | 1,338 | | Frequency | 102 MHz | 299 MHz | The implementation results of our designs are evaluated in most of the essential characteristics of NoC as shown in Table 1. As a result, the router which we propose by using speculation scheme can operate at maximum frequency of 219.2 MHz, which is increased by 227.1 % in comparison with the modular router, and area is decreased by 2.8%. In Table 2 as compared with the proposed work the router frequency is increased by 197.2% as compare to the modular router and area is decreased by 3.4%. In Table 3 the proposed router operates at the frequency of 299 MHz which is increased by 193.1% as compare to the VOQ router and area is decreased by 16.9%. In terms of operating frequency, the design of proposed router achieves higher speed than the all other routers designs, because of fewer pipeline stages performed in parallel, as well as simpler hardware architecture. ### V. SIMULATION ### Simulation Result for Fixed Priority Arbiter We have presented the fixed priority arbiter design with the help of "Xilinx ISE- 9.1" design suit for device xc3s200-5ft256 and the simulation result for the same is shown below. As we see that the highest priority is given to the req1 and the lowest is to req5. If the request of input port 1 is high then no other input port requests is serviced. If the request of input port 1 is not high then the service is gives to the input port 2, and so on. The lowest priority is given to the input port 5. Figure 6 Simulation Result for Fixed Priority Arbiter # Simulation Result for Proposed Router We have designed all the component of our proposed router and also see their simulation results. For the performance of router we have combine all the component of proposed router together and observe simulation result. We have presented the proposed router design with the help of "Xilinx ISE- 9.1" design suit for device xc3s200-5ft256 and the simulation result for the same is shown in fig. 8. Figure 7 Simulation Result for Proposed Router # VI. CONCLUSION The performance of a conventional VC router can be improved by employing speculation scheme based on taking advantage of the number of pipeline stages of a packet transfer can be easily reduced. The results illustrate that the design brings not only significantly increase of overall system performance but also gives the simplicity in hardware design. The proposed design with speculation structural reduces in area by 2.8% and 3.4% as compare to modular router architecture and 16.9% decreases as compare to the VOQ router architecture respectively. The frequency of the proposed design is increase as comparison to the modular router and VOQ router design architecture. #### REFERENCES - [1] W. Song proposed "Spatial Parallelism in The Routers of Asynchronous On-Chip Networks", *school of computer science* 2011. - [2] Li-S. Peh proposed "Flow Control and Micro-Architectural Mechanisms for Extending the Performance of Interconnection Networks", Aug. 2001 - [3] Mostafa S. Sayed, A. Shalaby, M. El-Sayed Ragab, Victor Goulart, "Congestion Mitigation Using Flexible Router Architecture for Network-on-Chip", *Japan-Egypt conference IEEE*, pp. 182-187 Mar.2012. - [4] L. Rooban, S. Dhananjeyan, "Design of Router Architecture Based on Wormhole Switching Mode for NoC", *International Journal of Scientific &Engineering Research*, vol. 3, issue 3, Mar.2012. - [5] R. Mullins, A. West, and S. Moore, "The Design and Implementation of a Low-Latency On-Chip Network", in *Proc. Asia & South Pacific Design Automation Conf.* pp. 164–169, 2006. - [6] U. Saravanakumar, R. Rangarajan and K. Rajasekar, "Hardware Implementation of Pipeline Based Router Design for On-Chip Network", *intact journal on communication technology*, vol. 3, issue 4 pp. 646-650 Dec.2012,. - [7] Ye Lu, John McCanny, Sakir Sezer "Generic Low Latency Noc Router Architecture for FPGA Computing Systems" 21st International Conference on Field Programmable Logic and Application IEEE, pp. 82 89, Sept.2011 - [8] B. Attia, W. Chouchene, A. Zitouni, N. Abid and R. Tourki, "A Modular Router Architecture Desgin For Network on Chip", *IEEE 8<sup>th</sup> international multi conference on SSD* pp. 1-6 Mar.2011. - [9] Son Truong Nguyen and Shigeru Oyanagi presented, "A Low Cost Single-Cycle Router Based on Virtual Output Queuing for On-Chip Networks", 13th Euromicro Conference on Digital System Design: Architectures, Methods and Tools IEEE, pp. 60-67 Sept.2010 - [10] N. Kavaldjiev, G.J.M. Smit, P.G. Jansen, "A Virtual Channel Router for On chip Networks", *IEEE Proceedings International SOC Conference*, pp. 289–293, Sept.2004. - [11] Son Truong Nguyen, Shigeru Oyanagi, "The Design of On-the-fly Virtual Channel Allocation for Low Cost High Performance On-Chip Routers" *IEEE international conference networking and computing*, pp. 88-94, Nov.2010. - [12] Everton A. Carara, Fernando G. Moraes, "Flow Oriented Routing for Nocs", *IEEE SoC conference*, pp. 367-370, Sept.2010. - [13] Daniel U. Becker, William J. Dally "Allocator Implementations for Network-on-Chip Routers" *ACM Conference Networking*, <a href="http://doi.acm.org/10.1145/1654059.1654112">http://doi.acm.org/10.1145/1654059.1654112</a> ACM/IEEE 2009. - [14] Ankur Agarwal, Florida Atlantic University, Boca Raton, "Survey Of Network On Chip (Noc) Architectures & Contributions", journal of engineering and computer architecture vol. 3, issue 1, 2009. - [15] Ebrahim Behrouzian-Nezhad and Ahmad Khademzadeh, "BIOS: A New Efficient Routing Algorithm for Network on Chip", *Contemporary Engineering Sciences*, vol. 2, no. 1, pp. 37 46, 2009 - [16] Arnab Banerjee, Robert Mullins and Simon Moore, "A Power and Energy Exploration of Network-on-Chip Architectures", *IEEE international symposium*, pp. 163-172, may 2007. - [17] Nicopoulos, C.A., Dongkook Park, Jongman Kim, N. Vijaykrishnan, "ViChaR: A Dynamic Virtual Channel Regulator for Network-on-Chip Routers," *IEEE Micro architecture*, 39th annual symposium, pp. 333-346, Dec.2006. - [18] Ville Rantala Teijo Lehtonen Juha Plosila, "Network on Chip Routing Algorithms" TUCS Technical Report No779, Aug.2006.