Transcription

ECE 8823 A / CS 8803 - ICNInterconnection NetworksSpring cn s17/Lecture 13:System InterfaceTushar KrishnaAssistant ProfessorSchool of Electrical and Computer EngineeringGeorgia Institute of [email protected]

2Network Architecture¡ Topology¡ How to connect the nodes¡ Road NetworkHow does the NoCinterface with therest of the system?¡ Routing¡ Which path should a message take¡ Series of road segments from source to destination¡ Flow Control¡ When does the message have to stop/proceed¡ Traffic signals at end of each road segment¡ Router Microarchitecture¡ How to build the routers¡ Design of traffic intersection (number of lanes, algorithm forturning red/green)ICN Spring 2017 L13: System Interface Tushar Krishna, School of ECE, Georgia TechFebruary 27, 2017

3Network InterfaceSo far we have focused inside the network: routers, their connections, androuting flow-control protocols for communication between themLet’s go up the stackCoreL1D L1I L2 L3 /DirectoryICN Spring 2017 L13: System Interface Tushar Krishna, School of ECE, Georgia TechNetworkInterfaceRouterFebruary 27, 2017

4NIC MicroarchitectureInterface to Cache ControllerBackpressureRoutingUnit*NVC SelectVC erEgressNetwork Interface*Source Routing or Lookahead RoutingICN Spring 2017 L13: System InterfaceWERouterS Tushar Krishna, School of ECE, Georgia TechFebruary 27, 2017

5Interface to Core/Cache Controllers¡Industry Standard Interfaces (in MPSoCs)¡ AMBA AXI (ARM)¡ AMBA 4 ACE (AXI Coherence Extensions)¡ AMBA 5 CHI (Coherent Hub Interface)¡ OCP (Sonics)¡ STBus (ST Microelectronics )¡ Wishbone (OpenCores)¡Custom Interfaces (in CMPs)¡ Intel¡ AMD¡ IBMICN Spring 2017 L13: System Interface Tushar Krishna, School of ECE, Georgia TechFebruary 27, 2017

6Communication Protocols¡Message Passing¡ Explicit movement of data between nodes andaddress spaces¡ Programmers manage communication¡Shared Memory¡ Communication occurs implicitly throughloads/stores and accessing instructions¡ Cache misses are serviced by the NoC¡ We will focus on NoCs for shared memory systemsICN Spring 2017 L13: System Interface Tushar Krishna, School of ECE, Georgia TechFebruary 27, 2017

Cache Controller à NIC Interface:Miss Status Handling Register (MSHR)7CoreCache RequestTypeAddrDataOn a cache miss,allocate entry intoMSHR, and send arequest into the NoC.ReplyTypeProtocol Finite StateMachineAddrDataMSHRsMessage Format and SendRdReqDestWritebackTo networkDataCacheStatusDestAddrRdReplyAddrMessage ReceiveNetwork InterfaceAddrDataRequestResponse is drainedby MSHRAddrDataAddrFebruary 27, 2017From network

8Shared Memory SystemsCoreL2 CacheTagsL1 I/DCacheRouterDataControllerLogicSlide Courtesy: N. Jerger, Univ of TorontoICN Spring 2017 L13: System Interface Tushar Krishna, School of ECE, Georgia TechFebruary 27, 2017

9Shared Memory Network for CMPs¡Logically ¡ all processors access same shared memory¡Practically ¡ cache hierarchies reduce access latency to improveperformance¡Requires cache coherence protocol¡ to maintain coherent view in presence of multipleshared copiesICN Spring 2017 L13: System Interface Tushar Krishna, School of ECE, Georgia TechFebruary 27, 2017

10Hardware Cache CoherenceP2P3P4 1BusMem¡ Snoopy Protocol¡ Broadcast Rd/Wr requestover a shared bus¡ Every cache snoopsrequestMemory ControllerP12Read Cache miss3SendDataRequest broadcast¡ If some other cache iswriting, invalidate selfcopyICN Spring 2017 L13: System Interface Tushar Krishna, School of ECE, Georgia TechFebruary 27, 2017

11Hardware Cache CoherenceP2 DirectoryP3 MemDirectoryreceivesrequestP4 InterconnectionNetwork¡ Directory Protocol¡ Send a Rd/Wr request toa directory¡ Directory tracks dirtycopy and sharers andmanages data responseand invalidatesICN Spring 2017 L13: System Interface1Read Cache miss2DirectoryP1 Tushar Krishna, School of ECE, Georgia Tech3SendDataFebruary 27, 2017

Cache Organization:Private L212PPrivateL1PrivateL2 sliceRouterDirectoryMemory ControllerICN Spring 2017 L13: System Interface Tushar Krishna, School of ECE, Georgia TechFebruary 27, 2017

Cache Organization:Shared distributed L213Non Uniform Cache Access (NUCA)PSharedL2 slicePrivateL1DirectorysliceRouterMemory ControllerICN Spring 2017 L13: System Interface Tushar Krishna, School of ECE, Georgia TechFebruary 27, 2017

Cache Organization and CoherenceProtocol Impacts Network Performance14¡ Cache Organization shapes injection into the network¡ Private L2 caches L1 Miss à L2 Miss à Traffic in the NoC [low miss penalty]- Data replication between L2s è Overall lower cache capacity¡ Shared L2 caches Data can only exist in one L2 bank è Higher cache capacity- L1 Miss à Traffic in the NoC to go to L2 bank [increased miss penalty]¡ Coherence protocol shapes NoC bandwidth requirement¡ Snoopy Protocol à More Messages¡ Directory Protocol à Fewer Messages¡ Messages Types¡ Data requests¡ Data responses¡ Coherence permissionsICN Spring 2017 L13: System Interface Tushar Krishna, School of ECE, Georgia TechFebruary 27, 2017

Cache Coherence(Private L1 Private/Shared L2)Private Cache(L1 or L2)Invalidate ACK if this was a WriteRequest. Not required if Read Req2Fwd3Resp4UnblockOwner2Broadcast (if snoopy protocol) orunicast/multicast (if directory)Directory/Ordering PointCould be the Memory Controller itselfMemory ControllerOn-Chip HitICN Spring 2017 L13: System InterfaceReq/Ctrl(1-flit)Resp/Data(1 or 5-flit)HomeNode23Memory ControllerOn-Chip Miss Tushar Krishna, School of ECE, Georgia TechFebruary 27, 2017

Implications of Shared Memory Trafficon NoC Design¡Virtual Networks¡ 3-4 Protocol Message Classes¡ request, forward, response, unblock¡ 3-4 Virtual Networks in NoC¡ response and unblock guaranteed to drain (can sharevnet)¡ There might be additional Message classes ( Virtual Networks) for “non-cacheable requests”¡ DMA, synchronization, setup, ICN Spring 2017 L13: System Interface Tushar Krishna, School of ECE, Georgia TechFebruary 27, 201716

Implications of Shared Memory Trafficon NoC Design¡Flit Size and VC depth¡ Control Packets: request, forward, response ACK,and unblock¡ size links such that control packets fit in 1-flit¡ Data Packets: response DATA¡ Suppose 64B cache line, 16B flits : data packets are 5flit¡ 1-flit for control information (header etc)¡ 4-flits for cache line (64B cache line, 16B flits)ICN Spring 2017 L13: System Interface Tushar Krishna, School of ECE, Georgia TechFebruary 27, 201717

Example: MOESI hammer (AMDOpteron) protocol in gem518¡Message Classes¡ src/mem/protocol/MOESI hammer-cache.sm¡ src/mem/protocol/MOESI hammer-dir.sm// Cache ControllerMessageBuffer * requestFromCache, network "To",virtual network "2”, vnet type "request";// Directory ControllerMessageBuffer * forwardFromDir, network "To",virtual network "3”, vnet type "forward";MessageBuffer * responseFromCache, network "To",virtual network "4”, vnet type "response";MessageBuffer * responseFromDir, network "To",virtual network "4”, vnet type "response";MessageBuffer * unblockFromCache, network "To",virtual network "5”, vnet type "unblock";MessageBuffer * dmaResponseFromDir, network "To",virtual network "1”, vnet type "response";MessageBuffer * forwardToCache, network "From",virtual network "3”, vnet type "forward";MessageBuffer * unblockToDir, network "From",virtual network "5”, vnet type "unblock";MessageBuffer * responseToCache, network "From",virtual network "4”, vnet type "response";MessageBuffer * responseToDir, network "From",virtual network "4", vnet type "response";MessageBuffer * requestToDir, network "From",virtual network "2”, vnet type "request";MessageBuffer * dmaRequestToDir, network "From",virtual network "0”, vnet type "request";ICN Spring 2017 L13: System Interface Tushar Krishna, School of ECE, Georgia TechFebruary 27, 2017

Example: MOESI hammer (AMDOpteron) protocol in gem519¡Message Types¡ src/mem/protocol/MOESI hammer-msg.sm// pe, desc ".") {GETX,desc "Get eXclusive";GETS,desc "Get Shared";MERGED GETS, desc "Get Shared";PUT,desc "Put Ownership";WB ACK,desc "Writeback ack";WB NACK,desc "Writeback neg. ack";PUTF,desc "PUT on a Flush";GETF,desc "Issue exclusive for Flushing";BLOCK ACK, desc "Dir Block ack";INV,desc "Invalidate";}ICN Spring 2017 L13: System Interface// Type, desc ".") {ACK,desc "ACKnowledgment, responder does not have acopy";ACK SHARED,desc "ACKnowledgment, responder has a sharedcopy";DATA,desc "Data, responder does not have a copy";DATA SHARED,desc "Data, responder has a shared copy";DATA EXCLUSIVE,desc "Data, responder was exclusive, gave us acopy, and they went to invalid";WB CLEAN,desc "Clean writeback";WB DIRTY,desc "Dirty writeback";WB EXCLUSIVE CLEAN, desc "Clean writeback of exclusive data";WB EXCLUSIVE DIRTY, desc "Dirty writeback of exclusive data";UNBLOCK,desc "Unblock for writeback";UNBLOCKS,desc "Unblock now in S";UNBLOCKM,desc "Unblock now in M/O/E";NULL,desc "Null value";} Tushar Krishna, School of ECE, Georgia TechFebruary 27, 2017

20Implications of Traffic on NoC Design¡ Design-time¡ Placement of Cores/Caches/Memory Controllers¡ Homogeneous / “General Purpose CMP”¡ Typical Assumptions: each tile has 1 (or 2 cores), Private L1Data Instruction Cache, Private/Shared L2 slice, Directory¡ What about Memory Controllers?¡ If one or two memory controllers, usually on one end of thechip¡ What if there are more? (next)¡ Heterogeneous / “Application Specific SoC”¡ later in the course¡ Runtime (usually done by the OS)¡ Mapping of threads/tasks to cores¡ Mapping of data across cachesICN Spring 2017 L13: System Interface Tushar Krishna, School of ECE, Georgia TechFebruary 27, 2017

21Logistics¡Lab 4 [5 points]: Due coming Sunday (March 4)¡ Full-System Simulations for PARSEC benchmarks¡ 2 coherence protocols¡ 2 NoC Configurations: 1-cycle and 5-cycle routers¡ Study Impact of Network Delay on Full-systemRuntime¡Proposal Presentation [7 points] (March 15)¡ Milestone I: Motivation Graphs [5 points]¡ Proposed Plan Timeline [2 points]ICN Spring 2017 L13: System Interface Tushar Krishna, School of ECE, Georgia TechFebruary 27, 2017

22Paper Discussion¡“Achieving Predictable Performance throughBetter Memory Controller Placement in Many-CoreCMPs”¡ Dennis Abts, Natalie Enright Jerger, John Kim, DanGibson, Mikko Lipasti, ISCA 2009ICN Spring 2017 L13: System Interface Tushar Krishna, School of ECE, Georgia TechFebruary 27, 2017

23Discussion Points¡ Summary of paper¡ Why do you think row0 7 andcol0 7 popular?¡ What’s the problem with thisplacement?¡ Simulation Methodology?¡ Channel Load for Random Traffic¡ Why genetic algorithm?¡ Routing Algorithms¡ XY, YX, XY YX, CDR¡ Deadlock avoidance?¡ Challenges with otherplacements?¡ 2 strengths¡ 2 weaknessesICN Spring 2017 L13: System Interface Tushar Krishna, School of ECE, Georgia TechFebruary 27, 2017

24ResultsTraffic TypesReq-only, Rep-only, Req RepICN Spring 2017 L13: System Interface Tushar Krishna, School of ECE, Georgia TechFebruary 27, 2017

L1 Miss à L2 Miss à Traffic in the NoC [low miss penalty] - Data replication between L2s è Overall lower cache capacity ¡ Shared L2 caches Data can only exist in one L2 bank è Higher cache capacity - L1 Miss à Traffic in the NoC to go to L2 bank [increased miss penal