Transcription

SolutionArchitecture IRDA BusinessAnalytics ProjectNov2010

Solution Architecture Document - IRDA Business Analytics ProjectTable of ContentsList of Abbreviations Used with Their Definition . 5List of Terms Used with Their Definition . 91.Executive Summary . 141.1Introduction . 141.2Solution Architecture . 142.Objectives of the Business Analytics Solution . 173.Key Business Drivers . 174.Solution Themes . 185.Present IT Infrastructure at IRDA . 195.1Existing applications used in IRDA . 195.2Existing applications and their status: . 206.Data Management Challenges . 217.Solution Architecture Components . 237.1Reference Architecture . 237.2Functional Architecture . 257.3List of interfaces for the Business Analytics Solution . 307.4Delivery Channel Architecture (Information View) . 317.5Application Architecture . 378.Architecture Considerations and Constraints . 439.Interoperability Aspects of Business Analytics Solution . 459.1Challenges of Interoperability. 459.2Technology Considerations for Interoperability . 4610.10.1Conceptual Data Model Design . 47Overview . 47Business Analytics ProjectPage 2

Solution Architecture Document - IRDA Business Analytics Project10.211.Form De duplication Matrix . 48Infrastructure Specifications . 5011.1Data Centre and Disaster Recovery Site . 5011.2Strategy for Disaster Recovery (DR) . 5111.3Connectivity between CDC and DR Site in normal operation . 5411.4Business Continuity Plan . 5511.5Recovery Point and Time Objectives for the Business Analytics Solution . 5611.6Strategy for Business Continuity Planning (BCP) . 6312.12.1Service Level Agreement for IRDA Business Analytics Solution. 65Description and Scope of Services Covered . 6513.Sizing and Performance Considerations for IRDA Business Analytics Program . 6714.Scalability and Obsolescence Plan . 7315.Security Framework for IRDA Business Analytics Solution . 7515.1Application Security Strategy . 7515.2Security Considerations . 7515.3Approach for Security Types . 7615.4Other Security Considerations . 7815.5Role Based Security Strategy . 7815.6Technical Framework for Security . 7916.Data and Document Migration Strategy . 8016.1Data Migration Objectives . 8016.2Data Migration Scope . 8016.3Data Migration – Business Considerations . 8116.4Data Migration – Technical Considerations . 8216.5Data Migration Approach . 82Business Analytics ProjectPage 3

Solution Architecture Document - IRDA Business Analytics Project16.6Data and Document Migration Methodology . 8316.7Data Archiving Strategy. 8616.8Physical and Analog Data Conversion tools and techniques . 8716.9Risks and challenges in Data Migration . 89Appendix . 91A.Department wise data model . 91B.Indicative List of Dimensions with their values and attributes . 204C.Data Sizing Estimate for the IRDA BAP Solution . 211a)Life Department . 212b)Non Life General Department . 213c)Non Life Reinsurance Department. 214d)Health Department . 215e)Actuarial Department . 216f)Intermediaries- Brokers Department. 217g)F&A Department . 218h)Total physical space Estimation . 218i)Server Load Estimation . 219D.CDC and DR Specification . 221E.Technical details of Security for IRDA Business Analytics Solution. 228F.Security Settings for IRDA Business Analytics Project . 244G.Data Archiving Procedures and Guidelines for IRDA Business Analytics Solution . 246H.Existing applications at IRDA with their details . 250Business Analytics ProjectPage 4

Solution Architecture Document - IRDA Business Analytics ProjectList of Abbreviations Used with Their DefinitionAbbreviationsDescriptionACLAn access control list (ACL) is a list of permissions attached to an object. An ACLspecifies which users--or system processes--are granted access to objects, as well aswhat operations are allowed to be performed on given objects.ADSActive Directory Server (ADS) is a technology created by Microsoft that provides avariety of network servicesANSIThe American National Standards Institute (ANSI) is a private non-profit organizationthat oversees the development of voluntary consensus standards for products,services, processes, systems, and personnel in the United StatesAPIAn application programming interface (API) is an interface implemented by a softwareprogram to enable interaction with other softwareATIB2BBCPAgent Training InstitutesBusiness - to - BusinessBusiness continuity planning (BCP) is the creation and validation of a practicedlogistical plan for how an organization will recover and restore partially or completelyinterrupted critical (urgent) functions within a predetermined time after a disaster orextended disruption. The logistical plan is called a business continuity plan.CBACContext-based access control (CBAC) intelligently filters TCP and UDP packets based onapplication layer protocol session information and can be used for intranets, extranetsand internets.Centralized Data Center or Data Center is a facility used to house computer systemsand associated components, such as telecommunications and storage systems.CDC/DCCOMCOM (hardware interface) (COM) is a serial port interface on IBM PC-compatiblecomputers running Microsoft Windows or MS-DOSDDDMZDeputy DirectorThe Demilitarized Zone (DMZ) is a critical part of a firewall: it is a network that isneither part of the un trusted network, nor part of the trusted networkThe Domain Name System (DNS) is a hierarchical naming system for computers,services, or any resource connected to the Internet or a private networkDNSDRCDRMDTLSA Disaster Recovery Center (DRC) is a backup site is a location where an organizationcan easily relocate following a disaster, such as fire, flood, terrorist threat or otherdisruptive event.Disaster Recovery Management (DRM) is the process, policies and procedures relatedto preparing for recovery or continuation of technology infrastructure critical to anorganization after a natural or human-induced disaster.The Datagram Transport Layer Security (DTLS) protocol provides communicationsprivacy for datagram protocols.Business Analytics ProjectPage 5

Solution Architecture Document - IRDA Business Analytics ProjectAbbreviationsDescriptionDWA data warehouse (DW) is a repository of an organization's electronically stored data.Data warehouses are designed to facilitate reporting and analysisEAIEnterprise Application Integration (EAI) is defined as the use of software and computersystems architectural principles to integrate a set of enterprise computer applications.EDESBExecutive DirectorAn enterprise service bus (ESB) consists of a software architecture construct whichprovides fundamental services for complex architectures via an event-driven andstandards-based messaging-engine (the bus).ETLExtract, transform, and load (ETL) is a process in database usage and especially in datawarehousingF&AGUIFinance and AccountsA graphical user interface (GUI) is a type of user interface item that allows people tointeract with programs in more ways than typing such as computers; hand-helddevices etc.A host-based intrusion detection system (HIDS) is an intrusion detection system thatmonitors and analyses the internals of a computing system rather than the networkpackets on its external interfacesHIDSHRMSHSRPIDMJDBCHuman Resource Management SystemHot Standby Router Protocol (HSRP) is a Cisco proprietary redundancy protocol forestablishing a fault-tolerant default gatewayIdentity management (IDM) is a broad administrative area that deals with identifyingindividuals in a system (such as a country, a network or an organization) andcontrolling the access to the resources in that system by placing restrictions on theestablished identitiesJava Database Connectivity (JDBC) is an API for the Java programming language thatdefines how a client may access a database. It provides methods for querying andupdating data in a databaseLANA local area network (LAN) is a computer network covering a small physical area, like ahome, office, or small group of buildingsLDAPThe Lightweight Directory Access Protocol (LDAP) is an application protocol forquerying and modifying directory services running over TCP/IPMACMandatory access control (MAC) refers to a type of access control by which theoperating system constrains the ability of a subject or initiator to access or generallyperform some sort of operation on an object or targetMDMMaster Data Management (MDM) comprises a set of processes and tools thatconsistently defines and manages the non-transactional data entities of anorganizationNetwork Information Center (NIC), is the part of the Domain Name System (DNS) ofthe Internet that keeps the database of domain names, and generates the zone fileswhich convert domain names to IP addressesNICODBCOpen Database Connectivity (ODBC) provides a standard software API method forusing database management systems (DBMS)Business Analytics ProjectPage 6

Solution Architecture Document - IRDA Business Analytics ProjectAbbreviationsDescriptionODSAn operational data store (ODS) is a database designed to integrate data from multiplesources to make analysis and reporting easierOLAPOnline analytical processing (OLAP) is an approach to quickly answer multidimensional analytical queriesRBACRole-based access control (RBAC) is an approach to restricting system access toauthorized users.Remote procedure call (RPC) is an Inter-process communication technology thatallows a computer program to cause a subroutine or procedure to execute in anotheraddress space (commonly on another computer on a shared network) without theprogrammer explicitly coding the details for this remote interaction.RPCRPORecovery Point Objective (RPO) is the point in time to which you must recover data asdefined by your organization. This is what an organization determines is an"acceptable loss" in a disaster situation.RTORecovery Time Objective (RTO) is the duration of time and a service level within whicha business process must be restored after a disaster (or disruption) in order to avoidunacceptable consequences associated with a break in business continuity.SAMLSecurity Assertion Markup Language (SAML) is an XML-based standard for exchangingauthentication and authorization data between security domains, that is, between anidentity provider (a producer of assertions) and a service provider (a consumer ofassertions)A Storage area network (SAN), an architecture to attach remote computer storagedevices to servers in such a way that the devices appear as locally attached to theoperating systemSANSIPSLASOAPSQLSSLSSOTATTCPThe Session Initiation Protocol (SIP) is a signaling protocol, widely used for controllingmultimedia communication sessions such as voice and video calls over InternetProtocol (IP)Service Level Agreement (SLA) is a part of a service contract where the level of serviceis formally defined. SLA is sometimes used to refer to the contracted delivery time (ofthe service) or performance.SOAP, originally defined as Simple Object Access Protocol, is a protocol specificationfor exchanging structured information in the implementation of Web Services incomputer networks.SQL (Structured Query Language) is a database computer language designed formanaging data in relational database management systems (RDBMS)Secure Sockets Layer (SSL) is a cryptographic protocol that provides security forcommunications over networks such as the InternetSingle sign-on (SSO) is a property of access control of multiple, related, butindependent software systemsTurn Around TimeThe Transmission Control Protocol (TCP) is one of the core protocols of the InternetProtocol Suite.Business Analytics ProjectPage 7

Solution Architecture Document - IRDA Business Analytics ProjectAbbreviationsDescriptionTLSTransport Layer Security (TLS) is a cryptographic protocol that provides security forcommunications over networks such as the InternetTPAUDPUMLThird Party AgentsThe User Datagram Protocol (UDP) is one of the core members of the InternetProtocol Suite, the set of network protocols used for the Internet.Unified Modeling Language (UML) is a standardized general-purpose modelinglanguage in the field of software engineering.VoIPVoice over Internet Protocol (VoIP) is a general term for a family of transmissiontechnologies for delivery of voice communications over IP networks such as theInternet or other packet-switched networks.VPNA virtual private network (VPN) encapsulates data transfers between two ormore networked devices not on the same private network so as to keep thetransferred data private from other devices on one or more intervening local or widearea networks.A wide area network (WAN) is a computer network that covers a broad area (i.e., anynetwork whose communications links cross metropolitan, regional, or nationalboundariesWANXMLXML (Extensible Markup Language) is a set of rules for encoding documentselectronicallyBusiness Analytics ProjectPage 8

Solution Architecture Document - IRDA Business Analytics ProjectList of Terms Used with Their DefinitionTermsApplication ServerDescriptionAn application server is a software framework dedicated to theefficient execution of procedures (programs, routines, scripts) forsupporting the construction of applications.Audit loggingAudit log is a chronological sequence of audit records, each of whichcontains evidence directly pertaining to and resulting from theexecution of a business process or system function.BiometricsBiometrics comprises methods for uniquely recognizing humans basedupon one or more intrinsic physical or behavioral traits.Business IntelligenceBusiness Intelligence (BI) refers to computer-based techniques used inspotting, digging-out, and analyzing business data, such as salesrevenue by products and/or departments or associated costs andincomes.Business process management Business process management (BPM) is a management approachfocused on aligning all aspects of an organization with the wants andneeds of clients.Caching/cachea cache is a component that improves performance by transparentlystoring data such that future requests for that data can be servedfaster. The data that is stored within a cache might be values that havebeen computed earlier or duplicates of original values that are storedelsewhere.Conceptual data modelA conceptual data model is a map of concepts and their relationships.This describes the semantics of an organization and represents a seriesof assertions about its nature.Content ManagementContent management, or CM, is the set of processes and technologiesthat support the collection, managing, and publishing of information inany form or medium. In recent times this information is typicallyreferred to as content or, to be precise, digital content.Context based accessContext-based access control intelligently filters TCP and UDP packetsbased on application layer protocol session information and can beused for intranets, extranets and internetsBusiness Analytics ProjectPage 9

Solution Architecture Document - IRDA Business Analytics ProjectTermsData EncryptionDescriptionData encryption is the process of transforming information (referred toas plaintext) using an algorithm (called cipher) to make it unreadable toanyone except those possessing special knowledge, usually referred toas a key.Data IntegrityData integrity is data that has a complete or whole structure. Allcharacteristics of the data including business rules, rules for how piecesof data relate dates, definitions and lineage must be correct for data tobe complete.Data MappingDatamapping istheprocessofelement mappings between two distinct data models.Data ProfilingData profiling is the process of examining the data available in anexisting data source and collecting statistics and information about thatdata.Database IndexA database index is a data structure that improves the speed of dataretrieval operations on a database table at the cost of slower writesand increased storage space.Database ServerA databaseserver isa computerprogram thatprovides database services to other computer programs or computers,as defined by the client–server model.Digital SignatureA digital signature is a mathematical scheme for demonstrating theauthenticity of a digital message or document. A valid digital signaturegives a recipient reason to believe that the message was created by aknown sender, and that it was not altered in nofan object, image, sound, document or a signal (usually an analog signal)by a discrete set of its points or samples.FirewallA firewall is a part of a computer system or network that is designed communications.FTPFile Transfer Protocol (FTP) is a standard network protocol used to copya file from one host to another over a TCP/IP-based network, such asthe Internet.Business Analytics Projectcreating dataPage 10

Solution Architecture Document - IRDA Business Analytics ProjectTermsKnowledge ManagementDescriptionKnowledge management (KM) comprises a range of strategies andpractices used in an organization to identify, create, represent,distribute, and enable adoption of insights and experiences. Suchinsights and experiences comprise knowledge, either embodied inindividuals or embedded in organizational processes or practiceLoad BalancingLoad balancing is a technique to distribute workload evenly across twoor more computers, network links, CPUs, hard drives, or otherresources, in order to get optimal resource utilization, maximizethroughput, minimize response time, and avoid overload.Metadata managementMeta-data Management involves storing information about otherinformation. With different types of media being used references tothe location of the data can allow management of diverse repositories.MISA management information system (MIS) is a system or process thatprovides information needed to manage organizations effectivelyMultifactor authenticationMulti-factor authentication means two or more of the authenticationfactor required for being authenticated.Operational Data StoreAn operational data store (or "ODS") is a database designed tointegrate data from multiple sources to make analysis and reportingeasier.Payment GatewayA paymentgateway isan e-commerce applicationserviceprovider service that authorizes payments for e-businesses, onlineretailers, bricks and clicks, or traditional brick and mortar. It is theequivalent of a physical point of sale terminal located in most retailoutlets.Physical Data ModelA physical data model (database design) is a representation of a datadesign which takes into account the facilities and constraints of a givendatabase management system.PortletsPortlets are pluggable user interface software components that aremanaged and displayed in a web portal. Portlets produce fragments ofmarkup code that are aggregated into a portal page.Proxy ServerA proxy server is a server (a computer system or an applicationprogram) that acts as an intermediary for requests from clients seekingresources from other servers.Business Analytics ProjectPage 11

Solution Architecture Document - IRDA Business Analytics ProjectTermsRole based accessDescriptionRole-based access control is an approach to restricting system access toauthorized usersRouterA router is a device that interconnects two or more computernetworks, and selectively interchanges packets of data between them.SOAA Service-OrientedArchitecture (SOA)isaflexiblesetof design principles used during the phases of developmentand integration. A deployed SOA-based architecture will provide aloosely-integrated suite of services that can be used within multiplebusiness domains.SSL encryptionAn SSL encryption establishes a private communication channelenabling encryption of the data during transmission. Encryptionscrambles the data, essentially creating an envelope for messageprivacy.Storage Area NetworkA storage area network (SAN) is an architecture to attach remotecomputer storage devices (such as disk arrays, tape libraries,and optical jukeboxes) to servers in such a way that the devices appearas locally attached to the operating system.System IntegrityThe state that exists when there is complete assurance that under allconditions an IT system is based on the logical correctness andreliability of the operating system, the logical completeness ofthe hardware and software thatimplementtheprotectionmechanisms.TokenA security token is a physical device that an authorized user ofcomputer services is given to ease authentication.Universal Serial BusUniversal Serial Bus (USB) is a specification to establish communicationbetween devices and a host controller (usually personal computers).Virtual TokenVirtual tokens are a new concept in multi-factor authentication whichreduce the costs normally associated with implementation andmaintenance of multi-factor solutions by utilizing the user's existinginternet device as the "something the user has" factor.Web GardeningThe scalability on multiprocessor machines can be enhanced by loadbalancing, each with processor affinity set to its CPU. The technique iscalled Web gardening, and can dramatically improve the performanceBusiness Analytics ProjectPage 12

Solution Architecture Document - IRDA Business Analytics ProjectTermsDescriptionof application.Web ServerA web server is a computer program that delivers (serves) content,such as web pages, using the Hypertext Transfer Protocol (HTTP), overthe World Wide Web.Web ServicesWeb services are typically application programming interfaces (API)or web APIs that are accessed via Hypertext Transfer Protocol andexecuted on a remote system hosting the requested services.Business Analytics ProjectPage 13

Solution Architecture Document - IRDA Business Analytics Project1. Executive Summary1.1IntroductionPost AS IS study, Requirement gathering activity across all the eight departments under consideration ofthis project was started. Based on the requirements study and keeping in mind the differentfunctionalities to be expected out of the solution, the next stage was to propose and design a technicalplatform which would support all such functionalities. The purpose of this document is to provide thedesign the architecture of the envisaged business analytics solution based on the functionalrequirements.1.2Solution ArchitecturePresently various external entities including Insurers are submitting data to IRDA in hardcopy documentsor in softcopies sent over email or through the memory disk. There is no central storage of data andreporting system in place so as to generate relevant information out of the raw data in form of reportsor analysis. To eliminate the cumbersome and complex manual processes involved in generatinginformation from the raw data, the envisaged solution needs to be designed in such a manner all thedifferent manual processes will be automated and correct information will be available to the businessusers at right point and with respect to the appropriate context.The overarching objective of the Business Analytics solution is to provide necessary data andinformation for analyzing the insurance companies and regulatory decision making.For designing the solution architecture, the system requirements have been considered as one of thekey input. Based on both functional and system requirements different views of the solution has beenrepresented to describe the entire solution in details.The architecture section includes following components: Business Analytics Solution Architectureo Reference Architecture

MDM Master Data Management (MDM) comprises a set of processes and tools that consistently defines and manages the non-transactional data entities of an organization NIC Network Information Center (NIC), is the part of the Domain Name System (DNS) of