Upload
phamliem
View
217
Download
0
Embed Size (px)
Citation preview
Concepts & Definitions
DataFacts devoid of meaning or intente.g. structured data in DB
InformationData that has meaning (data in context)e.g. course selection info in a student management system, documents, voice, video...Knowledge: information with direction or intent
ContentTerm for the Web agee.g. text, graphics, animation, maps, photos, film clips etc.
Information Resource Management Responsibilities
Corporate databasesDistributedVarious data modelsData warehouse
InformationDocumentsWeb contents
Knowledge managementExplicit knowledge (know-what)Tacit knowledge (know-how)
IS has been continually managing new forms of information resources
Managing Data
DBMS
The three-level database modelLevel 1: the conceptual level
Containing the various "user views" of the corporate data that each application program uses
Level 2: the logical levelLogical views of an organizations data as under the control of the DBAs
Level 3: the physical levelSpecifying the way the data is physically stored
Level 2 absorbs changes made at level 3
Stuent ID Student name Course Score10021 Jack Software Engineering 7910021 Jack Data structure 7610022 James Software Engineering 8510022 James Data structure 88
StuID StuName Age10021 Jack 2110022 James 20
CourseID CourseName Capacity Room373 Software engineering 30 AQ5018275 Data structure 40 AQ3023
StuID CourseID Score10021 373 7910022 373 8510021 275 7610022 275 88
Level 1
Level 2Table Student
Table Course
Table CourseSelect
Level 3
Four Data Models
Hierarchical modestructures data so that each element is subordinate to another in a strict hierarchical manner (Parent & child)
Network modelAllows each data item to have more than one parent, Relationships stated by pointers stored with the data
Relational modelObject model
Storing and managing data as objectsA competitive candidate for storing XML data
XML
XML (eXtensible Markup Language) is a self-describing markup language for applying structure to data
Not limited to predefined tagsHuman readableMachine readable
PortabilityJava: portable programsXML: portable data
XML---Semi-Structured Data
TEXT
Structured(relational)
Data
XMLLessStructure
MoreStructure
Structured data:
Unstructured data:
XML Data Model & Native Storage
“two...”
imdb
show
title review“Fugitive, The”
review
suntimes
reviewer rating
nyt
“Roger Ebert” “gives”
@year“1993”
… …
Native XML DatabaseDefines a (logical) model for an XML document and stores and retrieves documents according to that model.Has an XML document as its fundamental unit of (logical) storage
Getting Corporate Data into Shape (1)
The Problem: Management can not get consistent view across the enterprise
1960s-1970: application developed in separation "information islands"Different units in an organization developed their used their own database and their own applications
Inconsistent data definitionsDuplicate data
Getting Corporate Data into Shape (2)
The Cause: an application-driven approachGetting applications running as quickly as possible
The Solution: a data-driven approachData of interest data source applicationsUsually evolves from the application-driven chaos
Getting Corporate Data into Shape (3)
Managing data as a corporate resource is more than installing a DBMS
DBA: administering databases and software that manages themData administrator: managing enterprise-wide data resources
Clean up the data definitionsControl shared dataManage data distribution, and Maintain data quality
Getting Corporate Data into Shape (4)
ERPs aim to integrate all data and processes of an organization into a unified system
Automate and integrate the majority of business processesShare common data and practices across the entire enterpriseProduce, access and manage information in a real-time environmentConfigure application to meet business needs
Key: a unified databaseProvide management a corporate-wide view of operations
Four Types of Information (1)
Two structures of information
Record-based: facts about entitiesDocument-based: dealing with concepts
Housed in documents, messages, video, audio clips...
Four Types of Information (2)
Two sources of information: internal and external
Internal record-based information: traditional focus of ISExternal record-based information: public DBInternal and external document-based information have received little attention from IS until recently
However, it is estimated that 90% of an organization's information is in documents rather than structured databases (Sprague, 1995).
Technologies for Managing Information
The two different structures of information are managed in different ways
Record-basedData warehouse
Document-basedDocument management systemsWeb content management
What is Data Warehouse?
“A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatilecollection of data in support of management’s decision-making process.”—W. H. Inmon
Data Warehouse—Subject-Oriented
Organized around major subjects, such as customer, product, salesFocusing on the modeling and analysis of data for decision makers, not on daily operations or transaction processingProvide a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process
Data Warehouse—Integrated
Constructed by integrating multiple, heterogeneous data sources
Relational databases, flat files, on-line transaction records
Data cleaning and data integration techniques are applied.
Naming conventions, encoding structures, attribute measures, etc. among different data sourcesWhen data is moved to the warehouse, it is converted.
Data Warehouse—Time Variant
The time horizon for the data warehouse is significantly longer than that of operational systems
Operational database: current value dataData warehouse data: provide information from a historical perspective (e.g., past 5-10 years)
Every key structure in the data warehouseContains an element of time, explicitly or implicitly
Data Warehouse—Nonvolatile
A physically separate store of data transformed from the operational environmentOperational update of data does not occur in the data warehouse environment
Does not require transaction processing, recovery, and concurrency control mechanismsRequires only two operations in data accessing:
Initial loading of data and access of data
Data Warehouse vs. Heterogeneous DBMS
Traditional heterogeneous DB integration: A query driven approach
Build wrappers/mediators on top of heterogeneous databases When a query is posed to a client site, a meta-dictionary is used to translate the query into queries appropriate for individual heterogeneous sites involved, and the results are integrated into a global answer setComplex information filtering, compete for resources
Data warehouse: update-driven, high performanceInformation from heterogeneous sources is integrated in advance and stored in warehouses for direct query and analysis
Data Warehouse vs. Operational DBMS
OLTP (on-line transaction processing)Major task of traditional relational DBMSDay-to-day operations: purchasing, inventory, banking, manufacturing, payroll, registration, accounting, etc.
OLAP (on-line analytical processing)Major task of data warehouse systemData analysis and decision making
OLTP vs. OLAP OLTP OLAP users clerk, IT professional knowledge worker function day to day operations decision support DB design application-oriented subject-oriented data current, up-to-date
detailed, flat relational isolated
historical, summarized, multidimensional integrated, consolidated
usage repetitive ad-hoc access read/write
index/hash on prim. key lots of scans
unit of work short, simple transaction complex query # records accessed tens millions #users thousands hundreds DB size 100MB-GB 100GB-TB metric transaction throughput query throughput, response
Typical OLAP Operations
Roll up (drill-up): summarize dataBy climbing up hierarchy or by dimension reduction
Drill down (roll down): reverse of roll-upFrom higher level summary to lower level summary or detailed data, or introducing new dimensions
Slice and dice: project and select Pivot (rotate):
Reorient the cube, visualization, 3D to series of 2D planes
Data Warehouse: A Multi-Tiered Architecture
28
DataWarehouse
ExtractTransformLoadRefresh
OLAP Engine
AnalysisQueryReportsData mining
Monitor&
IntegratorMetadata
Data Sources Front-End Tools
Serve
Data Marts
Operational DBs
Othersources
Data Storage
OLAP Server
Document Management
Estimated that 90% of an organization’s information is in documents rather than structured databases
Types of DocumentsContracts and AgreementsReportsManuals and HandbooksCorrespondenceMemosDrawings and Blueprints…
Fundamental Roles of Documents
4 Fundamental roles of documentsAs a product, or support for a productAs a fundamental mechanism for communicationamong people and groups within an organization and between organizations.As the primary vehicle for business processesAs an important part of organizational memory
Electronic Document
An electronic document has the following characteristics
holds information of multiple media: text, graphics, audio, videocontains multiple structures: headers, footers, TOC, sections, paragraphs, tablesdynamic: can be updated on the flymay depend on other documents
Limitations of RDBMS
Limitations of RDBMS for document management
Based on E-R data modelsSuitable for structured dataTraditional business applications, decision support systems, reporting toolsNo inherent support to manage electronic documents
Electronic Document Management System (1)
An EDMS is a computer system used to track and store electronic documents and/or images of paper documents.
Allows users to create a document or capture a hard copy in electronic formCommonly provided capabilities
StorageVersioningMetadataSecurityIndexingRetrieval
Electronic Document Management System (2)
Records created & received
electronicallyRecords created & received in
hard copy
Records are filed & managed for access & maintenance electronically
Electronic Document Management System (3)
An EDMS usually provides a single view of multiple databasesAn EDMS may include:
Scanners and Optical Character Recognition (OCR) for document capturePrinters for creating hard copiesStorage devices such as redundant array of independent disks systems and computer server Server programs for managing the databases that contain the documents.
Content Management (1)
Content is a core management discipline underlying online business
Without production-level Web content, management processes, and technologies, large-scale e-business is not possibleThe adoption of the XML
The language for manipulating the content to work with transaction applications
Content Management (2)
Traditional “home-grown“ content management
The Webmaster was the publishing bottle neck
3 phases of content management life cycle
Input-process-output
Content Management (3)
Content creation and acquisitionFocus on creating content qualityDistribute content creation and maintenance to business departments with centralized coordination and control
Content administration and safeguardingEmphasis on efficiencyUse tools for content administration and work flow control
Content Management (4)
Content deployment and presentationEmphasis on effectiveness
i.e. Presenting the content so that it attracts visitors, allows them to navigate the site easily, and leads them to the desired actions
Features to attract and keep visitorsPersonalization: allowing visitors to customize how they view the pageLocalization: tailoring a site to a culture, market or localeMultichannel distribution: appropriate display for various devices
Content Management Systems (1)
A Content Management System (CMS) is software that makes it easier to create, edit and publish content on a web site.
Back-end to help create, edit and manage contentFront-end to deliver content dynamically to various endpointsWork flow control in moving and adding contents
Content Management Systems (2)
Content Delivery Application
Content Management Application
Content Delivery
Assembled, tagged & formatted assets
“Front-end" functions for delivering and displaying content
“Back-end” functions for creating, editing, producing, and administering a site and its content
•Databases•DB Schemas•XML, HTML•Web Services
•Portals•Web apps•PeopleSoft•MBM
•Docs, ppts•Brochures•Photos•Logos•Contracts•Syllabus•Schedule•C:\St
ruct
ured
Con
tent
Uns
truc
ture
d C
onte
nt
Content RepositoriesIndividual Contributors
Workflow