Upload
shonda-burns
View
222
Download
0
Tags:
Embed Size (px)
Citation preview
11
Data for BusinessData for Business
22
Conventional Business ToolsConventional Business Tools
Paper-BasedPaper-Based
LettersLetters
TelephoneTelephone
FaxFax
TeleconferencingTeleconferencing
Etc.Etc.
33
Evolution of Data for BusinessEvolution of Data for Business
Paper-based(Basic Infrastructure
E-Docs(Standalone)
Network
- LAN- WAN
44
Stand-alone ComputerStand-alone Computer
Using computers in business without connectivity
Cashier Inventory
Customer Profile Employee Profile
55
Intranet ComputerIntranet Computer
Local Area Network (LAN)
66
Internet ComputerInternet Computer
Unsecured Network
Bad Guy
LAN
LANLAN
LAN
Wide Area Network (WAN)
77
Wireless NetworkWireless Network
WLAN (Wireless LAN)WLAN (Wireless LAN)
Wi-FiWi-Fi• Wireless network in computer systems which Wireless network in computer systems which
enable connection to the internet or other enable connection to the internet or other machines machines
More convenient but more exposed to More convenient but more exposed to publicpublic
Need better protection Need better protection • Use data encryptionUse data encryption
88
Levels of Data AccessLevels of Data Access
Executive
Manager
Employees
Within OrganizationOutside Organization
99
Data SharingData Sharing
We need to: We need to: • Guarantee each worker access to the Guarantee each worker access to the
right information, at the right time, from right information, at the right time, from the whatever sourcethe whatever source
We need to:We need to:• Provide each worker with the Provide each worker with the
appropriate interfaces to work with this appropriate interfaces to work with this information and make decisioninformation and make decision
1010
Scope of Data SharingScope of Data Sharing
Private (internal use)Private (internal use)• LAN (Intranet)LAN (Intranet)
PublicPublic• WAN (Internet)WAN (Internet)
1111
Why Go Public?Why Go Public?
Increase ProductivityIncrease Productivity• Online transactionOnline transaction
Open business opportunitiesOpen business opportunities• Create partnershipCreate partnership
1212
Data ManagementData Management
Centralized SystemCentralized System• Easy to manageEasy to manage• Can lead to bottleneck problem at peak Can lead to bottleneck problem at peak
timestimes
Distributed SystemDistributed System• Hard to manageHard to manage• Provide better performance and Provide better performance and
scalabilityscalability
1313
Centralized SystemCentralized System
ServerdB
Client 1
Client 2 Client 3
Client 4
Client 5
Client 6
1414
Distributed DBMSDistributed DBMS
Data Partitioning
1515
Questions of ConcernQuestions of Concern
What can be shared and what cannot be?What can be shared and what cannot be?
Is Data Privacy guaranteed by using IT Is Data Privacy guaranteed by using IT systems?systems?
Is our current system sufficiently useful? Is our current system sufficiently useful?
What do we really need?What do we really need?
1616
Symmetric CryptographySymmetric Cryptography
http://msdn.microsoft.com/en-us/library/aa480570.aspx
1717
Asymmetric CryptographyAsymmetric Cryptography
http://msdn.microsoft.com/en-us/library/aa480570.aspx
1818
Data RestrictionData Restriction
PublicPublic• Information which may or must be open to the general public. It is defined Information which may or must be open to the general public. It is defined
as information with no existing local, national or international legal as information with no existing local, national or international legal restrictions on access. restrictions on access.
• Example: Course CatalogExample: Course Catalog
SensitiveSensitive• Information whose access must be guarded due to proprietary, ethical, or Information whose access must be guarded due to proprietary, ethical, or
privacy considerations. privacy considerations. • Example: Date of Birth, EthnicityExample: Date of Birth, Ethnicity
RestrictedRestricted• Information protected because of protective statutes, policies or Information protected because of protective statutes, policies or
regulations. This level also represents information that isn't by default regulations. This level also represents information that isn't by default protected by legal statue, but for which the Information Owner has protected by legal statue, but for which the Information Owner has exercised their right to restrict access.exercised their right to restrict access.
• Example: Student Academic Record (FERPA)Example: Student Academic Record (FERPA)
Purdue University
1919
Data ValidationData Validation
Data validation is the process of ensuring that a program Data validation is the process of ensuring that a program operates on clean, correct and useful data. operates on clean, correct and useful data.
It uses routines, often called "validation rules" or "check It uses routines, often called "validation rules" or "check routines", that check for correctness, meaningfulness, routines", that check for correctness, meaningfulness, and security of data that are input to the system. and security of data that are input to the system.
Data validation checks that data are valid, sensible, Data validation checks that data are valid, sensible, reasonable, and secure before they are processed.reasonable, and secure before they are processed.
2020
Data Validation MethodsData Validation Methods
Format checkFormat check• Checks that the data is in a specified format (template), e.g., dates Checks that the data is in a specified format (template), e.g., dates
have to be in the format DD/MM/YYYY.have to be in the format DD/MM/YYYY. Data type checksData type checks
• Checks if the input data does not match with the chosen data type, Checks if the input data does not match with the chosen data type, e.g., In an input box accepting numeric data, if the letter 'O' was e.g., In an input box accepting numeric data, if the letter 'O' was typed instead of the number zero, an error message would appear.typed instead of the number zero, an error message would appear.
Range checkRange check• Checks that data lie within a specified range of values, e.g., the Checks that data lie within a specified range of values, e.g., the
month of a person's date of birth should lie between 1 and 12.month of a person's date of birth should lie between 1 and 12. Limit checkLimit check
• Unlike range checks, data is checked for one limit only, upper OR Unlike range checks, data is checked for one limit only, upper OR lower, e.g., data should not be greater than 2 (>2).lower, e.g., data should not be greater than 2 (>2).
2121
Data Validation Methods (cont.)Data Validation Methods (cont.)
Presence checkPresence check• Checks that important data are actually present and have not Checks that important data are actually present and have not
been missed out, e.g., customers may be required to have their been missed out, e.g., customers may be required to have their telephone numbers listed.telephone numbers listed.
Spelling and grammar checkSpelling and grammar check• Looks for spelling and grammatical errors.Looks for spelling and grammatical errors.
Consistency ChecksConsistency Checks• Checks fields to ensure data in these fields corresponds, e.g., If Checks fields to ensure data in these fields corresponds, e.g., If
Title = "Mr.", then Gender = "M".Title = "Mr.", then Gender = "M".
2222
Dirty DataDirty Data
Dirty data refers to inaccurate information/data primarily Dirty data refers to inaccurate information/data primarily collected by means of data capture formscollected by means of data capture forms
Dirty data is data that is:Dirty data is data that is:• MisleadingMisleading• Incorrect or without generalized formattingIncorrect or without generalized formatting• Containing spelling or punctuation errors (data that is entered in Containing spelling or punctuation errors (data that is entered in
a wrong field or duplicate data)a wrong field or duplicate data)
2323
Causes of Dirty DataCauses of Dirty Data
Deliberate distortion of informationDeliberate distortion of information• Person could deliberately inserts misleading or fictional data Person could deliberately inserts misleading or fictional data
such as personal information, biographical data which such as personal information, biographical data which seems/appears real, it may not be picked up by an administrator seems/appears real, it may not be picked up by an administrator and/or a validation routine due to its appearanceand/or a validation routine due to its appearance
Typographical errorsTypographical errors Formatting issues Formatting issues
• Personal preferences for formatting of the data (such as phone Personal preferences for formatting of the data (such as phone numbers) could lead to introduction of dirty datanumbers) could lead to introduction of dirty data
Duplication errorsDuplication errors• Duplicate data may be caused by accidental double submission Duplicate data may be caused by accidental double submission
on the forms; incorrect data joining; user error(s)on the forms; incorrect data joining; user error(s)
2424
Dirty Data PreventionDirty Data Prevention
It is commonly prevented using input masks or validation It is commonly prevented using input masks or validation rules.rules.
Completely removing dirty data from a data source is Completely removing dirty data from a data source is impossible or impractical in some cases.impossible or impractical in some cases.
2525
Data CleansingData Cleansing
Data cleansing or data scrubbing is the act of detecting Data cleansing or data scrubbing is the act of detecting and correcting (or removing) corrupted or inaccurate and correcting (or removing) corrupted or inaccurate records from a record set, table, or database. records from a record set, table, or database.
It refers to identifying incomplete, incorrect, inaccurate, It refers to identifying incomplete, incorrect, inaccurate, irrelevant etc. parts of the data and then replacing, irrelevant etc. parts of the data and then replacing, modifying or deleting dirty data.modifying or deleting dirty data.
Data cleansing differs from Data cleansing differs from data validationdata validation in that: in that:• validation means data is rejected from the system at entry and is validation means data is rejected from the system at entry and is
performed at entry time, rather than on batches of data.performed at entry time, rather than on batches of data.
2626
Steps in the Evolution of Data MiningSteps in the Evolution of Data Mining Evolutionary Evolutionary
Step Step Business QuestionBusiness Question Enabling Enabling
TechnologiesTechnologiesCharacteristicCharacteristic
ss
Data CollectionData Collection
((1960s1960s))"What was my total "What was my total revenue in the last revenue in the last five years?"five years?"
Computers, tapes, Computers, tapes, disksdisks
Retrospective, Retrospective, static data static data deliverydelivery
Data AccessData Access
((1980s1980s))"What were unit "What were unit sales in New England sales in New England last March?"last March?"
RDBMS, SQL, ODBCRDBMS, SQL, ODBC Retrospective, Retrospective, dynamic data dynamic data delivery at delivery at record levelrecord level
Data Data Warehousing &Warehousing &
Decision Decision SupportSupport
(1990s)(1990s)
"What were unit "What were unit sales in New England sales in New England last March? Drill last March? Drill down to Boston."down to Boston."
On-line analytic On-line analytic processing (OLAP), processing (OLAP), multidimensional multidimensional databases, data databases, data warehouseswarehouses
Retrospective, Retrospective, dynamic data dynamic data delivery at delivery at multiple levelsmultiple levels
Data MiningData Mining
((Emerging TodayEmerging Today))
"What’s likely to "What’s likely to happen to Boston happen to Boston unit sales next unit sales next month? Why?"month? Why?"
Advanced algorithms, Advanced algorithms, multiprocessor multiprocessor computers, massive computers, massive databasesdatabases
Prospective, Prospective, proactive proactive information information deliverydelivery
http://www.thearling.com/text/dmwhite/dmwhite.htm
2727
Data Storage PerformanceData Storage Performance
ActiveActive
Less ActiveLess Active
HistoricalHistorical
ArchiveArchive
dB
Fast
Medium
Slow
Per Request
Life Cycle of DataLife Cycle of Data
2828
Data for BusinessData for Business
RFID TechnologyRFID Technology
2929
Radio Frequency Identification (RFID)Radio Frequency Identification (RFID)
An automatic method, relying on storing An automatic method, relying on storing and remotely retrieving data using and remotely retrieving data using devices called “RFID tags”.devices called “RFID tags”.
3030
Types of RFIDTypes of RFID PassivePassive
• Does not have internal power supplyDoes not have internal power supply• Range (4cm up to a few meters)Range (4cm up to a few meters)
ActiveActive• Have its own power supply to broadcast signal to readerHave its own power supply to broadcast signal to reader• Range of hundreds of meters with 10 years battery lifetimeRange of hundreds of meters with 10 years battery lifetime
Semi-passiveSemi-passive• Have its own power for chip but not for broadcast a signalHave its own power for chip but not for broadcast a signal• greater sensitivity than passive, typically 100 times more greater sensitivity than passive, typically 100 times more
RFID backscatter
3131
Example of RFID TagsExample of RFID Tags
RFID in the form of sticker
An RFID tag used for electronic toll collection
3232
Implantable RFID ChipImplantable RFID Chip
3333
Logo of the Anti-RFID Campaign Logo of the Anti-RFID Campaign