34
CIS 310 Management Information Systems Database Refresher

CIS 310 Management Information Systems Database Refresher

Embed Size (px)

Citation preview

Page 1: CIS 310 Management Information Systems Database Refresher

CIS 310 Management Information Systems

Database Refresher

Page 2: CIS 310 Management Information Systems Database Refresher

Database Refresher – Mental Model

Database = File CabinetRecords = File FolderData = Contents of the folder

Database = Student RecordsTable = A-G in the top DrawerRecord = BID 1234567890Attribute, Entity or Data Item = Student Name, Address, GPA, email…etc.

Page 3: CIS 310 Management Information Systems Database Refresher

Relating TablesStudent Data• BID (unique)• First Name• Last Name• Address• City• State• Zip• Phone• eMail

Enrolled In, Winter 2013• BID (unique)• Class Number 1• Class Number 2• Class Number 3• Class Number 4…etc.

Classes in Winter 2013• Class No. (unique each term)• Dept. No.• Class Name• Units• Time• Professor

Page 4: CIS 310 Management Information Systems Database Refresher

Relational Database Management System (RDBMS)

• Software that helps you to link the tables and perform reporting and queries on the data in the database.– Access (CIS 101)– Sybase– Oracle– SQL Server (Microsoft)– DB2 (IBM)

Page 5: CIS 310 Management Information Systems Database Refresher

Entity Relationship Diagram (ERD)

• Design tool used to design and plan databases.

• Primary key

• Table

Attributes

Page 6: CIS 310 Management Information Systems Database Refresher

ERDs Continues

Relationships• One-to-one

• One-to-many

• Many-to-many

1

11

Page 7: CIS 310 Management Information Systems Database Refresher

ERD Example (engotzz.blogspot.com)

Page 8: CIS 310 Management Information Systems Database Refresher

Bookstore ERD from jdonohue.com

Page 9: CIS 310 Management Information Systems Database Refresher

End

• Is LastName a good primary key if you’re just using a small database for a class?

• Name three tables that would be in a database for pet adoption.

• What would data attributes for a pet adoption database be for a table called animals?

No. People have the same last name. Not expandable.

Animals NewOwners ShotHistory

petID, Breed, Age, Name, KidFriendly, OtherPetFriendly, Color…etc.

Page 10: CIS 310 Management Information Systems Database Refresher

CIS 310 Management Information Systems

Data Warehousing, Data Marts, Data Integrity

Page 11: CIS 310 Management Information Systems Database Refresher

Example: Rensselaer Polytechnic Institute Admissions Data Warehouse

• Attract the best and brightest and retain diversity, balance, geography and manage financial aid.

• Results– Invested $1.2 million. Costs $537,000 annually to

operate.– Savings in improved data analysis $820,000 annually– Savings in financial aid $500,000 annually– Savings in labor for reporting $320,000 annually

Source: Information Week, 2007

Page 12: CIS 310 Management Information Systems Database Refresher

Example 2 – Cal Poly Data Warehouse

Page 13: CIS 310 Management Information Systems Database Refresher

Data Warehouse

• Collection of data from several databases to support business analysis.– Aggregate lots of data– Internal and external sources– Drill down capability

• Benefit– Focus on managerial decision making instead of

operational decision making.– Provide insight not available before because data

was never connected before.

Page 14: CIS 310 Management Information Systems Database Refresher

datawarehouse4u.infoExternal Data Sources

Page 15: CIS 310 Management Information Systems Database Refresher

Data Mart: Subset of DW (Gdwsolutions.com)

Page 16: CIS 310 Management Information Systems Database Refresher

Data Cube (Multidimensional Analysis)

• Allows you to look at the data from different dimensions to perform analysis.

Store AStore BStore CStore DStore E

Campaign A Campaign BCampaign C

Prod

uct A

Prod

uct B

Prod

uct C

Prod

uct D

Page 17: CIS 310 Management Information Systems Database Refresher

Slicing an Dicing

Store AStore BStore CStore DStore E

Campaign A Campaign BCampaign C

Prod

uct A

Prod

uct B

Prod

uct C

Prod

uct D

Store AStore BStore CStore DStore E

Campaign A Campaign BCampaign C

Prod

uct A

Prod

uct B

Prod

uct C

Prod

uct D

Page 18: CIS 310 Management Information Systems Database Refresher

Questions

1. What store is the most productive and what products are the best sellers at those stores?

2. What clinics need the most blood during which time of year?

3. Which advertising campaign was most productive in which areas?

4. What elements are decisively different between my worst performing and best performing store?

Page 19: CIS 310 Management Information Systems Database Refresher

Information Granularity

• Yard foot inches • All sales sales per region sales per store • Kids with the flu kids with the flue by region kids with the flue by region and age.

• Granularity is how far you can dig into the detail of the data.

Page 20: CIS 310 Management Information Systems Database Refresher

Data Integrity

• Not all data is ‘clean’. Sometimes data is erroneous or incomplete.

• You do not want to make a decision based upon bad data.

• High quality data can lead to better, or a least more informed, decisions.

Page 21: CIS 310 Management Information Systems Database Refresher

5 Characteristics of High Quality Information

• Accuracy – the data is correct.• Completeness – all the data needed is there.• Consistency – data is uniform. A phone number

has 10 characters, never any other length. • Uniqueness – To have value, the data must

uniquely inform the company.• Timeliness – New and current data is better for

current decision making.

Page 22: CIS 310 Management Information Systems Database Refresher

Data Scrubbing

• ‘Cleaning’ the data to get rid of incomplete, inconsistent or erroneous data.– Missing data or attributes– Redundant records– Missing keys– Erroneous records– Incorrect data

• Ex. How many fake accounts have you set up to try software for free?

• Ex. How many times did you sign up for something and then never return?

Page 23: CIS 310 Management Information Systems Database Refresher

Information Accuracy Costs $$

• The more complete and accurate the data is, the higher it will cost.

• It takes resources to collect, verify and fix data.

• Another question – what is the costs of not having high-quality data?– Having to redo things.– Making bad decisions.– Process failure.

Page 24: CIS 310 Management Information Systems Database Refresher

Getting the Right Data In

• Online forms designed to prevent errors.• @ to check if it is an email address or to email

the account before activation.• Form fields required. * Don’t let the user

continue until they fill out everything.• Form fields of a specific format or length. Zip

code has to be 5 digits or it is rejected.

Page 25: CIS 310 Management Information Systems Database Refresher

End

• Is a monthly sales report an example of highly granular information?

• What is that OLAP thingy?

• Why would anyone use a data mart instead of the data warehouse?

No. It is more granular than a yearly sales report and less granular than a daily sales report.

Data cube…with the ability to slice and dice.

A data mart is a subset of a data warehouse, used for a more focused purpose. You would use the mart to make your analysisfaster and maybe easier.

Page 26: CIS 310 Management Information Systems Database Refresher

Data Mining & Data Analysis

Winter, 2013

Page 27: CIS 310 Management Information Systems Database Refresher

Data Mining

Use a variety of techniques to uncover interesting things about the data• Cluster analysis• Association Detection• Statistical Analysis

Page 28: CIS 310 Management Information Systems Database Refresher

Structured vs. Unstructured Data

• Structured data is already in a database or spreadsheet format. – .mdbx or .xlsx

• Unstructured data doesn’t have an organized format to it.– Photos– Music– Pdf memos.– Emails.

Page 29: CIS 310 Management Information Systems Database Refresher

Cluster Analysis

• Grouping a set of objects in a way that clusters form around certain attributes.

• Ex. Zip code clustering can show where most sales, customers..etc. are from.

• Ex. Social media cluster analysis may predict what words are more likely to be next to each other. – Music sales is directly linked to buzz on social media.– Mapped into chart where clusters form and grow on

different words, predicting success.

Page 30: CIS 310 Management Information Systems Database Refresher

Association Detection

• Market Basket Analysis (also known as commodity bundle) – What is in your basket and how is it related/predictive?

• A student purchasing engineering books might also need a calculator.

• Amazon suggesting ‘customers who bought this also bought that.’ May entice you to purchase more books.

Page 31: CIS 310 Management Information Systems Database Refresher

Statistical Analysis

• Forecasting & Time Series– Data can be collected at specific intervals to gain

predictive insight to it.– Ex. stock prices, power consumption, sales over

time in response to a marketing campaign.– Data could indicate seasonal or cyclic trends.

Page 32: CIS 310 Management Information Systems Database Refresher

Other Mining Opportunities

• Text Mining –– Searching through a massive number of emails for

a company. – Searching twitter data.

• Web Mining– Look at people’s browsing and buying or

navigation habits.

Page 33: CIS 310 Management Information Systems Database Refresher

So what is BI again?

• It is a set of processes and analysis tools used to examine data and get something great from it.

• BI and Big Data is booming. There are lots of massive data sets and we are at the beginning of understanding how to gain insight from all that data.

Page 34: CIS 310 Management Information Systems Database Refresher

End