Upload
elijah-carr
View
221
Download
1
Tags:
Embed Size (px)
Citation preview
1
Peter Fox
Xinformatics
ITWS, ERTH, CSCI 4400/6400
Week 1, January 22, 2013
Introduction to XinformaticsCourse Scope, Assessments
Contents• Introductions
• Course Outline
• Application areas
• Logistics and resources
• Assessment and assignments
• Learning objectives, outcomes
• Introduction to Xinformatics
• Discussion, etc.
• Next class(es)2
Introductions• Name, major, year
• Interests, goals, outcomes
• Have you completed any *suggested* prerequisites:– Knowledge such as that gained in a Data Base
class (e.g., CSCI-4380)– Knowledge such as that gained in a Data
Structures class (e.g., CSCI-1200)– Knowledge such as that gained in a Data
Science class (e.g. ITEC/CSCI/ERTH 6961)
• Questions 3
Course Outline (tentative)• Introduction to Informatics• Information theory, models, tools• Foundations; semiotics, library, cognitive and social science• Capturing the problem: Use case development and
requirement analysis• State-of-the-Art, informatics applications• Information life-cycle• Information architectures (Internet, Web, Grid, Cloud) • Information Visualization, Information and Workflow
Management• Information Discovery, Information Integration• Information Audit• Information quality, and bias• Class exercises, presentations** along the way
4
Application Areas• Geoinformatics
• Astroinformatics
• Cheminformatics
• Bioinformatics
• Helioinformatics
• Healthinformatics
• Ecoinformatics
• Nursing informatics
• and the list goes on, and on5
Logistics• Class: ITWS, ERTH, CSCI 4400/6400, Hours: 9am-
11:50am Tuesdays• Location: Lally 104• Instructor: Peter Fox - [email protected], TA: Karthik
Abimanyu – [email protected] • Contact hours: Mondays 3pm-4pm (or by appt)• Contact location: Winslow 2120 or Lally 207A• *******Web:
http://tw.rpi.edu/web/Courses/Xinformatics/2013 Schedule, syllabus, reading, assignments, etc.
6
Assessment and Assignments• Via written assignments with specific percentage of
grade allocation provided with each assignment• Via individual oral presentations with specific
percentage of grade allocation provided• Via group presentations – depending on class size• Via participation in class (not to exceed 10% of
total) – this works by ‘losing’ points by not participating
• Late submission policy: first time with valid reason – no penalty, otherwise 20% of score deducted each late day
7
Assessment and Assignments• Reading assignments
– Are given almost every week– Most are background and informational– Some are key to completing assignments– Some are relevant to the current week’s class (i.e. follow
up reading)– Others are relevant to following week’s class (i.e. pre-
reading)– Undergraduates - will not be tested on but we will often
discuss these in class and participation in these is taken into account
– Graduates – are likely to be tested as part of assignments, i.e. an extra question
• You will progress from individual work to group work
8
Details on Assessment• Assignment 1: General assignment: Use case and user requirements
development: develop a use case, perform the analysis and list the users (actors) and their requirements. Due ~ week 4. 20% (written; individual);
• Assignment 2: Based on the use-case that was developed in Assignment 1, address the question of assessing information uncertainty in different aspects of the use case and determine possible ways to condition the system to reduce uncertainty in achieving the goals of the use case. Due ~ week 7. 20% (written; individual);
• Assignment 3: Analysis of cognitive, collection and social/cultural aspects of information in signs discussed and decomposed. Due ~ week 10. 20% (10% written and 10% oral; individual);
• Assignment 4: Xinformatics group project - problem description or formulation, analysis, critique of existing implementations, use case modeling, prototype implementation. Due ~ week 13. 40% (30% written, 10% oral; individual or group).
9
Objectives• To instruct future information architects how to
sustainably generate information models, designs and architectures
• To instruct future technologists how to understand and support essential data and information needs of a wide variety of producers and consumers
• For both to know tools, and requirements to properly handle data and information
• Will learn and be evaluated on the underpinnings of informatics, including theoretical methods, technologies and best practices.
10
Learning Objectives• Through class lectures, practical sessions,
written and oral presentation assignments and projects, students should:– Develop and demonstrate skill in Participation and Organization of multi-
skilled teams in the application of Informatics– Demonstrate the development of Conceptual and Information Models and
explain them to non-experts (6400 level only)– Demonstrate the application of information theory and design principles to
information systems– Evaluate, discuss and be able to apply (6400 level only) Informatics Best
Practices and Standards– Demonstrate knowledge (advanced for 6400 level) and application of
Informatics Standards– Develop and demonstrate (advanced for 6400 level) skill in Informatics Tool
Use and Evaluation 11
Academic Integrity• Student-teacher relationships are built on trust. For example, students
must trust that teachers have made appropriate decisions about the structure and content of the courses they teach, and teachers must trust that the assignments that students turn in are their own. Acts, which violate this trust, undermine the educational process. The Rensselaer Handbook of Student Rights and Responsibilities defines various forms of Academic Dishonesty and you should make yourself familiar with these. In this class, all assignments that are turned in for a grade must represent the student’s own work. In cases where help was received, or teamwork was allowed, a notation on the assignment should indicate your collaboration. Submission of any assignment that is in violation of this policy will result in a penalty. If found in violation of the academic dishonesty policy, students may be subject to two types of penalties. The instructor administers an academic (grade) penalty, and the student may also enter the Institute judicial process and be subject to such additional sanctions as: warning, probation, suspension, expulsion, and alternative actions as defined in the current Handbook of Student Rights and Responsibilities. If you have any question concerning this policy before submitting an assignment, please ask for clarification.
12
Questions so far?
13
Introduction to Informatics• E.g. Bioinformatics
– Over the past few decades, major advances in the field of molecular biology, coupled with advances in genomic technologies, have led to an explosive growth in the biological information generated by the scientific community. This deluge of genomic information has, in turn, led to an absolute requirement for computerized databases to store, organize, and index the data and for specialized tools to view and analyze the data.
– http://www.ncbi.nlm.nih.gov/About/primer/bioinformatics.html
14
Tell us more…• Bioinformatics is the field of science in which biology,
computer science, and information technology merge to form a single discipline.
• The ultimate goal of the field is to enable the discovery of new biological insights as well as to create a global perspective from which unifying principles in biology can be discerned.
• At the beginning of the "genomic revolution", a bioinformatics concern was the creation and maintenance of a database to store biological information, such as nucleotide and amino acid sequences.
• Development of this type of database involved not only design issues but the development of complex interfaces whereby researchers could both access existing data as well as submit new or revised data.
15
And…• Ultimately, however, all of this information
must be combined to form a comprehensive picture of normal cellular activities so that researchers may study how these activities are altered in different disease states.
• Therefore, the field of bioinformatics has evolved such that the most pressing task now involves the analysis and interpretation of various types of data, including nucleotide and amino acid sequences, protein domains, and protein structures. 16
And…• The actual process of analyzing and interpreting
data is referred to as computational biology. Important sub-disciplines within bioinformatics and computational biology include:– the development and implementation of tools that enable
efficient access to, and use and management of, various types of information
– the development of new algorithms (mathematical formulas) and statistics with which to assess relationships among members of large data sets, such as methods to locate a gene within a sequence, predict protein structure and/or function, and cluster protein sequences into families of related sequences
17
One result – myexperiment.org
18
Definitions
• Data - are pieces of information that represent the qualitative or quantitative attributes of a variable or set of variables.
• Data (plural of "datum", which is seldom used) - are typically the results of measurements and can be the basis of graphs, images, or observations of a set of variables.
• Data - are often viewed as the lowest level of abstraction from which information and knowledge are derived 19
Definitions ctd.• Information
– Representations (of facts? data?) in a form that lends itself to human use
– The word information derives from the Latin informare (in+formare) meaning to give form, shape, or character to. It is therefore to be the formative principle of, or to imbue with some specific character or quality.
• Knowledge– Check out Wikipedia…. meaning
20
Definitions ctd.
• Metadata – data about data
• Metainformation – information about information
• Documentation – integrated collection of information and metadata intended to support all aspects of data (find, access, use…)
21
Could it really be this linear?
Data-Information-Knowledge Ecosystem
23
Data Information Knowledge
Producers Consumers
Context
PresentationOrganization
IntegrationConversation
CreationGathering
Experience
TWC Curriculum tw.rpi.edu/web/Courses
24
Data Information Knowledge
Producers Consumers
Context
PresentationOrganization
IntegrationConversation
CreationGathering
Experience
Data Science Xinformatics
Web Science
Semantic eScience
25
The Information Era: Interoperability
• managing and accessing large data sets• higher space/time resolution capabilities • rapid response requirements• data assimilation into models• crossing disciplinary boundaries.
Modern information and communications technologies are creating an “interoperable” information era in which ready access to data and information can be truly universal.
Open access to data and services enables us to meet the new challenges of understand complex systems:
Fox CI and X-informatics - CSIG 2008, Aug 11
26
Shifting the Burden from the Userto the Provider
Other forms of information
27
Information explosion• Devices are everywhere, but … by 2020
28
And, gulp, unstructured
29
The key is:• As volume, complexity and heterogeneity
increase…– Suddenly information may look more like a
continuum– All known methods, algorithms will not scale
(except for very simple operations)– And because it is information, humans are part of
the loop
• Thus – we need to understand and apply the theoretical foundations
• Problem: all to date are developed in an analog world, not a digital one!!
30
31
Mind the gap• As capabilities and needs grow on both
sides: science/ medicine/ engineering – and
technology:
• There is/ was still a gap between science
and the underlying infrastructure and
technology that is available
• Cyberinfrastructure is the new research environment(s) that support advanced data acquisition, data storage, data management, data integration, data mining, data visualization and other computing and information processing services over the Internet.
Informatics - information science includes the
science of (data and) information, the practice
of information processing, and the engineering
of information systems. Informatics studies the
structure, behavior, and interactions of natural
and artificial systems that store, process and
communicate (data and) information. It also
develops its own conceptual and theoretical
foundations. Since computers, individuals and
organizations all process information,
informatics has computational, cognitive and
social aspects, including study of the social
impact of information technologies. Wikipedia.
32
But really it’s not just one field
IT Cyber
Infrastructure (CI)
Cyber Informatics
Core Informatics
Science Informatics
Science, Benefit to others
Informatics
•CI = Discipline neutral, e.g. web server, database, wiki
•Cyberinformatics = mapping to discipline neutral aspects
•Core informatics = Reasoning engine, semantics, computer science
•Science (X) informatics = Use cases, science domain terms, concepts in an ontology or controlled vocabulary
A moment of history
• In the late 1950’s (actually around 1957-1958 or 1962 depending on what you read) the modern informatics term was coined
• Existed for a while but then split into library science and computer science and developed their own fields, became disconnected
• Now coming back to be relevant to science+…• Informatics IS NOT just having a someone
work with an “IT/ICT” person (NOT, NOT, NOT) 33
Cyberinformatics
• The first match between the domain and the underlying domain-neutral e-infrastructure/ cyberinfrastructure
• When the underlying infrastructure (when it becomes real infrastructure and not just software) changes this is one part that needs to change
• Less brittle since upper layers remain intact
34
Core informatics
• The realm of computer science (for the most part, also librarians)
• Strongly influenced by science (and engineering and medical applications) above and below this layer
• If we can leverage this, we do not need to do the specialist work, however …
• We must work with these scientists, sustainably 35
Science Informatics
• Where science meets the underlying technical capabilities and methods
• Must be expressible in science terms; increasingly use cases
• The people in this area are multi-lingual and both interdisciplinary and multi-disciplinary, few are trained or literate here ******
• Team, or really a community of practice (CoP) 36
BORROMEAN RINGSBORROMEAN RINGSThree interlinked circles that Three interlinked circles that represent inseparable parts of represent inseparable parts of the whole. Remove any one the whole. Remove any one ring and the other two fall ring and the other two fall apart. Because of this apart. Because of this property, Borromean Rings property, Borromean Rings have been used as a symbol have been used as a symbol of unity in many fields.of unity in many fields.
THE PHYSICS OF INFORMATIONTHE PHYSICS OF INFORMATION
•Information has three indivisible ingredients – Information has three indivisible ingredients – content,content, context context and and structure. structure. •The ability to automatically utilize the inherent The ability to automatically utilize the inherent structure of information is the threshold in information structure of information is the threshold in information management from hardcopy to digital media. management from hardcopy to digital media.
© 2005 EvREsearch LTD© 2005 EvREsearch LTD
EvREsearch©EvREsearch©
Not a perfect story• Many authors criticize the use of the term entropy,
and physics of information
• Information conservation, diffusion, viscosity, advection, dissipation… sort of all make some sense
• Units are a big part of it (question: what are the possible units?) and what are the non-dimensional numbers?
• However the idea is very relevant to modeling, design and architecture
• We’ll revisit the components of the physics of information 38
Information theory
• Semiotics, also called semiotic studies or semiology, is the study of sign processes (semiosis), or signification and communication, signs and symbols, into three branches:– Syntactics: Relation of signs to each other in
formal structures– Semantics: Relation between signs and the
things to which they refer; their denotata– Pragmatics: Relation of signs to their impacts on
those who use them
39
Library science• Curates the artifacts of knowledge but
increasingly: (yes) information
• Organizes and manages them for consumers– Cataloging and classification
• Preservation– ‘maintaining or restoring access to artifacts,
documents and records through the study, diagnosis, treatment and prevention of decay and damage’ (wikipedia)
• In our digital age– Curation and preservation
40
CLAY
PAPYRUS
PAPER
DIGITAL
TIME (years before present)
INF
OR
MA
TIO
N T
RA
NS
PO
RT
INF
OR
MA
TIO
N I
NT
EG
RA
TIO
N
INFORMATION VOLUME
STONE
HISTORY OF INFORMATION THRESHOLDSHISTORY OF INFORMATION THRESHOLDS
INFORMATION ERASINFORMATION ERAS
© 2005 EvREsearch LTD© 2005 EvREsearch LTD
FUTURE0100020003000400050006000
Social Science• Branch of humanities
• Especially as it relates to networks of scientists
• Exploits sociology of groups, teams
• Cultural norms as well as discipline norms– Modes of what and how rewards are given– Between those who produce and those who
consume data and information– How you collect, understand, model and design
models and architectures is as much social as technical skill
42
Cognitive Science• Cognitive science is an interdisciplinary study of
the mind and intelligence• It operates at the intersection of psychology,
philosophy, computer science, linguistics, anthropology, and neuroscience.
• Of relevance for data and information science are three significant theoretical underpinnings– mental representation,– the nature of expertise, – and intuition
• Very relevant to models, modeling, metamodel choice 43
Computer Science• Data Structures
• Knowledge Representation
• Algorithms!!
• More…
44
Use Case• … is a collection of possible sequences of
interactions between the system under discussion and its actors, relating to a particular goal. – The collection should define all system behavior relevant to
the actors to assure them goals will be carried out properly. – Any system behavior that is irrelevant to the actors should
not be included in the use cases.• is a prose description of a system's behavior when interacting with
the outside world.• is a technique for capturing functional requirements of business
systems and, potentially, of an IT system to support the business system.
• can also capture non-functional requirements
Preview of Information Models• Conceptual models, sometimes called domain
models, are typically used to explore domain concepts
• High-level conceptual models are often created as part of initial requirements envisioning efforts as they are used to explore the high-level static business or science or medicine structures and concepts.
• Conceptual models are often created as the precursor to logical models or as alternatives to them
• Followed by logical and physical models 46
Object models• An object model is a logic organization of the
real world objects (entities), constraints on them, and the relationships among objects. E.g.– A database (DB) language is a concrete syntax
for an object (data) model. – A DB system implements that model.
47
Architectures• Building on content, context,
and users, some illustrate information architecture as an iceberg.
• Just like an iceberg, the majority of information architecture work is out of sight, "below the water."
• The work includes the creation of plans, controlled-vocabularies, and blueprints all before any user interfaces are created.
48
Above the water and below• Design, design, design
• Of the interfaces, architecture, of the social, cognitive, etc. elements of information ‘systems’
• Almost all are design to support two basic modes of investigation: – induction and deduction… but enough of that for
now
49
50
Information life-cycle (trivial)
51
Visualization
52
Workflow Management
53
Discovery, Integration• Discovery (mostly about libraries!)
– Digital Fluencies– Federated Search– Folksonomies– Information Literacy– Intelligent Agents– Search Engines– Taxonomies
• Integration (mostly about application tools)
54
Information Audit• Analysis and evaluation of a
organizations's information system (whether manual or computerized) to detect and rectify blockages, duplication, and leakage of information.
• The objectives of this audit are to improve accuracy, relevance, security, and timeliness of the recorded information.
55
Information Quality, and Bias• Quality is perceived differently by information
providers and information recipients
• There are many different qualitative and quantitative aspects of quality
• Methodologies for dealing with qualities are just emerging
• Little resources are allocated to quality
• Each organization handles quality differently
• Bias detection is often an expert skill, and bias correction even more-so 56
X=discipline (not eXtreme, sorry)
• But really– Core elements that apply across many
disciplines
57
Discussion• About informatics?
• Definitions?
• Applications?
• Components?
• Theory (we’ll start on this soon)
58
Skills needed• Modeling, theory, architecture experience?
– Nah, we’ll cover that
• Literacy with computers and applications that can handle information– Yep
• Ability to access internet and retrieve/ acquire data– Oh yea
• Presentation of assignments– Ditto 59
What is expected• Attend class (ON TIME), complete
assignments (esp. reading, be prepared to give comments when asked in subsequent classes)
• Participate (e.g. reading)
• Ask questions
• Work both individually and in a group
• Work constructively in group and class sessions
• Next classes Jan 29 and Feb 5 … 60
Also on the web• Reading assignments – are intended to
prepare you for following lectures and may be considered materials for written assignments or project
• Assignments will be posted on the Web– Individual– Group
• Karthik is your first contact for assignment questions
61
What is next• Next week – Information Systems Theory
• Reading for this week - Xinformatics Applications - State of the Art– Clinical Informatics: http://www.himss.org/ASP/topics_clinicalInformatics.asp– Clinical Informatics, Medical Device Connectivity:
http://www.himss.org/CI_Insights/HIMSSClinicalInformaticsInsights.asp?date=20091012]
– Urban Informatics: emerging-urban-informatics/– Geo Informatics: http://www.geoinformatics.com/blog/online-articles/spatial-
developments-in-3d-4d-and-5d– Astro Informatics: http://arxiv.org/abs/0909.3892,
http://www.practicalastroinformatics.org/conferences/aas10– Bio Informatics: http://www.wired.com/wiredscience/2008/07/researchers-tra/– Biomedical informatics for proteomics
http://www.nature.com/nature/journal/v422/n6928/full/nature01515.html?free=2
62