Upload
others
View
20
Download
0
Embed Size (px)
List of Suggested Reviewers or Reviewers Not To Include (optional)
SUGGESTED REVIEWERS:Not Listed
REVIEWERS NOT TO INCLUDE:Not Listed
A Your Name: Your Organizational Affiliation(s), last 12 m Last Active DateFragkiadaki, Katerina Carnegie Mellon University
G: Your PhD Advisor(s)T: All your PhD Thesis AdviseesP: Your Graduate Advisors
to disambiguate common namesB Advisor/Advisee Name: Organizational Affiliation Optional (email, Department)G: Shi, Jianbo University of PennsylvaniaG: Malik, Jitendra University of California, BerkeleyT: Harley, Adam Carnegie Mellon UniversityT: Tung, Hsiao-Yu Fish Carnegie Mellon University
A:
C: Collaborators on projects, such as funded grants, graduate research or others (in last 48 months).to disambiguate common names
C Name: Organizational Affiliation Optional (email, Department) Last ActiveC: Agarwal, Arpit Carnegie Mellon UniversityA: Agrawal, Pulkit University of California, BerkeleyC: Alemi, Alex Google ResearchA: Arbelaez, Pablo Universidad de los Andes, ColombiaC: Atkeson, Chris Carnegie Mellon University
Pursuant to PAPPG Chapter II.C.1.e., each PI, co-PI, and other senior project personnel identified on a proposal must provide collaborator and other affiliations information to help NSF identify appropriate reviewers.(v.4/21/2017)
Table B: List names as Last Name, First Name, Middle Initial, and provide organizational affiliations, if known, for the following.
Table C: List names as Last Name, First Name, Middle Initial, and provide organizational affiliations, if known, for the following.
Co-authors on any book, article, report, abstract or paper (with collaboration in last 48 months; publication date may be later).
Table A: List your Last Name, First Name, Middle Initial, and organizational affiliation (including considered affiliation) in the last 12 months.
List names as Last Name, First Name, Middle Initial. Additionally, provide email, organization, and department (optional) to disambiguate common names.
Fixed column widths keep this sheet one page wide; if you cut and paste text, set font size at 10pt or smaller, and abbreviate, where necessary, to make the data fit.
To insert n blank rows, select n row numbers to move down, right click, and choose Insert from the menu. You may fill-down (crtl-D) to mark a sequence of collaborators, or copy affiliations. Excel has arrows that enable sorting."Last active" dates are optional, but will help NSF staff easily determine which information remains relevant for reviewer selection.
Please complete this template (e.g., Excel, Google Sheets, LibreOffice), save as .xlsx or .xls, and upload directly as a Fastlane Collaborators and Other Affiliations single copy doc. Do not upload .pdf.
There are five tables: A: Your Name & Affiliation(s); B: PhD Advisors/Advisees (all);C: Collaborators; D: Co-Editors;E: Relationships
A: Carreira, Joao Deep MindA: Efros, Alexei University of California, BerkeleyA: Felsen, Panna University of California, BerkeleyA: Girshick, Ross FacebookA: Gkioxari, Georgia FacebookA: Gupta, Saurabh FacebookA: Hariharan, Bharath FacebookC: Huang, Henry Carnegie Mellon UniversityC: Huang, Jonathan Google ResearchA: Kar, Abhishek University of California, BerkeleyA: Levine, Sergey University of California, BerkeleyC: Ricco, Susanna Google ResearchC: Salakhutdinov, Ruslan Carnegie Mellon UniversityC: Schmid, Cordelia INRIA Grenoble Rhne-AlpesC: Sukthankar, Rahul Google ResearchA: Tulsiani, Shubham University of California, BerkeleyC: Vijayanarasimhan, Sudheen Google Research
B: Editorial board: Name(s) of editor-in-chief and journal (in past 24 months).E: Other Co-Editors of journals or collections with whom you directly interacted (in past 24 months).
to disambiguate common namesD Name: Organizational Affiliation Journal/Collection Last Active
R:to disambiguate common names
D Name: Organizational Affiliation Optional (email, Department) Last Active
Additional names for whom some relationship would otherwise preclude their service as a reviewer.
Table E: List persons for whom a personal, family, or business relationship would otherwise preclude their service as a reviewer.
Table D: List editorial board, editor-in-chief and co-editors with whom you interact. An editor-in-chief should list the entire editorial board.
Not for distribution
COVER SHEET FOR PROPOSAL TO THE NATIONAL SCIENCE FOUNDATIONFOR NSF USE ONLY
NSF PROPOSAL NUMBER
DATE RECEIVED NUMBER OF COPIES DIVISION ASSIGNED FUND CODE DUNS# (Data Universal Numbering System) FILE LOCATION
FOR CONSIDERATION BY NSF ORGANIZATION UNIT(S) (Indicate the most specific unit known, i.e. program, division, etc.)
PROGRAM ANNOUNCEMENT/SOLICITATION NO./DUE DATE Special Exception to Deadline Date Policy
EMPLOYER IDENTIFICATION NUMBER (EIN) ORTAXPAYER IDENTIFICATION NUMBER (TIN)
SHOW PREVIOUS AWARD NO. IF THIS ISA RENEWALAN ACCOMPLISHMENT-BASED RENEWAL
IS THIS PROPOSAL BEING SUBMITTED TO ANOTHER FEDERALAGENCY? YES NO IF YES, LIST ACRONYM(S)
NAME OF ORGANIZATION TO WHICH AWARD SHOULD BE MADE ADDRESS OF AWARDEE ORGANIZATION, INCLUDING 9 DIGIT ZIP CODE
AWARDEE ORGANIZATION CODE (IF KNOWN)
IS AWARDEE ORGANIZATION (Check All That Apply) SMALL BUSINESS MINORITY BUSINESS IF THIS IS A PRELIMINARY PROPOSALFOR-PROFIT ORGANIZATION WOMAN-OWNED BUSINESS THEN CHECK HERE
NAME OF PRIMARY PLACE OF PERF ADDRESS OF PRIMARY PLACE OF PERF, INCLUDING 9 DIGIT ZIP CODE
TITLE OF PROPOSED PROJECT
REQUESTED AMOUNT
$
PROPOSED DURATION (1-60 MONTHS)
months
REQUESTED STARTING DATE SHOW RELATED PRELIMINARY PROPOSAL NO.IF APPLICABLE
THIS PROPOSAL INCLUDES ANY OF THE ITEMS LISTED BELOWBEGINNING INVESTIGATOR
DISCLOSURE OF LOBBYING ACTIVITIES
PROPRIETARY & PRIVILEGED INFORMATION
HISTORIC PLACES
COLLABORATIVE STATUSVERTEBRATE ANIMALS IACUC App. DatePHS Animal Welfare Assurance Number
HUMAN SUBJECTS Human Subjects Assurance Number
Exemption Subsection or IRB App. Date
INTERNATIONAL ACTIVITIES: COUNTRY/COUNTRIES INVOLVED
TYPE OF PROPOSAL
PI/PD DEPARTMENT PI/PD POSTAL ADDRESS
PI/PD FAX NUMBER
NAMES (TYPED) High Degree Yr of Degree Telephone Number Email Address
PI/PD NAME
CO-PI/PD
CO-PI/PD
CO-PI/PD
CO-PI/PD
Page 1 of 3
CNS - MAJOR RESEARCH INSTRUMENTATION, (continued)
NSF 18-513
250969449
Carnegie-Mellon University
0001057000
5000 Forbes AvenueWQED BuildingPITTSBURGH, PA 15213-3815
Carnegie-Mellon UniversityCarnegie-Mellon University Carnegie-Mellon University5000 Forbes AvenuePittsburgh ,PA ,152133890 ,US.
MRI: Development of a Mobile Human Behavior Capture System
1,866,666 48 09/01/18
Robotics Institute & HCI Institute
412-268-6436
5000 Forbes Avenue
Pittsburgh, PA 15213United States
Christopher Atkeson PhD 1986 412-681-8354 [email protected]
Katerina Fragkiadaki DPhil 2013 412-268-9527 [email protected]
Jessica Hodgins PhD 1989 412-268-6795 [email protected]
Yaser Sheikh PhD 2006 412-268-1138 [email protected]
052184116
Not a collaborative proposalEquipment
Not for distribution
CERTIFICATION PAGE
Certification for Authorized Organizational Representative (or Equivalent) or Individual Applicant
By electronically signing and submitting this proposal, the Authorized Organizational Representative (AOR) or Individual Applicant is: (1) certifying that statements made herein are true and complete to the best of his/her knowledge; and (2) agreeing to accept the obligation to comply with NSF award terms and conditions if an award is made as a result of this application. Further, the applicant is hereby providing certifications regarding conflict of interest (when applicable), drug-free workplace, debarment and suspension, lobbying activities (see below), nondiscrimination, flood hazard insurance (when applicable), responsible conduct of research, organizational support, Federal tax obligations, unpaid Federal tax liability, and criminal convictions as set forth in the NSF Proposal & Award Policies & Procedures Guide (PAPPG). Willful provision of false information in this application and its supporting documents or in reports required under an ensuing award is a criminal offense (U.S. Code, Title 18, Section 1001).
Certification Regarding Conflict of Interest
The AOR is required to complete certifications stating that the organization has implemented and is enforcing a written policy on conflicts of interest (COI), consistent with the provisionsof PAPPG Chapter IX.A.; that, to the best of his/her knowledge, all financial disclosures required by the conflict of interest policy were made; and that conflicts of interest, if any, were,or prior to the organization’s expenditure of any funds under the award, will be, satisfactorily managed, reduced or eliminated in accordance with the organization’s conflict of interest policy.Conflicts that cannot be satisfactorily managed, reduced or eliminated and research that proceeds without the imposition of conditions or restrictions when a conflict of interest exists,must be disclosed to NSF via use of the Notifications and Requests Module in FastLane.
Drug Free Work Place Certification
By electronically signing the Certification Pages, the Authorized Organizational Representative (or equivalent), is providing the Drug Free Work Place Certification contained in Exhibit II-3 of the Proposal & Award Policies & Procedures Guide.
Debarment and Suspension Certification (If answer "yes", please provide explanation.)
Is the organization or its principals presently debarred, suspended, proposed for debarment, declared ineligible, or voluntarily excluded from covered transactions by any Federal department or agency? Yes No
By electronically signing the Certification Pages, the Authorized Organizational Representative (or equivalent) or Individual Applicant is providing the Debarment and Suspension Certification contained in Exhibit II-4 of the Proposal & Award Policies & Procedures Guide.
Certification Regarding LobbyingThis certification is required for an award of a Federal contract, grant, or cooperative agreement exceeding $100,000 and for an award of a Federal loan or a commitment providing for the United States to insure or guarantee a loan exceeding $150,000.
Certification for Contracts, Grants, Loans and Cooperative AgreementsThe undersigned certifies, to the best of his or her knowledge and belief, that:(1) No Federal appropriated funds have been paid or will be paid, by or on behalf of the undersigned, to any person for influencing or attempting to influence an officer or employee of any agency, a Member of Congress, an officer or employee of Congress, or an employee of a Member of Congress in connection with the awarding of any Federal contract, the making of any Federal grant, the making of any Federal loan, the entering into of any cooperative agreement, and the extension, continuation, renewal, amendment, or modification of any Federal contract, grant, loan, or cooperative agreement.(2) If any funds other than Federal appropriated funds have been paid or will be paid to any person for influencing or attempting to influence an officer or employee of any agency, a Member of Congress, an officer or employee of Congress, or an employee of a Member of Congress in connection with this Federal contract, grant, loan, or cooperative agreement, the undersigned shall complete and submit Standard Form-LLL, ‘‘Disclosure of Lobbying Activities,’’ in accordance with its instructions.(3) The undersigned shall require that the language of this certification be included in the award documents for all subawards at all tiers including subcontracts, subgrants, and contracts under grants, loans, and cooperative agreements and that all subrecipients shall certify and disclose accordingly.
This certification is a material representation of fact upon which reliance was placed when this transaction was made or entered into. Submission of this certification is a prerequisite for making or entering into this transaction imposed by section 1352, Title 31, U.S. Code. Any person who fails to file the required certification shall be subject to a civil penalty of not lessthan $10,000 and not more than $100,000 for each such failure.
Certification Regarding Nondiscrimination
By electronically signing the Certification Pages, the Authorized Organizational Representative (or equivalent) is providing the Certification Regarding Nondiscrimination contained in Exhibit II-6 of the Proposal & Award Policies & Procedures Guide.
Certification Regarding Flood Hazard Insurance
Two sections of the National Flood Insurance Act of 1968 (42 USC §4012a and §4106) bar Federal agencies from giving financial assistance for acquisition or construction purposes in any area identified by the Federal Emergency Management Agency (FEMA) as having special flood hazards unless the: (1) community in which that area is located participates in the national flood insurance program; and(2) building (and any related equipment) is covered by adequate flood insurance.
By electronically signing the Certification Pages, the Authorized Organizational Representative (or equivalent) or Individual Applicant located in FEMA-designated special flood hazard areas is certifying that adequate flood insurance has been or will be obtained in the following situations: (1) for NSF grants for the construction of a building or facility, regardless of the dollar amount of the grant; and(2) for other NSF grants when more than $25,000 has been budgeted in the proposal for repair, alteration or improvement (construction) of a building or facility.
Certification Regarding Responsible Conduct of Research (RCR) (This certification is not applicable to proposals for conferences, symposia, and workshops.)
By electronically signing the Certification Pages, the Authorized Organizational Representative is certifying that, in accordance with the NSF Proposal & Award Policies & Procedures Guide, Chapter IX.B. , the institution has a plan in place to provide appropriate training and oversight in the responsible and ethical conduct of research to undergraduates, graduate students and postdoctoral researchers who will be supported by NSF to conduct research. The AOR shall require that the language of this certification be included in any award documents for all subawards at all tiers.
Page 2 of 3
Not for distribution
CERTIFICATION PAGE - CONTINUED
Certification Regarding Organizational Support
By electronically signing the Certification Pages, the Authorized Organizational Representative (or equivalent) is certifying that there is organizational support for the proposal as required by Section 526 of the America COMPETES Reauthorization Act of 2010. This support extends to the portion of the proposal developed to satisfy the Broader Impacts Review Criterion as well as the Intellectual Merit Review Criterion, and any additional review criteria specified in the solicitation. Organizational support will be made available, as described in the proposal, in order to address the broader impacts and intellectual merit activities to be undertaken.
Certification Regarding Federal Tax Obligations
When the proposal exceeds $5,000,000, the Authorized Organizational Representative (or equivalent) is required to complete the following certification regarding Federal tax obligations. By electronically signing the Certification pages, the Authorized Organizational Representative is certifying that, to the best of their knowledge and belief, the proposing organization: (1) has filed all Federal tax returns required during the three years preceding this certification; (2) has not been convicted of a criminal offense under the Internal Revenue Code of 1986; and (3) has not, more than 90 days prior to this certification, been notified of any unpaid Federal tax assessment for which the liability remains unsatisfied, unless the assessment is the subject of an installment agreement or offer in compromise that has been approved by the Internal Revenue Service and is not in default, or the assessment is the subject of a non-frivolous administrative or judicial proceeding.
Certification Regarding Unpaid Federal Tax Liability
When the proposing organization is a corporation, the Authorized Organizational Representative (or equivalent) is required to complete the following certification regarding Federal Tax Liability: By electronically signing the Certification Pages, the Authorized Organizational Representative (or equivalent) is certifying that the corporation has no unpaid Federal tax liability that has been assessed, for which all judicial and administrative remedies have been exhausted or lapsed, and that is not being paid in a timely manner pursuant to an agreement with the authority responsible for collecting the tax liability.
Certification Regarding Criminal Convictions
When the proposing organization is a corporation, the Authorized Organizational Representative (or equivalent) is required to complete the following certification regarding Criminal Convictions: By electronically signing the Certification Pages, the Authorized Organizational Representative (or equivalent) is certifying that the corporation has not been convicted of a felony criminal violation under any Federal law within the 24 months preceding the date on which the certification is signed.
Certification Dual Use Research of Concern
By electronically signing the certification pages, the Authorized Organizational Representative is certifying that the organization will be or is in compliance with all aspects of the United States Government Policy for Institutional Oversight of Life Sciences Dual Use Research of Concern.
AUTHORIZED ORGANIZATIONAL REPRESENTATIVE SIGNATURE DATE
NAME
TELEPHONE NUMBER EMAIL ADDRESS FAX NUMBER
fm1207rrs-07
Page 3 of 3
Not for distribution
COVER SHEET FOR PROPOSAL TO THE NATIONAL SCIENCE FOUNDATIONFOR CONSIDERATION BY NSF ORGANIZATION UNIT(S) - continued from page 1 (Indicate the most specific unit known, i.e. program, division, etc.)
Continuation Page
IIS - ROBUST INTELLIGENCE
TABLE OF CONTENTSFor font size and page formatting specifications, see PAPPG section II.B.2.
Total No. of Page No.*Pages (Optional)*
Cover Sheet for Proposal to the National Science Foundation
Project Summary (not to exceed 1 page)
Table of Contents
Project Description (Including Results from Prior
NSF Support) (not to exceed 15 pages) (Exceed only if allowed by aspecific program announcement/solicitation or if approved inadvance by the appropriate NSF Assistant Director or designee)
References Cited
Biographical Sketches (Not to exceed 2 pages each)
Budget (Plus up to 3 pages of budget justification)
Current and Pending Support
Facilities, Equipment and Other Resources
Special Information/Supplementary Documents(Data Management Plan, Mentoring Plan and Other Supplementary Documents)
Appendix (List below. )
(Include only if allowed by a specific program announcement/solicitation or if approved in advance by the appropriate NSFAssistant Director or designee)
Appendix Items:
*Proposers may select any numbering mechanism for the proposal. The entire proposal however, must be paginated.Complete both columns only if the proposal is numbered consecutively.
1
17
5
7
5
5
0
Figure 1: Mobile behavior capture for animals. (From National Geographic)
MRI: Development of a Mobile Human Behavior Capture SystemInstrument Location: Carnegie Mellon University, as well as deployments in the field such as in subject’s
homes. The system development will be located in CMU’s Motion Capture Lab and various mechanical and
electronics fabrication areas in CMU’s Robotics Institute.
Instrument Type: A behavior capture system, similar to a motion capture system but more general.
Research Activities to be Enabled
We propose building a ground-breaking Mobile Behavior Capture system that goes far beyond current labo-
ratory markerless motion capture systems used in research and movie production. We will build on our success
with CMU’s Motion Capture Lab and Panoptic Studio (a markerless motion capture facility, Figures 3 and 4).
This system will be transformative by offering a combination of new capabilities: in terms of capturing behav-
ior in natural environments outside the laboratory by being portable, in terms of mobile measurement tracking
moving behavior with multiple system-wide “foveas”, in terms of enabling interactive behavioral experimentsinvolving humans, robots, and also human-robot interaction, in terms of being able to perform markerless cap-
ture as well as tracking markers, in terms of spatial and temporal scalability so that fine as well as gross behaviors
can be captured as well as quick and long duration behaviors, and in terms of multi-modal capture: coordinated
motion, sound, contact, interaction force, and physiological measurements. Our system will support research
on human behavior and human-technology interaction, diagnosis and therapy for people with disabilities, and
robot programming and learning techniques. This is exciting because it allows us to build more realistic models
of human behavior, perform more sophisticated robot experiments, and make a real difference in people’s lives
through better rehabilitation and therapy in place.
Why get out of the lab? Behavior of animals in a zoo is quite different from behavior of animals in the wild,
and robots are being used to extend behavior capture to natural environments (Figure 1). The same is true of
humans: behavior in a lab or motion capture studio is often unnatural. Behavior is shaped by its context. For
example, in order to assist older adults to live independently for as long as possible, we need to understand how
specific individuals behave in their own homes. Figure 2 shows the Aware Home, a house we built to capture
behavior. Unfortunately, it became clear that subjects still treated the house as a laboratory and not as their own
home.
Why measure mobile behavior? Much behavior, especially social behavior, is expressed while on the move,
such as behavior on sidewalks, hallways, stairs, elevators, and outdoors in general, so mobile measurement
Figure 2: Left: The Aware Home: a house we built for behavior capture. Right: Social interactions measured
by our CareMedia system in a hallway of an Alzheimer’s care facility.
1
Figure 3: Left and Middle: Capturing deformation typically involve many markers and cumbersome camera
arrangements. Right: The CMU Panoptic Studio, a video-based capture area within a 6m diameter dome.
is also important (Figure 2). Instrumenting a large volume leads to poor spatial resolution, unless one has a
Hollywood movie-sized budget.
Why interactive? Most motion measurement systems are not real time, and results are typically available the
next day or week. We will build a system that can track selected quantities in real time, such as the motion of a
hand, or behavior transitions. The real time portion of the system will support interactive behavioral experiments
with humans, as well as experiments involving real time control of robots.
Why markerless? Soft materials such as human skin (Figure 3), liquids, and granular materials pose chal-
lenges to measurement systems that rely on markers [30]. Imagine putting markers on vegetables being cut up
during food preparation, or on Alzheimer’s patients to assess their needs for assistance in dressing, feeding, or
cleaning themselves. We have found that such patients immediately focus on and pick at the markers.
Why scalable? Currently, we cannot get high resolution images of details like facial expressions or finger
movements if we allow subjects to move around significantly. We need to pre-plan where the high resolution
measurements should be, instead of following the subjects. Handling quick captures is relatively easy, in that
we can start capture well before an event of interest and stop capture well after it. Capturing long duration
behaviors, and capturing continuously 24/7 creates a flood of data. We are able to compress the data using
standards like H264 in real time, but in order to reduce the flood more, we need to do compression and data
forgetting that is tuned for the experiment being performed or the hypothesis being tested.
Why multi-modal? Human manipulation behavior and also social interaction often involves touching and
forces. We need ways to measure contact and forces, especially in situations where we can’t easily use kinematic
measurements to estimate them. A more complete measurement and understanding of physiological variables
helps us better understand human motion, and also other factors such as human emotion.
What is captured? Our system will include measurement-at-a-distance components such as cameras, ther-
mal imaging, microphone arrays, and radar imaging and velocity measurement, contact measurement compo-
nents such as traditional strain gage instrumentation and resistive and capacitive touch and force sensors as well
as optically measured deformation of elastic materials in contact [29], and physiological measurement compo-
nents that range from current worn devices such as Fitbits and heart monitors to electromyographic sensing and
ultrasound and radar imaging of internal tissues such as muscles and bones [7, 23]. Novel challenges include in-
tegrating this wide range of multi-modal sensors, capturing behavior involving soft deformable objects, liquids,
and granular materials (such as salt, sugar, and flour used in cooking), tracking internal tissue movement using
ultrasound and radar, creating a system that is easy to deploy, calibrate, use, and maintain, and that provides
results quickly and conveniently, and, most importantly, is accepted or ignored by subjects.
Who will use the system? A good predictor of who will use the proposed Mobile Behavior Capture system
is based on who uses CMU’s Motion Capture Lab and Panoptic Studio. Research, courses, and independent
student projects from the Robotics Institute (robot and drone testing, robot programming studies such as robots
learning from imitation or demonstration, human-robot interaction studies, human motion capture for animation,
2
Figure 4: Tracking humans in the Panoptic Studio.
computer vision research and obtaining ground truth measurements, humanoid robot real time control studies
and well as ground truth measurements, and human behavior research), Drama (research and teaching how to act
for and use motion capture), Art (research and teaching animation), the Entertainment Technology Center (ETC)
(research and teaching how to use motion capture), and Biomedical Engineering (research on disease, therapies,
and medical devices) use the Motion Capture Lab and Panoptic Studio. Disney Research Pittsburgh and Boston
Dynamics are two companies that made use of the Motion Capture Lab recently. Perhaps the biggest users so
far are from the computer graphics, animation, and vision communities worldwide. Data made available on the
web has been acknowledged in several hundred papers.
Where has funding come from? To predict future funding, we will review past funding. The Motion Cap-
ture Lab and various versions of the Panoptic Studio have helped us obtain the following NSF funding (total
$18,516,823). The titles of these awards gives an indication of the breadth of research supported by our ex-
isting capture facilities: NSF Young Investigator: Coordination and Control of Dynamic Physical Systems,
0196047, $13,932; Data-Driven Control of Humanoid Robots, 0196089, $254,501; PostDoc: Parallel Search
Algorithms for Automating the Animation of Human Motion, 0196221, $7,740; CADRE: Digital Muybridge:
A Repository for Human Motion Data, 0196217, $1,253,648; Programming Entertainment Robots 0203912,
$66,000; ITR: CareMedia: Automated Video and Sensor Analysis for Geriatric Care, 0205219, $2,131,000;
ITR: Providing Intuitive Access to Human Motion Databases, 0205224, $528,000; Collaborative Research Re-
sources: An Experimental Platform for Humanoid Robotics Research, 0224419, $1,015,000; CISE Research
Instrumentation: Data-Driven Modeling for Real-Time Interaction and Animation, 0242482, $48,394; ITR:
Human Activity Monitoring Using Simple Sensors, 0312991, $338,646; ITR: Collaborative Research: Using
Humanoids to Understand Humans, 0325383, $1,484,667; ITR Collaborative Research: Indexing, Retrieval,
and Use of Large Motion Databases; 0326322, $1,454,000; Collaborative Research: DHB: Human Dynamics
of Robot-Supported Collaborative Work, 0624275, $544,000; Data-Driven Animation of Skin Deformations,
0702556, $349,000; Exploring the Uncanny Valley, 0811450, $362,000; Approximate Dynamic Programming
Using Random Sampling, 0824077, $348,199; II-EN The Human Virtualization Studio: From Distributed
Sensor to Interactive Audiovisual Environment, 0855163, $600,000; RI: Small: Spacetime Reconstruction of
Dynamic Scenes from Moving Cameras, 0916272, $445,771; RI: Medium: Collaborative Research: Trajec-
tory Libraries for Locomotion on Rough Terrain, 0964581, $699,879; CPS: Medium: Collaborative Research:
Monitoring Human Performance with Wearable Accelerometers, 0931999, $1,206,078; Collaborative Research:
Computational Behavioral Science: Modeling, Analysis, and Visualization of Social and Communicative Be-
3
havior, 1029549, $1,531,518; EAGER: 3D Event Reconstruction from Social Cameras, 1353120, $216,000;
RI: Medium: Combining Optimal and Neuromuscular Controllers for Agile and Robust Humanoid Behavior,
1563807, $1,000,000; SCH: EXP: Monitoring Motor Symptoms in Parkinson’s Disease with Wearable Devices,
1602337, $678,850; RI: Small: Optical Skin For Robots: Tactile Sensing and Whole Body Vision, 1717066,
$440,000; and NRI: INT: Individualized Co-Robotics, 1734449, $1,500,000.
This list does not include larger group grants such as IGERT: Interdisciplinary Research Training in Assistive
Technology, 0333420, $3,718,105; and Quality of Life Technology Engineering Research Center, 0540865,
$29,560,917; Disney Research Pittsburgh also provided substantial funding, as did DARPA.
Where will new funding come from? We expect the transformative capabilities of this facility: portabil-
ity, mobility, multiple foveation, real time, markerless, scalable, and multi-modal, to attract new sources of
funding for new types of research, as well as further NSF support. As an example of relevant funding, this
proposal complements a recently funded NSF Expeditions project which includes CMU, Computational Photo-Scatterography: Unraveling Scattered Photons for Bio-imaging. This project will develop optical wearable
devices to make real-time physiological measurements. The proposed capture system would be useful for each
of Atkeson’s current awards: “RI: Medium: Combining Optimal and Neuromuscular Controllers for Agile
and Robust Humanoid Behavior,” 1563807, $1,000,000, to assess robot performance and support new kinds of
feedback for robot control; “RI: Small: Optical Skin For Robots: Tactile Sensing and Whole Body Vision,”
1717066, $440,000, to evaluate robot skin as well as provide ground truth data; and “NRI: INT: Individualized
Co-Robotics, 1734449, $1,500,000. to assess exoskeleton performance and support new kinds of feedback for
exoskeleton control. It would also be useful for Hodgins’ project “SCH: EXP: Monitoring Motor Symptoms in
Parkinson’s Disease with Wearable Devices,” 1602337, $678,850; to provide additional monitoring information
as well as ground truth data. Atkeson is involved in a study of cheetah motor control in South Africa using
motion capture. Although it is not realistic to suggest this capture system would be sent to Africa, it is clear
that a portable motion capture system would be very useful for animal studies in the wild. In terms of the im-
mediate future, CMU and University of Pittsburgh faculty are also preparing an NSF Science and Technology
Center (STC) proposal on Understanding Action Through High Definition Behavioral Analysis. The goal is to
use high quality behavioral monitoring of humans and animals to better understand the neuro-biological basis
of behavior. This proposal has been selected and will be submitted by CMU this year. In addition to funding
similar to the previous funding we have received, we expect to be able to develop new funding sources due to the
coordinated visual, contact, and physiological capture in the areas of wearable devices for medical, entertain-
ment, and other purposes, soft robotics, including robot skin and sensors, and to support a national facility for
exoskeleton and robot evaluation. We hope to be more successful in seeking NIH funding for wearable devices
for preventive medicine and therapy.
Who will be the future users? We expect the Mobile Behavior Capture system to be used in similar ways as
well as new ways we cannot predict that take advantage of its transformative capabilities. We expect the Mo-
bile Behavior Capture system to be a national facility, with CMU hosting visiting researchers, and researchers
worldwide using our data. We expect our data repositories to be widely used by diverse research communities
worldwide, as our current repositories are.
Specific users during development: The development process will be stimulating for the co-PIs, the graduate
students supported by this award, and other students of the co-PIs or involved in the development of this system.
These researchers will largely be from the Robotics Institute and the Machine Learning Department.
Specific anticipated users: As the new system comes online, we expect the pattern of usage described
above to continue: 10s of faculty, postdocs, and graduate students and hundreds of undergraduates in various
courses and projects, as well as visitors and a large number of researchers using the captured data. In addition
to the co-PIs and their students, we expect users from the human-robot interaction (HRI) research community
such as Henny Admoni, a new faculty member in the Robotics Institute (RI). We expect users from the robot
manipulation research community such as Katharina Muelling, Oliver Kroemer and David Held, also new fac-
ulty in the RI. Nancy Pollard, who works on robot hand design, will be a user. Carmel Majidi, soft robotics,
is a likely user. The researchers involved in the NSF STC proposal on Understanding Action Through HighDefinition Behavioral Analysis are likely users: Deva Ramanan, RI, Machine Vision; Mike Tarr, Psychology,
Human behavior; Rita Singh, Language Technologies Institute (LTI), Human verbal behavior; LP Morency, LTI,
Human affect from video and audio; Maysam Chamanazar, Biomedical Engineering, Devices/recording tech-
4
nology; Marios Savides, ECE, Behavior/biometrics/face and posture recognition; Pulkit Grover, ECE, Devices
including dense array EEG; Hae Young Noh, Civil Engineering, Vibration sensors for detecting human actions,
Brooke Feeney, Psychology, Human social behavior; Nathan Urban, U. Pitt. Neurobiology; Avniel Ghuman, U.
Pitt. Neurosurgery; Julie Fiez, U. Pitt. Psychology; and Doug Weber, U. Pitt. BioEngineering, Behavior and
brain-computer interfaces. Rory Cooper and other members of the University of Pittsburgh School of Health
and Rehabilitation Science will be users. This is in addition to continued use by the CMU Drama and Art
Departments and the ETC.
Prior Work and Results from Prior NSF Support
Before discussing NSF support in the last five years, we would like to present earlier research done with NSF
support. 30 years ago Atkeson developed special purpose video hardware to track colored markers in real time,
leading to a spinoff company [14]. This system was used to measure human movement as well as supporting re-
search on visually guided robot behavior and robot learning by watching human teachers, It was clear, however,
that many other aspects of human behavior needed to be measured beyond just optical tracking of movement.
20 years ago at Georgia Tech Atkeson co-lead the construction of the Aware Home (Figure 2) which was in-
strumented with cameras and radio-frequency tracking systems to support research on how technology can help
older adults and people with disabilities live independently in their own homes as long as possible, as well as
other human-technology interaction issues [24]. At CMU he developed prototype behavior measurement sys-
tems using floor vibration and elements of commonly available home security systems such as motion detectors.
Based on this work, it became clear that to observe natural behavior, one had to capture it “in the wild”, rather
than in a zoo or laboratory setting. Living in someone else’s home for a few days of a study, or even sleeping
in a sleep lab for a night, leads to unnatural behavior. Also, many subjects, especially older adults or people
with disabilities, were not willing or able (due to mobility or transportation issues) to travel to or participate in
a lab study, no matter how naturalistic it was. People change their behavior when they are out of their natural
environment (e.g., home, work, school, etc). Robots and smart or assistive environments, and other intelligent
interactive systems need to be developed and tested with actual end users in their natural environments.
As an example of the philosophy that we must capture behavior 24/7 in the wild, 15 years ago Atkeson
helped lead the instrumentation of a skilled nursing facility (an Alzheimer’s unit) as part of the CMU Care-
Media project [1]. Patients in this unit were not able to describe other medical problems or side effects of
medications they were already taking. We developed a camera and microphone network to measure behavior
such as locomotion, eating, and social interaction to try to identify medical problems and drug side effects (Fig-
ure 2). This deployment forced us to address challenges such as automatically processing large amounts of data
to find rare or sparse events, as well as acceptance, privacy, and ethical issues. This work is closely related
to the CMU/Pitt work on using facial capture to track medical issues such as depression, which is best done
continuously 24/7 in the wild [25].
This work led to educational support in the form of an NSF IGERT on Interdisciplinary Research Training
Opportunities in Assistive Technology at CMU and the University of Pittsburgh (PIs Atkeson and Cooper). One
of our guiding principles was that students learn more when they need to leave the academic campus and collect
data and test their systems in the real world. The work also led to a CMU/Pitt NSF Engineering Research Center
on Quality of Life Technology.
15 years ago Hodgins established the CMU Motion Capture Lab, which has collected and provided move-
ment data for a wide range of research ([11], which has been acknowledged in several hundred papers). With
the support of the NSF Engineering Research Center on Quality of Life Technology, this lab made available
movement data and other forms of behavior capture [12]. Similarly, the 10 year old CMU Panoptic Studio built
by Sheikh, an enclosed space instrumented with hundreds of several types of cameras, has provided data on and
software for measuring social interactions ([17], Figures 3, 4, and 5). Narasimhan at CMU is currently develop-
ing outdoor behavior capture devices to be installed on light and sign poles throughout the city of Pittsburgh [8].
A prototype of the capture station is in front of Newell Simon Hall at CMU. One goal is to provide a test
bed for many types of researchers including urban planning and policy as well as well as smart transportation
projects such as CMU’s Traffic 21 [5], CMU’s real-time bus tracking [22], and Pittsburgh’s many autonomous
5
Figure 5: Openpose is widely used human tracking software that came out of the work on the Panoptic Studio [6,
3, 18].
transportation companies. Another goal is to extend current automobile traffic behavior capture techniques to
pedestrians and street life.
CMU has some of the best facilities for optically tracking movement (the geometry of behavior) in the world.
However, multi-modal sensing technology has advanced greatly in the past decade driven by the popularity of
smart phones, and we still have not achieved our goal of ubiquitous unobtrusive 24/7 behavior capture in the
wild.
In terms of a large NSF equipment award, Hodgins was PI and Atkeson was a co-PI for “Collaborative
Research Resources: An Experimental Platform for Humanoid Robotics Research”, 0224419, $1,015,000. This
equipment award was for the development of a humanoid robot in collaboration with a company, Sarcos. This
development was successful, and the robot continues to be an important component of CMU’s research in hu-
manoid robotics. Approximately 10 students, 50 papers, several NSF awards totaling approximately $3,500,000,
and significant DARPA support were enabled by this equipment award.
The most relevant recent award for Atkeson is: (a) NSF award: IIS-1717066 (PI: Atkeson); amount:$440,000; period: 8/1/17 - 7/31/20.
(b) Title: RI: Small: Optical Skin For Robots: Tactile Sensing and Whole Body Vision
(c) Summary of Results: This recent grant is supported work on developing optical approaches for tactile sensing
as well as whole body vision (eyeballs all over the body).
Intellectual Merit: This project will enable robots to feel what they touch. The key idea is to put cameras
inside the body of the robot, looking outward at the robot skin as it deforms, and also through the robot skin
to see nearby objects as they are grasped or avoided. This approach addresses several challenges: 1) achieving
close to human resolution (a million biological sensors) using millions of pixels, 2) reducing occlusion during
grasping and manipulation, and detecting obstacles before impact, and 3) protecting expensive electronics and
wiring while allowing replacement of worn out or damaged inexpensive skin. Technical goals for the project
include first building and then installing on a robot a network of about 100 off-the-shelf small cameras (less than
1 cubic centimeter) that is capable of collecting information, deciding what video streams to pay attention to, and
processing the video streams to estimate forces, slip, and object shape. A transformative idea is to aggressively
distribute high resolution imaging over the entire robot body. This reduces occlusion, a major issue in perception
for manipulation. Given the low cost of imaging sensors, there is no longer a need to restrict optical sensing
to infrared range finders (single pixel depth cameras), line cameras, or low resolution area cameras. Building
a camera network of hundreds of cameras on a mobile skin, and building a multi-modal sensing skin, will be
highly synergistic with developing the proposed mobile behavior capture system.
Broader Impacts: Robots with better sensing can more safely help people. In terms of outreach, we are
developing a robot museum, as described in the broader impacts portion of this proposal.
Development of Human Resources. The project involves one graduate student. We have weekly individual
meetings and weekly lab meetings. The graduate student is performing research, making presentations to our
group, and will give conference presentations and lectures in courses. We will put the graduate student in a
position to be a success in academia and industry.
(d) Publications resulting from this NSF award: [31].
(e) Other research products: We have made instructions on how to build our tactile sensors available on the
web.
6
(f) Renewed support. This proposal is not for renewed support.
The most relevant recent award for Sheikh is: (a) NSF award: IIS-1353120 (PI: Sheikh); amount: $216,000;
period: 9/15/13 - 8/30/15.
(b) Title: EAGER: 3D Event Reconstruction from Social Cameras
(c) Summary of Results: This award supported work on combining information from uninstrumented unsyn-
chronized mobile cameras.
Intellectual Merit: This EAGER project helped establish a new area of visual analysis by providing the
requisite framework for social activity understanding in 3D rather than in 2D. It explored the use of social
cameras to reconstruct and understand social activities in the wild. Users naturally direct social cameras at
areas of activity they consider significant, by turning their heads towards them (with wearable cameras) or by
pointing their smartphone cameras at them. The core scientific contribution of this work is the joint analysis
of both the 3D motion of social cameras (that encodes group attention) and the 3D motion in the scene (that
encodes social activity) towards understanding the social interactions in a scene. A number of internal models
(such as maximizing rigidity or minimizing effort) for event reconstruction were investigated to address the
ill-posed inverse problems involved.
Broader Impacts: The ability to analyze social videos in 3D space and time provides useful tools for
almost any activity that involves social groups working together, such as citizen journalism, search-and-rescue
team coordination, or collaborative assembly teams. The project was integrated with education through teaching
and student training, and collaborated with industry.
Development of Human Resources. The project supported one graduate student.
(d) Publications resulting from this NSF award: [4, 26, 10, 2, 16, 20].
(e) Other research products: This work contributed to the publicly available Openpose software (Figure 5).
(f) Renewed support. This proposal is not for renewed support.
The most relevant recent award for Hodgins is: (a) NSF award: IIS-1602337 (PI: Hodgins); amount:$678,850; period: 9/1/16 - 8/31/19.
(b) Title: SCH: EXP: Monitoring Motor Symptoms in Parkinson’s Disease with Wearable Devices
(c) Summary of Results: This project aims to promote a paradigm shift in PD management through in-home
monitoring using wearable accelerometers and machine learning (Figure 10). Novel algorithms and experimen-
tal protocols are developed to allow for robust detection and assessment of PD motor symptoms during daily
living environments.
Intellectual Merit: Specifically, this project develops algorithms for weakly-supervised learning, time se-
ries analysis, and personalization of classifiers. This project collects long-term (several weeks), in-home data
where the participants’ actions are natural and unscripted. Participants use a cell phone app to label their
own data, marking segments of time as containing or not containing the occurrence of a PD motor symptom.
This project extends multiple-instance learning algorithms for learning from weakly-labeled data in time series.
Additional major technical challenges include detection of subtle motor symptoms and local minima during
optimization. To further increase robustness and generalization, this project explores the use of personalization
algorithms to learn person-specific models of motor symptoms from unsupervised data. The proposed tech-
niques for weakly-supervised learning and personalization are general, and they can be applied to other human
sensing problems.
Broader Impacts: This project aims to promote a paradigm shift in Parkinson’s Disease (PD) management.
This disease poses a serious threat to the elderly population, affecting as many as one million Americans.
Costs associated with PD, including treatment, social security payments, and lost income from inability to
work, is estimated to be nearly $25 billion per year in the United States alone. The current state-of-the-art
in PD management suffers from several shortcomings: (1) frequent clinic visits are a major contributor to the
high cost of PD treatment and are inconvenient for the patient, especially in a population for which traveling
is difficult; (2) inaccurate patient self-reports and 15-20 minute clinic visits are not enough information for
doctors to accurately assess their patients, leading to difficulties in monitoring patient symptoms and medication
response; and (3) motor function assessments are subjective, making it difficult to monitor disease progression.
Furthermore, because they must be performed by a trained clinician, it is infeasible to do frequent motor function
assessments. This project explores how we can do better using wearable devices to monitor this disease.
Development of Human Resources. The project supported one graduate student.
7
Figure 6: Left: Two visible light cameras combining multiple lenses and image chips. Right: Three RGBD
cameras combining infrared illumination, a visible light camera, and an infrared camera (a single camera, a
stereo pair or a time-of-flight depth measurement).
(d) Publications resulting from this NSF award: [32].
(e) Other research products: None yet.
(f) Renewed support. This proposal is not for renewed support.
There is no prior NSF support for Katerina Fragkiadaki.
State of the Art in Mobile Human Behavior Capture
Robot cameras were pioneered in the making of 2001: A Space Odyssey and the first Star Wars movie [28].
Robotic pan/tilt mounts for cameras are now commonly available. There are commercial sources for pan/tilt
robot cameras on trolleys to provide mobile capture for television and movie studios as well as mobile capture
for sports events (such as track and field events). These trolleys are limited to tracks on the ground, floor, or
ceilings. Companies such as Bot&Dolly, Ross, Telemetrics, Mark Roberts Motion Control, and Camerobot
Systems sell cameras mounted on robot arms for six degree of freedom camera position and orientation control
(within the workspace of the robot arm, which is often quite limited). Consumer-level robots such as Jibo and
mobile robots such as Pepper can serve as (slow) camera platforms. Telepresence robots also serve as camera
platforms. There are a few “selfie-bots” that can move autonomously. Remotely controlled and autonomous
flying drones with cameras are widely available. Omnidirectional cameras are used to capture for virtual reality
playback where only orientation can be controlled by the viewer. There are marker-based motion capture sys-
tems such as Vicon that offer a real time tracking option as well as provide support for capturing simultaneous
physiological measurements such as EMG. Multi-modal cameras are common in RGBD cameras (Figure 6),
which typically have a visible light cameras, an infrared illuminator or pattern projector, one or more infrared
cameras, and a microphone, and in the form of robot heads, which typically combine some of visible light
stereo imaging, infrared depth measurement, visible light and infrared illumination, microphone arrays, and
infrared lidar. Self-driving cars add radar to the mix. However, we are not aware of any integrated system with
multiple mobile camera platforms that combines the attributes we will develop: markerless interactive scalable
multi-modal behavior capture. One technological wave we are surfing is driven by smart phones, and the race
to provide smart phone-based sensing such as optical, depth (Figure 7), thermal, ultrasound, and radar imaging
(Figure 8). Another technological wave is the development of miniaturized “soft” electronics for wearable sen-
sors (Figure 12). We are adopting the philosophy of including as many of these sensors as we can, limited by
cost, so that the system can support a wide range of studies we are planning, as well as those we cannot predict
at this time.
Research Needs
We will expand on the discussion in the initial section of what motivates each of the desired attributes of the
system we will develop: portable self-mobile markerless interactive scalable multi-modal behavior capture This
proposal is based on our experience with our Panoptic Studio (Figures 3 and 4) and our Motion Capture Lab. We
are frustrated with the limited sensing volume provided by a 6m diameter dome, or any marker-less capture sys-
tem. The scale of the behavior and the number of available cameras defines the camera arrangement, achievable
sensing volume, and resulting spatial resolution. We cannot simultaneously capture fine scale behavior such
8
Figure 7: Depth images from time of flight imagers.
as facial expressions, finger movements, or gestures like a slight shrug, while capturing large scale behavior
such as dance, running, or just walking around. Our cameras are fixed, with a fixed orientation, and a fixed
lens setting. Often the behavioral context is unnatural: it is like being in a children’s playhouse or treehouse, or
living in the currently popular “tiny houses”. We are also tired of putting hundreds of tiny markers on subjects
in the Motion Capture Lab to capture facial expressions and skin deformation. This process takes a long time at
the start and end of each capture session, is cumbersome for the subject, and fundamentally limits the resolution
of what can be captured.
We propose to build a Behavior Capture System to address the above needs, as well as other needs that are
not met by our current behavior capture systems: portability and flexibility so behavior can be captured in its
natural domain (such as in the home, or at a sports event or concert) rather than in a lab setting, spatial scalabilityso that fine as well as gross behaviors can be captured, temporal scalability so that long duration or continuous
24/7 behaviors can be captured, mobility during capture so the fovea(s) of the system can accurately track a
moving locus of behavior, and greater multi-modality to more fully capture human and robot behavior including
devices to capture touch, contact force, and physiological behavior as well as behavior at a distance. We will
go beyond capturing rigid bodies to capturing the behavior of deformable bodies, liquids, and granular materi-
als. We will integrate measurements of internal tissue movement and changes along with other physiological
measurements.
Description of the Research Instrument
Our design involves: a) A modular system of Multi-Modal Cameras (MMCams), arrays of measurement-at-a-distance components: multiple coordinated visible light imaging devices, optical depth measurement devices,
microphone arrays, thermal imaging, radar imaging and distance and velocity measurement, other radio fre-
quency and electric and magnetic field measurements, and lighting for night and low light situations.
b) The MMCams can be assembled, calibrated, and synchronized into various size groups to match the
capture environment. We will use the modular panels of the Panoptic Studio dome as the starting point for
our design. These panels hold a mix of high resolution and low resolution cameras, a depth camera, and
synchronization hardware (Figure 3).
c) the MMCams can be mounted on robots, vehicles, drones, and even the subjects themselves to track
mobile behavior. We plan to purchase omnidirectional mobile robot bases to explore this capability (Figure 9).
d) Additional contact measurement components which include traditional strain gage instrumentation and
Figure 8: Phone-based thermal imagers (2), an ultrasound imager, and phone-based radar.
9
Figure 9: Left: A Segway omnidirectional robot base. Middle: Currently available small cameras such as the
Naneye (1x1x1mm). Right: Stereo Naneye to also measure depth.
resistive and capacitive touch and force sensors embedded in manipulated objects and surfaces such as furniture,
appliances and their controls, floors, and walls, as well as extending our work on optically measured deforma-
tion of materials in contact [29]. These components will also be mountable on subjects. We have extensive
experience attaching small accelerometers, gyroscopes, and magnetic field sensors to humans and robots, for
example. We also will explore mounting very small cameras on subjects to augment the capture system’s views
and reduce occlusion (Figure 9).
e) Additional physiological measurement components which generalize from current worn devices such as
Fitbit and heart monitors to include electromyographic activity and recently developed ultrasound imaging of
internal tissues such as muscle fiber movement [7, 23]. We will extend newly available radar and ultrasound
chips and cell phone-based devices such as those used for gesture recognition to physiological measurements
such as breathing, heartbeat, and muscle state [9]. Hodgins’ work on on monitoring Parkinson’s Disease using
wearable devices is an example of using physiological sensors (Figure 10) [32]. Markvicka and Majidi at CMU
are developing small bandaid-like physiological sensors that we plan to use (Figure 12) [15].
Real time tracking will be included to support tracking mobile behaviors as well as supporting studies
of interactive robots and real-time robot control and learning. Integration of many multimodal sensors will
be a major emphasis. This includes time synchronization as well as convenient calibration and rectification
of different measurements. For example, we will combine optical, audio, contact and force, and vibration
measurements to understand how an older adult with a motor disability such as tremor can more effectively
utilize current tablet and phone technology. People with motor difficulties repeatedly “press” or touch the
wrong area of the screen, and are unable to undo or recover. Many older adults can’t use ride services like Uber,
which they desperately need because they have lost their license, because they can’t operate the smart phone
interface.
Our existing software base is described in [13]. We have developed a method to automatically reconstruct
full body motion of interacting multiple people. Our method does not rely on a 3D template model or any
subject-specific assumption such as body shape, color, height, and body topology. Our method works robustly
in various challenging social interaction scenes of arbitrary number of people, producing temporally coherent
time-varying body structures. Furthermore, our method is free from error accumulation and, thus, enables
capture of long term group interactions (e.g., more than 10 minutes). Our algorithm is designed to fuse the weak
Figure 10: Left: A wearable accelerometer system. Right: A Parkinson’s patient whose tremor is being
monitored by cameras and wearable accelerometers (red circles) [32].
10
Figure 11: Several levels of proposals generated by our method. (a) Images from up to 480 views. (b) Per-
joint detection score maps. (c) Node proposals generated after non-maxima suppression. (d) Part proposals by
connecting a pair of node proposals. (e) Skeletal proposals generated by piecing together part proposals. (f)
Labeled 3D patch trajectory stream showing associations with each part trajectory. In (c-f), color means joint or
part labels shown below the figure.
perceptual processes in the large number of views by progressively generating skeletal proposals from low-
level appearance cues, and a framework for temporal refinement is also presented by associating body parts to
reconstructed dense 3D trajectory stream (Figure 11). Our system and method are the first in reconstructing full
body motion of more than five people engaged in social interactions without using markers. We also empirically
demonstrate the impact of the number of views in achieving this goal.
Intellectual Merit
We will address several intellectual and practical challenges in the development of this system. The system will
be a combination of moving sensor panels so an initial calibration is not adequate. A critical challenge is to
continuously calibrate the system be able to integrate information across panels. We will make this problem
easier by marking the panels and the environment and dedicating sensors on each panel to continuously track
neighboring panels and static fiducial markers. We will use optimization to continuously estimate all the panel
locations by minimizing the error in fitting these measurements, as well as the measurements of the subjects,
which also provide calibration information. We will also explore a number of Simultaneous Localization and
Mapping (SLAM) techniques from robotics. Another challenge for a distributed system is synchronization
(sharing a global clock). We will use wired connections rather than wireless when necessary to simplify syn-
chronization. We will also use wired umbilicals to the mobile robots to provide power rather than relying on
cumbersome and dangerous batteries.
Measuring small details like finger movements, facial expressions, and subtle social cues requires high
resolution, but monitoring a large space limits resolution if it is uniform. We plan to address this challenge by
implementing movable cameras and computer controlled lenses. The MMCam components will be mounted
on separate pan/tilt mounts with computer controlled lenses (zoom and focus) and the camera panels will also
be pointable. The mobile bases will be steerable. This introduces a new challenge. We will need to be able to
process enough information in real time to provide guidance information to all these actuators. We will simplify
this problem by using depth cameras to reduce the computational load to track objects in 3D, and thermal
cameras to make it easier to find and track human body parts. We will also take advantage of the continual
improvement of GPUs to process more of the more complex visible light imaging information in real time.
Even with pointable cameras occlusion is a major challenge. In addition, human appearance and configu-
ration variation is immense. Clothes and skin sliding across a muscle are difficult challenges, since the visible
texture moves relative to the actual limb. Usually fixed cameras are pointed to the center of a capture area.
Multiple people in such an area tend to move to the edges where resolution is less good, just as people in an
elevator move to the walls. We will extend our work on the Panoptic Studio to handle these and other issues that
arise.
Practical challenges include: How do we minimize the cost and hassle of deploying the system? How do
we maximize the acceptance and minimize the invasiveness of the system for subjects? How do we minimize
the cost and hassle of analyzing the data?
11
We expect our behavior capture work to support the development of a great deal of other behavior recog-
nition and monitoring technology in research and in for consumers. From a machine learning point of view,
our data can be used to train learning systems to process data using many fewer sensors. This may lead to
more inexpensive ways to recognize and track human behavior from devices like phones, sensors placed in the
environment, and wearable technology.
We also expect our system to be used to design and prototype, and potentially train operators for, future
monitoring systems in care facilities (hospitals, nursing homes, assisted living, and homes in general).
Justification for a Development Proposal
How will the end result of the effort be a stable shared-use research instrument, rather than technology devel-opment, a device, a product or a technique/protocol? We hope our description above and our track record with
building the Panoptic Studio will convince you we can create a stable shared-use research instrument. What sig-nificant new capabilities, not available in an instrument provided by a vendor, will the new instrument provide?This is discussed above. In what way does the instrument development require design and development work thatmust be undertaken or has been undertaken in-house, rather than through readily available/published designsfound in the literature? There are companies (particularly in the movie and special effects business) that can
help in setting up many cameras (movie sets) or capturing an expensive stunt. There are robotic camera systems
(as described in the state of the art section). However, these companies are only interested in capturing beautiful
images and videos, not in data or accurate measurement, and they are out of the typical academic’s price range.
We have the expertise to combine imaging, contact, and physiological measurements into useful integrated data.
To what extent does the instrument development require/benefit from a team of scientists/engineers/techniciansthat bring a variety of skills to the project? This work combines expertise in computer vision, robotics, hard-
ware, instrumentation, and physiological sensors. For what activities does the instrument development requirea significant number of person-hours, more so than simple “assembly” of purchased parts? See the challenges
listed in the section on Intellectual Merit. To what extent does the instrument development require time-framesfor completion that are longer than are required for plug-and-play or assembled instruments? We expect to
spend a full year designing the measurement-at-a-distance component, and a full year constructing that part. We
expect to spend another year perfecting and implementing contact and physiological sensing systems. The final
year focuses on evaluation and refinement of the integrated system. Does the instrument development requirethe use of a machine shop or a testbed to fabricate/test unique components? Yes. Does the instrument devel-opment effort involve risks in achieving the required specifications, and what is the risk mitigation plan? Risks
are listed in the section on Intellectual Merit. Risk mitigation is achieved by using a wide variety of sensors to
simplify computation, and to dedicate some sensors to continuous self-calibration of the system.
Management Plan
The mobile behavior capture system will be placed (deployable) in a number of rooms at CMU, including
the existing Motion Capture Lab. The system will be operated as needed. We expect it will be used daily, with
downtimes caused by a need to transport the system long distances to a subject’s home or a rehabilitation facility,
for example. The system will be maintained by the developers. The computers will be maintained by CMU’s
Information Technology staff. We will allocate instrument time in the same way we use for current facilities,
a simple web-based signup system. The students being supported by the CMU cost sharing funding will assist
new users. As the system comes online, we will advertise it both locally and nationally on appropriate email
lists.
Organization of the project team: We will have three tracks: measurement at a distance component
development, system integration, and wearable and deployed sensor design and fabrication. Development of
the measurement-at-a-distance MMCams will be led by Katerina Fragkiadaki, a new professor who works on
machine learning for computer vision. System integration, development of the robotics aspects of the system and
development of the contact, force, and physiological sensors will be led by the PI, Chris Atkeson. We will use
our technical expertise developed in building the Panoptic Studio (Sheikh) and Motion Capture Lab (Hodgins).
12
Figure 12: Smart adhesive patch. Photographs of three variations of the smart adhesive patch including, from
left to right, an accelerometer, pulse oximeter, and pressure sensor. Each patch includes a coin cell battery,
power regulation, wireless processor, and a digital (or analog) sensor that is assembled on a medical grade
adhesive. (left) For size comparison an adhesive bandage is included (Tough-Strips, Band-Aid) [15].
We will need and Atkeson has extensive experience in robot design, system integration, and multi-modal sensor
design.
In the first year we will perform the detailed design of the system, and build prototypes of the MMCams
mounted on omnidirectional mobile robots (see budget justification for more information). We will be able to
evaluate these prototypes by operating them in conjunction with the existing Motion Capture Lab and Panoptic
Studio. In the second year we will build the complete measurement-at-a-distance system, and continue proto-
typing and evaluating other types of the system. In the third year we will build the complete set of other sensors:
contact, force, and physiological measurement systems. We will both integrate and evaluate the system. In the
fourth year we will focus on evaluating our system by testing a variety of experiment designs and other usage
scenarios, and remedying any deficiencies we find. We will do extensive testing against ground truth data as
well as our existing systems, the Motion Capture Lab and the Panoptic Studio.
During the development process the graduate students performing the development will be closely super-
vised by the co-PIs. We will have weekly project meetings where we perform activities such as design reviews,
code walkthroughs, performance assessment, API design, and overall project coordination. The development
group will invite users to these meetings and listen carefully to user feedback.
There are four co-PIs involved in the project, and approximately four graduate students (depending on
recruitment). The graduate students are supported by $800,000 (approximately 8 student-years) CMU cost
sharing support. The co-PIs are supported by other projects.
The design of the mobile behavior capture system is described in the section “Description of the Research
Instrument and Needs”. For the measurement at-a-distance system we will use the construction techniques used
to build the Panoptic Studio and mobile robots in the CMU Robotics Institute. We both use our machine shop
as well as web interfaces to order parts fabricated by others. In year 4 we will certify and then commission the
system.
Project activities include: measurement-at-a-distance: designing and building MMCam, camera panels,
mobile robots, mounting the cameras on the robots, creating the power and signal wiring, getting the computers
functioning, writing the software, analyzing the data, and designing and building calibration and test fixtures.
Include a description of parts and materials, deliverables and estimated schedules and costs for each phaseof the project as appropriate.
13
The total estimated costs for each year are described in the budget section. In the first year we expect
to purchase five computers at approximately $12,000 each (quote from Exxact). These computers have been
equipped with four state of the art GPUs. We expect the computers and GPUs we will actually purchase a
year from now will cost about the same but be even more powerful. To network the above computers we will
buy an Infiniband network switch for approximately $8000 (quote from Dell). We also plan to purchase two
mobile bases which we estimate will cost approximately $40,000 each (quote from Segway). This mobile base
is one of the few omnidirectional bases we have found that are fast enough to keep up with human walking
(1.3m/s) and strong enough to carry up to two of the above computers and 8 MMCams. Again, a year from
now we will again survey available mobile bases. We have estimated the cost of MMCams by pricing visible
light cameras (a cluster of 6 cameras attached to an NVIDIA board with a TX2 Jetson GPU: Leopard Imaging
LI-JETSON-KIT-IMX477CS-X $1600) and time of flight depth cameras (Basler tof640-20gm 850nm $2340).
There are additional costs for synchronization hardware and other wiring. We have based this cost estimate on
costs we saw building the Panoptic Studio.
In the first year we will also begin to prototype contact, force, and physiological sensors. The costs of
individual components are relatively cheap (in the hundreds of dollars: consumer level thermal, ultrasound, and
radar imaging sensors are typically $200-300). We have based the total costs for this year based on our historical
costs for this type of development.
We have provided a total estimate for costs involving components less than $5000 each as “fabricated equip-
ment”. For fabricating equipment we have based our estimates on our historical costs for developing this type
of equipment.
In year 2 we are building the full measurement-at-a-distance system, adding 15 computers at $12000 (quote
from Exxact) each and 8 mobile bases at $40000 (quote from Segway) each. Additional funds are requested for
64 MMCams, and continuing development of contact, force, and physiological sensors.
In year 3 and 4 we focus on building out the full system, developing ground truth testing equipment, and
fixing any design flaws. We expect to develop custom electronics for the contact, force, and physiological
sensors.
We have attempted to reduce risk as much as possible. We are reusing much of the Panoptic Studio design
for the MMCam panel and synchronization hardware. We will dedicate hardware on the mobile elements to
directly measure the location of other elements and fiduciaries placed around the capture volume, rather than
trying to infer camera position only from image data. We have extensive experience with contact and force
sensors, and taking advantage of Atkeson’s work in the project “RI: Small: Optical Skin For Robots: Tactile
Sensing and Whole Body Vision,”. We will benefit from the assistance of Carmel Majidi, an expert in soft
sensors in the CMU Mechanical Engineering Department. We will use our weekly meetings to assess new risks
and for quarterly re-analyzing and modifying the project plan to keep it within scope, schedule and budget.
We have had great success making public most data collected in the Motion Capture Lab (mocap.cs.cmu.edu and kitchen.cs.cmu.edu) and Panoptic Studio (domedb.perception.cs.cmu.edu).
Data made available on the web has been acknowledged in several hundred papers.
Broader Impacts of the Proposed Work
Impact on the research community of interest, and How will we attract and involve other researchers? We
expect to continue to make capture data available on the web. Data made available so far has been acknowledged
in several hundred papers, mostly from the computer graphics, animation, and vision communities worldwide.
This form of usage is freely available to all, including those from non-Ph.D. and/or minority-serving institutions.
We will host visitors who wish to use our facilities, as we do now. As we have described in this proposal, this
instrument development will result in a mobile behavior capture device that is not only unique across CMU, but
also worldwide, making a substantial improvement in our capabilities to conduct leading-edge research as well
as leading edge research training.
Outreach: A major outreach initiative led by Atkeson is the creation of a physical and virtual Robot Mu-
seum. So far we have created physical exhibits on juggling robots, robot actuation (gears vs. direct drive),
mobile robots, soft robots, Steve Jacobsen and Sarcos, robots in literature, legged robots, computer graphics
14
(Ivan Sutherland), and AI (Newell and Simon). Our next major initiatives are 1) to develop cell phone apps that
trigger off augmented reality (AR) tags and robot pictures in halls to provide a self-guided tour of the Robotics
Institute, and 2) use virtual reality (VR) to provide access to our collection from anywhere in the world. We want
anyone to be able to design, build, debug, evaluate, and repair a historical robot in virtual reality. The impact
of our outreach will be increased by a new Disney TV show based on the characters from the Disney movie
Big Hero 6, including the inflatable medical robot Baymax inspired by Atkeson’s work on inflatable robots. We
have coordinated our outreach activities with the larger outreach efforts of CMU’s Robotics Institute to scale
up reach and effectiveness. Our technologies are being shared by being published, and papers and software are
available electronically.
Is student participation in development appropriate? This project will engage several graduate students in
instrument development activities. We miss the old days when students had to make their own oscilloscopes.
We believe this is an excellent way for a new student to get to know the field, while dealing with a concrete set
of problems. We believe this instrument development work will inspire students to ask new questions and enrich
the rest of their graduate and future careers. An excellent example of that is the students that participated in
developing the Panoptic Studio. Their work led to many papers, participation in workshops sharing information
between related groups, and prominence in their fields [13, 21, 4, 19, 27, 26, 10, 16, 20].
Participation of Underrepresented Groups: Two of the co-PIs of this proposal are female. We expect this
will encourage female students to participate in the instrument development. Because one of the foci of the
system is for rehabilitation and therapy, and we will work with the University of Pittsburgh School of Health
and Rehabilitation Sciences, we also expect participation by faculty and students with disabilities. In terms of
more general outreach to underserved populations, we will make use of ongoing efforts in the Robotics Institute
and CMU-wide. These efforts include supporting minority visits to CMU, recruiting at various conferences and
educational institutions, and providing minority fellowships. As the Robotics Institute PhD admissions chair in
2016, Atkeson led a process which resulted in 31% of acceptances going to female applicants. As a member of
the Robotics Institute faculty hiring committee in 2016, Atkeson participated in a process that led to 10 out of
18 interviewees being female. Half of the faculty hired were women. As the head of Robotics Institute hiring
in 2018, Atkeson is leading a process in which 9 out of 20 interviewees are female. Atkeson is assisting efforts
at CMU to raise money for fellowships for students who will help us in our efforts to serve diverse populations
and communities, including our own.
Dissemination Plan: For a more complete description of our dissemination plan, see our Data Management
Plan. We will maintain a public website to freely share our captured data with video material. We will present
our work at conferences and publish it in journals, and will use these vehicles to advertise our work to potential
collaborators in science and industry.
Technology Transfer: The best way to transfer technology is by having students go to industry. Three recent
students work at Boston Dynamics transferring our work in robotics to commercial applications, one recent
student and recent postdoc work on self-driving cars at Uber, one recent student works on self-driving cars at
Apple, and one recent student works on humanoid robotics at the Toyota Research Institute. An older former
student is the CTO of the Amazon drone effort. Several older former students work at Google. We are thrilled
that we and our students are part of the robotics revolution. Sheikh leads a Facebook/Oculus research lab in
Pittsburgh as well as being a professor at CMU, which is another form of technology transfer.
15
References
[1] A. J. Allin, C. G. Atkeson, H. Wactlar, S. Stevens, M. J. Robertson, D. Wilson, J. Zimmerman, and
A. Bharucha. Toward the automatic assessment of behavioral disturbances of dementia. In Fifth Interna-tional Conference on Ubiquitous Computing (UbiComp’03), 2nd International Workshop on UbiquitousComputing for Pervasive Healthcare Applications, 2003.
[2] Ido Arev, Hyun Soo Park, Yaser Sheikh, Jessica Hodgins, and Ariel Shamir. Automatic video-editing of
footage from multiple social cameras. In ACM SIGGRAPH, 2014.
[3] Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh. Realtime multi-person 2d pose estimation using
part affinity fields. In CVPR, 2017.
[4] Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh. Realtime multi-person 2d pose estimation using
part affinity fields. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.
[5] CMU. Traffic21. traffic21.heinz.cmu.edu. [Online; accessed Oct-16-2017].
[6] CMU-Perceptual-Computing-Lab. Openpose: A real-time multi-person keypoint detection and multi-
threading C++ library. github.com/CMU-Perceptual-Computing-Lab/openpose. [Online;
accessed Oct-16-2017].
[7] D. J. Farris and G. S. Sawicki. Human medial gastrocnemius force–velocity behavior shifts with locomo-
tion speed and gait. In Proc. Natl. Acad. Sci. USA, volume 109, pages 977–982, 2012.
[8] K. Fatahlian. Led street light research project part ii: New findings. repository.cmu.edu/architecture/117. [Online; accessed Oct-16-2017].
[9] Google. Soli. atap.google.com/soli. [Online; accessed Oct-16-2017].
[10] Paulo Gotardo, Tomas Simon, Yaser Sheikh, and Iain Matthews. Photogeometric scene flow for high-detail
dynamic 3d reconstruction. In International Conference on Computer Vision (ICCV), 2015.
[11] J. K. Hodgins. CMU graphics lab motion capture database. mocap.cs.cmu.edu. [Online; accessed
Oct-16-2017].
[12] J. K. Hodgins. Grand challenge data collection. kitchen.cs.cmu.edu. [Online; accessed Oct-16-
2017].
[13] Hanbyul Joo, Hao Liu, Lei Tan, Lin Gui, Bart Nabbe, Iain Matthews, Takeo Kanade, Shohei Nobuhara,
and Yaser Sheikh. Panoptic studio: A massively multiview system for social motion capture. In The IEEEInternational Conference on Computer Vision (ICCV), 2015.
[14] Newton Labs. Cognachrome. www.newtonlabs.com/cognachrome. [Online; accessed Oct-16-
2017].
[15] Eric Markvicka. Soft-Matter Robotic Materials. PhD thesis, Carnegie Mellon University, 2018.
[16] Varun Ramakrishna, Daniel Munoz, Drew Bagnell, Martial Hebert, and Yaser Sheikh. Pose machines:
Articulated pose estimation via inference machines. In European Conference on Computer Vision (ECCV),2014.
[17] Y. Sheikh. CMU panoptic dataset. domedb.perception.cs.cmu.edu. [Online; accessed Oct-16-
2017].
[18] Tomas Simon, Hanbyul Joo, Iain Matthews, and Yaser Sheikh. Hand keypoint detection in single images
using multiview bootstrapping. In CVPR, 2017.
[19] Tomas Simon, Hanbyul Joo, Iain Matthews, and Yaser Sheikh. Hand keypoint detection in single images
using multiview bootstrapping. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR),2017.
[20] Tomas Simon, Jack Valmadre, Iain Matthews, and Yaser Sheikh. Separable spatiotemporal priors for
convex reconstruction of time-varying 3d point clouds. In European Conference on Computer Vision(ECCV, 2014.
[21] Tomas Simon, Jack Valmadre, Iain Matthews, and Yaser Sheikh. Kronecker-markov prior for dynamic 3d
reconstruction. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 39(11):2201–
2214, 2017.
[22] A. Steinfeld. Tiramisu transit. www.tiramisutransit.com. [Online; accessed Oct-16-2017].
[23] J. M. D. Taylor, A. S. Arnold, and J. M. Wakeling. Quantifying achilles tendon force in vivo from ultra-
sound images. Journal of Biomechanics, 49(14):3200–3208, 2016.
[24] Georgia Tech. Aware home. www.awarehome.gatech.edu. [Online; accessed Oct-16-2017].
[25] F. De La Torre. Intraface. www.humansensing.cs.cmu.edu/intraface. [Online; accessed
Oct-16-2017].
[26] Minh Vo, Srinivas Narasimhan, and Yaser Sheikh. Spatiotemporal bundle adjustment for dynamic 3d
reconstruction. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
[27] Shih-En Wei, Varun Ramakrishna, Takeo Kanade, and Yaser Sheikh. Convolutional pose machines. In
IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016.
[28] Wikipedia. Motion control photography. https://en.wikipedia.org/wiki/Motion_control_photography. [Online; accessed Feb-4-2018].
[29] A. Yamaguchi and C. G. Atkeson. Combining finger vision and optical tactile sensing: Reducing and
handling errors while cutting vegetables. In IEEE-RAS International Conference on Humanoid Robotics,
2016.
[30] A. Yamaguchi and C. G. Atkeson. Stereo vision of liquid and particle flow for robot pouring. In IEEE-RASInternational Conference on Humanoid Robotics, 2016.
[31] Akihiko Yamaguchi and Christopher G. Atkeson. Implementing tactile behaviors using fingervision. In
IEEE-RAS International Conference on Humanoid Robotics, 2017.
[32] Ada Zhang, Alexander Cebulla, Stanislav Panev, Jessica Hodgins, and Fernando de la Torre. Weakly-
supervised learning for parkinson’s disease tremor detection. In IEEE Eng. Med. Biol. Soc., pages 143–
147, 2017.
BIOGRAPHICAL SKETCH
No Bio Data Provided
KATERINA FRAGKIADAKIAssistant Professor
Machine Learning Department, Carnegie Mellon University
5000 Forbes Avenue, Pittsburgh, PA, 15213
2675289476
[email protected], www.cs.cmu.edu/∼katef
Professional PreparationNational Technical University of Athens EECS Diplomat 2007
University of Pennsylvania CIS M.S. 2011
University of Pennsylvania CIS Ph.D. 2013
EECS, UC Berkeley PostDoctoral Fellow, 2013–2015
Google Research PostDoctoral Fellow, Oct. 2015–December 2016
AppointmentsAssistant Professor MLD, CMU September 2016–present
Products Five Closely Related products to the proposed project1. Tung F., Tung W., Yumer E., Fragkiadaki K., 2017, Self-supervised learning of Motion
Capture, Neural Information Processing Systems (NIPS)2. Tung Hsiao-Yu F., Harley A. Seto W., Fragkiadaki K., 2017, Adversarial Inverse Graph-
ics Networks: Learning 2D-to-3D Lifting and Image-to-Image Translation from Unpaired
Supervision, International Conference on Computer Vision (ICCV)3. Vijayanarasimhan S., Ricco S., Schmid C., Sukthankar R., Fragkiadaki K., 2017, SfM-Net:
Learning of Structure and Motion from Video, arxiv4. Fragkiadaki K., Salas M., Arbelaez P., Malik J., 2014, Grouping-based Low-Rank Trajectory
Completion and 3D Reconstruction, Neural Information Processing Systems (NIPS)5. Fragkiadaki K., Levine S., Felsen P., Malik J., 2015, Recurrent Network Models for Human
Dynamics, IEEE International Conference on Computer Vision (ICCV)
Five Other Products1. Carreira J., Agrawal P., Fragkiadaki K., Malik J., 2016, Human Pose Estimation with Itera-
tive Error Feedback, IEEE Conference on Computer Vision and Pattern Recognition (CVPR)2. Fragkiadaki K., Agrawal P., Levine S., Malik J., 2016, Learning Visual Predictive Models of
Physics for Playing Billiards, International Conference to Learning Representations(ICLR)3. Ying C., Fragkiadaki K., 2017, Depth-Adaptive Computational Policies for Efficient Visual
Tracking, EMMCVPR4. Fragkiadaki K., Arbelaez P., Felsen P., Malik J., 2015, Learning to Segment Moving Objects
in Videos, IEEE Conference on Computer Vision and Pattern Recognition (CVPR)5. Felsen P., Fragkiadaki K., Malik J., Efros A., 2015, Learning Feature Hierarchies from Long
Term Trajectory Associations in Videos, Transfer and Multitask Learning Workshop inNIPS
Synergistic Activities
1
• Creation of a new course in Fall 2017 at MLD CMU, to cover recent advances on Lan-
guage grounding. Course Title: Language grounding in Vision and Control (Undergradu-
ate+Graduate)
• Creation of a new course in Sprint 2017 at MLD CMU, to cover recent advances on Deep
Reinforcement Learning and Deep Robotic Learning. Course Title: Deep Reinforcement
Learning and Control (Undergraduate+Graduate)
• Area Chair for CVPR 2018
• Organizer: The 11th Perceptual Organization for Computer Vision Workshop, CVPR 2016:
”The role of feedback in Recognition and Segmentation”. Workshop that brought together
Human and Computer Vision scientists to investigate incorporation of Feedback in visual
architectures
• Best Ph.D. Thesis, Computer and Information Science Department, University of Pennsyl-
vania, 2013.
2
SUMMARYPROPOSAL BUDGET
FundsRequested By
proposer
Fundsgranted by NSF
(if different)
Date Checked Date Of Rate Sheet Initials - ORG
NSF FundedPerson-months
fm1030rs-07
FOR NSF USE ONLYORGANIZATION PROPOSAL NO. DURATION (months)
Proposed Granted
PRINCIPAL INVESTIGATOR / PROJECT DIRECTOR AWARD NO.
A. SENIOR PERSONNEL: PI/PD, Co-PI’s, Faculty and Other Senior Associates (List each separately with title, A.7. show number in brackets) CAL ACAD SUMR
1.
2.
3.
4.
5.
6. ( ) OTHERS (LIST INDIVIDUALLY ON BUDGET JUSTIFICATION PAGE)
7. ( ) TOTAL SENIOR PERSONNEL (1 - 6)
B. OTHER PERSONNEL (SHOW NUMBERS IN BRACKETS)
1. ( ) POST DOCTORAL SCHOLARS
2. ( ) OTHER PROFESSIONALS (TECHNICIAN, PROGRAMMER, ETC.)
3. ( ) GRADUATE STUDENTS
4. ( ) UNDERGRADUATE STUDENTS
5. ( ) SECRETARIAL - CLERICAL (IF CHARGED DIRECTLY)
6. ( ) OTHER
TOTAL SALARIES AND WAGES (A + B)
C. FRINGE BENEFITS (IF CHARGED AS DIRECT COSTS)
TOTAL SALARIES, WAGES AND FRINGE BENEFITS (A + B + C)
D. EQUIPMENT (LIST ITEM AND DOLLAR AMOUNT FOR EACH ITEM EXCEEDING $5,000.)
TOTAL EQUIPMENT
E. TRAVEL 1. DOMESTIC (INCL. U.S. POSSESSIONS)
2. INTERNATIONAL
F. PARTICIPANT SUPPORT COSTS
1. STIPENDS $
2. TRAVEL
3. SUBSISTENCE
4. OTHER
TOTAL NUMBER OF PARTICIPANTS ( ) TOTAL PARTICIPANT COSTS
G. OTHER DIRECT COSTS
1. MATERIALS AND SUPPLIES
2. PUBLICATION COSTS/DOCUMENTATION/DISSEMINATION
3. CONSULTANT SERVICES
4. COMPUTER SERVICES
5. SUBAWARDS
6. OTHER
TOTAL OTHER DIRECT COSTS
H. TOTAL DIRECT COSTS (A THROUGH G)
I. INDIRECT COSTS (F&A)(SPECIFY RATE AND BASE)
TOTAL INDIRECT COSTS (F&A)
J. TOTAL DIRECT AND INDIRECT COSTS (H + I)
K. SMALL BUSINESS FEE
L. AMOUNT OF THIS REQUEST (J) OR (J MINUS K)
M. COST SHARING PROPOSED LEVEL $ AGREED LEVEL IF DIFFERENT $
PI/PD NAME FOR NSF USE ONLYINDIRECT COST RATE VERIFICATION
ORG. REP. NAME*
*ELECTRONIC SIGNATURES REQUIRED FOR REVISED BUDGET
1YEAR
1
Carnegie-Mellon University
Christopher
ChristopherChristopher
Atkeson
Atkeson Atkeson
0.00 0.00 0.00
0 0.00 0.00 0.00 01 0.00 0.00 0.00 0
0 0.00 0.00 0.00 00 0.00 0.00 0.00 00 00 00 00 0
00
0
8,000$1 network switch80,0002 mobile bases60,0005 computers
200,000Fabricated Equipment: mobile capture system 348,000
00
0000
0 0
00000
1,000 1,000
349,000
581Modified Total Direct Costs Base (Rate: 58.1000, Base: 1000)
349,5810
349,581200,000
SUMMARYPROPOSAL BUDGET
FundsRequested By
proposer
Fundsgranted by NSF
(if different)
Date Checked Date Of Rate Sheet Initials - ORG
NSF FundedPerson-months
fm1030rs-07
FOR NSF USE ONLYORGANIZATION PROPOSAL NO. DURATION (months)
Proposed Granted
PRINCIPAL INVESTIGATOR / PROJECT DIRECTOR AWARD NO.
A. SENIOR PERSONNEL: PI/PD, Co-PI’s, Faculty and Other Senior Associates (List each separately with title, A.7. show number in brackets) CAL ACAD SUMR
1.
2.
3.
4.
5.
6. ( ) OTHERS (LIST INDIVIDUALLY ON BUDGET JUSTIFICATION PAGE)
7. ( ) TOTAL SENIOR PERSONNEL (1 - 6)
B. OTHER PERSONNEL (SHOW NUMBERS IN BRACKETS)
1. ( ) POST DOCTORAL SCHOLARS
2. ( ) OTHER PROFESSIONALS (TECHNICIAN, PROGRAMMER, ETC.)
3. ( ) GRADUATE STUDENTS
4. ( ) UNDERGRADUATE STUDENTS
5. ( ) SECRETARIAL - CLERICAL (IF CHARGED DIRECTLY)
6. ( ) OTHER
TOTAL SALARIES AND WAGES (A + B)
C. FRINGE BENEFITS (IF CHARGED AS DIRECT COSTS)
TOTAL SALARIES, WAGES AND FRINGE BENEFITS (A + B + C)
D. EQUIPMENT (LIST ITEM AND DOLLAR AMOUNT FOR EACH ITEM EXCEEDING $5,000.)
TOTAL EQUIPMENT
E. TRAVEL 1. DOMESTIC (INCL. U.S. POSSESSIONS)
2. INTERNATIONAL
F. PARTICIPANT SUPPORT COSTS
1. STIPENDS $
2. TRAVEL
3. SUBSISTENCE
4. OTHER
TOTAL NUMBER OF PARTICIPANTS ( ) TOTAL PARTICIPANT COSTS
G. OTHER DIRECT COSTS
1. MATERIALS AND SUPPLIES
2. PUBLICATION COSTS/DOCUMENTATION/DISSEMINATION
3. CONSULTANT SERVICES
4. COMPUTER SERVICES
5. SUBAWARDS
6. OTHER
TOTAL OTHER DIRECT COSTS
H. TOTAL DIRECT COSTS (A THROUGH G)
I. INDIRECT COSTS (F&A)(SPECIFY RATE AND BASE)
TOTAL INDIRECT COSTS (F&A)
J. TOTAL DIRECT AND INDIRECT COSTS (H + I)
K. SMALL BUSINESS FEE
L. AMOUNT OF THIS REQUEST (J) OR (J MINUS K)
M. COST SHARING PROPOSED LEVEL $ AGREED LEVEL IF DIFFERENT $
PI/PD NAME FOR NSF USE ONLYINDIRECT COST RATE VERIFICATION
ORG. REP. NAME*
*ELECTRONIC SIGNATURES REQUIRED FOR REVISED BUDGET
2YEAR
2
Carnegie-Mellon University
Christopher
ChristopherChristopher
Atkeson
Atkeson Atkeson
0.00 0.00 0.00
0 0.00 0.00 0.00 01 0.00 0.00 0.00 0
0 0.00 0.00 0.00 00 0.00 0.00 0.00 00 00 00 00 0
00
0
180,000$15 computers320,0008 mobile bases300,000Fabricated Equipment: mobile capture system
800,00000
0000
0 0
00000
2,000 2,000
802,000
1,162Modified Total Direct Costs Base (Rate: 58.1000, Base: 2000)
803,1620
803,162200,000
SUMMARYPROPOSAL BUDGET
FundsRequested By
proposer
Fundsgranted by NSF
(if different)
Date Checked Date Of Rate Sheet Initials - ORG
NSF FundedPerson-months
fm1030rs-07
FOR NSF USE ONLYORGANIZATION PROPOSAL NO. DURATION (months)
Proposed Granted
PRINCIPAL INVESTIGATOR / PROJECT DIRECTOR AWARD NO.
A. SENIOR PERSONNEL: PI/PD, Co-PI’s, Faculty and Other Senior Associates (List each separately with title, A.7. show number in brackets) CAL ACAD SUMR
1.
2.
3.
4.
5.
6. ( ) OTHERS (LIST INDIVIDUALLY ON BUDGET JUSTIFICATION PAGE)
7. ( ) TOTAL SENIOR PERSONNEL (1 - 6)
B. OTHER PERSONNEL (SHOW NUMBERS IN BRACKETS)
1. ( ) POST DOCTORAL SCHOLARS
2. ( ) OTHER PROFESSIONALS (TECHNICIAN, PROGRAMMER, ETC.)
3. ( ) GRADUATE STUDENTS
4. ( ) UNDERGRADUATE STUDENTS
5. ( ) SECRETARIAL - CLERICAL (IF CHARGED DIRECTLY)
6. ( ) OTHER
TOTAL SALARIES AND WAGES (A + B)
C. FRINGE BENEFITS (IF CHARGED AS DIRECT COSTS)
TOTAL SALARIES, WAGES AND FRINGE BENEFITS (A + B + C)
D. EQUIPMENT (LIST ITEM AND DOLLAR AMOUNT FOR EACH ITEM EXCEEDING $5,000.)
TOTAL EQUIPMENT
E. TRAVEL 1. DOMESTIC (INCL. U.S. POSSESSIONS)
2. INTERNATIONAL
F. PARTICIPANT SUPPORT COSTS
1. STIPENDS $
2. TRAVEL
3. SUBSISTENCE
4. OTHER
TOTAL NUMBER OF PARTICIPANTS ( ) TOTAL PARTICIPANT COSTS
G. OTHER DIRECT COSTS
1. MATERIALS AND SUPPLIES
2. PUBLICATION COSTS/DOCUMENTATION/DISSEMINATION
3. CONSULTANT SERVICES
4. COMPUTER SERVICES
5. SUBAWARDS
6. OTHER
TOTAL OTHER DIRECT COSTS
H. TOTAL DIRECT COSTS (A THROUGH G)
I. INDIRECT COSTS (F&A)(SPECIFY RATE AND BASE)
TOTAL INDIRECT COSTS (F&A)
J. TOTAL DIRECT AND INDIRECT COSTS (H + I)
K. SMALL BUSINESS FEE
L. AMOUNT OF THIS REQUEST (J) OR (J MINUS K)
M. COST SHARING PROPOSED LEVEL $ AGREED LEVEL IF DIFFERENT $
PI/PD NAME FOR NSF USE ONLYINDIRECT COST RATE VERIFICATION
ORG. REP. NAME*
*ELECTRONIC SIGNATURES REQUIRED FOR REVISED BUDGET
3YEAR
3
Carnegie-Mellon University
Christopher
ChristopherChristopher
Atkeson
Atkeson Atkeson
0.00 0.00 0.00
0 0.00 0.00 0.00 01 0.00 0.00 0.00 0
0 0.00 0.00 0.00 00 0.00 0.00 0.00 00 00 00 00 0
00
0
382,303$Fabricated Equipment: mobile capture system
382,30300
0000
0 0
00000
10,000 10,000 392,303
5,810Modified Total Direct Costs Base (Rate: 58.1000, Base: 10000)
398,1130
398,113200,000
SUMMARYPROPOSAL BUDGET
FundsRequested By
proposer
Fundsgranted by NSF
(if different)
Date Checked Date Of Rate Sheet Initials - ORG
NSF FundedPerson-months
fm1030rs-07
FOR NSF USE ONLYORGANIZATION PROPOSAL NO. DURATION (months)
Proposed Granted
PRINCIPAL INVESTIGATOR / PROJECT DIRECTOR AWARD NO.
A. SENIOR PERSONNEL: PI/PD, Co-PI’s, Faculty and Other Senior Associates (List each separately with title, A.7. show number in brackets) CAL ACAD SUMR
1.
2.
3.
4.
5.
6. ( ) OTHERS (LIST INDIVIDUALLY ON BUDGET JUSTIFICATION PAGE)
7. ( ) TOTAL SENIOR PERSONNEL (1 - 6)
B. OTHER PERSONNEL (SHOW NUMBERS IN BRACKETS)
1. ( ) POST DOCTORAL SCHOLARS
2. ( ) OTHER PROFESSIONALS (TECHNICIAN, PROGRAMMER, ETC.)
3. ( ) GRADUATE STUDENTS
4. ( ) UNDERGRADUATE STUDENTS
5. ( ) SECRETARIAL - CLERICAL (IF CHARGED DIRECTLY)
6. ( ) OTHER
TOTAL SALARIES AND WAGES (A + B)
C. FRINGE BENEFITS (IF CHARGED AS DIRECT COSTS)
TOTAL SALARIES, WAGES AND FRINGE BENEFITS (A + B + C)
D. EQUIPMENT (LIST ITEM AND DOLLAR AMOUNT FOR EACH ITEM EXCEEDING $5,000.)
TOTAL EQUIPMENT
E. TRAVEL 1. DOMESTIC (INCL. U.S. POSSESSIONS)
2. INTERNATIONAL
F. PARTICIPANT SUPPORT COSTS
1. STIPENDS $
2. TRAVEL
3. SUBSISTENCE
4. OTHER
TOTAL NUMBER OF PARTICIPANTS ( ) TOTAL PARTICIPANT COSTS
G. OTHER DIRECT COSTS
1. MATERIALS AND SUPPLIES
2. PUBLICATION COSTS/DOCUMENTATION/DISSEMINATION
3. CONSULTANT SERVICES
4. COMPUTER SERVICES
5. SUBAWARDS
6. OTHER
TOTAL OTHER DIRECT COSTS
H. TOTAL DIRECT COSTS (A THROUGH G)
I. INDIRECT COSTS (F&A)(SPECIFY RATE AND BASE)
TOTAL INDIRECT COSTS (F&A)
J. TOTAL DIRECT AND INDIRECT COSTS (H + I)
K. SMALL BUSINESS FEE
L. AMOUNT OF THIS REQUEST (J) OR (J MINUS K)
M. COST SHARING PROPOSED LEVEL $ AGREED LEVEL IF DIFFERENT $
PI/PD NAME FOR NSF USE ONLYINDIRECT COST RATE VERIFICATION
ORG. REP. NAME*
*ELECTRONIC SIGNATURES REQUIRED FOR REVISED BUDGET
4YEAR
4
Carnegie-Mellon University
Christopher
ChristopherChristopher
Atkeson
Atkeson Atkeson
0.00 0.00 0.00
0 0.00 0.00 0.00 01 0.00 0.00 0.00 0
0 0.00 0.00 0.00 00 0.00 0.00 0.00 00 00 00 00 0
00
0
300,000$Fabricated Equipment: mobile capture system
300,00000
0000
0 0
00000
10,000 10,000 310,000
5,810Modified Total Direct Costs Base (Rate: 58.1000, Base: 10000)
315,8100
315,810200,000
SUMMARYPROPOSAL BUDGET
FundsRequested By
proposer
Fundsgranted by NSF
(if different)
Date Checked Date Of Rate Sheet Initials - ORG
NSF FundedPerson-months
fm1030rs-07
FOR NSF USE ONLYORGANIZATION PROPOSAL NO. DURATION (months)
Proposed Granted
PRINCIPAL INVESTIGATOR / PROJECT DIRECTOR AWARD NO.
A. SENIOR PERSONNEL: PI/PD, Co-PI’s, Faculty and Other Senior Associates (List each separately with title, A.7. show number in brackets) CAL ACAD SUMR
1.
2.
3.
4.
5.
6. ( ) OTHERS (LIST INDIVIDUALLY ON BUDGET JUSTIFICATION PAGE)
7. ( ) TOTAL SENIOR PERSONNEL (1 - 6)
B. OTHER PERSONNEL (SHOW NUMBERS IN BRACKETS)
1. ( ) POST DOCTORAL SCHOLARS
2. ( ) OTHER PROFESSIONALS (TECHNICIAN, PROGRAMMER, ETC.)
3. ( ) GRADUATE STUDENTS
4. ( ) UNDERGRADUATE STUDENTS
5. ( ) SECRETARIAL - CLERICAL (IF CHARGED DIRECTLY)
6. ( ) OTHER
TOTAL SALARIES AND WAGES (A + B)
C. FRINGE BENEFITS (IF CHARGED AS DIRECT COSTS)
TOTAL SALARIES, WAGES AND FRINGE BENEFITS (A + B + C)
D. EQUIPMENT (LIST ITEM AND DOLLAR AMOUNT FOR EACH ITEM EXCEEDING $5,000.)
TOTAL EQUIPMENT
E. TRAVEL 1. DOMESTIC (INCL. U.S. POSSESSIONS)
2. INTERNATIONAL
F. PARTICIPANT SUPPORT COSTS
1. STIPENDS $
2. TRAVEL
3. SUBSISTENCE
4. OTHER
TOTAL NUMBER OF PARTICIPANTS ( ) TOTAL PARTICIPANT COSTS
G. OTHER DIRECT COSTS
1. MATERIALS AND SUPPLIES
2. PUBLICATION COSTS/DOCUMENTATION/DISSEMINATION
3. CONSULTANT SERVICES
4. COMPUTER SERVICES
5. SUBAWARDS
6. OTHER
TOTAL OTHER DIRECT COSTS
H. TOTAL DIRECT COSTS (A THROUGH G)
I. INDIRECT COSTS (F&A)(SPECIFY RATE AND BASE)
TOTAL INDIRECT COSTS (F&A)
J. TOTAL DIRECT AND INDIRECT COSTS (H + I)
K. SMALL BUSINESS FEE
L. AMOUNT OF THIS REQUEST (J) OR (J MINUS K)
M. COST SHARING PROPOSED LEVEL $ AGREED LEVEL IF DIFFERENT $
PI/PD NAME FOR NSF USE ONLYINDIRECT COST RATE VERIFICATION
ORG. REP. NAME*
*ELECTRONIC SIGNATURES REQUIRED FOR REVISED BUDGET
Cumulative
C
Carnegie-Mellon University
Christopher
ChristopherChristopher
Atkeson
Atkeson Atkeson
0.00 0.00 0.00
0.00 0.00 0.00 00 0.00 0.00 0.00 0
0 0.00 0.00 0.00 00 0.00 0.00 0.00 00 00 00 00 0
00
0
1,830,303$
1,830,30300
0000
0 0
00000
23,000 23,000
1,853,303
13,363
1,866,6660
1,866,666800,000
Budget JustificationThis proposal is to fabricate equipment, a mobile behavioral capture system, with a total project cost of
$2,666,666, over 4 years. CMU is providing cost sharing of $200,000 in graduate student support per year,
totalling $800,000. The source of the cost sharing is the Dean’s office of the School of Computer Science of
Carnegie Mellon University.
To prepare this budget, given that in the first year we will perfect the design of the fabricated equipment
using the latest technology, we have used prices of relevant currently available technology. We provide quotes
for individual components that cost more than $5000 as supplementary documents.
In the first year we expect to purchase five computers at approximately $12,000 each (quote from Exxact).
These computers have been equipped with four state of the art GPUs. We expect the computers and GPUs
we will actually purchase a year from now will cost about the same but be even more powerful. To network
the above computers we will buy an Infiniband network switch for approximately $8000 (quote from Dell).
We also plan to purchase two mobile bases which we estimate will cost approximately $40,000 each (quote
from Segway). This mobile base is one of the few omnidirectional bases we have found that are fast enough
to keep up with human walking (1.3m/s) and strong enough to carry up to two of the above computers and 8
MMCams. Again, a year from now we will again survey available mobile bases. We have estimated the cost
of MMCams by pricing visible light cameras (a cluster of 6 cameras attached to an NVIDIA board with a TX2
Jetson GPU: Leopard Imaging LI-JETSON-KIT-IMX477CS-X $1600) and time of flight depth cameras (Basler
tof640-20gm 850nm $2340). There are additional costs for synchronization hardware and other wiring. We
have based this cost estimate on costs we saw building the Panoptic Studio.
In the first year we will also begin to prototype contact, force, and physiological sensors. The costs of
individual components are relatively cheap (in the hundreds of dollars: consumer level thermal, ultrasound, and
radar imaging sensors are typically $200-300). We have based the total costs for this year based on our historical
costs for this type of development.
We have also included $1000 in maintenance costs, which also pay for installing some computers in an
appropriately cooled computer room.
In year 2 we are building the full measurement-at-a-distance system, adding 15 computers at $12000 (quote
from Exxact) each and 8 mobile bases at $40000 (quote from Segway) each. Additional funds are requested for
64 MMCams, and continuing development of contact, force, and physiological sensors. There will be additional
maintenance costs.
In year 3 and 4 we focus on building out the full system, developing ground truth testing equipment, and
fixing any design flaws. We expect to develop custom electronics for the contact, force, and physiological
sensors. At this point we start to pay for a full year of maintenance for the full system. For fabricating equipment
with low cost components, we have based our estimates on our historical costs for developing this type of
equipment.
Equipment
Line D reflects the equipment needed for the project.
Other Direct Costs
Line G6 reflects the costs associated with maintenance of the equipment for the project.
Indirect Costs
Indirect Costs on this proposal have been calculated at our current proposed or negotiated rate for all fiscal years
in accordance with the OMB Uniform Guidance on Cost Principles, Audit, and Administrative Requirements
1
for Federal Awards. The modified total direct cost base (MTDC) amount used in calculating the indirect costs
is the total direct costs, excluding capital equipment, charges for tuition remission, and participant support.
Overhead Rate: 58.10% (capped rate for grants and cooperative agreements)
Requested TableITEM YEAR 1 YEAR 2 YEAR 3 YEAR 4 TOTAL
NSF Cost NSF Cost NSF Cost NSF Cost NSF Cost
Request Sharing Request Sharing Request Sharing Request Sharing Request Sharing
1 computers 60000 0 180000 0 0 0 0 0 240000 0
2 mobile 80000 0 320000 0 0 0 0 0 400000 0
bases
3 network 8000 0 0 0 0 0 0 0 8000 0
switch
4 fabricated 200000 0 300000 0 382203 0 300000 0 1182203 0
equipment
5 maintenance 1000 0 2000 0 10000 0 10000 0 23000 0
6 other 581 0 1162 0 5810 0 5810 0 13363 0
7 graduate 0 200000 0 200000 0 200000 0 200000 0 800000
students
TOTAL 349581 200000 803162 200000 398113 200000 315810 200000 1866666 800000
2
Current and Pending Support (See GPG Section II.D.8 for guidance on information to include on this form.)
The following information should be provided for each investigator and other senior personnel. Failure to provide this information may delay consideration of this proposal. Other agencies (including NSF) to w hich this proposal has been/will be submitted. Investigator: Christopher Atkeson Support: Current Pending Submission Planned in Near Future *Transfer of Support Project/Proposal Title: RI: Medium: Combining Optimal and Neuromuscular Controllers for Agile and Robust Humanoid Behavior Source of Support: National Science Foundation Total Award Amount: $1,000,000
Total Award Period Covered: 08/01/16 – 07/31/19 Location of Project: Carnegie Mellon University Person-Months Per Year Committed to the Project.
Cal: Acad: Sumr: 1.0
Support: Current Pending Submission Planned in Near Future *Transfer of Support Project/Proposal Title: NSF: INT: Individualized Co-Robotics Source of Support: National Science Foundation Total Award Amount: $1,500,000
Total Award Period Covered: 09/01/17 – 08/31/20 Location of Project: Carnegie Mellon University Person-Months Per Year Committed to the Project.
Cal: Acad: Sumr: 1.0
Support: Current Pending Submission Planned in Near Future *Transfer of Support Project/Proposal Title: MRI: Development of a Mobile Human Behavior Capture System (this proposal) Source of Support: National Science Foundation Total Award Amount: $1,866,666
Total Award Period Covered: 09/01/18 – 08/31/22 Location of Project: Carnegie Mellon University NO EFFORT Person-Months Per Year Committed to the Project.
Cal: Acad: Sumr:
Support: Current Pending Submission Planned in Near Future *Transfer of Support Project/Proposal Title: Source of Support: Total Award Amount: $
Total Award Period Covered: Location of Project: Person-Months Per Year Committed to the Project.
Cal: Acad: Sumr:
Support: Current Pending Submission Planned in Near Future *Transfer of Support Project/Proposal Title: Source of Support: Total Award Amount: $
Total Award Period Covered: Location of Project: Person-Months Per Year Committed to the Project.
Cal: Acad: Sumr:
*If this project has previously been funded by another agency, please list and furnish information for immediately pre-ceding funding period. NSF Form 1239 USE ADDITIONAL SHEETS AS NECESSARY
CURRENT AND PENDING SUPPORT Investigator: Katerina Fragkiadaki Other agencies including NSF to which this proposal has been/will be submitted: None
Support: Pending Title: RI: Small: Large Scale Imitation Learning and Language Understanding from Narrated Videos Source of Support: National Science Foundation Total Award Amount: $499,835 Period of Performance: 9/1/2018 to 8/31/2021 Location of Project: Carnegie Mellon University Number of Person-Months: 1.2 SU Support: Pending Title: MRI: Development of a Mobile Human Behavior Capture System Source of Support: National Science Foundation Total Award Amount: $1,866,666 Period of Performance: 9/1/2018 to 8/31/2022 Location of Project: Carnegie Mellon University Number of Person-Months: N/A
Current and Pending Support (See GPG Section II.D.8 for guidance on information to include on this form.)
The following information should be provided for each investigator and other senior personnel. Failure to provide this infor-mation may delay consideration of this proposal. Other agencies (including NSF) to w hich this proposal has been/w ill be submit-
Investigator: Jessica Hodgins Support: Current Pending Submission Planned in Near Future *Transfer of Support Project/Proposal Title: SCH: EXP: Monitoring Motor Symptoms in Parkinson's Disease with Wearable Devices Source of Support: National Science Foundation Total Award Amount: $678,850
Total Award Period Covered: 09/01/16 – 08/31/19 Location of Project: Carnegie Mellon University Person-Months Per Year Committed to the Project.
Cal: Acad: Sumr: 1.00
Support: Current Pending Submission Planned in Near Future *Transfer of Support Project/Proposal Title: CMLH: In-Home Movement Therapy Data Collection Source of Support: UPMC Total Award Amount: $300,000
Total Award Period Covered: 06/01/17 – 05/31/18 Location of Project: Carnegie Mellon University Person-Months Per Year Committed to the Project.
Cal: Acad: 1.25 Sumr: 0.40
Support: Current Pending Submission Planned in Near Future *Transfer of Support Project/Proposal Title: Affective State Estimation From Wearable Sensors Source of Support: Sony Corporation Total Award Amount: $155,689
Total Award Period Covered: 08/21/17 – 03/31/18 Location of Project: Carnegie Mellon University S t Person-Months Per Year Committed to the Project.
Cal: Acad: 0.75 Sumr: 0.25
Support: Current Pending Submission Planned in Near Future *Transfer of Support Project/Proposal Title: MRI: Development of a Mobile Human Behavior Capture System (this proposal) Source of Support: National Science Foundation Total Award Amount: $1,866,666
Total Award Period Covered: 09/01/18 – 08/31/22 Location of Project: Carnegie Mellon University NO EFFORT Person-Months Per Year Committed to the Project.
Cal: Acad: Sumr:
Support: Current Pending Submission Planned in Near Future *Transfer of Support Project/Proposal Title: Source of Support: Total Award Amount: $
Total Award Period Covered: Location of Project: Carnegie Mellon University Person-Months Per Year Committed to the Project.
Cal: Acad: Sumr:
*If this project has previously been funded by another agency, please list and furnish information for immediately preceding funding period. NSF Form 1239 USE ADDITIONAL SHEETS AS NECESSARY
Current and Pending Support (See GPG Section II.D.8 for guidance on inform ation to include on this form .)
Please note that Dr. Yaser Sheikh is a Research Professor, and not tenure track faculty w ith teaching responsibilities. As such, his ef fort is supported solely by research funding. Other agencies (including NSF) to which this proposal has been/will be sub-
Investigator: Yaser She ikh Support: Current Pending Submission Planned in Near Future *Transfer of Support Project/Proposal Title: Data-driven 3D Event Brow s ing from Multiple Mobile Cam eras Source of Support: Office of Naval Research Total Aw ard Amount: $498,499
Total Aw ard Period Covered: 05/01/2015– 04/30/2018 Location of Project: Carnegie Me llon Univers ity Person-Months Per Year Committed to the Pro-
Cal : Acad: Sumr: 0.69 Support: Current Pending Submission Planned in Near Future *Transfer of Support Project/Proposal Title: Robust Autom atic Activity De tection for a Multi-Cam era Stream ing Video Environ-
Source of Support: DoI/IARPA Total Aw ard Amount: $9,972,762
Total Aw ard Period Covered: 09/20/2017– 09/19/2021 Location of Project: Carnegie Me llon Univers ity Person-Months Per Year Committed to the Pro-
Cal : Acad: 2.25 Sumr: 0.75 Support: Current Pending Submission Planned in Near Future *Transfer of Support Project/Proposal Title: MRI: Deve lopm ent of a Mobile Hum an Behavior Capture Sys tem (this proposal) Source of Support: National Science Foundation Total Aw ard Amount: $1,866,666
Total Aw ard Period Covered: 09/01/2018– 08/31/2022 Location of Project: Carnegie Me llon Univers ity NO EFFORT Person-Months Per Year Committed to the Pro-
Cal : Acad: Sumr: Support: Current Pending Submission Planned in Near Future *Transfer of Support Project/Proposal T i tle: Source of Support: Total Award Amount: $
Total Award Period Covered: Location of Project: Carnegie Me llon Univers ity Person-Months Per Year Committed to the Pro-
Cal : Acad: Sumr: Support: Current Pending Submission Planned in Near Future *Transfer of Support Project/Proposal T i tle: Source of Support: Total Award Amount: $
Total Award Period Covered: Location of Project: Carnegie Me llon Univers ity Person-Months Per Year Committed to the Pro-
Cal : Acad: Sumr: *If this project has previously been funded by another agency, please list and furnish information for immediately preceding funding period. NSF Form 1239 USE ADDITIONAL SHEETS AS NECESSARY
CMU Facilities
Motion Capture LabThe 1700 square foot Motion Capture Lab provides a resource for behavior capture of humans as well
as measuring and controlling robot behavior in real time. It includes a Vicon Optical Motion Capture
System with sixteen 200 Hz, 4Meg resolution cameras (MX-40). In addition to traditional motion
capture, the Vicon system can be used in real time to track robot motion, and provide the equivalent
of very high quality inertial feedback. In addition to capturing motion, we have instrumentation to
capture contact forces at the hands and feet (one force gauge (IMADA DPS-44), one ATI Industrial
Automation Mini85 wrist force torque sensor, and two AMTI AccuSway PLUS force plates that mea-
sure the six-axis contact force and torque at a rate of 1 kHz), and also electromyographic activity
(EMG, a measure of muscle activation, Aurion ZeroWire (wireless) system with 16 pairs of electrodes
at a rate of 5 kHz). A high-speed video camera is also used to capture skin deformation at 1 kHz.
Behavior capture goes beyond motion capture with this capture of forces and muscle activation.
The Humanoid Robotics Lab (located in the Motion Capture Lab described below) provides a state
of the art full-sized humanoid for research and education. We have developed a hydraulic humanoid
in collaboration with Sarcos (with NSF equipment funding). We use this robot because of the speed,
power, and achievable joint compliance of the hydraulics and the range of motion of the joints.
Panoptic StudioThe Panoptic Studio is a multiview capture system with 521 heterogeneous sensors, consisting of
480 VGA cameras, 31 HD Cameras, and 10 Kinect v2 RGB+D sensors, distributed over the surface
of geodesic sphere with a 5.49m diameter (Figure 1). The large number of lower resolution VGA
cameras at unique viewpoints provide a large volume with robustness against occlusions, and allow
no restriction for view direction of the subjects. The HD views provide details (zoom) of the scene.
Multiple Kinects provide initial point clouds to generate dense trajectory stream.
The structure consists of pentagonal panels, hexagonal panels, and trimmed base panels. Our
design was modularized so that each hexagonal panel houses a set of 24 VGA cameras. The HD
cameras are installed at the center of each hexagonal panel, and projectors are installed at the center of
each pentagonal panel. Additionally, a total of 10 Kinect v2 RGB+D sensors are mounted at heights
of 1 and 2.6 meters, forming two rings with 5 evenly spaced sensors each.
System Architecture: Figure 2 shows the architecture of our system. The 480 cameras are ar-
ranged modularly with 24 cameras in each of 20 standard hexagonal panels on the dome. Each module
in each panel is managed by a Distributed Module Controller (DMC) that triggers all cameras in the
module, receives data from them, and consolidates the video for transmission to the local machine.
Each individual camera is a global shutter CMOS sensor, with a fixed focal length of 4.5mm, that
captures VGA (640 480) resolution images at 25Hz. Each panel produces an uncompressed video
stream at 1.47 Gbps, and thus, for the entire set of 480 cameras the data-rate is approximately 29.4
Gbps. To handle this stream, the system pipeline has been designed with a modularized communi-
cation and control structure. For each subsystem, the clock generator sends a frame counter, trigger
signal, and the pixel clock signal to each DMC associated with a panel. The DMC uses this timing
information to initiate and synchronize capture of all cameras within the module. Upon trigger and
1
Figure 1: Panoptic Studio layout. (Top Row) The exterior of the dome with the equipment mounted
on the surface. (Bottom Row) The interior of the dome. VGA cameras are shown as red circles, HD
cameras as blue circles, Kinects as cyan rectangles, and projectors as green rectangles.
exposure, each of the 24 camera heads transfers back image data via the camera interconnect to the
DMC, which consolidates the image data and timing from all cameras. This composite data is then
transferred via optical interconnect to the module node, where it is stored locally. Each module node
has dual purpose: it serves as a distributed RAID storage unit and participates as a multicore com-
putational node in a cluster. Each module has 3 HDDs integrated as RAID-0 to have sufficient write
speed without data loss, totaling 60 HDDs for 20 modules. All the local nodes of our system are on
a local network on a gigabit switch. The acquisition is controlled via a master node that the system
operator can use to control all functions of the studio. Similar to the VGA cameras, HD cameras are
modularized and each pair of cameras are connected to a local node machine via SDI cables. Each
local node saves the data from two cameras to two RAID storage units respectively. Each RGB+D
sensor is connected to a dedicated capture node that is mounted on the dome exterior. To capture at
rates of approximately 30 Hz, the nodes are equipped with two SSD drives each and store color, depth,
and infrared frames as well as body and face detections from the Kinect SDK. A separate master node
controls and coordinates the 10 capture nodes via the local network.
Temporal Calibration for Heterogeneous Sensors: Synchronizing the cameras is necessary to
use geometric constraints (such as triangulation) across multiple views. In our system, we use hard-
ware clocks to trigger cameras at the same time. Because the frame rates of the VGA and HD cameras
2
Figure 2: Modularized system architecture. The studio houses 480 VGA cameras synchronized to a
central clock system and controlled by a master node. 31 synchronized HD cameras are also installed
with another clock system. The VGA clock and HD clock are temporally aligned by recording them
as a stereo signal. 10 RGB-D sensors are also located in the studio. All the sensors are calibrated to
the same coordinate system.
are different (25 fps and 29.97 fps respectively) we use two separate hardware clocks to achieve shut-
terlevel synchronization among all VGA cameras, and independently among all HD cameras. To
precisely align the two time references, we record the timecode signals generated from the two clocks
as a single stereo audio signal, which we then decode to obtain a precise alignment at sub-millisecond
accuracy.
Time alignment with the Kinect v2 streams (RGB and depth) is achieved with a small hardware
modification: each Kinects microphone array is rewired to instead record an LTC timecode signal.
This timecode signal is the same that is produced by the genlock and timecode generator used to
synchronize the HD cameras, and is distributed to each Kinect via a distribution amplifier. We process
the Kinect audio to decode the LTC timecode, yielding temporal alignment between the recorded
Kinect datawhich is timestamped by the capture API for accurate relative timing between color, depth,
and audio framesand the HD video frames. Empirically, we have confirmed the temporal alignment
obtained by this method to be of at least millisecond accuracy
Spatial Calibration We use Structure from Motion (SfM) to calibrate all of the 521 cameras. To
easily generate feature points for SfM, five projectors are also installed on the geodesic dome. For
calibration, they project a random pattern on a white structure (we use a portable white tent), and
multiple scenes (typically three) are captured by moving the structure within the dome. We perform
SfM for each scene separately and perform a bundle adjustment by merging all the matches from each
scene. We use the VisualSfM software with 1 distortion parameter to produce an initial estimate and
3
a set of candidate correspondences, and subsequently run our own bundle adjustment implementation
with 5 distortion parameters for the final refinement. The computation time is about 12 hours with 6
scenes (521 images for each) using a 6 core machine. In this calibration process, we only use the color
cameras of Kinects. We additionally calibrate the transformation between the color and depth sensor
for each Kinect with a standard checkerboard pattern, placing all cameras in alignment within a global
coordinate frame.
Software: A more detailed description of the software is presented in [1]. We have developed a
method to automatically reconstruct full body motion of interacting multiple people. Our method does
not rely on a 3D template model or any subject-specific assumption such as body shape, color, height,
and body topology. Our method works robustly in various challenging social interaction scenes of ar-
bitrary number of people, producing temporally coherent time-varying body structures. Furthermore,
our method is free from error accumulation and, thus, enables capture of long term group interactions
(e.g., more than 10 minutes).
Our algorithm is designed to fuse the weak perceptual processes in the large number of views
by progressively generating skeletal proposals from low-level appearance cues, and a framework for
temporal refinement is also presented by associating body parts to reconstructed dense 3D trajectory
stream. Our system and method are the first in reconstructing full body motion of more than five
people engaged in social interactions without using markers. We also empirically demonstrate the
impact of the number of views in achieving this goal.
Our algorithm is composed of two major stages. The first stage takes, as input, images from mul-
tiple views at a time instance (calibrated and synchronized), and produces 3D body skeletal proposals
for multiple human subjects. The second stage further refines the output of the first stage by using a
dense 3D patch trajectory stream, and produces temporally stable 3D skeletons and an associated set
of labeled 3D patch trajectories for each body part, describing subtle surface motions.
In the first stage, a 2D pose detector is computed independently on all 480 VGA views at each
time instant, generating detection score maps for each body joint. Our approach then generates several
levels of proposals, as shown in Figure 3. A set of node proposals for each joint is generated by non-
maxima suppression of the 3D score map, where the k-th node proposal is a putative 3D position of
that anatomical landmark. A part proposal is a putative body part connecting two node proposals.
As the output of the first stage, our algorithm produces skeletal proposals. A skeletal proposal is
generated by finding an optimal combination of part proposals using a dynamic programming method.
After reconstructing skeletal proposals at each time t independently, we associate skeletons from the
same identities across time and generate skeletal trajectory proposals, which are sets of part trajectory
proposals (a moving part across time).
In the second stage, we refine the skeletal trajectory proposals generated in the first stage using
dense 3D patch trajectories. To produce evidence of the motion of different anatomical landmarks, we
compute a set of dense 3D trajectories which we refer to as a 3D patch trajectory stream, by tracking
each 3D patch independently. Each patch trajectory is initiated at an arbitrary time (every 20th frame
in our results), and tracked for an arbitrary duration (30 frames backward-forward in our results). Our
method associates a part trajectory with a set of patch trajectories, and these trajectories determine
rigid transformations, between any time t to t+1 for this part. These labeled 3D trajectories associated
with each part provide surface deformation cues and also play a role in refining the quality by reducing
motion jitter, filling missing parts, and detecting erroneous parts.
Social Interaction Dataset: We publicly share a novel dataset which is the largest in terms of the
number of views (521 views), duration (3+ hours in total), and the number of subjects in the scenes (up
to 8 subjects) for full body motion capture. Our dataset is distinctive from the previously presented
4
Figure 3: Several levels of proposals generated by our method. (a) Images from up to 480 views. (b)
Per-joint detection score maps. (c) Node proposals generated after non-maxima suppression. (d) Part
proposals by connecting a pair of node proposals. (e) Skeletal proposals generated by piecing together
part proposals. (f) Labeled 3D patch trajectory stream showing associations with each part trajectory.
In (c-f), color means joint or part labels shown below the figure.
datasets in that ours captures natural interactions of groups without controlling their behavior and
appearance, and contains motions with rich social signals. The system described in this paper provides
empirical data of unprecedented resolution with the promise of facilitating data-driven exploration of
scientific conjectures about the communication code of social behavior. All the data and output are
publicly shared on our website (https://domedb.perception.cs.cmu.edu).
References[1] Hanbyul Joo, Hao Liu, Lei Tan, Lin Gui, Bart Nabbe, Iain Matthews, Takeo Kanade, Shohei
Nobuhara, and Yaser Sheikh. Panoptic studio: A massively multiview system for social motion
capture. In The IEEE International Conference on Computer Vision (ICCV), 2015.
5