Traditional Beliefs. Security for Privacy? Security for Privacy? Ah-ha! it s the Data: Define a Privacy Policy. Security for Privacy?

Similar documents
A Reality Check on Health Information Privacy: How should we understand re-identification risks under HIPAA?

DATA PROTECTION POLICY (in force since 21 May 2018)

Research Consent Form

System of Records Notice (SORN) Checklist

Technology Standards of Practice

THE JOURNEY FROM PHI TO RHI: USING CLINICAL DATA IN RESEARCH

De-Identification Reduce Privacy Risks When Sharing Personally Identifiable Information

Data Integration and Big Data In Ontario Brian Beamish Information and Privacy Commissioner of Ontario

Leveraging Health IT: How can informatics transform public health (and public health transform health IT)?

Safe Harbor Vs the Statistical Method

The EU GDPR: Implications for U.S. Universities and Academic Medical Centers

Meaningful Use Hello Health v7 Guide for Eligible Professionals. Stage 1

Information Privacy and Security

Meaningful Use Hello Health v7 Guide for Eligible Professionals. Stage 2

Patient Matching within a Health Information Exchange

YALE UNIVERSITY THE RESEARCHERS GUIDE TO HIPAA. Health Insurance Portability and Accountability Act of 1996

Student Orientation: HIPAA Health Insurance Portability & Accountability Act

Healthcare Privacy Officer on Evaluating Breach Incidents A look at tools and processes for monitoring compliance and preserving your reputation

Guidance on De-identification of Protected Health Information September 4, 2012.

Chapter 9 Legal Aspects of Health Information Management

LifeBridge Health HIPAA Policy 4. Uses of Protected Health Information for Research

Clinical Data Transparency CLINICAL STUDY REPORTS APPROACH TO PROTECTION OF PERSONAL DATA

Best practices in using secondary analysis as a method

***************************************************************************************

Memorial Hermann Information Exchange. MHiE POLICIES & PROCEDURES MANUAL

CINCINNATI CHILDREN S HOSPITAL MEDICAL CENTER CONSENT TO PARTICIPATE IN A RESEARCH STUDY

Patient Privacy Requirements Beyond HIPAA

Risk Management using the HITRUST De-Identification Framework

IRB 101. Rachel Langhofer Joan Rankin Shapiro Research Administration UA College of Medicine - Phoenix

PFF Patient Registry Protocol Version 1.0 date 21 Jan 2016

Notre Dame College Website Terms of Use

How BC s Health System Matrix Project Met the Challenges of Health Data

Component Description Unit Topics 1. Introduction to Healthcare and Public Health in the U.S. 2. The Culture of Healthcare

Preservative License Plate De-identification for Privacy Protection

San Francisco Department of Public Health Policy Title: HIPAA Compliance Privacy and the Conduct of Research Page 1 of 10

HIPAA Education Program

MAIN STREET RADIOLOGY

Navigating HIPAA Regulations. Michelle C. Stickler, DEd Director, Research Subjects Protections

PATIENT AND STAFF IDENTIFICATION Understanding Biometric Options

I. Researcher Information

Ethics for Professionals Counselors

INFORMATION TECHNOLOGY, MOBILES DIGITAL MEDIA POLICY AND PROCEDURES

HIPAA PRIVACY TRAINING

HIPAA PRIVACY DIRECTIONS. HIPAA Privacy/Security Personal Privacy. What is HIPAA?

HIPAA Training

De-identification and Clinical Trials Data: Oh the Possibilities!

Patient Data Privacy in. Electronic Records

HIPAA Privacy Rule and Sharing Information Related to Mental Health

PRIVACY POLICY USES AND DISCLOSURES FOR TREATMENT, PAYMENT, AND HEALTH CARE OPERATIONS

Privacy Policy - Australian Privacy Principles (APPs)

Efficacy of Tympanostomy Tubes for Children with Recurrent Acute Otitis Media Randomization Phase

UK Cystic Fibrosis Registry. Data sharing policy

Navpreet Kaur IT /16/16. Electronic Health Records

A Privacy Impact Assessment for the Individual Health Identifier (IHI)

The Impact of New Technology in Health Care on Privacy

I. PURPOSE DEFINITIONS. Page 1 of 5

1 LAWS of MINNESOTA 2014 Ch 250, s 3. CHAPTER 250--H.F.No BE IT ENACTED BY THE LEGISLATURE OF THE STATE OF MINNESOTA:

Allergy & Rhinology. Manuscript Submission Guidelines. Table of Contents:

PRIVACY IMPACT ASSESSMENT (PIA) For the

Report of the Information & Privacy Commissioner/Ontario. Review of the Cardiac Care Network of Ontario (CCN):

Paragon Infusion Centers Patient Information

INFORMATION TO BE GIVEN

Health Information Privacy Policies and Procedures

Measures Reporting for Eligible Hospitals

HIPAA Privacy Regulations Governing Research

Pre-OCONUS travel File (PRO-File) Step-by-step instruction

1500 Health Insurance Claim Form. Frequently Asked Questions (as of 6/17/13)

RESEARCH POLICY MANUAL

Draft Code of Practice FOR PUBLIC CONSULTATION

I. LIVE INTERACTIVE TELEDERMATOLOGY

PARAGOULD DOCTORS CLINIC PRIVACY NOTICE

Healthcare Identifiers Service Information Guide

Appendix: Data Sources and Methodology

POLICY STATEMENT PRIVACY POLICY

NCRIC ALPR FAQs. Page: FAQ:

New HIPAA Privacy Regulations Governing Research. Karen Blackwell, MS Director, HIPAA Compliance

Release of Medical Records in Ohio OHIMA. Ohio Revised Code (ORC) HIPAA

A PRIVACY ANALYTICS WHITE PAPER. The De-identification Maturity Model. Khaled El Emam, PhD Waël Hassan, PhD

WISHIN Statement on Privacy, Security, and HIPAA Compliance - for WISHIN Pulse

Advanced HIPAA Communications and University Relations

Standard Operating Procedures (SOP) Research and Development Office

Department of Defense INSTRUCTION

HIPAA. Health Insurance Portability and Accountability Act. Presented by the UMMC Office of Integrity and Compliance

Statistical Analysis of the EPIRARE Survey on Registries Data Elements

LAW OF GEORGIA ON PATIENT RIGHTS

Client Information Form

# $ pages In Stock. Report Description

The Queen s Medical Center HIPAA Training Packet for Researchers

Are you participating in any other research studies? Yes No

Signature (Patient or Legal Guardian): Date:

If you have any questions about this notice, please contact the SSHS Privacy Officer at:

Medical Records Ch. 13. Dr. Thorson

Quality Data Model (QDM) Style Guide. QDM (version MAT) for Meaningful Use Stage 2

Optimization Problems in Machine Learning

Notice of privacy practices

E-Health System and EHR. Health and Wellness Atlantic Access and Privacy Workshop June 27-28, 2005

A Study on Personal Health Information De-identification Status for Big Data

NATIONAL GEOGRAPHIC SOCIETY CONSERVATION GRANT APPLICATION PREPARATION

UCL Research Ethics Committee. Application For Ethical Review: Low Risk

Department of Defense INSTRUCTION. SUBJECT: Security of Unclassified DoD Information on Non-DoD Information Systems

Transcription:

Traditional Beliefs Beyond Specification is Enforcement Bradley Malin School of Computer Science Carnegie Mellon University October 31, 25 We know how to protect privacy: If You Encrypt, They Must Acquit (Cryptography, Secure Storage) Make Strong Barriers (Authentication, Network Security, Intrusion Detection) Inform Collectors and Users (Policy Specification, Auditing) Don t Share Identity (Federal Agencies, Data Brokers, Crediting) Security for Privacy? Security for Privacy? Authorization: allowed to read/write data Authentication: login with password Encryption: to avoid eavesdropping Authentication: login with password Authorization: allowed to read/write data Encryption: to avoid eavesdropping But Data Can Re-identify! Can I see some anonymous data? Security for Privacy? Ah-ha! it s the Data: Define a Privacy Policy Cheer for the many benefits! Procedure Specifies how data can (not) be used Authentication: login with password Authorization: allowed to read/write data Encryption: to avoid eavesdropping But Data Can Re-identify! Ah! I know who this is! Logical Cognition Demands active involvement & thought regarding information Standardization equal opportunity Legal Enforcement L. Cranor. Web privacy with P3P. O Reilly & Associates. Sebastopol, CA. 22. W. Stufflebeam, et al. Specifying privacy policies with P3P and EPAL: lessons learned. Workshop on Privacy in the Electronic Society. 23. 1

Why? Legal Aspects United States Federal / State level Financial (GLB) Privacy Act of 1974 Medical (HIPAA) Privacy on the WWW Minors (COPPA) Educational (FERPA) Wiretap & Surveillance Laws United States Why? Legal Aspects Federal / State level Privacy Act of 1974 Privacy on the WWW Educational (FERPA) Financial (GLB) Medical (HIPAA) Minors (COPPA) Wiretap and Surveillance Laws Europe Data Directive 95/46 Safe Harbor and US Let s Consider FERPA (Buckley Amendment) Family Educational Right to Privacy Applies to: schools receiving funds from US Dept. Educ. If school permits the release of students educational records w/o written consent of parents Federal funding refusal Parents or eligible students have rights Inspect/review student's school s education records request school correct records believed to be inaccurate/misleading FERPA Schools may disclose, without consent, "directory" information, such as: Name date and place of birth address honors and awards telephone number and dates of attendance Schools must alert parents/students about directory and allow request not to disclose Schools must notify parents and eligible students annually of their rights under FERPA FERPA in Practice Many schools privacy policies state they choose not to post any (or minimal) directory information on their students Example: MIT "7.2 School, department and lab web pages - Faculty, staff and students MUST EXERCISE CAUTION IN POSTING DIRECTORY and other information to a web page that is accessible to MIT and/or to the public. STUDENTS HAVE THE RIGHT TO WITHOLD DIRECTORY and other information from public distribution. FACULTY AND STAFF MUST RECEIVE PERMISSION to post personal information and identification photographs to web pages." FERPA in the Face of Technology RosterFinder software program finds online name lists Leverages Google API Applied to gather undergraduate information Discovered many directories of undergraduates online not supposed to be there Improper communication and privacy policy enforcement L. Sweeney. Finding Lists of People on the Web. ACM Computers and Society, 34 (1) April 24. 2

Precision of rosters by RosterFinder results M: manually R: RosterFinder Pos: ranked position Tot: total number of docs Increasing ability to gather data & infringe on privacy! But can also automated policy enforcement FERPA in the Face of Technology Great, you Defined a Privacy Policy? But Wait a Minute But consider some of the limitations Need robust language (P3P and EPAL are the beginning) Scope of world / interaction Syntax, not semantics Need enforcement Enter data privacy: WHERE does data come from? WHAT does data reveal? HOW do we prove data does not reveal more than specified? L. Sweeney. Finding Lists of People on the Web. ACM Computers and Society, 34 (1) April 24. What is? The study of computational solutions for releasing data such that data remains practically useful while aspects of subjects are not revealed. Privacy Protection ( data protectors ): release information such that entity-specific properties (e.g. identity) are controlled restrict what can be learned INFERENCE CONTROL Privacy Is Complex PUBLIC POLICY DISCLOSURE CONTROL LAW PRIVACY SPECIFICATION ANONYMITY (De-identification) HUMAN INTERFACE ORGANIZATIONAL PRACTICES ENFORCEMENT & IMPLEMENTATION SECURITY TRUSTED HARDWARE CRYPTO Data Linkage ( data detectives ) combining disparate pieces of entity-specific information to learn more about an entity PRIVACY- PRESERVING DATA MINING AUDIT & ACCOUNTABILITY Diagram courtesy of Michael Shamos. is Interdisciplinary Data. Data. Data. AI learning theory database language security IS anonymity heavy some heavy rights mgt some heavy database some heavy some ubiquitous heavy heavy heavy some some some AI primarily concerns knowledge representation and semantics Learning focuses on data mining algorithms Theory includes zero-knowledge proofs and multi-party computations What kind of data? Field Structured Databases Text Documents Genomic Image Video Network (Physical or Social) Communications All kinds! 3

GDSP (MB/person) Information Explosion (Sweeney 97) 3 Increase in technological capability for collection, storage, 25 transfer Growth in 2 active web 15 Decrease servers in cost 1 Global Disk Storage Per Person (GDSP) 5 ~(hard drive space) / (world population) Sewrvers (in Millions) 35 5 45 4 35 3 25 2 15 1 5 1983 1985 1987 1989 1991 1993 1995 1997 1999 21 23 Growth in available disk storage Storage (tera) Population (1 9 ) Person-time / page 1983 1986 9 4.5 2 months 16, 1993 First WWW 5.7 conference 1 hour 2 2,8, 6. 3.5 min 1983 1985 1987 1989 1991 1993 1995 1997 1999 21 23 1991 Ye ar 1996 21 Anonymity & De-identification Anonymous: Data can not be manipulated or linked to identify an individual De-identified: All explicit identifiers, such as name, address, & phone number are removed, generalized, or replaced with made up values Does Anonymous = De-identified? HIPAA (Health Insurance Portability & Accountability Act) Rationale: Inconsistent state laws promulgating unnecessary difficulties in standardization, transfer, and sharing of health-related information A covered entity may not use or disclose protected health information Exceptions To the individual that the information corresponds With consent: to carry out treatment, payment, or health care operations If consent is not required: same as above, but not with respect to psychotherapy notes Safe Harbor Data Sharing Under HIPAA Data that can be given away requires removal of 18 direct and other quasi-identifiers Includes: name, address, zip code, phone number, birthdate, no geographic smaller than a state Limited Release Recipient contractually agrees to not use or disclose the information for purposes other than prespecified research and will not identify or contact the individuals who are the subjects May include specific geographic locations (i.e. zip code) Statistical or Scientific Standard (we ll return to this) Healthcare Reform At Work Collect and disseminate hospital discharge data Attributes recommended by National Association of Health Data Organizations for disclosure BUT this is outside the jurisdiction of HIPAA Patient Zip Code Patient Birth Date Patient Gender Patient Racial Background Patient Number Visit Date Principle Diagnosis Codes (ICD-9) Procedure Codes Physician ID Number Physician Zip Code Total Charges Linkage Use combination of attributes to determine the uniqueness of an entity in a dataset Second dataset with identified subjects is used to make the re-identification by drawing inferences between the two datasets on the related attributes The attributes do not have to be equal, but there must exist some ability for inference of between attributes. 4

Linking to Re-identify Data Linking to Re-identify Data Ethnicity Visit date Diagnosis Procedure Medication Zip Birthdate Sex Zip Birthdate Sex Name Address Date registered Party affiliation Date last voted Total charge Medical Data Voter List Linking to Re-identify Data Ethnicity Visit date Diagnosis Procedure Medication Total charge Zip Sex Name Address 87% of the United States is Birthdate RE-IDENTIFIABLE Date registered Party affiliation Date last voted {date of birth, gender, 5-digit ZIP} uniquely identifies 87.1% of USA Few fields are needed to uniquely identify individuals. Medical Data Voter List L. Sweeney. Weaving technology and policy to maintain confidentiality. Journal of Law, Medicine, and Ethics. 1997: Privacy L. Sweeney. Policy, Technology, Uniqueness. and Data Law Privacy Laboratory Technical Report. 2. {date of birth, gender, 5-digit ZIP} uniquely identifies 87.1% of USA ZIP 6623, 112,167 people, 11%, not % insufficient # above the age of 55 living there. {date of birth, gender, 5-digit ZIP} uniquely identifies 87.1% of USA ZIP 11794, 5418 people, primarily between 19 and 24 (4666 of 5418 or 86%), only 13%. 5

Voter List Chain of Links D G Z Medical Data Voter List D G Z Chain of Links Medical Data So what do you do? DNA Data Mutation Analysis Prediction and Risk Pharmaco-Genomic Relations Familial Relations ATCGATCGAT... DNA - Discharge Inferences Exist ATCGATCGAT Ethnicity Visit date Diagnosis Procedure Medication Total charge Inferences can lead to re-identification Zip Birthdate Sex B. Malin and L. Sweeney. Determining the identifiability of DNA database entries. In Proceedings of the 25 AMIA Annual Symposium. 2: 547-551. Genotype-Phenotype Relations Can infer genotype-phenotype relationships out of both DNA and medical databases Medical Database DIAGNOSIS Disease Phenotype DIAGNOSIS Phenotype With Genetic Trait ACTG Disease Sequences ACTG Genomic DNA DNA Database B. Malin and L. Sweeney. Determining the identifiability of DNA database entries. In Proceedings of the 25 AMIA Annual Symposium. 2: 547-551. False Protection Example DNA Re-identification Name John Doe Jane Doh Address 1 Some Way 2 No Way Diagnosis 33321 82132 Treatment 123 912 DNA accta a agctt c Many deployed genomic privacy technologies leave DNA susceptible to re-identification DNA is re-identified by automated methods, such as: Genotype Phenotype (G-P) Inference DNA CLINICAL Name John Doe Jane Doh Address 1 Some Way 2 No Way Diagnosis 33321 82132 Treatment 123 912 DNA accta a agctt c Sequence accta a agctt c Name Address John Doe 1 Some Way Jane Doh 2 No Way Linkage Prediction ICD9 Code 3334 277 6

DNA Re-identification Many deployed genomic privacy technologies leave DNA susceptible to re-identification DNA is re-identified by automated methods, such as: Genotype Phenotype (G-P) Inference DNA Re-identification Many deployed genomic privacy technologies leave DNA susceptible to re-identification DNA is re-identified by automated methods, such as: Genotype Phenotype (G-P) Inference DNA accta a INFERRED DISEASE Cystic Fibrosis INFERRED DISEASE Huntington s Disease Name John Doe Address 1 Some Way ICD-9 3334 DNA accta a INFERRED DISEASE Cystic Fibrosis INFERRED DISEASE Huntington s Disease Name John Doe Address 1 Some Way ICD-9 3334 agctt c Huntington s Disease Cystic Fibrosis Jane Doh 2 No Way 277 agctt c Huntington s Disease Cystic Fibrosis Jane Doh 2 No Way 277 Linkage Prediction Unique Re-identification! Longitudinal Genomic Learning Model Clinical Profiles Diagnoses Clinical Phenotype State Mapping Classify Profile Visits Constrain Profile State Alignment DNA Predictions B. Malin and L. Sweeney. Inferring genotype from clinical phenotype through a knowledgebased algorithm. In Proceedings of the Pacific Symposium on Biocomputing. 22: 41-52. Learning DNA from Phenotype Example: Huntington s disease Exists strong correlation between age of onset and DNA mutation (# of CAG repeats) Given longitudinal clinical info, accurately infer age of onset in 2 of 22 cases Size of Repeat vs. Age of Onset y = -21.48Ln(x) + 122.66 R 2 =.889 7 6 5 4 3 2 act ual age 1 min age max age 5 1 15 2 25 individual Individual B. Malin and L. Sweeney. Inferring genotype from clinical phenotype through a knowledgebased algorithm. In Proceedings of the Pacific Symposium on Biocomputing. 22: 41-52. # CAG repeats Age 9 8 7 6 5 4 3 5 1 15 2 25 3 35 4 45 5 55 6 Age of Onset Age of Onset Prediction So What Do We Do? Some say, You Can t Release Any Data So What Do We Do? Others* say, Privacy is Dead, Get Over It Accuracy, quality, risk Distortion, anonymity Accuracy, quality, risk Distortion, anonymity Ann 1/2/61 2139 cardiac Abe 7/14/61 2139 cancer Al 3/8/61 2138 liver Recipient Data Holder Recipient * Others: See Larry Ellison (Oracle), Scott McNealy (Sun Micro.) Data Holder 7

So What Do We Do? We say, Share Data While Providing Guarantees of Anonymity Example: Camera-Happy World Over 3 million cameras in the US Manhattan has over 25 cameras Average American caught on camera 8-1 times / day Recipient A* 1961 213* cardiac A* 1961 213* cancer A* 1961 213* liver Computational solutions Holder Over 4 million cameras in the UK Average Londoner is caught > 3 times a day Some Camera Watch Images CMU Camera Watch Project http://privacy.cs.cmu.edu/dataprivacy/projects/camwatch/index.html Video Goal Modify video images so that Privacy: automated attempts to recognize faces fail Utility: knowledge learned from data is useful The Good Side of Surveillance Homeland security monitoring Monitor number of faces over time Solution to problem Enables sharing of data for specified purposes Protects rights as specified in policy e.g. your identity won t be revealed unless you have done something illegal Early bioterrorism detection Monitor for respiratory distress L. Sweeney and R. Gross. Mining images from publicly-available cameras for homeland security. In Proceedings of the AAAI Spring Symposium. 25. 8

Protection Post / During Capture A Solution: The Dot Approach Can we study video and image information for surveillance purposes with identity protection? Example: can we track people, but withhold identity? More detailed Silhouettes and coloring for tracking De-identifying People Alternative De-identification Masking and environmental suppression (More from Andrew Senior - IBM) (Andrew Senior --- IBM) A. Senior, et. al. Enabling video privacy through computer vision. IEEE Security and Privacy Magazine. May-June 25; 3(3): 5-57. Original People Removed Back Removed People Silhouette Andrew Senior. http://www.research.ibm.com/people/a/aws/ Can we make Video Privacy More Formal? De-identifying Video Surveillance Data De-identification for some uses can be achieved by replacing people with dots or replacing faces with blobs. In each case, de-identification is achieved but not necessarily anonymity What if we need to see what a face is expressing? Example use. Tracking coughs (biosurveillance) or suspicious behavior in public spaces. De-identification, not anonymity Separating machines from humans 9

Example: De-identification of Faces Captured images are below. Here is a known image of Bob. Which person is Bob? Example: De-identification of Faces Captured images are below. Here is a known image of Bob. Which person is Bob? Face Recognition: The Big Idea PCA-Based Face Recognition Systems Identification Algorithm Name of Person Identity Unknown? Goal: Limit success of Module 2 Gallery Probe Module 2: Eigenfaces / PCA Training Set Face Space of Average Face Projected Gallery Distance Measure Projected Probe Face Recognition Software: ~7% % of Samples Where Correct Match Found 1 95 85 75 65 55 1 5 1 25 5 15 All Rank 1

Eigenvectors (i.e. Concepts ) The characteristic function: (A-λI) = De-identification: T-mask Example continued... Captured images are deidentified below. Here is a known image of Bob. Which person is Bob? where A is the covariance matrix De-identification: T-mask Example continued... Captured images are deidentified below. Here is a known image of Bob. Which person is Bob? % of Samples Where Correct Match Found Automated Recognition Fails! 1 8 6 4 2 (Unaltered vs. T-Bar ) 1 5 1 25 5 15 All Rank De-identification: pixel reduction Example continued... Captured images are deidentified below. Here is a known image of Bob. Which person is Bob? De-identification: pixel reduction Example continued... Captured images are deidentified below. Here is a known image of Bob. Which person is Bob? 11

De-identification: pixel reduction 1 Pixelation: Automated Recognition Easier! 1 % of Samples Where Correct Match Found 9 8 7 6 % of Samples Where Correct Match Found 9 8 7 6 Pixelated Both Pictures: Probe and Gallery 5 1 5 1 25 5 15 All Rank 25 Bradley Malin 5 1 5 1 25 5 15 All Rank 25 Bradley Malin Why Try These Crazy Things? Many people and organizations claim they work Why Try These Crazy Things? Many people and organizations claim they work Guassian Blur Pilelation J. Alexander and J. Smith. Engineering privacy in public: confounding face recognition. Third Privacy Enhancing Technologies Workshop. 23 M. Boyle, C. Edwards, and S. Greenberg. The effects of filtered video on awareness and privacy. ACM Conference on Computer Supported Cooperative Work. 2. But Why Should We Care? Policy Sidebar More De-identification Ideas! European Data Directive Collected video and images can not be released unless they have been sufficiently protected Contends pixelation is sufficient criteria for identity protection Single Bar Mask T-Mask Black Blob Mouth Only Grayscale Black & White Ordinal Data Threshold Pixelation Negative Grayscale Black & White Random Grayscale Black & White Mr. Potato Head 12

% of Samples Where Correct Match Found Ad Hoc Methods = Poor Protection 1 Percent Identifie 9 8 7 6 5 4 3 2 1 Test All Black VS. % of Samples Where Correct Match Found 1 8 6 4 2 Not Looking Good 4 8 12 16 2 24 255 Threshold Level T = 65 T = 15 % of Samples Where Correct Match Found Random Changes to Grayscale Images 1 Percent Correctly Identified 9 8 7 6 5 4 3 2 1 Original Gray Scale/Rand Experiment: ID rate v. Number of Pixels Changed Identification Rate for Randomly Changed set in Gallery Randomly v. Originals changed gallery Identification Rate for Originals v. Randomly Changed Randomly changed probe set in Probe 2 4 6 8 1 12 14 Number of Values Changed 2 4 6 8 1 12 14 Number of Pixels Changed (R) R = 3 R = 9 Don t be Naïve Again, de-identified anonymous Masks can be removed and trained against Some cases naïve de-identification even harms privacy! pixelation and blur may improve performance Time to get logical k-protection Models k-anonymity: For every record, there are at least k individuals to whom it refers (realized upon release). k-same: For every face, there are at least k people to whom that face refers. No face actually refers to a single real person. E. Newton, L. Sweeney, and B. Malin. Preserving privacy by de-identifying facial images. IEEE Transactions on Knowledge and Data Engineering. 25; 17(2): 232-243. Formal Models of Anonymity Jcd Jwq Jxy Dan Don Dave Ann Abe Al Subjects Population Universe Ann 1/2/61 2139 cardiac Abe 7/14/61 2139 cancer Al 3/8/61 2138 liver Private Information Jcd Jwq Jxy Null-Map Al 3/8/1961 2138 cardiac Ann 1/2/1961 2139 cancer Abe 7/14/1961 2139 liver Wrong-Map A* 1961 213* cardiac A* 1961 213* cancer A* 1961 213* liver k-anonymity cardiac cancer liver 13

Model Examples Subexample: Population Registers k-map: For each tuple record in the release, record must refer to at least k entities in the population A* 1963 213* cardiac A* 1961 213* cancer A* 1964 213* liver Gil Hal Jim There are three colors with frequencies: 1 red, 3 green and 2 blue. There are 2 types of figures, with 2 of one type and 4 of the other. k-anonymity: k in the release A* 1961 213* cardiac A* 1961 213* cancer A* 1961 213* liver Ken Len Mel Register The combination of color and figure labeled as Hal and Len are each unique. Formal Protection Example Ranking of Faces Gil Hal Jim Ken Len Mel Register + = Release To achieve k-map where k=2, agents for Gil, Hal and Ken merge their info Information released about any of them results in the same merged image. How does everyone rank against each other? Who is is closest? Who is is farthest? k-anonymity: Face Style! Face DB k-same Algorithm S5 No Privacy Protection Face dataset is k-anonymized k-anonymized if each probe image maps to at least??? k gallery images S1 S1 S1 S5 S1 Similarity Function DB Subset Average Function S4 S2 S4 S2 S3 S3 14

Example of 2-Same k-same Example (More Depth) -Pixel -Eigen k = 2 3 5 1 5 1 Guarantee k-same Algorithm Image sets de-identified using k-same are k-anonymized % of Samples Where Correct Match Found Performance of k-same Algorithms 11.9 8.8.7 6.6 Percent Correct, Top Rank.5 4.4.3.2 2.1 Expected[k-Same] k-same-pixel k-same-eigen 1 2 2 3 4 5 k5 6 7 8 9 1 Rank Upper-bound on Recognition Performance = 1 k Can Guarantee this bound for ANY Recognition System Some Intuition: Blurring Some Intuition: Pixelation Original Original 15

Some Intuition: k-same k-same Algorithm Concerns Guarantee Image sets de-identified using k-same are k-anonymized Original K = 5 K = 15 But Changes in face expression Changes in gender Noticeable blurring Face DB Extending k-same to k-same-select Expression k-same-select Results Similarity Function DB Subset Average Function Original Gender & Expression Data Utility(ies) Original Gender Classification: Ad Hoc Expression Classification Small performance decrease for blurring Noticeable decrease for pixelation Similar results similar to gender classification 16

Expression Classification Demonstration Time! K-Same Demo (http://privacy.cs.cmu.edu/dataprivacy/projects/video/datainfo.html) GALLERY PROBE k-same decreases data utility k-same-select increases data utility Some Parting Thoughts Security + Policy does not guarantee Privacy Privacy is not dead, but it requires intelligence Interdisciplinary approach is necessary Understand policy & law Understand the technology Understand the goals of data use Thanks! malin@cs.cmu.edu Some slides adapted from: Ralph Gross Elaine Newton Michael Shamos Latanya Sweeney More information: http://privacy.cs.cmu.edu http://www.cs.cmu.edu/~malin 17