Thursday, April 13, 2017

Data Masking using OEM Cloud Control

Data masking is a technique whereby we mask the sensitive data in the cloned copies of production databases (cloned for testing purposes) so that testing databases don’t contain any sensitive information like credit card numbers or other personal information data like email address or phone numbers etc. Once we create a clone of production database, we mask the data before handing over to the users.
In this article I will explain how we perform data masking using OEM Cloud Control 12c and 13c because method of masking data is almost same in both versions.
I have an database target registered with OEM which contains a schema salman. There are 2 tables in this schema with sensitive information. Following is a sample script to create these tables and populate with the data
SQL> create user salman identified by salman default tablespace users quota unlimited on users;

User created.

SQL> grant connect, create table to salman;

Grant succeeded.

SQL> conn salman/salman

create table departments(dept_id number, department_name varchar2(20));
alter table departments add constraint pk_department primary key (dept_id);
insert into departments values (100, 'Finance');
insert into departments values (200, 'HR');
insert into departments values (300, 'MIS');
insert into departments values (400, 'Facility');

create table customers_purchases
(cust_id number,
purchase_id number,
credit_card_number varchar2(20),
paid_amount number,
purchase_date date,
constraint fk_customers foreign key (cust_id) references customers(cust_id));

insert into customers_purchases values(100,1,'2345323456786543',200,sysdate -10);
insert into customers_purchases values(100,2,'2345323456786543',250,sysdate -14);
insert into customers_purchases values(100,3,'2345323456786543',400,sysdate -20);
insert into customers_purchases values(200,1,'9473648593846253',300,sysdate -11);
insert into customers_purchases values(200,2,'9473648593846253',600,sysdate -5);
insert into customers_purchases values(200,3,'9473648593846253',220,sysdate -6);
insert into customers_purchases values(300,1,'8574637485967564',2920,sysdate -7);
insert into customers_purchases values(300,2,'8574637485967564',5430,sysdate -8);
insert into customers_purchases values(400,1,'3454657685645342',7560,sysdate -9);
insert into customers_purchases values(400,2,'3454657685645342',1245,sysdate -40);
insert into customers_purchases values(400,3,'3454657685645342',2000,sysdate -90);

Now I have 2 tables one of which contains sensitive personal information of customers and other table contains sensitive credit card and other payments information. Both tables are also in parent child relation and later we will see how OEM automatically identifies related table and how it masks data in the columns involved in parent child relationship.

Data Masking Step by Step
First step is to create an application data model (ADM). During ADM creation, we identify the schema tables and sensitive columns that require to be masked. While creating ADM, OEM will automatically identify parent child relationships of tables.
For OEM 12c 
Log into the OEM using SYSMAN or any other privileged user and select Enterprise -> Quality Management -> Application Data Modeling.

For OEM 13c

On next page, click on Create

On this page, provide name of your ADM and select source database by clicking on search icon. My target database name is dbmask which is registered as dbmask_test (target name) in the OEM. Once done, click on continue.

On this screen, we need to provide database credentials to log into the database. I am providing credentials and also storing credentials as Named Credentials because later we would need to use these credentials again. Click Login button.

After clicking Login, we see left pane populated with all schemas in the database. Here I selected “salman” schema and clicked the highlighted arrow button to move it to right pane. This schema is now selected for ADM. Click Continue

A job will be created to build the ADM. Click submit

We can click on the link “View Job Details” to see the status of the job, otherwise just click on the refresh button at the right corner of the page again and again until status of the job changes to SUCCEEDED (if it fails, check the job from Job Activity page to find out the cause  of failure). Once job completes, lock icon beside the ADM name will be no longer visible.

Select/click this newly created ADM and click on Edit

On next screen, use the credentials created above to log into the database.

Next screen shows 3 tabs, first tab lists all the tables of schema (salman), second tab shows tables with relationships.

If we click on third tab, we can either select sensitive columns (columns to be masked) or we can also let OEM do this for us as can be seen in bellow 3 screenshots. Second screen bellow shows the step where we select the sensitive column type that we want OEM to find for us. OEM will go through the tables’ data to find out which column has sensitive data.

Select sensitive columns types and click continue

Click on submit

Job is submitted. Once job completes, click on “Discover Results” to see the discovered sensitive columns. Job may take longer time based on the number of tables and size of data.

Following screen shows that OEM has discovered the columns for us.

Second option is to click “Add” from the Sensitive Columns tab to manually select the sensitive columns as auto discovery job is still not intelligent to find our needed columns and may miss the columns that we want to mask. After clicking “Add”, following screen appears where we click on search icon to select application (SALMAN) and then clicking on Search button to list all tables and columns from where we can select all required columns and click OK. Alternatively you can do this columns selection table by table by providing table name and/or column names and then clicking search.

After selecting the columns and clicking OK from the above screen, we see a list of all selected columns but you will notice an extra column CUSTOMERS_PURCHASES.CUS_ID. This is selected automatically because parent table’s column CUSTOMERS.CUST_ID was selected by me, and after the data masking, data in this column would change and both tables should have identical data to maintain same referential integrity and therefore OEM added this column automatically. Click Save and Return button.

Our ADM has been created successfully.

If we have multiple databases with same schema structure and we want to use same ADM for that database, we can associate this ADM with another OEM target database by clicking Actions -> Associated Databases, and then clicking on Add button. See following 2 screenshots.

Next step is to create Data Masking Definition (DMD). Select Enterprise->Quality Management -> Data Masking Definition.

Click on Create form right side of the screen.

Provide name for the DMD and click on search icon to select the ADM and in the end click Add button.

Log into the database using named credentials already created in a previous steps.

Click on the search icon to select the schema and then click search button to list all the sensitive columns defined in the ADM. Here we can select each column one by one and then click on “Define Format” And “Add” button, or we can select multiple columns which have same data type to add data masking format and then click on “Define Format And Add”. Here I have selected VARCHAR2 columns together and later I will do same for NUMBER columns.

From this screen, we select format to mask these columns. I have selected “Random Strings” to update the existing data in my columns. We can also click on IMPORT button to import any customized data masking format created by us (not covered in this article).

Once you click “add” from above screen, following screen appears where format has been added. Specify the length of the string that will be used to replace the existing data during masking process. End length cannot be more than the size of the columns.

Click on “Add” once again to select other 3 NUMBER type columns which still require masking format to be added for them. I selected “Random Digits” masking format for these 3 columns with start length 5 and end length 5.

Following is what I finally see after adding masking formats for all the columns. Foreign key columns are automatically added even if we don’t select those. Click OK.

Next step is to generate the Data Masking Script. Click on Generate Script after selecting the DMD just created.

Select “Mask In-Database”, which means that data will be masked in existing database because this is already a clone of the production database. If we want script to create a script for datapump export which will mask as well as import the data in a new database (which means building a data masked clone database using datapump). During data masking, temporary objects may be created in default tablespace or in a different tablespace. Click Submit.

Once script generation completes, status of DMD will be “Script Generated”. Now last step is to perform data masking. Click on “Schedule Job” button to schedule a job to perform masking.

Select named credentials on the name screen. Click Login.

Click the checkbox “The select target is not a production database”. This is just to confirm that you don’t execute it on production accidentally. Select already created named credentials for “Database Credentials”, and also “Host Credentials”. For my case, I did not have any host credentials already created for this database host, so I clicked on “New” and provided credentials to log into the host during job execution, click Submit.
If we have associated a different database with the ADM as explained above, that database would also be available in drop down for the data masking job to be submitted. Under Data “Masking Option” in this screen, you can see the location and name of the script (data masking script) copied on the host. On the same location you can also see output log file after data masking job completes.

Monitor the job by clicking on the “View Job Details” on next screen. This job may take longer based on the data this job has to mask.

Once job completes, we can see the changed data in the tables.
I had following data before masking.

Data after masking can be seen bellow. We can also see that values in CUST_ID in both tables are masked in a way that referential integrity remains valid

We can see that values in CUST_ID in both tables are masked in a way that referential integrity remains valid

No comments: