Title: | United States Copyright Office Product Management Division SR Audit Data Dataset Cleaning Algorithms |
---|---|
Description: | Intended to be used by the United States Copyright Office Product Management Division Business Analysts. Include algorithms for the United States Copyright Office Product Management Division SR Audit Data dataset. The algorithm takes in the SR Audit Data excel file and reformat the spreadsheet such that the values and variables fit the format of the online database. Support functions in this package include clean_str(), which cleans instances of variable AUDIT_LOG; clean_data_to_excel(), which cleans and output the reorganized SR Audit Data dataset in excel format; clean_data_to_dataframe(), which cleans and stores the reorganized SR Audit Data data set to a data frame; format_from_excel(), which reads in the outputted excel file from the clean_data_to_excel() function and formats and returns the data as a dictionary that uses FIELD types as keys and NON-FIELD types as the values of those keys. format_from_dataframe(), which reads in the outputted data frame from the clean_data_to_dataframe() function and formats and returns the data as a dictionary that uses FIELD types as keys and NON-FIELD types as the values of those keys; support_function(), which takes in the dictionary outputted either from the format_from_dataframe() or format_from_excel() function and returns the data as a formatted data frame according to the original U.S. Copyright Office SR Audit Data online database. The main function of this package is clean_format_all(), which takes in an excel file and returns the formatted data into a new excel and text file according to the format from the U.S. Copyright Office SR Audit Data online database. |
Authors: | Frederick Liu [aut, cre] |
Maintainer: | Frederick Liu <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.0.3 |
Built: | 2024-11-01 11:21:32 UTC |
Source: | https://github.com/cran/uscoauditlog |
Cleans and output the reorganized SR Audit Data dataset into a data frame
clean_data_to_dataframe(filename)
clean_data_to_dataframe(filename)
filename |
Input name of the .xlsx file |
Returns a dataframe that includes the cleaned data.
## Not run: ## Read in the original excel file filename = "data.xlsx" clean_data_to_dataframe(filename) ## End(Not run)
## Not run: ## Read in the original excel file filename = "data.xlsx" clean_data_to_dataframe(filename) ## End(Not run)
Cleans and output the reorganized SR Audit Data dataset in .xlsx format
clean_data_to_excel(filename)
clean_data_to_excel(filename)
filename |
Input name of the .xlsx file |
Returns an excel sheet that includes the cleaned data.
## Not run: filename = "data.xlsx" clean_data_to_excel(filename) ## End(Not run)
## Not run: filename = "data.xlsx" clean_data_to_excel(filename) ## End(Not run)
Takes in a .xlsx file and returns the formatted data into a new .xlsx and .txt file according to the format of the U.S. Copyright Office SR Audit Data online database.
clean_format_all(excelfile)
clean_format_all(excelfile)
excelfile |
Input the original raw SR Audit Data spreadsheet |
Returns an excel sheet and text file that includes the cleaned and formatted data that are congruent to the format of the U.S. Copyright Office SR Audit Data online database.
#This is the main function. Users should be only using this function for data cleaning. ## Not run: filename = "data.xlsx" clean_format_all(excelfile) ## End(Not run)
#This is the main function. Users should be only using this function for data cleaning. ## Not run: filename = "data.xlsx" clean_format_all(excelfile) ## End(Not run)
Cleans instances of variable AUDIT_LOG from the U.S. Copyright Office SR Audit Data spreadsheet
clean_str(str)
clean_str(str)
str |
Input an instance value from variable AUDIT_LOG |
Returns a cleaned string version of an instance from variable AUDIT_LOG.
str = "2*J15*Owner2*L12*LAAS2*K10*2*C110*SR_STAT_ID2*N14*Open2*O16*Closed" clean_str(str)
str = "2*J15*Owner2*L12*LAAS2*K10*2*C110*SR_STAT_ID2*N14*Open2*O16*Closed" clean_str(str)
Reads in the outputted data frame from the clean_data_to_dataframe function and formats and returns the data as a dictionary that uses FIELD types as keys and NON-FIELD types as the values of those keys
format_from_dataframe(dataframedata)
format_from_dataframe(dataframedata)
dataframedata |
Input the cleaned .xlsx sheet outputted from the function clean_data_to_dataframe |
Returns a vector dictionary that contains the formatted version of the cleaned data.
## Not run: filename = "data.xlsx" dataframedata = clean_data_to_dataframe(filename) format_from_dataframe(dataframedata) ## End(Not run)
## Not run: filename = "data.xlsx" dataframedata = clean_data_to_dataframe(filename) format_from_dataframe(dataframedata) ## End(Not run)
Reads in the outputted excel file from the clean_data_to_excel function and formats and returns the data as a dictionary that uses FIELD types as keys and NON-FIELD types as the values of those keys
format_from_excel(filename)
format_from_excel(filename)
filename |
Input the cleaned .xlsx sheet outputted from the function clean_data_to_excel |
Returns a vector dictionary that contains the formatted version of the cleaned data.
## Not run: filename = "data.xlsx" filename = clean_data_to_excel(filename) format_from_excel(filename) ## End(Not run)
## Not run: filename = "data.xlsx" filename = clean_data_to_excel(filename) format_from_excel(filename) ## End(Not run)
Takes in the dictionary outputted either from the format_from_dataframe or format_from_excel function and returns the data as a formatted data frame according to the original U.S. Copyright Office SR Audit Data online database.
support_function(data)
support_function(data)
data |
Input the dictionary variable from the format_from_dataframe or format_from_excel function |
Returns a formatted data frame according to the original U.S. Copyright Office SR Audit Data online database.
## Not run: filename = "data.xlsx" dataframedata = clean_data_to_dataframe(filename) data = format_from_dataframe(dataframedata) support_function(data) ## End(Not run)
## Not run: filename = "data.xlsx" dataframedata = clean_data_to_dataframe(filename) data = format_from_dataframe(dataframedata) support_function(data) ## End(Not run)