Week 4 – Verify and report on your cleaning results


1. Verification and reporting come directly before the data-cleaning process.

Answers

·        True

·        False

Explanation: Without a doubt! Before moving on to the process of cleaning the data, it is necessary to first verify and report the data. Before beginning any kind of cleaning or analysis, it is necessary to be certain that the data are accurate and have not been tampered with in any way.

2. What is the first step in the verification process?

Answers

·        Compare cleaned data with the original, uncleaned dataset and compare it to what is there now

·        Create a chronological list of modifications made to the data

·        Determine the quality of the data

·        Inform others of your data-cleaning effort

Explanation: The first stage in the verification process often entails determining whether or not the data are complete and have not been tampered with. This involves checking to see whether the dataset has all the data that was anticipated, checking to make sure there are no missing values, and verifying the structure of the dataset as a whole. It's comparable to constructing a solid data analysis home from the ground up.

3. Which of the following functions automatically remove extra spaces when cleaning data?

Answers

·        SNIP

·        REMOVE

·        TRIM

·        CLEAR

Explanation: When data is cleaned, it is usual practice to utilize the TRIM function to eliminate superfluous spaces in a data set automatically. It is a useful tool that may maintain uniformity and get rid of unwanted gaps, which can be a source of problems during analysis or presentation.

4. What tool can a data analyst use to figure out how many identical errors occur in a dataset?

Answers

·        CASE

·        COUNTA

·        CONFIRM

·        COUNT

Explanation: A data analyst may make use of applications such as spreadsheet software (such as Excel) or scripting languages (such as Python or R) in order to determine the number of mistakes in a dataset that are similar to one another. They are able to recognize and tally the number of instances of certain faults within the dataset if they write scripts or make use of functions. Pattern matching and quickly locating faults that are similar are two more useful applications for regular expressions.

5. Fill in the blank: A data analyst uses the CASE statement to consider one or more _____, then returns a value.

Answers

·        additions

·        conditions

·        identifications

·        changes

Explanation: The CASE statement is used by a data analyst in order to take into consideration one or more criteria before returning a value.

6. What is the process of tracking changes, additions, deletions, and errors during data cleaning?

Answers

·        Recording

·        Observation

·        Cataloging

·        Documentation

Explanation: Data auditing or data lineage refers to the act of tracing the history of a set of data, including its modifications, additions, removals, and mistakes, while cleansing the data. It requires recording and keeping a record of any changes, additions, deletions, or corrections that were made to the dataset. These modifications might include anything from the list above. Having an audit trail provides significant benefits in terms of transparency, repeatability, and comprehension of the development of the data during the process of cleaning them.

7. Fill in the blank: While cleaning data, a data analyst can use a changelog to keep a chronological list of changes they make. They can refer to it during the _____ period if there are errors or questions.

Answers

·        presenting

·        verification

·        documentation

·        visualization

Explanation: A data analyst may use a changelog to retain a chronological track of modifications they make when cleaning data. This list can be seen at any time. During the time in which they are debugging, if there are any issues or queries, they may refer to it.

8. Reviewing version history is an effective way to view a changelog in SQL.

Answers

·        True

·        False

Explanation: Sure enough! The history of versions in SQL may be reviewed to provide a clear understanding of the changes that have been made over time, which can serve as an effective changelog. This comes in very handy for recording changes made to database schemas, queries, or stored procedures. It assists in understanding the growth of the database structure, which is helpful in troubleshooting, and it might be vital for preserving the integrity of the data.

Shuffle Q/A 1

9. In what step of the data-cleaning process do you find mistakes before you begin analyzing the data?

Answer

·        Confirming

·        Publishing

·        Verifying

·        Processing

Explanation: Part of the process of validating the data involves looking for errors before commencing the analysis of the data. Verifying the quality and integrity of the data is an important step to take before beginning analysis. This step helps to guarantee that any possible flaws or inconsistencies in the data are uncovered and rectified throughout the process of data cleaning. A more precise and trustworthy analysis may be carried out with the assistance of this proactive technique.

10. During the data cleaning process you find a significant amount of data that contains irrelevant spaces. Which function do you use to remove leading, trailing, or repeated spaces?

Answer

·        CUT

·        DELETE

·        TRIM

·        TIDY

Explanation: You may use the TRIM function to get rid of repetitive spaces in the data, leading or trailing spaces, or both. This function is often used in a variety of computer languages (for example, SQL and Python) and spreadsheet applications (for example, Excel) in order to clean up and standardize the space inside text data. It is a useful tool that may maintain uniformity and get rid of unwanted gaps, which can be a source of problems during analysis or presentation.

11. A data analyst is checking for errors in a dataset. They want to determine how many times the name of a country is in the dataset using a pivot table. What function can they use to find this count?

Answer

·        COUNTA

·        CHECK

·        COUNT

·        CASE

Explanation: A data analyst may utilize the "Count" or "Countif" function of a pivot table to identify the frequency with which the name of a nation occurs in a dataset. This can be accomplished by selecting the "Count" or "Countif" column. This function will provide a summary of the frequency with which each nation is mentioned by counting the number of times each distinct country name appears in the dataset. It is a helpful method for analyzing and comprehending the distribution of nation names within the dataset in a short amount of time.

12. You’re writing the below SQL query and need to change “World Wide Web” to “www”. What function would you use to accomplish this task?

SELECT
_____

WHEN ‘World Wide Web’ THEN ‘www’

END AS some_column

FROM

some_table

Answer

·        THEN

·        CASE

·        ELSE

·        WHEN

13. What should a data analyst actively track throughout the data cleaning process?

Answer

·        Additions, changes, and queries

·        Errors, deletions, and notes

·        Changes, resolutions, and deletions

·        Errors, additions, and deletions

14. A data analyst is in the verification process and needs to verify the modifications that they have made to the data. What could the analyst reference to find the changes they made throughout data cleaning?

Answer

·        Changelog

·        Notepad

·        Spreadsheet

·        Metadata

15. A data analyst commits a query to the repository as a new and improved query. Then, they specify the changes they made and why they made them. This scenario is part of what process?

Answer

·        Reporting data

·        Visualizing data

·        Communicating with stakeholders

·        Creating a changelog

Explanation: The procedure that you described is called as version control or versioning, and it includes the situation that you described. To be more specific, it entails committing changes to a repository along with a message that details the changes done as well as the reasoning behind why those changes were made. When developing or modifying queries, scripts, or any other kind of code, it is essential to follow this best practice in order to keep a clean and well-organized history of the changes that have been made, to enable cooperation, and to guarantee transparency.

16. The data collected for an analysis project has just been cleaned. What are the next steps for a data analyst? Select all that apply.

Answer

·        Reporting

·        Certification

·        Validation

·        Verification

17. As a data analyst, you will need to keep the big picture in mind throughout any project when verifying data cleaning. What must the analyst do to take a big picture view of the project? Select all that apply.

Answer

·        Consider the data

·        Consider the goal

·        Consider the business problem

·        Consider the reporting

18. During the verification process, you find that you missed a few leading spaces during data cleaning. What function can you use to eliminate these spaces?

Answer

·        TRIM

·        TIDY

·        CUT

·        CROP

Explanation: During the process of data cleaning, you may make use of the LTRIM function in SQL to remove any leading spaces. This method will remove any leading spaces (or characters that you specify) from a string. This will ensure that any unnecessary spaces at the start of the data are cut.

Shuffle Q/A 2

19. Which SQL tool considers one or more conditions, then returns a value as soon as a condition is met?

Answer

·        THEN

·        WHEN

·        CASE

·        ELSE

Explanation: The CASE statement is a SQL tool that evaluates one or more conditions and provides a value as soon as a condition is satisfied. This tool may be used in conjunction with other SQL tools. You are able to implement conditional logic in SQL by using the CASE statement, which involves assessing several conditions in a sequential manner and delivering a given result after a condition has been fulfilled. This may come in handy when constructing more complicated queries with the usage of conditional expressions.

20. Fill in the blank: Documentation is the process of tracking _____ during data cleaning. Select all that apply.

Answer

·        additions

·        deletions

·        changes

·        inactivity

21. Fill in the blank: A changelog contains a _____ list of modifications made to a project.

Answer

·        random

·        approximate

·        chronological

·        synchronized

Explanation: A changelog is a document that provides a chronological summary of all of the changes that have been made to a project.

22. You start a complex project that will take more than a year to complete. You need to document modifications made to your queries throughout the project. What is the correct way to store these modifications?

Answer

·        Creating a changelog

·        Creating a notepad

·        Visualizing data

·        Creating a spreadsheet

Explanation: Utilizing version control systems like Git is the right technique to save changes that are made to your query throughout the course of a long-term project. You will be able to communicate with others on the project and monitor changes to your code (including SQL queries) over time by using version control systems. You will also be able to preserve a history of the alterations made. A commit message that explains the changes that were made to the project is often included with each set of updates. This helps to maintain an accurate and well-organized record of the development of the project.

23. Fill in the blank: A process to confirm that a data-cleaning effort was well-executed and the resulting data is accurate and reliable is known as _____.

Answer

·        verification

·        publishing

·        manipulation

·        processing

Explanation: The process of ensuring that a data-cleansing endeavor was carried out correctly and that the data that was produced as a consequence are accurate and reliable is referred to as data validation.

24. A data analyst is in the verification step. They consider the business problem, the goal, and the data involved in their analytics project. What scenario does this describe?

Answer

·        Reporting on the data

·        Considering the stakeholders

·        Seeing the big picture

·        Visualizing the data

Explanation: This scenario depicts a data analyst taking an all-encompassing and strategic approach to their analytics project, taking into consideration the business challenge, the overall aim, and the data involved in the endeavor. This is in line with the verification process, in which the analyst makes sure that the data are correct, consistent, and dependable before moving on to further analysis. It is possible that the analyst is adopting a "big picture" approach in order to verify that the statistics are in accordance with the general goals of the project.

25. During data cleaning, you find an error in a username where the ID number was accidentally joined to the user’s last name. You need to figure out if this username has been entered incorrectly more than once in your datasett. If you use a pivot table, what function can you use to determine the number of times this error occurs in your dataset?

Answer

·        CASE

·        COUNT

·        COUNTA

·        CHECK

Explanation: You may assess the frequency with which a certain kind of mistake appears in your dataset by making use of the "Count" or "Countif" function that is available in pivot tables. This function will tally the occurrences of each unique username, assisting you in determining the frequency with which an improper combination of the joined ID number and last name has been input. You may easily retrieve the count you want by using the "Count" function on the column containing users' usernames in the pivot table you're working with.

26. You’re working with a dataset that contains categorical variables. You notice that some of the strings are misspelled or are not capitalized. What function can you use to fix these errors when a condition is met?

Answer

·        ELSE

·        CASE

·        WHEN

·        THEN

Explanation: When a certain condition is satisfied, you may use the CASE statement in conjunction with string manipulation routines to correct misspelled or incorrectly capitalized texts that are included inside category variables. For instance, you might utilize functions in SQL such as UPPER or LOWER to guarantee that the capitalization is consistent, and you could use CASE to handle particular situations. Both of these functions are available.

27. A data analyst uses a changelog while cleaning data. What process does a changelog support?

Answer

·        Illumination

·        Examination

·        Disclosure

·        Documentation

Explanation: During data cleansing, the process of version control might benefit from the use of a changelog. It assists in documenting and maintaining a record of the historical sequence of changes that have been made to the dataset. This history of the changes is very helpful for understanding how the data have developed over time, retracing their steps to determine which alterations were made, and ensuring that the data cleaning procedure is both transparent and reproducible. The changelog is helpful in versioning, which makes it simpler to monitor and cooperate on work related to data cleansing. This is particularly useful in projects that are collaborative or long-term in nature.

28. A changelog is essential for storing chronological modifications made during the data cleaning process. When will an analyst refer to the information in the changelog to certify data integrity?

Answer

·        Documentation

·        Verification

·        Presenting

·        Visualization

Shuffle Q/A 3

29. Fill in the blank: As a data analyst, you should always create a _____ to track your additions, deletions, errors, and changes to a query.

Answer

·        notepad

·        database

·        changelog

·        spreadsheet

Explanation: As a data analyst, you always need to remember to generate a changelog so that you can keep track of the mistakes, additions, and modifications you make to a query.

30. Fill in the blank: TRIM is a function that removes _____ spaces in data. Select all that apply.

Answer

·        repeated

·        trailing

·        leading

·        inner

31. While verifying cleaned data, a data analyst encounters a misspelled name. Which function can they use to determine the number of misspelled occurrences in the dataset?

Answer

·        CASE

·        CHECK

·        CHECK

·        COUNTA

32. At what point during the analysis process does a data analyst use a changelog?

Answer

·        While cleaning the data

·        While visualizing the data

·        While gathering the data

·        While reporting the data

33. Your manager points out an error in a product ID number in your dataset. The Product IDs can be numbers like 42 or text like "CAD-425". Using a pivot table, what function can you use to find how many times this error occurs in the dataset?

Answer

·        COUNT

·        CHECK

·        COUNTA

·        CASE

Explanation: To determine the frequency with which the incorrect product ID is included in the dataset, you may use the "Count" or "Countif" function that is available in a pivot table. This function will count the occurrences of each unique product ID, which will assist you in determining the total number of times that the inaccurate information has been entered. You may rapidly acquire the count of occurrences for each product ID, including the one with the error, if you apply the "Count" function to the column containing the product ID in your pivot table and then apply it to the product ID column.

34. While reviewing your coworker’s data cleaning process, you find a few cases of trailing spaces in the data. What function can you use to remove these spaces?

Answer

·        REMOVE TRAILING

·        DELETE

·        CUT

·        TRIM

35. Which of the following queries considers one or more conditions and returns a value as soon as that condition is met?

Answer

·        SELECT * WHEN CASE COLUMN = VARIABLE

·        SELECT * CASE IF COLUMN = VARIABLE

·        SELECT * CASE WHEN COLUMN = VARIABLE

·        SELECT * IF CASE COLUMN = VARIABLE

Explanation: Using the CASE statement is the conventional method for implementing the query that takes into account one or more criteria and provides a response as soon as that condition is satisfied. The CASE statement allows you to evaluate conditions sequentially and return a specified result as soon as a condition is satisfied. Because of this, it is a flexible tool that may be used for conditional logic in queries.

36. Fill in the blank: Once data is clean, a data analyst moves on to _____ and verification.

Answer

·        processing

·        confirming

·        publishing

·        reporting

Explanation: The next steps for a data analyst, once the data have been cleaned, are to do analysis and verification.

37. A data analyst is starting a large scale project. The project will be crucial to business success and the data analyst needs to keep the big picture at the forefront when verifying their data cleaning. What is the first step in the verification process?

Answer

·        Create a chronological list of modifications made to the data

·        Compare cleaned data with the original, uncleaned dataset and compare it to what is there now

·        Inform others of the data-cleaning effort

·        Determine the quality of the data

Explanation: The first thing a data analyst should do in the verification process is make sure the data are full and accurate before moving on to the next phase. This entails verifying the general structure of the dataset as well as checking to see whether all of the anticipated data has been included, confirming that there are no missing values, and checking for any missing data. It is a necessary action to do in order to guarantee that the data are prepared for analysis and are in accordance with the overarching objectives of the project.

38. You use SQL to clean your data. You make comments whenever you modify your queries to keep track of any changes. What documentation will this practice help you create when you’re done cleaning the data?

Answer

·        A changelog

·        A query repository

·        A new dataset

·        A database

Explanation: You may produce documentation that functions as a changelog or version history by getting into the habit of inserting comments in your SQL queries to keep track of changes. This will help you write documentation more efficiently. These comments provide a record of the alterations that were made to the queries throughout the process of data cleaning. They also explain the thinking that went into making the modifications. This documentation is helpful for understanding how the queries have evolved over time, diagnosing problems, and ensuring that the data cleaning process is transparent and reproducible.

39. A data analyst is starting a large scale project that is crucial to business success. The data analyst needs to remember the big picture when verifying their data cleaning. What is involved when focusing on the big picture-view of the project? Select all that apply.

Answer

·        Consider the reporting

·        Consider the business problem

·        Consider the stakeholders

·        Consider the goal

Post a Comment

Previous Post Next Post