Week 3 – Regular Expressions

1. When using regular expressions, which of the following expressions uses a reserved character that can represent any single character?

Answers

·        re.findall(fu$, text)

·        re.findall(f*n, text)

·        re.findall(^un, text)

·        re.findall(f.n, text)

Explanation: The dot (.) is the expression that employs a reserved character to represent a single character in regular expressions. This character may be any character. The period (.) is known as a wildcard character because it may be used to represent any single letter other than a newline.

2. Which of the following is NOT a function of the Python regex module?

Answers

·        re.grep()

·        re.match()

·        re.search()

·        re.findall()

3. The circumflex [^] and the dollar sign [$] are anchor characters. What do these anchor characters do in regex?

Answers

·        Match the start and end of a word.

·        Match the start and end of a line

·        Exclude everything between two anchor characters

·        Represent any number and any letter character, respectively

4. When using regex, some characters represent particular types of characters. Some examples are the dollar sign, the circumflex, and the dot wildcard. What are these characters collectively known as?

Answers

·        Special characters

·        Anchor characters

·        Literal characters

·        Wildcard characters

Explanation: The term "metacharacters" refers to this group of characters in regular expressions. This includes the dollar sign ($), the circumflex (), and the dot wildcard. Metacharacters are characters that have a specific meaning within the context of a regex pattern. They may be used to specify quantifiers, places in the string, or classes of characters. Within the framework of regular expressions, these characters do not match one another literally but rather have a special meaning.

5. What is grep?

Answers

·        An operating system

·        A command for parsing strings in Python

·        A command-line regex tool

·        A type of special character

Explanation: The grep utility is a command-line program used in Unix and other operating systems that are similar to Unix. It is used to search for text patterns inside files. The acronym "global regular expression print" is where the word "grep" comes from. It then publishes the lines of the text that fit the patterns that you have given after searching through the text using regular expressions.

6. The check_web_address function checks if the text passed qualifies as a top-level web address, meaning that it contains alphanumeric characters (which includes letters, numbers, and underscores), as well as periods, dashes, and a plus sign, followed by a period and a character-only top-level domain such as ".com", ".info", ".edu", etc. Fill in the regular expression to do that, using escape characters, wildcards, repetition qualifiers, beginning and end-of-line characters, and character classes.

import re

def check_web_address(text):

pattern = ___

result = re.search(pattern, text)

return result != None

print(check_web_address("gmail.com")) # True

print(check_web_address("www@google")) # False

print(check_web_address("www.Coursera.org")) # True

print(check_web_address("web-address.com/homepage")) # False

print(check_web_address("My_Favorite-Blog.US")) # True

Answers

·        pattern = r”^\w.*\.[a-zA-Z]*$”

7. The check_time function checks for the time format of a 12-hour clock, as follows: the hour is between 1 and 12, with no leading zero, followed by a colon, then minutes between 00 and 59, then an optional space, and then AM or PM, in upper or lower case. Fill in the regular expression to do that. How many of the concepts that you just learned can you use here?

import re

def check_time(text):

pattern = ___

result = re.search(pattern, text)

return result != None

print(check_time("12:45pm")) # True

print(check_time("9:59 AM")) # True

print(check_time("6:60am")) # False

print(check_time("five o'clock")) # False

Answers

·        pattern = r”^(1[012]|[1-9]):[0-5][0-9] ?[APap][Mm]$”

8. The contains_acronym function checks the text for the presence of 2 or more characters or digits surrounded by parentheses, with at least the first character in uppercase (if it's a letter), returning True if the condition is met, or False otherwise. For example, "Instant messaging (IM) is a set of communication technologies used for text-based communication" should return True since (IM) satisfies the match conditions." Fill in the regular expression in this function:

import re

def contains_acronym(text):

pattern = ___

result = re.search(pattern, text)

return result != None

print(contains_acronym("Instant messaging (IM) is a set of communication technologies used for text-based communication")) # True

print(contains_acronym("American Standard Code for Information Interchange (ASCII) is a character encoding standard for electronic communication")) # True

print(contains_acronym("Please do NOT enter without permission!")) # False

print(contains_acronym("PostScript is a fourth-generation programming language (4GL)")) # True

print(contains_acronym("Have fun using a self-contained underwater breathing apparatus (Scuba)!")) # True

Answers

·        pattern = r”\(\w.*\w\)”

 

9. What does the "r" before the pattern string in re.search(r"Py.*n", sample.txt) indicate?

Answers

·        Raw strings

·        Regex

·        Repeat

·        Result

Explanation: The letter "r" that is placed in front of the pattern string in the Python expression re.search(r"Py.*n", sample.txt) denotes that the string is a raw string literal. Backslashes are not utilized as escape characters within a raw string because they are instead handled as literal characters by the string.

For instance, if you don't preface the pattern with "r," you may need to express it as re.search("", sample.txt) in order to match a single backslash in the file. It is possible to express it as re.search(r"", sample.txt) if you use the prefix "r," which makes the pattern easier to understand and eliminates the need for any further escaping.

The regular expression pattern that is shown here makes use of the raw string literal, which simplifies the process of dealing with special characters that are often used in regular expressions.

10. What does the plus character [+] do in regex?

Answers

·        Matches plus sign characters

·        Matches one or more occurrences of the character before it

·        Matches the end of a string

·        Matches the character before the  [+] only if there is more than one

Explanation: The plus sign (also written as a plus sign, plus), when used in regular expressions (regex), does not have any particular significance. It is handled as if it were a literal character and is considered to represent itself.

In regular expressions, the letter + has a specific significance; however, this significance is not shown by surrounding the character in square brackets. The plus sign is a quantifier that, when used without square brackets, denotes "one or more occurrences" of the preceding character or group. This is because the plus sign may be used alone. For instance, the pattern a+ would match a string if it had one or more instances of the letter 'a' that came one after another.

Because characters are taken literally inside square brackets, using [+] to denote the plus sign would particularly match the '+' character that is already present in the text. Instead of functioning as a quantifier, it is used to signify a plus sign in its literal sense.

11. Fill in the code to check if the text passed includes a possible U.S. zip code, formatted as follows: exactly 5 digits, and sometimes, but not always, followed by a dash with 4 more digits. The zip code needs to be preceded by at least one space, and cannot be at the start of the text.

import re

def check_zip_code (text):

result = re.search(r"___", text)

return result != None

print(check_zip_code("The zip codes for New York are 10001 thru 11104.")) # True

print(check_zip_code("90210 is a TV show")) # False

print(check_zip_code("Their address is: 123 Main Street, Anytown, AZ 85258-0001.")) # True

print(check_zip_code("The Parliament of Canada is at 111 Wellington St, Ottawa, ON K1A0A9.")) # False

Answers

·        result = re.search(r”(?<!^)\s\d{5}(-\d{4})?”, text)

12. We're working with a CSV file, which contains employee information. Each record has a name field, followed by a phone number field, and a role field. The phone number field contains U.S. phone numbers, and needs to be modified to the international format, with "+1-" in front of the phone number. Fill in the regular expression, using groups, to use the transform_record function to do that.

import re

def transform_record(record):

new_record = re.sub(___)

return new_record

print(transform_record("Sabrina Green,802-867-5309,System Administrator"))

# Sabrina Green,+1-802-867-5309,System Administrator

print(transform_record("Eli Jones,684-3481127,IT specialist"))

# Eli Jones,+1-684-3481127,IT specialist

print(transform_record("Melody Daniels,846-687-7436,Programmer"))

# Melody Daniels,+1-846-687-7436,Programmer

print(transform_record("Charlie Rivera,698-746-3357,Web Developer"))

# Charlie Rivera,+1-698-746-3357,Web Developer

Answers

·        new_record = re.sub(r”,([\d-]+)”,r”,+1-\1″ ,record)

 

13. The multi_vowel_words function returns all words with 3 or more consecutive vowels (a, e, i, o, u). Fill in the regular expression to do that.

import re

def multi_vowel_words(text):

pattern = ___

result = re.findall(pattern, text)

return result

print(multi_vowel_words("Life is beautiful"))

# ['beautiful']

print(multi_vowel_words("Obviously, the queen is courageous and gracious."))

# ['Obviously', 'queen', 'courageous', 'gracious']

print(multi_vowel_words("The rambunctious children had to sit quietly and await their delicious dinner."))

# ['rambunctious', 'quietly', 'delicious']

print(multi_vowel_words("The order of a data queue is First In First Out (FIFO)"))

# ['queue']

print(multi_vowel_words("Hello world!"))

# []

Answers

·        pattern = r”\b\w*[aeiou]{3,}\w*\b”

 

14. When capturing regex groups, what datatype does the groups method return?

Answers

·        A string

·        A tuple

·        A list

·        A float

Explanation: When used inside the context of regex, the groups method will return a tuple. The groups function gives you the ability to extract the captured substrings as a tuple when you employ capturing groups in a regular expression.

If you have a regular expression pattern that has two capturing groups, for instance, invoking the groups method will result in the return of a tuple that contains the substrings that are matched by each of the capturing groups.

15. The transform_comments function converts comments in a Python script into those usable by a C compiler. This means looking for text that begins with a hash mark (#) and replacing it with double slashes (//), which is the C single-line comment indicator. For the purpose of this exercise, we'll ignore the possibility of a hash mark embedded inside of a Python command, and assume that it's only used to indicate a comment. We also want to treat repetitive hash marks (##), (###), etc., as a single comment indicator, to be replaced with just (//) and not (#//) or (//#). Fill in the parameters of the substitution method to complete this function:

import re

def transform_comments(line_of_code):

result = re.sub(___)

return result

print(transform_comments("### Start of program"))

# Should be "// Start of program"

print(transform_comments(" number = 0 ## Initialize the variable"))

# Should be " number = 0 // Initialize the variable"

print(transform_comments(" number += 1 # Increment the variable"))

# Should be " number += 1 // Increment the variable"

print(transform_comments(" return(number)"))

# Should be " return(number)"

Answers

·        result = re.sub(r”#+”, r”//”, line_of_code)

 

16. The convert_phone_number function checks for a U.S. phone number format: XXX-XXX-XXXX (3 digits followed by a dash, 3 more digits followed by a dash, and 4 digits), and converts it to a more formal format that looks like this: (XXX) XXX-XXXX. Fill in the regular expression to complete this function.

import re

def convert_phone_number(phone):

result = re.sub(___)

return result

print(convert_phone_number("My number is 212-345-9999.")) # My number is (212) 345-9999.

print(convert_phone_number("Please call 888-555-1234")) # Please call (888) 555-1234

print(convert_phone_number("123-123-12345")) # 123-123-12345

print(convert_phone_number("Phone number of Buckingham Palace is +44 303 123 7300")) # Phone number of Buckingham Palace is +44 303 123 7300

Answers

·        result = re.sub(r”\b(\d{3})-(\d{3})-(\d{4})\b”, r”(\1) \2-\3″, phone)

 

Explanation: In this particular example, the regular expression r'(d3)-(d3)-(d4)' is used in order to match the structure of a United States phone number, which consists of three groups for each of the three sets of digits. The r'(1) 2-3' string in the re.sub method is the replacement string. The matched groups are denoted by the 1, 2, and 3 variables, respectively.

This piece of code, when executed with a United States telephone number such as "123-456-7890," will produce the follow

Post a Comment

Previous Post Next Post