Link Search Menu Expand Document

Counting words

Table of contents

  1. Total Count
  2. Frequency Count

Programming in C is a very important baseline skill for CS 162. This exercise should make sure you’re comfortable with the basics of the language. In particular, you need to be fluent in working with structs, linked data structures (e.g., lists), pointers, arrays, typedef and such, which CS 61C may have touched only lightly.

You will be writing a program called words. words is a program that counts (1) the total amount of words and (2) the frequency of each word in a file(s). It then prints the results to stdout. Like most UNIX utilities in the real world, your program should read its input from each of the files specified as command line arguments, printing the cumulative word counts. If no file is provided, your program should read from stdin.

In C, header files (suffixed by .h) are how we engineer abstractions. They define the objects, types, methods, and—most importantly—documentation. The corresponding .c file provides the implementation of the abstraction. You should be able to write code with the header file without peeking under the covers at its implementation.

In this case, words/word_count.h provides the definition of the word_count struct, which we will use as a linked list to keep track of a word and its frequency. This has been typedef’d into WordCount. This means that instead of typing out struct word_count, we can use WordCount as shorthand. The header file also gives us a list of functions that are defined in words/word_count.c. Part of this assignment is to write code for these functions in words/word_count.c

We have provided you with a compiled version of sort_words so that you do not need to write the wordcount_sort function. However, you may still need to write your own comparator function (i.e. wordcount_less). The Makefile links this in with your two object files, words.o and word_count.o.

Note that words.o is an ELF formatted binary. As such you will need to use a system which can run ELF executables to test your program (such as the CS 162 VM). Windows and OS X do NOT use ELF and as such should not be used for testing.

For this section, you will be making changes to words/main.c and words/word_count.c. After editing these files, cd into the words directory and run make in the terminal. This will create the words executable. (Remember to run make after making code changes to generate a fresh executable). Use this executable (and your own test cases) to test your program for correctness. The autograder will use a series of test cases to determine your final score for this section.

For the below examples, suppose we have a file called words.txt that contains the following content:

abc def AaA
bbb zzz aaa

Total Count

Your first task will be to implement total word count. When executed, words will print the total number of words counted to stdout. At this point, you will not need to make edits to word_count.c. A complete implementation of the num_words() function can suffice.

A word is defined as a sequence of contiguous alphabetical characters of length greater than one. All words should be converted to their lower-case representation and be treated as not case-sensitive. The maximum length of a word has been defined at the top of main.c.

After completing this part, running ./words words.txt should print:

The total number of words is: 6

Frequency Count

Your second task will be to implement frequency counting. Your program should print each unique word as well as the number of times it occurred. This should be sorted in order of frequency (low first). The alphabetical ordering of words should be used as a tie breaker. The wordcount_sort function has been defined for you in wc_sort.o. However, you will need to implement the wordcount_less function in main.c.

You will need to implement the functions in word_count.c to support the linked list data structure (i.e. WordCount a.k.a. struct word_count). The complete implementation of word_count.c will prove to be useful when implementing count_words() in main.c. After completing this part, running ./words -f words.txt should print:

1 abc
1 bbb
1 def
1 zzz
2 aaa

Hint: You can run the code below to verify the basic functionality of your program (don’t treat this as a testing spec though):

cat <filename>
| tr " " "\n"
| tr -s "\n"
| tr "[:upper:]" "[:lower:]"
| tr -d -C "[:lower:]\n"
| sort
| uniq -c
| sort -n