Programming in C is a very important baseline skill for CS 162. This exercise should make sure you’re comfortable with the basics of the language. In particular, you need to be fluent in working with structs, linked data structures (e.g., lists), pointers, arrays,
typedef and such, which CS 61C may have touched only lightly.
You will be writing a program called
words is a program that counts (1) the total amount of words and (2) the frequency of each word in a file(s). It then prints the results to
stdout. Like most UNIX utilities in the real world, your program should read its input from each of the files specified as command line arguments, printing the cumulative word counts. If no file is provided, your program should read from
In C, header files (suffixed by
.h) are how we engineer abstractions. They define the objects, types, methods, and—most importantly—documentation. The corresponding
.c file provides the implementation of the abstraction. You should be able to write code with the header file without peeking under the covers at its implementation.
In this case,
words/word_count.h provides the definition of the
word_count struct, which we will use as a linked list to keep track of a word and its frequency. This has been
WordCount. This means that instead of typing out
struct word_count, we can use
WordCount as shorthand. The header file also gives us a list of functions that are defined in
words/word_count.c. Part of this assignment is to write code for these functions in
We have provided you with a compiled version of
sort_words so that you do not need to write the
wordcount_sort function. However, you may still need to write your own comparator function (i.e.
Makefile links this in with your two object files,
words.o is an ELF formatted binary. As such you will need to use a system which can run ELF executables to test your program (such as the CS 162 VM). Windows and OS X do NOT use ELF and as such should not be used for testing.
For this section, you will be making changes to
words/word_count.c. After editing these files,
cd into the
words directory and run
make in the terminal. This will create the
words executable. (Remember to run
make after making code changes to generate a fresh executable). Use this executable (and your own test cases) to test your program for correctness. The autograder will use a series of test cases to determine your final score for this section.
For the below examples, suppose we have a file called
words.txt that contains the following content:
abc def AaA bbb zzz aaa
Your first task will be to implement total word count. When executed,
words will print the total number of words counted to stdout. At this point, you will not need to make edits to
word_count.c. A complete implementation of the
num_words() function can suffice.
A word is defined as a sequence of contiguous alphabetical characters of length greater than one. All words should be converted to their lower-case representation and be treated as not case-sensitive. The maximum length of a word has been defined at the top of
After completing this part, running
./words words.txt should print:
The total number of words is: 6
Your second task will be to implement frequency counting. Your program should print each unique word as well as the number of times it occurred. This should be sorted in order of frequency (low first). The alphabetical ordering of words should be used as a tie breaker. The
wordcount_sort function has been defined for you in
wc_sort.o. However, you will need to implement the
wordcount_less function in
You will need to implement the functions in
word_count.c to support the linked list data structure (i.e.
struct word_count). The complete implementation of
word_count.c will prove to be useful when implementing
main.c. After completing this part, running
./words -f words.txt should print:
1 abc 1 bbb 1 def 1 zzz 2 aaa
Hint: You can run the code below to verify the basic functionality of your program (don’t treat this as a testing spec though):
cat <filename> | tr " " "\n" | tr -s "\n" | tr "[:upper:]" "[:lower:]" | tr -d -C "[:lower:]\n" | sort | uniq -c | sort -n