Understanding Data Structures
In 1976, Niklaus Wirth published a book titled 'Algorithms + Data Structures = Programs.' In the 2020 decade, that statement still stands true.
Data structure is the arrangement of data in memory. They are essential for organizing, processing, retrieving, accessing, and storing data.
Data structures usually work together with algorithms. They hold the data while algorithms solve problems using the data.
Data structures are one of the foundational disciplines in computer science. They are pivotal to programming and a must-know for every programmer. If you've had to apply for a programming job, you will notice data structures take a huge chunk of the interviews.
Software development is about solving problems, getting the best results in the shortest time possible, and with fewer resources. Just like a carpenter needs a hammer and nails to work so does a programmer need data structures for his job. Coding is just one aspect of a programmer's job. Tech companies spend more time designing the best algorithms to save computation power, server time, loading time, etc. It saves them time and money.
It's easy to make mistakes that only come out much later, after you've already implemented a lot of code. You'll realize, 'Oh I should have used a different type of data structure.' Start over from scratch.
New, young programmers jump right into coding and frameworks, but for the actual job what they need is a good knowledge of data structures. Every real-world problem deals with data, and operations on that data. During interviews, interviewers check to see if an interviewee can solve their problems using the best algorithms.
There are different data structures for different problems. For example, if you had to search for a particular word in the dictionary, going through the pages one by one won't do. Instead, you look for the first letter of the word you are searching for and then work your way alphabetically. This is an example of a binary search. If you had to design a Linkedin connections algorithm, you would need a Graph data structure to place connections into first, second, or third.
Data Structures and data types
- Primitive data structures
- Non-primitive data structures.
Primitive data structures are primitive data types. These data types are defined by the programming language. Strings, integers, floats, and pointers are an example of primitive data structures.
Non-primitive data structures are not defined by the programming language. They are created by the programmer and are derived from primitive data structures. They are also called 'reference variables' or 'object references' because they reference a memory location that stores data. Non-primitive methods have methods attached to them to perform operations.
Non-primitive data structures are further divided into linear and non-linear data structures. Linear data structures are data arranged sequentially. Values/data elements are connected in a straight line, whether vertical or horizontal. Accessing or indexing an element is done in a sequential manner using a loop. Arrays, Stacks, Linked-lists, and Queues are examples of linear data structures.
Non-linear data structures are arranged in random or tree-like structures with nodes attaching one element to another. Non-linear data structures take a hierarchical form, accessing data through nodes and points. Graphs, Hash tables, Trees, and Heap are examples of non-linear data structures.
Detailed Look at some types of data structures and how they work.
Arrays
Arrays are the most common and basic non-primitive data structures. In some languages like JavaScript, they are called arrays while in others like Python they are called lists. No matter what they are called, they can be identified by their structure and operations.
Arrays are used to store multiple values in a single variable. They are usually homogenous. Their values are enclosed in square brackets. Each element of an array has an index number, and these elements can be accessed or retrieved using their index number. Array indexes start with 0.
There are two types of arrays. One dimensional array and multi-dimensional arrays. Multidimensional arrays are nested arrays or arrays within arrays. Multidimensional arrays are usually used to form matrices.
We could find the min and max values of a range of numbers using an array like this in python,
max_min = [5, 10, 19, 4, 3]
print(max(max_min), min(max_min))
# 19 3
We could also sort a list of names in ascending or descending orders.
countries = ['Ethiopia', 'Uganda', 'Venezuela', 'America', 'Canada']
countries.sort()
print(countries)
#['America', 'Canada', 'Ethiopia', 'Uganda', 'Venezuela']
Hashtables
Hashtables are also called dictionaries, search tables, maps, associative arrays or hash. This structure is non-linear and elements are stored in a key/value pair. The address or index value of the data element is generated from a hash function.
Hashtables are different from arrays because rather than index elements with positions, it uses a key. This makes it perform faster insertion and search of elements because each element is accessed with its key.
The performance of a hashtable is usually based on the hash function, size of the hash table, and collision handling method. Obvious examples of hashtable implementation include variables of telephone books of names/phone numbers, scheduling of dates/events, and identification of people/physical traits.
Here's an example of a hashtable. It contains a person's personal data.
name = {"Name":"Alicia", "Surname": "Knowles", "Age":39, "State":"California", "Hair Color":"Black", "Weight": "56kg"}
#to get the state
print(name['State'])
The key is 'State' and the value is 'California'.
Graph
This is a non-linear data structure that has nodes and edges. Nodes are also referred to as vertices and they correspond to objects. Edges are the connections between objects. Graphs are used to show relationships. A graph can also have weight – which indicates the strength between each connection of the node.
There are two types of graphs – directed and undirected graphs. A directed graph contains an ordered pair of nodes while an undirected graph contains an unordered pair of nodes. Undirected graphs do not have a specific direction to represent edges, while directed graphs have a specific direction of vertexes. Undirected graphs usually have two-way relationships as edges can be transverse in both directions. Directed graphs usually have a one-way relationship and edges can be transverse in one way.
A real-world implementation of graph is the friends modeling on Facebook.
Here is a graph in Python that has six nodes
graph = {
'A': ['B', 'C'],
'B': ['C', 'D'],
'C': ['D'],
'D': ['C'],
'E': ['F'],
'F': ['C']
}
This graph is represented using the adjacency list and the dictionary (hashtable) data structure. The keys are the node of the graph.
Stacks and Queues
Stack is a data structure that can be thought of as a physical stack or pile. It is a linear data structure where elements are removed or added at one end only. Stack's major feature is that it supports LIFO (Last In First Out). Think of a stack of plates placed in a vertical order. If you need to take a plate out, the one at the top will be taken first, and that is the plate that was placed last. This is how Stack works. The element which is inserted last is accessed first.
Queue is a linear data structure like Stack, but the difference is that Queue uses FIFO (First In First Out). Think of a line at a coffee shop. The first person in the line gets to be attended to first. Every other person just stands behind the last person they see. Both Queue and Stack access elements in a sequential manner.
The undo, redo function in most applications is a real-life example of Stack. User actions are put into a stack and when a user 'undo' an action, the most recent action is pushed out. Your computer or browser also keeps track of your 'most recently seen' files, browser history, or images using Stack.
Queue is used to transfer data asynchronously. It is also used to share resources amongst different processes like CPU scheduling, Disk scheduling, IO buffers, and keyboard buffers.
A common example of Queue would also be the network printer printing documents from different computers in a FIFO manner.
In Python, we can create stacks using the deque module from collections.
from collections import deque
newstack = deque()
newstack.append('x')
newstack.append('y')
newstack.append('z')
print(newstack)
#deque(['x', 'y', 'z'])
print(myStack.pop())
#z
print(myStack.pop())
#y
print(myStack.pop())
#x
Operations on data structures
Different operations can be carried on data structures. They include,
- Traversing: Visiting each node of a data structure tree.
- Insertion: Inserting new elements into the data structure.
- Deletion: Delete elements from a data structure.
- Searching: Finding the location of an element.
- Sorting: Arrangement of elements of a data structure in some kind of order.
- Merging: Combine two sorted arrays into one single subarray.
- Popping: To remove elements from a structure.
- Pushing: To add elements to a structure.
Data structures are also classified as static and dynamic data structures. Static data structures have a fixed memory size and their content cannot be altered. The size of a static data structure is allocated at compile time.
Dynamic data structure has a flexible memory size and it either increases or decreases. The size is allocated at run time.
CONCLUSION
Data dominates. If you've chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming.
Data structure is the heart of programming. They make the difference between a bad system and a good one. In this article, we focused on four data structures, as a guide to other types of data structures and their implementation.