Human Activity Recognition (HAR) refers to the capability of machines to identify various activities performed by the users. The knowledge acquired from these systems/algorithms is integrated into many applications where the associated device uses it to identify actions or gestures and performs predefined tasks in response.
We are interested in classifying human activities based on accelerometer data. we will be using a publically available dataset called UCI-HAR. The dataset is available to download here. Just for your reference a youtube video of the authors collecting participant's accelerometer data is also available here.
We will use the raw accelerometer data within the inertial_signals folder. The provided script, CombineScript.py
, organizes and sorts accelerometer data, establishing separate classes for each category and compiling participant data into these classes. MakeDataset.py
script is used to read through all the participant data and create a single dataset. The dataset is then split into train,test and validation set. We focus on the first 10 seconds of activity, translating to the initial 500 data samples due to a sampling rate of 50Hz.
CombineScript.py
and MakeDataset.py
in the same folder that contains the UCI dataset. Ensure you have moved into the folder before running the scripts. If you are runing the scripts from a different folder, you will have to play around with the paths in the scripts to make it work.CombineScript.py
and provide the paths to test and train folders in UCI dataset. This will create a folder called Combined
which will contain all the data from all the participants. This is how most of the datasets are organized. You may encounter similar dataset structures in the future.MakeDataset.py
and provide the path to Combined
folder. This will create a Dataset which will contain the train, test and validation set. You can use this dataset to train your models.Zero-shot prompting involves providing a language model with a prompt or a set of instructions that allows it to generate text or perform a task without any explicit training data or labeled examples. The model is expected to generate high-quality text or perform the task accurately based solely on the prompt and its internal knowledge.
Few-shot prompting is similar to zero-shot prompting, but it involves providing the model with a limited number of labeled examples or prompts that are relevant to the specific task or dataset. The model is then expected to generate high-quality text or perform the task accurately based on the few labeled examples and its internal knowledge.
You have been provided with a Python notebook that demonstrates how to use zero-shot and few-shot prompting with a language model (LLM). The example in the notebook involves text-based tasks, but LLMs can also be applied to a wide range of tasks (Students intrested in learning more can read here and here).
Queries will be provided in the form of featurized accelerometer data and the model should predict the activity performed.
For this exercise marks will not depend on what numbers you get but on the process you followed Utilize apps like Physics Toolbox Suite
from your smartphone to collect your data in .csv/.txt format. Ensure at least 15 seconds of data is collected, trimming edges to obtain 10 seconds of relevant data. Also record a video of yourself while recording data. This video will be required in some future assignments. Collect 3-5 samples per activity class.
Complete the decision tree implementation in tree/base.py. The code should be written in Python and not use existing libraries other than the ones shared in class or already imported in the code. Your decision tree should work for four cases: i) discrete features, discrete output; ii) discrete features, real output; iii) real features, discrete output; real features, real output. Your model should accept real inputs only (for discrete inputs, you may convert the attributes into one-hot encoded vectors). Your decision tree should be able to use InformationGain using Entropy or GiniIndex as the criteria for splitting for discrete output. Your decision tree should be able to use InformationGain using MSE as the criteria for splitting for real output. Your code should also be able to plot/display the decision tree. [2.5 marks]
You should be editing the following files.
metrics.py
: Complete the performance metrics functions in this file.
usage.py
: Run this file to check your solutions.
tree (Directory): Module for decision tree.
base.py
: Complete Decision Tree Class.utils.py
: Complete all utility functions.__init__.py
: Do not edit thisYou should run usage.py to check your solutions.
Generate your dataset using the following lines of code
from sklearn.datasets import make_classification
X, y = make_classification(
n_features=2, n_redundant=0, n_informative=2, random_state=1, n_clusters_per_class=2, class_sep=0.5)
# For plotting
import matplotlib.pyplot as plt
plt.scatter(X[:, 0], X[:, 1], c=y)
a) Show the usage of your decision tree on the above dataset. The first 70% of the data should be used for training purposes and the remaining 30% for test purposes. Show the accuracy, per-class precision and recall of the decision tree you implemented on the test dataset. [0.5 mark]
b) Use 5 fold cross-validation on the dataset. Using nested cross-validation find the optimum depth of the tree. [1 mark]
You should be editing
classification-exp.py
for the code containing the above experiments.
a) Show the usage of your decision tree for the automotive efficiency problem. [0.5 marks]
b) Compare the performance of your model with the decision tree module from scikit learn. [0.5 marks]
You should be editing
auto-efficiency.py
for the code containing the above experiments.
Create some fake data to do some experiments on the runtime complexity of your decision tree algorithm. Create a dataset with N samples and M binary features. Vary M and N to plot the time taken for: 1) learning the tree, 2) predicting for test data. How do these results compare with theoretical time complexity for decision tree creation and prediction. You should do the comparison for all the four cases of decision trees. [1 marks]
You should be editing
experiments.py
for the code containing the above experiments.
You must answer the subjectve questions (visualization,timing analysis, displaying plots) by creating Asst#<task-name>_<Q#>.md