Gathering & Analysing the Data

Thomas Cummings - 17th May 2024

As a research project focused on machine learning, the data requirements and successful collection of data is central to the project success. The research will use a mixed methods approach and “the overall purpose and central premise of mixed methods studies is that the use of quantitative and qualitative approaches in combination provides a better understanding of research problems and complex phenomena than either approach alone” (Molina-Azorin, 2016). This will be done using an exploratory sequential mixed method research strategy which “involves a first phase of qualitative data collection and analysis, followed by a second phase of quantitative data collection and analysis that builds on the results of the first qualitative phase.” (Creswell, 2009).

In this research project the initial qualitative aspect will involve an online survey of veterinarians and veterinary nurses using Google Forms which will provide a relevant professional perception of bias in canine body condition scores (BCS). This will support the requirement for an objective solution as per current research literature and “the rationale for this approach lies in first exploring a topic before deciding what variables need to be measured” (SAGE, 2019). This research survey will aim to determine: how confident they feel in performing BCS; whether any breeds are more difficult to evaluate; the amount of training received relevant to this; their perceptions of vet and owner bias; and some examples of visual classification images. It is also important to note that the qualitative or quantitative nature of the survey will depend on the style and volume of questions being asked as “qualitative data analysis involves identifying meaningful quotations, coding them with relevant topics, and possibly developing larger themes” (SAGE, 2019) whereas the codings themselves can be used as a quantitative data if there is sufficient data to draw conclusions. Although the survey in this research project is yet to be developed, the aim will be to ask open ended questions to a small number of relevant professionals to give quotations that provide an overview of the topic.

In addition to this, despite the BCS score being ordinal data and as such mainly quantitative it would also have a qualitative aspect due to the opinion and bias inherent as part of manually classifying the BCS from the photographs in order to label the dataset for training. This is a key ethics issue in machine learning as the algorithms will follow any bias in the manual labelling and this could be a concern to canine health if the project deliverable was under or over classifying BCS. Whilst this research only attempts to theoretically determine the ability of machine learning algorithms to classify BCS, it will still require a disclaimer to consult a veterinarian before making changes to a dogs diet or exercise. A real world application could mitigate the bias concerns by ensuring a more robust labelling of data taking into account the average of multiple manual classifications, completing physical evaluation of BCS scores and also using other metrics such as weight and DEXA body scans. This is outside the scope of this research project and the visual classification method will be sufficient for determining the theoretical feasibility of a broader system.

The main image data for classification will be collected through participants voluntarily completing the project website submission form including a lateral and dorsal photograph as well as requiring a couple of questions to be answered (age, breed, sex and neutered status) to ensure a good distribution of data. Guidance will also be provided on the website explaining how to take the photographs and the research project will be advertised on selected social media dog groups as well as by leafleting dog owners.

Once the data has been received it will need to be pre-processed before use in training the models. This will include removing unusable data (e.g. images that don’t match the requirements) as well as cleaning the data (e.g. breed spelling mistakes) and then statistical analysis will need to ensure an even distribution of data across the classes. This may require adjusting the project scope, actively sampling from underrepresented classes or using techniques to balance the class sizes. Finally image manipulation techniques will be required to normalise the data for use in the models such as ensuring uniform image sizes and possibly grayscaling or similar manipulations. At this point the machine learning modelling and training will be completed and as a quantitative dominant research project, this will make up the majority of the analysis while still “recognising that some qualitative aspects are usefully studied” (Walliman, 2017).

Two separate methods for the quantitative analysis will be considered. Firstly an object detection algorithm will be trained to locate and extract the dog contours from the image which will then allow points such as for the abdomen and thorax to be automatically selected and measurements between the x and y-coordinates taken. These measurements can then be used to calculate various numerical ratios before further statistical analysis can be completed to determine principal components allowing the most effective predictive model to be developed.

Secondly a deep learning neural network approach will be trained by using pixel values of the Labrador Retriever images to learn characteristics of the data and train model weights using back propagation techniques to create a solution for classifications. This will need to be repeated using a methodical approach to optimise the hyperparameters and general network architecture.

Finally the model outputs can be analysed by considering and comparing accuracy metrics for each model as well as determining whether a weighted ensemble method would enhance the overall accuracy. Once this is determined for estimating Labrador Retriever BCS, further testing will be completed using the other breed data in order to determine the extent of generalisability. Further information on evaluating the models and research project can be found here.

References

Creswell, J.W. (2009). ‘Research Design: Qualitative, Quantitative and Mixed Methods Approaches’, 3rd edn. London: Sage Publications Ltd. Available from https://www.ucg.ac.me/skladiste/blog_609332/objava_105202/fajlovi/Creswell.pdf

Molina-Azorin, J.F. (2016). ‘Mixed methods research: an opportunity to improve our studies and our research skills’, European Journal of Management and Business Economics, 25(2), pp. 37-38. http://dx.doi.org/10.1016/j.redeen.2016.05.001

Sage (2019). ‘Learn to use an exploratory sequential mixed method design for instrument development’, Sage Publications Ltd. Available from https://parsmodir.com/wp-content/uploads/2020/10/exploratory-method.pdf

Walliman, N. (2017). Research Methods: the Basics, 2nd edn. London: Taylor & Francis Group

Back to Blog