For machine scientists to train artificial intelligence systems and algorithms, they need data. They collect many data sets that they use to create artificial intelligence systems based on human behavior and the interaction of people with the technology they use in everyday life.
Weather data includes a range of liver diagnoses and diseases, cannabis consumer survey results, or active/passive collection of spoken data, AI systems always need training data to ensure their algorithms achieve the right results and results. Such data is often difficult to obtain. And even if ML scientists have intercepted the data set, how can we be sure that it meets the requirements of the AI system?
3 types of machine learning
There are basically three types of ML techniques for building AI systems:
- Monitored learning — in this approach, scientists provide the algorithm with the recording of data, such as data. Labels, text, numbers or images, and then calibrate the algorithm to recognize a specific set of input data as a specific thing. For example, imagine that you provide a set of dog images to an algorithm in which each image has a set of characteristics that match the characteristics of the dog images. The input to the algorithm may also include a series of images that are not dogs — such as photos of cats, pigeons, polar bears, pickaxes or snowblades — and the corresponding features of images other than dogs. Then, when you show the algorithm a photo of a dog that you have never seen before, you can tell on the basis of an algorithm that has learned how to classify photos as a dog or not, based on the characteristics and properties of photos is a dog.in fact a picture of a dog. The algorithm manages to recognize the image exactly as a dog and discard images that are not dogs.
- Unattended learning — this approach attempts to find classes of similar objects in a dataset based on the properties of each object. As soon as scientists give the algorithm a set of input data with specific parameters and values, the algorithm tries to find and group common features. For example, scientists can feed the algorithm with thousands of flower pictures with different markers, such as color, stem length or preferred soil. The algorithm will succeed if it can group all flowers of the same type.
- Strengthening learning — this approach trains the algorithm through a series of positive and negative feedback loops. In old laboratory studies, behavioral psychologists used a feedback loop to train pigeons. Strengthening learning is also how many pet owners train their pets to follow simple instructions, such as sitting or staying and then reward or reprimand them. As part of machine learning, scientists show a number of images to the algorithm. When the algorithm classifies images such as penguins, they validate the model if the algorithm correctly identifies the penguin and adjusts it if the algorithm confuses it. If you hear about unsuccessful Twitter bots, this is usually an example of learning to strengthen, in which bots have learned to misidentify examples, even if the system thinks so.
Although all ML techniques are useful and applicable in a variety of contexts, the remainder of this article will focus on the role of supervised learning in user experience.
All data is not equal
Getting good training data is the Achilles’ heel of many ML scientists. Where do you get this kind of data from? There are many sources that give you access to thousands of free recordings. Google has recently introduced a search tool that makes it easy to find publicly available databases for ML applications. It should be noted, however, that many of these databases are very esoteric — for example, “leading anti-aging brands in the United States. The 2018 turnover”. “Nevertheless, records are becoming more available.
However, many databases relevant to ML applications have the following limitations:
- They may not have exactly what ML researchers are looking for — for example, older people crossing the street.
- They may not be relevant or meaningful with the metadata required for ML use.
- Other ML researchers could use them over and over again.
- They may not be very clean — for example, they may have many missing values. For example, the database may not be representative of the population. You may not have enough examples.
As many researchers say, all data is not the same. Internal assumptions and the context associated with records are often overlooked. If scientists do not pay enough attention to the sanitary conditions of the data set before connecting to the ML system, artificial intelligence can never — or worse — misunderstand as we have already described. In cases where data quality is potentially suspicious, it is difficult to determine whether algorithm learning is real or accurate. It’s a big risk.
If we know what we know about machine learning and the risks and limitations of data sets, how can we reduce this risk? The answer includes user experience.
User experience and Machine Learning
Although not all records refer to human behavior, this is the case in the vast majority of cases. It is therefore important to understand the data collection behavior. Over the past decade, several companies have asked us to collect accurate examples and attribute tags needed to train or test artificial intelligence algorithms (in some cases there were thousands of them). Here are some examples of the examples we’ve worked with:
- Video examples of people doing indoor and outdoor activities
- Speech and text samples from doctors and nurses who perform clinical inquiries
- Video examples that record the presence or absence of people in a room
- Examples of video and audio of people approaching the front door
- Thumb-print samples from certain populations
Please note that none of this data was publicly available. We had to capture every needed set of data through targeted research with specific research intentions and goals. The sheer volume of data that needs to be captured for ML applications is at first glance a preview of non-UX techniques.
For many scientists and researchers, the simple answer to this challenge is the use of quantitative data collection techniques. However, our clients who commissioned our projects noticed a serious disadvantage of these methods: low data integrity. Our project sponsors realized that basic data must be accurate. This is especially true when it is necessary to carefully consider the nuances of recorded human experience. We had to capture and observe context behavior — not just ask for a number on a five-point scale — as is often the case with quantitative data collection.
Behavior capture is the privilege of user experience and requires rigorous testing and formal protocols. We have learned that User Experience is extremely capable of collecting and coding these data elements thanks to our research methods and specialist knowledge in understanding and coding human behavior.
User Experience Measures Behavior
To measure behavior, follow these steps:
- Identify the destination. To create conditions for recording user experiences, you first need to understand what ML researchers really need. What is the goal What is a good sample? What variation between cases is acceptable? What are the basic cases and what are the marginal cases? So if we want to have 10,000 photos of people smiling, is there an objective definition of a smile? Does the crooked smile work? With teeth, without teeth? Which age groups of items? With teeth, without teeth? Facial hair or smooth shaving? Different hairstyles? And so on. Both entry and exit cases are elements that ML researchers must clearly define and to which all parties must agree.
- Collect data. Then schedule data collection. One of the strengths of UX researchers is the ability to create and run large-scale research programs involving people. The experience and knowledge of many ML researchers do not allow a lot of behavioral data to be known personally, effectively and efficiently. However, the practice of user research is primarily about determining the conditions required to obtain objective data. The ability to recruit, obtain facilities, obtain informed consent, instruct participants, and collect, store and transfer data is crucial. In addition, UX researchers can collect all required metadata and attach this data to examples for additional support. Researchers at UX are skilled in sorting, collecting and categorizing data — as demonstrated by a set of skills including qualitative coding and many tools supporting this kind of analysis.
- Make another round of tagging. After the initial data collection, you may need to organize and run a crowdsourcing program, such as Amazon Mechanical Turk, to further improve the collected data. For example, when collecting voice samples on how a person in a noisy cafe can order a delicious, slim, extremely hot, three-round latte, there may be several features that may interest any sample. In such cases, we may employ several researchers or programmers to review each sample, transcribe the samples, and evaluate for transparency and completeness. These encoders would then have to resolve all observed differences to ensure coding purity.
These are just a few of the many reasons why UX researchers are uniquely positioned to bridge the gap between ML scientists and the capture, interpretation, and use of human behavior data sets for integration with AI algorithms. Using User Experience in this domain can help us protect against the limitations of available databases for artificial intelligence and avoid the use of ambiguous, useless or incorrect entries, the restrictions of which may not be obvious, regardless of whether these are: Problems with the data itself or internet works on the algorithm. UX researchers are well placed to help ML researchers gather clean records for training and testing of AI algorithms.