- Augmented Reality (AR)
Augmented Reality (AR) is a live, direct or indirect view of a real-world physical environment whose elements are augmented (or complemented) by computer-generated sensory information, such as sound, video, graphics, or GPS data. It is related to a more general concept called mediated reality, in which a computer modifies a view of reality. Various technologies are used in the depiction of Augmented Reality, including optical projection systems, monitors, wearable devices, and display systems worn on the human body. As a result, technology works by improving the current perception of reality.
Big Tech’s race toward augmented reality (AR) becomes more competitive every day. This month, Meta launched the latest version of its headset, the Quest 3. Early next year, Apple plans to release its first headset, the Vision Pro. Advertisements for each platform emphasize games and entertainment that merge virtual and physical: a digital board game placed on a coffee table, a movie screen projected on the seats of an airplane.
However, some researchers are more curious about other uses of AR. The Makeability Lab at the University of Washington is applying these budding technologies to help people with disabilities. This month, lab researchers will present multiple projects that implement AR (via headsets and phone apps) to make the world more accessible.
Researchers at the lab will debut RASSAR, an app that can scan homes to highlight accessibility and safety issues, on Oct. 23 at the ASSETS ’23 conference in New York.
Shortly after, on October 30, other teams from the laboratory will present the first research at the UIST ’23 conference in San Francisco. One app allows headsets to better understand natural language and the other aims to make tennis and other ball sports accessible to users with low vision.
UW News spoke with the lead authors of the three studies, Xia Su and Jae (Jaewook) Lee, both UW doctoral students in the Paul G. Allen School of Computer Science and Engineering, about their work and the future of AR for accessibility.
What is AR and how is it typically used today?
Jae Lee: I think a commonly accepted answer is that you use a portable headset or phone to overlay virtual objects on a physical environment. Many people probably know augmented reality from “Pokémon Go,” where you superimpose these Pokémon onto the physical world. Now Apple and Meta are introducing “mixed reality” or AR in passing, which further combines the physical and virtual worlds through cameras.
Xia Su: Something I’ve also noticed lately is that people are trying to expand the definition beyond glasses and phone screens. There could be AR audio, which manipulates your hearing, or devices that try to manipulate your smell or touch.
Many people associate AR with virtual reality and get caught up in the discussion about the metaverse and gaming.
How does it apply to accessibility?
JL: AR as a concept has been around for several decades. But in Jon Froehlich’s lab, we combine AR with accessibility research. A headset or telephone may be able to know how many people are in front of us, for example. For people who are blind or have low vision, that information could be fundamental to the way they perceive the world.
XS: There are actually two different routes to AR accessibility research. The most common is trying to make AR devices more accessible to people. The other, less common approach is to ask: How can we use AR or VR as tools to improve the accessibility of the real world? That’s what we’re focused on.
JL: As AR glasses become less bulky and cheaper, and as AI and computer vision advance, this research will become increasingly important. But widespread AR, even when it comes to accessibility, raises many questions. How is viewer privacy addressed? We, as a society, understand that visual technology can be beneficial for blind and low vision people. But we may also not want to include facial recognition technology in apps for privacy reasons, even if it helps someone recognize her friends.
Let’s talk about the items you’re coming out with.
First, can you explain your RASSAR application?
XS: It is an application that people can use to scan their interior spaces and help them detect possible accessibility safety problems in homes. It’s possible because some iPhones now have lidar (light detection and ranging) scanners that indicate the depth of a space, so we can reconstruct the space in 3D. We combine this with computer vision models to highlight ways to improve security and accessibility. To use it, someone (perhaps a parent childproofing a home or a caregiver) scans a room with their smartphone and RASSAR detects accessibility issues. For example, if a desk is too high, a red button will appear on it. If the user clicks the button, there will be more information about why the height of that desk is an accessibility issue and possible solutions.
JL: Ten years ago, it would have been necessary to review 60 pages of PDF files to fully check a home’s accessibility. We bring that information together in an application.
And this is something that anyone will be able to download to their phones and use?
XS: That’s the ultimate goal. We already have a demonstration. This version is based on lidar, which is currently only available on certain iPhone models. But if you have such a device, it is very simple.
JL: This is an example of these advances in hardware and software that allow us to create applications quickly. Apple announced RoomPlan, which creates a 3D floor plan of a room, when they added the lidar sensor. We are using it in RASSAR to understand the overall design. Being able to take advantage of that allows us to create a prototype very quickly.
So RASSAR is almost deployable now. The other areas of research you present are in earlier stages of development.
Can you tell me about Gazepointar?
JL: It’s an app implemented on an AR headset to allow people to speak more naturally with voice assistants like Siri or Alexa. There are all these pronouns that we use when we speak that are difficult for computers to understand without visual context. I can ask “Where did you buy it?” But what is this? A voice assistant has no idea what I’m talking about. With GazePointAR, the glasses observe the environment around the user and the app tracks the user’s gaze and hand movements. Then, the model tries to make sense of all these inputs: the word, the hand movements, the user’s gaze. Then, using a large language model, GPT, it tries to answer the question.
How do you detect what the movements are?
JL: We are using a headset called HoloLens 2 developed by Microsoft. It has a gaze tracker that watches your eyes and tries to guess what you’re looking at. It also has manual tracking capabilities. In a paper we presented based on this, we noted that we have a lot of problems with this. For example, people don’t use just one pronoun at a time, but several. We will say: “What is more expensive, this or this?” To answer that, we need data over time. But again, you may run into privacy issues if you want to track someone’s gaze or someone’s field of vision over time: What information are you storing and where is it stored? As technology improves, we certainly need to be alert to these privacy concerns, especially in computer vision.
This is difficult even for humans, right? I can ask, “Can you explain that?” while you point out various equations on a whiteboard and you won’t know which one I mean.
What applications do you see for this?
JL: Being able to use natural language would be important. But if you extend this to accessibility, there’s a chance that a blind or low vision person could use this to describe what’s around them. The question “Is there something dangerous in front of me?” It’s also ambiguous for a voice assistant. But with GazePointAR, ideally, the system could say, “There may be dangerous objects, such as knives and scissors.” Or people with low vision can make out a shape, point to it, and then ask the system what “it” is more specifically.
And finally you are working on a system called Artennis. What is it and what prompted this research?
JL: This is even further into the future than GazePointAR. ARTennis is a prototype that uses an AR headset to make tennis balls stand out more for players with low vision. The ball in play is marked by a red dot and has a cross of green arrows around it. Professor Jon Froehlich has a family member who wants to play sports with his children but does not have the residual vision necessary to do so. We thought that if it worked for tennis, it would also work for many other sports, since tennis has a small ball that shrinks as it moves away. If we can track a tennis ball in real time, we can do the same with a larger, slower basketball.
One of the co-authors of the article has vision problems, plays squash a lot and wanted to try this application and give us his opinion. We did a lot of brainstorming sessions with him and he tested the system. The red dot and green sight is the design he came up with to improve the feeling of depth perception.
What’s stopping this from being something people can use right away?
JL: Well, like GazePointAR, it’s based on a HoloLens 2 headset that costs $3,500. So that’s a different accessibility issue. It also runs at about 25 frames per second and for humans to perceive it in real time it must be about 30 frames per second. Sometimes we cannot grasp the speed of the tennis ball. We will expand the article and include basketball to see if there are different designs that people prefer for different sports. Without a doubt, the technology will be faster. So our question is: What will be the best design for the people who use it?