Project detects multiple objects in Augmented Reality and generates labels on top of them using Core ML.
Getting Started with ARKit
Augmented reality (AR) describes user experiences that add 2D or 3D elements to the live view from a device’s camera in a way that makes those elements appear to inhabit the real world. ARKit combines device motion tracking, camera scene capture, advanced scene processing, and display conveniences to simplify the task of building an AR experience. You can create many kinds of AR experiences with these technologies using the front or rear camera of an iOS device.
Overview
Step 1:
In xcode create a new project and select “Augmented Reality App” under applications. Then click “Next”.
Step 2:
Enter desired project name and be sure you’re signed into an iCloud account to select yourself for “Team”. This will be used for testing your app on a connected iPhone later. For “Content Technology” make sure SceneKit is selected. And click “Next”.
Step 3:
Double check you are listed having a “Signing Certificate” under “Signing & Capabilities” after selecting the project in the in the sidebar.
This is what you base code should look like for ViewController.swift
Step 4:
Add “import Vision” at the imports at the top of ViewController.swift. The framework is used for identifying objects within the camera for labeling.
The Vision framework performs face and face landmark detection, text detection, barcode recognition, image registration, and general feature tracking. Vision also allows the use of custom Core ML models for tasks like classification or object detection.
Step 5:
Download and add Resnet50 CoreML model and drag into root folder of your project. Add setup configuration for Resnet50.mlmodel and vision request handler.
Step 6:
Add registerGestureRecognizers() to the viewDidLoad() function. This will be used setting labels via user screen taps. Also be sure to remove the default parameter for SCNScene(), and set statistics to false.
Be sure to enable plane detection within viewWillAppear() function for running the view.
Step 7
Add private function of registerGestureRecognizers() to the ViewController class.
Step 8
The screenTap() function will be called whenever user taps on screen.
Records touch location, and centers it on object. The current frame from session (whats in view of camera).
The hitTest result from the Resnet50 model. It acts like a plane, we get the touch location.
Then check for feature points and return if empty if not hitTestResults is set to itself.
A Core Video “pixelBuffer” is an image buffer that holds pixels in main memory.
Pixel buffer is required for the vision request.
Step 9
Add the displayPredictions() function.
Generates a node representing the point at which a label is referencing pass in text from prediction
hitTestResults contains everything related to position of where the user clicked. Uses cordinates
of where user clicked to place sphere and aid in text placement above sphere anchor.
Step 10
Add performVisionRequest(). The function is where label identifying occurs.
Create a vision model and request pass through the model in which the request is going to run.
handle errors with a return if nil. Creates observation of what CoreML believes the item is, with
a confidence rating.
It then takes the center of the current frame, crops the image to contain the touched object feeds
cropped image to coreML and attempts to identify the item. Populates vision request array.
This is an array of vision request.