close

Вход

Забыли?

вход по аккаунту

?

DESCRIPTION JP2018511095

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2018511095
Abstract Describes techniques for headlessly completing application tasks in the background of a
digital personal assistant. The method comprises the steps of receiving an audio input via a
microphone. Natural language processing may be performed using speech input to determine
user speech commands. The user voice command may include a request to perform a task of the
application. The application can execute tasks as a background process without the user interface
of the application appearing. The digital personal assistant's user interface responds to the user
based on the received state associated with the task, such that the response comes from within
the context of the digital personal assistant's user interface without revealing the application's
user interface. Can be provided.
Headless task completion in the digital personal assistant
[0001]
As computing technology advances, more and more powerful computing devices are available.
For example, computing devices are increasingly adding features such as speech recognition.
Speech may be an effective way for users to communicate with computing devices, and speech
control applications such as speech control digital personal assistants have been developed.
[0002]
Digital personal assistants may be used to perform tasks or services for an individual. For
11-04-2019
1
example, the digital personal assistant may be a software module operating on a mobile device or
desktop computer. Examples of tasks or services obtained by the digital personal assistant
include weather conditions and forecasts, sports scores, traffic directions and traffic conditions,
local news and / or national news, and searching for stock quotes, and new Managing the user's
schedule by creating a schedule entry, and reminding the user of upcoming events, and storing
and retrieving reminders can be included.
[0003]
However, it is likely that the digital personal assistant can not perform all the tasks that the user
may want to perform. Thus, there is ample opportunity for improvement in the technology
associated with speech control digital personal assistants.
[0004]
This summary is provided to introduce a selection of concepts in a simplified form that are
further described below in the Detailed Description. This summary is not intended to identify key
features or essential features of the claimed subject matter, but is also intended to be used to
limit the scope of the claimed subject matter Absent.
[0005]
Describes techniques and tools for headlessly completing application tasks in the background of
a digital personal assistant. For example, the method is performed by a computing device
comprising a microphone. The method may comprise the step of receiving by the voicecontrolled digital personal assistant the digital voice input generated by the user. Digital audio
input may be received via a microphone. Natural language processing may be performed using
digital speech input to determine user speech commands. The user voice command may
comprise a request to perform predefined functions of the third party voice enabled application.
Predefined features can be identified using data structures that define the features supported by
available third party voice enabled applications that use voice input. The third party voice
enabled application may be caused to perform predefined functions as a background process
without the user interface of the third party voice enabled application appearing on the display
of the computing device. A response may be received from the third party voice enabled
application indicating a state associated with the predefined function. The voice-controlled digital
11-04-2019
2
personal assistant user interface is associated with predefined functions so that responses come
from within the context of the voice-controlled digital personal assistant user interface without
revealing the user interface of the third party voice-enabled application A response can be
provided to the user based on the received status.
[0006]
As another example, a computing device may be provided that includes a processing unit, a
memory, and one or more microphones to perform the operations described herein. For example,
a method performed by a computing device may include receiving, via one or more microphones,
speech input generated by a user. Speech recognition may be performed using speech input to
determine the command that has been uttered. The uttered command can comprise a request to
perform a task of the third party application. The tasks may be identified using data structures
that define the tasks of the third party application that can be invoked by the spoken command.
It can be determined whether the task of the third party application can be performed headlessly.
The third party application may be run as a background process that performs the task
headlessly when it is determined that the task of the third party application is headlessly
executable. A response may be received from the third party application indicating the status
associated with the task. The user interface of the speech control digital personal assistant
received state associated with the task such that the response comes from within the context of
the speech control digital personal assistant user interface without revealing the user interface of
the third party application Can provide a response to the user based on
[0007]
As another example, a computing device may be provided that includes a processing unit and a
memory to perform the operations described herein. For example, the computing device can
perform an operation to complete the task of the voice enabled application within the context of
the voice controlled digital personal assistant. The operation may comprise receiving at the voice
controlled digital personal assistant the digital voice input generated by the user. Digital audio
input may be received via a microphone. Natural language processing may be performed using
digital speech input to determine user speech commands. The user voice command may
comprise a request for performing a task of the voice enabled application. Tasks may be
identified using an extensible data structure that maps user voice commands to voice-enabled
application tasks. It can be determined whether the task of the voice enabled application is a
foreground task or a background task. When the task is determined to be a background task, the
voice-enabled application can be made to execute the task as a background task and in the
11-04-2019
3
context of a voice-controlled digital personal assistant without the user interface of the voiceenabled application being exposed. . A response from the voice enabled application may be
received. The response can indicate the state associated with the task. The response may be
provided to the user based on the received status associated with the task. The response may be
provided within the context of a voice controlled digital personal assistant without the user
interface of the voice enabled application being exposed when the task is determined to be a
background task.
[0008]
As described herein, various other features and advantages may be incorporated into the
techniques as desired.
[0009]
FIG. 1 illustrates an example of a system for headlessly completing application tasks in the
background of a digital personal assistant.
[0010]
FIG. 7 illustrates an exemplary software architecture for headlessly completing application tasks
in the background of a digital personal assistant.
[0011]
FIG. 6 is an illustration of an example state machine for an application interfacing with a digital
personal assistant.
[0012]
FIG. 5 is an example of a command definition that may be used to create a data structure to
enable an interface between an application and a digital personal assistant.
[0013]
FIG. 6 is an exemplary sequence diagram illustrating the communication of multiple threads to
perform the tasks of an application headlessly from within a digital personal assistant.
11-04-2019
4
[0014]
FIG. 5 is a flowchart of an exemplary method for headlessly completing application tasks within
the background of a digital personal assistant.
[0015]
FIG. 7 is a flowchart of an example method for determining whether to warm up an application
while the user is speaking to a digital personal assistant.
[0016]
FIG. 1 is a diagram of an exemplary computing system in which some described embodiments
may be implemented.
[0017]
FIG. 10 is an illustration of an example mobile device that can be used with the techniques
described herein.
[0018]
FIG. 10 is an illustration of an example cloud support environment that may be used in
conjunction with the techniques described herein.
[0010] [0019]
Overview As users become more comfortable with digital personal assistants, users may prefer to
perform more actions within the context of digital personal assistants. However, providers of
digital personal assistants can not spend time to anticipate or develop any applications that the
user may want to use. Thus, it may be desirable for the digital personal assistant to be able to
invoke or launch third party applications created by entities other than the provider of the digital
personal assistant.
[0011] [0020] In a typical solution, the user interface of the application is exposed when the
digital personal assistant launches the application and program control passes from the digital
personal assistant to the application. Once the application's user interface is tabulated, the user
can check the status of the request and the user can perform additional tasks from within the
application. In order to return to the digital personal assistant's user interface, the user must exit
11-04-2019
5
the application before control can be returned to the digital personal assistant.
[0012] [0021] As one particular example of using a cell phone digital personal assistant, the user
may request to add video to the user's queue using a video application installed on the cell
phone. For example, the user can say "Movie application, add Movie-X to my queue" in the digital
personal assistant's user interface. After the command is uttered and recognized by the assistant,
the assistant can start the video application that will present the user interface of the video
application. The animation may be added to the user's queue, and the queue may be presented to
the user as a confirmation that the animation has been added. The user can continue to use the
video application, or the user can close the video application to return to the user interface of the
digital personal assistant.
[0013] [0022] When the digital personal assistant transfers control to the application, loading
the application and its user interface into memory can take a considerable amount of time. The
delay can potentially affect user productivity, for example, by delaying the user's accomplishing a
subsequent task and / or interrupting the user's series of thoughts. For example, the user's
attention may be directed to closing the application before returning to the digital personal
assistant's user interface. Furthermore, by transferring control to the application, context
information available to the digital personal assistant may not be available to the application. For
example, the digital personal assistant may understand the identity and contact information of
the user's spouse, the location of the user's home or office, or the location of the user's day care
provider, but the application accesses the contextual information May not have
[0014] [0023] In the techniques and solutions described herein, the digital personal assistant can
determine whether the task of the third party application can be performed in the background,
whereby the action to perform the task is , In the context of a digital personal assistant, without
being exposed to the user interface of the voice enabled application. Thus, the user can
experience that a given set of tasks is performed within the context of the digital personal
assistant as opposed to the context of the application performing the user task. Furthermore,
when the application's tasks are performed in the background, the device's power can be
potentially reduced (and battery life is extended) because the application's user interface is not
loaded into memory.
[0015] [0024] Applications can be registered with digital personal assistants to extend the list of
native capabilities offered by the assistants. The application may be installed on the device or
may be called as a service via a network (such as the Internet). The schema definition may allow
the application to register voice commands with requests that are headlessly activated when the
11-04-2019
6
user requests commands / tasks. For example, the application may include a voice command
definition (VCD) file accessible by a digital personal assistant, the VCD file identifying tasks that
may be launched headlessly. The definition can specify that the application's task is always
headlessly activated, or the definition can specify that the application's task is headlessly
activated under certain circumstances. For example, if the application requires the user to
perform a task on a device that does not have a display surface (such as a wireless fitness band),
or if the user is connected to a Bluetooth headset When operating in hands-free mode, you can
choose to do something headless.
[0016] [0025] The application can provide the progress, failure, and response to successful
completion of the requested task, and status related outputs can be provided by the digital
personal assistant's user interface. Applications return to digital personal assistants, including,
for example, display text, text that can be read, deep links back to the application, links to web
pages or websites, and hypertext markup language (HTML) based web content Many different
types of data can be provided. Data from the application to the assistant may be presented as
coming from the assistant's native functionality through the assistant's user interface.
[0017] [0026] If the user provides a request for an application that may have multiple meanings
or results, the application can provide the digital personal assistant with a list of choices, and the
assistant's user interface ambiguizes between the choices It can be used to eliminate it.
Destructive or critical tasks when the user provides a request to the application that may be
disruptive or important (eg, when the banking application requires the balance transfer to be
performed) The assistant's confirmation interface may be used to confirm the request before
completing the.
[0018] [0027] The application can be speculatively loaded or warmed up when the command is
being spoken. For example, when the user completes the phrase "movie application" from the
command "movie application, add Movie-X to my queue", memory may be allocated, ready for
use in subroutines when the command is completed Thus, various subroutines of the installed
animation application may be retrieved from storage and loaded into allocated memory. When
the application is a web service, warm-up may include, for example, establishing a
communication session and obtaining user-specific information from a database at the remote
server. By warming up the application, the time to respond to the user is potential, so that the
interaction becomes more natural, and the user can move quickly to the next task and be more
productive than the user. Can be shortened to
[0019]
11-04-2019
7
[0028]
Using the techniques herein, a user desiring to add video to the user's queue with the video
application uses the typical solution of launching the video application and passing control to the
application You can have a different experience than in the case. In this example, the command
to add the animation of the animation application to the queue may be defined as headless in a
command data structure such as a VCD file. The video application can be warmed up so that the
response time to the user can be reduced when the user says "Video application" from the
command "Video application, add Movie-X to my queue". When the command is complete, the
animation may be added to the user's queue using the animation application, but without
populating the animation application's user interface. The video can be added to the user's queue,
and the digital personal assistant can confirm (using the assistant's user interface) that the video
has been added. The user can experience a quick response time and can perform fewer steps to
complete the task (e.g., the animation application does not have to be closed).
[0020]
[0029]
Exemplary System Including Digital Personal Assistant FIG. 1 is a system diagram illustrating an
example of a system 100 for headlessly completing task 112 of voice enabled application 110 in
the background of digital personal assistant 120. Voice enabled application 110 and digital
personal assistant 120 may be software modules installed on computing device 130. Computing
device 130 may be, for example, a desktop computer, laptop, cell phone, smart phone, wearable
device (such as a watch or wireless electronic band), or a tablet computer. Computing device 130
may include a command data structure 140 for identifying applications and application tasks
that may be launched by digital personal assistant 120. The application is in the foreground (as
the application user interface appears when the application is launched) and / or by the digital
personal assistant 120 in the background (where the application user interface does not appear
when the application is launched) It can be activated. For example, some tasks of the application
may be launched in the foreground and different tasks of the same ablation may be launched in
the background. The command data structure 140 can define how the application and / or task
of the application should be launched from the digital personal assistant 120.
[0021]
[0030]
11-04-2019
8
The computing device 130 may include a microphone 150 for converting sound into an electrical
signal. The microphone 150 may be a dynamic microphone, a condenser microphone, or a
piezoelectric microphone that uses electromagnetic induction, a change in capacitance, or
piezoelectricity, respectively, to generate an electrical signal from pneumatic vibrations. The
microphone 150 can include an amplifier, one or more analog or digital filters, and / or an
analog-to-digital converter to generate a digital audio input. Digital voice input can include
playing the user's voice, for example, when the user instructs the digital personal assistant 120
to accomplish a task. The computing device 130 may include a touch screen or keyboard (not
shown) to allow the user to input text input.
[0022]
[0031]
Digital speech input and / or text input may be processed by the natural language processing
module 122 of the digital personal assistant 120. For example, the natural language processing
module 122 can receive digital speech input and can translate words uttered by the user into
text. The extracted text can be semantically analyzed to determine user voice commands. By
analyzing digital voice input and taking action in response to spoken commands, digital personal
assistant 120 may be voice controlled. For example, the digital personal assistant 120 can
compare the extracted text to a list of potential user commands in order to determine which
commands are most likely to match the user's intent. The match may be based on statistical or
probabilistic methods, decision trees or other rules, other suitable match criteria, or a
combination thereof. The potential user commands may be the native commands of digital
personal assistant 120 and / or the commands defined in command data structure 140. Thus, by
defining commands in the command data structure 140, the scope of tasks that can be
performed by the digital personal assistant 120 on behalf of the user can be extended. Potential
commands may include performing task 112 of voice-enabled application 110, which may be
defined as a headless task or a background task in command data structure 140.
[0023]
[0032]
The natural language processing module 122 may generate a stream of text while the utterances
are being processed so that intermediate strings of text may be analyzed before the user
utterance is completed. Thus, if the user initiates a command with the name of the application,
the application may be identified early in the speech and the application may be warmed up
before the user completes the command. Warming up the application involves fetching the
11-04-2019
9
application's instructions from a relatively slow non-volatile memory (such as a hard disk drive or
flash memory) and an instruction to a relatively fast volatile memory (such as main memory or
cache memory) And storing.
[0024]
[0033]
When the digital personal assistant 120 determines that the command is associated with the
application's task, the application's task may be performed. If the digital personal assistant 120
determines that the application's task is to be performed as a background process (e.g., by
analyzing the definitions in the command data structure 140), the application may be run in the
background. An application such as voice enabled application 110 may communicate with digital
personal assistant 120. For example, the application may order a set of states related to task
completion, and the state of the application may be communicated to the digital personal
assistant 120. For example, the application may start in the "initial" state, transition to the
"progress" state while the task is being performed, and then transition to the "final" state when
the task is completed.
[0025]
[0034]
Digital personal assistant 120 can report on the progress of the task via user interface 124. The
user interface 124 may present text, graphics, or hyperlinks on the display of the computing
device 130, generate audio output from a speaker of the computing device 130, or off-center
weight of the computing device 130. The information may be communicated to the user in
various ways, such as generating other sensory outputs such as vibrations from an electric motor
connected to the. For example, user interface 124 may cause the spinning wheel to be presented
on the display screen of computing device 130 when the task is in progress. As another example,
the user interface 124 can generate a simulated utterance that indicates the successful
completion of the task when the task is in its final state and the task has completed successfully.
By using the user interface 124 of the digital personal assistant 120 to report the status of the
task, the response can come from within the context of the user interface 124 without popping
up the user interface of the application.
[0026]
[0035]
11-04-2019
10
It should be noted that the audio enabled application 110 may be created by the manufacturer of
the digital personal assistant 120 or by a third party different from the manufacturer.
Interoperation of digital personal assistant 120 and voice enabled application 110 may be
achieved by adhering to software contracts between applications and defining functions within
command data structure 140. Voice enabled application 110 may be capable of operating only as
a stand alone application or as a component of digital personal assistant 120. As a stand-alone
application, the voice enabled application 110 may be launched outside of the digital personal
assistant 120 as a foreground process, for example by tapping or double clicking on the icon
associated with the voice enabled application 110, the display screen of the computing device
130 It can be displayed on top. The voice enabled application 110 can present a user interface
when launched, and the user can interact with the user interface to perform tasks. The
interaction may be voice input only, or other input modes such as text input or gestures may be
used. The application called by digital personal assistant 120 may be installed on computing
device 130 or may be a web service.
[0027]
[0036]
Digital personal assistant 120 can invoke a web service, such as web service 162, running on
remote server computer 160. Web services are software functions provided at network addresses
on a network, such as network 170. Network 170 may be a local area network (LAN), a wide area
network (WAN), the Internet, an intranet, a wired network, a wireless network, a cellular network,
a combination thereof, or between computing device 130 and remote server computer 160. Any
network suitable for providing a channel for communication can be included. It should be
understood that the network topology shown in FIG. 1 is simplified and that multiple networks
and networking devices may be utilized to interconnect the various computing systems disclosed
herein. is there. Web service 162 may be invoked as part of the kernel or main portion of digital
personal assistant 120. For example, web service 162 may be called as a subroutine of natural
language processing module 122. Additionally or alternatively, web service 162 may be an
application defined within command data structure 140 and may be capable of being headlessly
launched from digital personal assistant 120.
[0028]
[0037]
11-04-2019
11
Exemplary Software Architecture Including Digital Personal Assistant FIG. 2 is a diagram
illustrating an exemplary software architecture 200 for headlessly completing application tasks
in the background of digital personal assistant 120. When performing application tasks
headlessly, the tasks may be performed in the background and the user interface of the
application is not tabulated as a result of the task being performed. Rather, the user interface of
the digital personal assistant 120 may be used to provide output to and / or input from the user
such that the user interacts within the context of the digital personal assistant 120 rather than
the context of the application. . Thus, the tasks performed headlessly of the application can be
performed in the background for the duration of execution of the task, and the user interface of
the application is never exposed. A computing device, such as computing device 130, may
execute software for digital personal assistant 120, operating system (OS) kernel 210, and
application 230, organized according to architecture 200.
[0029]
[0038]
The OS kernel 210 generally provides an interface between software and hardware components
of the computing device 130. The OS kernel 210 includes components for rendering (eg,
rendering of visual output to a display, generation of audio output for speakers and other sounds,
and generation of vibration output for an electric motor), and networking Component for process
management, component for memory management, component for position tracking, and
component for speech recognition and other input processing Can. The OS kernel 210 may
manage user input management, output management, storage access functions, network
communication functions, memory management functions, process management functions, and
other functions for the computing device 130. it can. The OS kernel 210 can provide access to
such functionality to the digital personal assistant 120 and the application 230 via, for example,
various system calls.
[0030]
[0039]
A user can generate user input (such as voice, haptics, motion, etc.) to interact with digital
personal assistant 120. Digital personal assistant 120 may be aware of user input through OS
kernel 210, which may include functionality for composing messages in response to user input.
The message may be used by digital personal assistant 120 or other software. The user input
may include tactile input such as touch screen input, button press, or key press. The OS kernel
210 may include functions for recognizing tactile input, button input, or key press input to a
11-04-2019
12
touch screen, finger gestures and the like. The OS kernel 210 can receive input from the
microphone 150 and can include functionality to recognize commands and / or words uttered
from the speech input. The OS kernel 210 can receive input from an accelerometer and can
include features for recognizing orientation or movement, such as a shake.
[0031]
[0040]
The user interface (UI) input processing engine 222 of the digital personal assistant 120 can wait
for user input event messages from the OS kernel 210. UI event messages may be voice input,
panning gestures, flick gestures, drag gestures, or other gestures on the touch screen of the
device, taps on the touch screen, keystroke inputs, shake gestures, or other UI events (eg,
direction Button or trackball input). The UI input processing engine 222 can translate UI event
messages from the OS kernel 210 into information sent to the control logic 224 of the digital
personal assistant 120. For example, the UI input processing engine 222 can include natural
language processing capabilities and can indicate that a particular application name has been
spoken or typed, or that a voice command has been given by the user. Alternatively, natural
language processing capabilities may be included in control logic 224.
[0032]
[0041]
Control logic 224 can receive information from various modules of digital personal assistant 120,
such as UI input processing engine 222, personal information store 226, and command data
structure 140, which control logic 224 has received Information can be used to make decisions
and perform actions. For example, should control logic 224 execute the task on behalf of the
user, such as by parsing the stream of spoken text to determine if a voice command has been
given, etc. You can decide if.
[0033]
[0042]
The control logic 224 can wait for the entire user command to be voiced before acting on the
command, or the control logic 224 acts on the command before the command is completed when
the command is still being voiced. Can start to do. For example, control logic 240 may parse an
11-04-2019
13
intermediate string of spoken commands and attempt to match the string to one or more
applications defined in command data structure 140. When the probability that the application is
called exceeds a threshold, the application can be warmed up so that the application can respond
to the user more quickly. Multiple applications and / or functions may be speculatively warmed
up in anticipation of being called, and the application may be stopped if it is determined that the
application is not called. For example, when a user initiates a command uttered with the name of
a particular application, that application may be warmed up because there is a high probability
that the particular application will be read. As another example, some partial command strings
may be limited to a small set of applications defined in command data structure 140, and the set
of applications may have a partial command string match When you do, you can be warmed up
in parallel. Specifically, the command data structure 140 includes two applications with a
command having the word "take", such as a camera application having a "take a photo" command
and a memo application having a "take a note" command. May have only The control logic 224
can initiate warm up of both the camera application and the note application when the word
"take" is recognized, and then when the complete command "take a picture" is recognized, The
note application can be stopped. Warming up the application involves allocating memory,
prefetching instructions, establishing a communication session, retrieving information from the
database, starting a new execution thread, generating an interrupt, or other Appropriate
application specific behavior can be included. The services of the OS kernel 210 can invoke, for
example, process management services, memory management services, network services, etc.
during warm up.
[0034]
[0043]
The uttered text can include context information, and the control logic 224 can parse the context
information such that the user voice command is context free. The context information may
include current location, current time, orientation of computing device 130, and personal
information stored in personal information store 226. Personal information includes user
relationships such as the user's spouse or child's name, user-specific locations such as home,
work, school, day care, or doctor's address, and information from the user's contact list or
calendar. It may include the user's favorite color, restaurant, or transportation, important
birthdays, anniversaries, or other dates, and other user-specific information. The user can provide
a command with context information, and the control logic 224 can translate the command into a
context free command. For example, the user can give the command "Bus app, teach me the bus
to return home within the next hour". In this example, the context information in the command is
the current date and time, the current position, and the location of the user's home.
11-04-2019
14
[0035]
[0044]
Control logic 224 can obtain the current time from OS kernel 210 that can maintain or have
access to the real time clock. Control logic 224 may obtain current position data of computing
device 130 from OS kernel 210, which may obtain current position data from a local component
of computing device 130. For example, location data may be determined based on data from the
Global Positioning System (GPS) by triangulation between towers of a cellular network, by
reference to the physical location of nearby Wi-Fi routers, or by another mechanism. It can be
done. Control logic 224 may obtain the user's home location from personal information store
226. Personal information store 226 may be stored on auxiliary or other non-volatile storage of
computing device 130. Thus, control logic 224 can receive personal information via OS kernel
210 that can access storage resources (eg, personal information store 226). When context
information can be resolved, commands can be translated into context free commands. For
example, if it is 6:00 pm on Friday, and the user is at 444 Main Street, and the user's home is
128 Pleasant Drive, then the context free command will arrive at "Bus app, near 444 Main Street,
6:00 pm and afternoon on Friday Tell me the bus that passes near 128 Pleasant Drive between 7
o'clock.
[0036]
[0045]
The user command may be control logic 224 (such as when the command is a native personal
digital assistant 120 command), an application 230 installed on computing device 130 (such as
when the command is associated with application 230), or a web service 162 (such as Such as
when the command is associated with the web service 162 may be performed. The command
data structure 140 can specify which command is associated with the application and whether
the command can be executed in the foreground or background. For example, command data
structure 140 may map user voice commands to features supported by available third party
voice enabled applications.
[0037]
[0046]
Control logic 224 may cause predefined function 232 of application 230 to be performed when
control logic 224 determines that the user command is associated with predefined function 232
11-04-2019
15
of application 230. If the control logic 224 determines that the predefined functions 232 of the
application 230 should be performed as a background process, the predefined functions 232 can
be performed in the background. For example, control logic 224 may generate an interrupt (eg,
via a process management component of OS kernel 210), write to shared memory, write to a
message queue, pass a message, or start a new execution thread , Can send the request 240 to
the predefined function 232. The application 230 may execute the predefined function 232 and
return an answer 242 to the control logic 224 by generating an interrupt, writing to shared
memory, writing to a message queue, or passing a message it can. The response may include the
status of application 230 and / or other information responsive to user commands.
[0038]
[0047]
Control logic 224 may cause web service 162 to be called when control logic 224 determines
that the command is associated with web service 262. For example, request 260 may be sent to
web service 162 via the networking component of OS kernel 210. The networking component
may format the request (eg, by encapsulating the request in network packets according to the
protocol of the network 170) and forward it to the web service 162 via the network 170 to
execute the user command it can. The request 260 may include multiple steps, such as opening a
communication channel (eg, a socket) between the control logic 224 and the web service 162,
and sending information related to the user command. The web service 162 may respond to the
request 260 with a response that may be sent via the network 170 and forwarded by the
networking component to the control logic 224 as a reply 262. The response from web service
162 may include the status of web service 162 and other information responsive to user
commands.
[0039]
[0048]
Control logic 224 may generate output (with the help of UI output rendering engine 228 and
rendering components of OS kernel 210) to be presented to the user based on the response from
the application. For example, the command data structure 140 can map the state received from
the function to the response provided by the voice controlled digital personal assistant 120 to
the user. In general, control logic 224 may provide high level output commands to UI output
rendering engine 228, which may include visual output on a display, audio and / or audio output
via speakers or headphones, As well as low level output primitives can be generated in the
rendering component of the OS kernel 210 for vibration output from the electric motor. For
11-04-2019
16
example, control logic 224 may send a text-to-speech command having a string of text to UI
output rendering engine 228, which may generate digital audio data to simulate speech. it can.
[0040]
[0049]
Control logic 224 may determine what information to provide to the user based on the state of
the application. The state may correspond to the initiation, processing, confirmation,
disambiguation, or termination of a user command. The command data structure 140 can map
the state of the application to different responses provided to the user. Types of information that
may be provided include, for example, display text, simulated speech, deep links back to the
application, links to web pages or websites, and hypertext markup language (HTML) based web
content.
[0041]
[0050]
Exemplary Application State FIG. 3 is a diagram of an exemplary state machine 300 for an
application that interfaces with digital personal assistant 120 in a headless manner. The
application can be started in either the warm up state 310 or the initial state 320. The warm up
state 310 may be entered when the digital personal assistant 120 warms up the application, such
as when the application name is known but the spoken command is not complete. The
application remains in the warm up state 310 until the warm up operation is complete. When the
warm up operation is complete, the application may transition to the initial state 320.
[0042]
[0051]
The initial state 320 can be entered after the warm up state 310 is complete or after user
commands have been provided to the application by the digital personal assistant 120. During
initial state 320, user commands are processed by the application. If the command is obvious but
takes more than a predetermined time (such as 5 seconds) to complete, the state may be
transitioned to the progress state 330 in which the command is being executed. The state may be
transitioned to a confirmation state 340 if the command is unambiguous and may result in
significant or destructive actions being performed. If the command is somewhat ambiguous but
11-04-2019
17
the ambiguity can be resolved by selecting between several options, the state can be transferred
to the disambiguating state 350. If the command is ambiguous and can not be clarified using
some options, the state may be transitioned to a final state 350, such as a failure or redirection
state. If the command can not be executed, the state may be transitioned to a final state 360,
such as a failure state. If the command can be completed in less than a predetermined amount of
time and it is not desired to request confirmation from the user, the state may be transitioned to
a final state 360, such as a success state. A single state where the final state 360 has multiple
statuses (such as when the status is success, failure, redirection, and timeout), or a group of final
states where the status is success, failure, redirection, and timeout It should be noted that it can
be
[0043]
[0052]
Progress state 330 may indicate that a user command action is being performed or attempted.
The application sends text-to-speech (TTS) strings or graphical user interface (GUI) strings to the
digital personal assistant 120 so that information may be presented to the user using the digital
personal assistant 120 user interface. Information may be provided to the user during the
progress state 330. Additionally or alternatively, default information (such as a spinning wheel,
an hourglass, and / or a cancel button) may be presented to the user during the progress state
330 using the interface of the digital personal assistant 120.
[0044]
[0053]
While in progress state 330, the application can monitor the progress of the operation and
determine whether the application can remain in progress state 330 or can transition to final
state 360. In one embodiment, the application may start the timer (eg, 5 seconds), and if the
application does not make enough progress before the timer expires, the state transitions to a
final state 360, such as a time-out state. It can be done. If the application is fully progressing, the
timer may be restarted and the progress may be checked again at the expiration of the next
timer. The application may have a maximum time limit to stay in progress state 330, and if the
maximum time limit is exceeded, the state may be transitioned to a final state 360, such as a
timeout state. The actions associated with the user command may be completed (successfully or
unsuccessfully) and the state may be transitioned to the appropriate final state 360. The user can
terminate the application when in progress state 330 by giving a command to the user interface
of digital personal assistant 120. For example, the user can press or click on the "cancel" or
11-04-2019
18
"back" button on the display or say "cancel". Canceling the command may cause the digital
personal assistant 120 to stop the application and display or exit the home screen of the digital
personal assistant 120.
[0045]
[0054]
The confirmation state 340 can indicate that the application is waiting for confirmation from the
user before completing the task. When the digital personal assistant 120 detects that the
application is in the confirmation state 340, a yes / no response prompt may be presented to the
user using the digital personal assistant 120 user interface. The application can provide the
digital personal assistant 120 with a TTS string, which is a question having a yes or no answer.
The digital personal assistant 120 can utter the provided TTS string of the application and can
hear “yes / no” answers. If the user response does not resolve for a yes or no answer, the
digital personal assistant 120 can continue to ask the user questions for a predefined number of
times (such as three times). If all attempts have been made, the Digital Personal Assistant 120
"Sorry, I do not know. Tap below to select an answer. And the digital personal assistant 120 can
stop listening. If the user taps yes or no, the digital personal assistant 120 can send the user's
selection to the application. If the user taps the microphone icon, the digital personal assistant
120 can again attempt to recognize the spoken answer (eg, by resetting a counter that counts the
number of oral answer attempts). The digital personal assistant 120 can loop until there is a
match or there is a user's cancellation or hit of the back button on the display screen. If the
application receives an acknowledgment from digital personal assistant 120, the application may
attempt to complete the task. If the task completes successfully, the state can transition to the
final state 360 with a status of success. If the task fails to complete successfully or the
application is canceled, the state can transition to the final state 360 with a failure status. If the
task takes more than a predetermined time to complete, the state may be transitioned to the
progress state 330 while the task is being performed.
[0046]
[0055]
The disambiguating state 350 may indicate that the application is waiting for the user to clear
between a limited number of options (such as 10 or less) before completing the task. The
application can provide the digital personal assistant 120 with a TTS string, a GUI string, and / or
a list of items that the user selects. The list of items may be provided as a template with one or
more pieces of information to provide the user for each item, such as a title, description, and / or
11-04-2019
19
an icon. The digital personal assistant 120 can present the user with a list of items using the
information provided by the application. The digital personal assistant 120 can prompt and hear
a selection from the user. The user can select from the list using flexible or inflexible selection.
Inflexible selection means that the user can only select from the list in one way, flexible selection
means that the user can select from the list in several different ways. For example, the user may
select from the list based on the enumerated numerical order of the items, such as by saying
"first" or "second" to select the first item or the second item respectively it can. As another
example, the user may be based on spatial relationships between items, such as "top item,"
"bottom item," "right item," or "second item from the bottom." It can be selected from the list. As
another example, the user can select from the list by saying the title of the item.
[0047]
[0056]
As a specific example of disambiguation, the user can say to the digital personal assistant 120,
"add movie application, Movie-X to my queue". However, there may be three versions of Movie-X,
such as the original and two sequels, Movie-X I, Movie-X II, and Movie-X III. In response to the
spoken command, the digital personal assistant 120 can launch a movie application in the
background with a command to add Movie-X to a queue. The movie application can search
Movie-X and determine that three versions exist. Thus, the animation application can transition
to the disambiguation state 350 and can send three alternative options to the digital personal
assistant 120. The digital personal assistant 120 can present the user three choices via the user
interface, and the user can select one from the list. When an appropriate selection is made by the
user, the digital personal assistant 120 can send a response to the animation application, and the
correct animation can be added to the queue.
[0048]
[0057]
If the user response can not be resolved for an item on the list, the digital personal assistant 120
can continue to ask the user a question a predefined number of times. If all attempts have been
made, the Digital Personal Assistant 120 "Sorry, I do not know. Tap below to select an answer.
And the digital personal assistant 120 can stop listening. If the user taps one of the items on the
displayed list, the digital personal assistant 120 can send the user's selection to the application. If
the user taps the microphone icon, the digital personal assistant 120 can again attempt to
recognize the spoken answer (eg, by resetting a counter that counts the number of oral answer
attempts). The digital personal assistant 120 can loop until there is a match or there is a user's
11-04-2019
20
cancellation or hit of the back button on the display screen. If the application receives a valid
response from digital personal assistant 120, the application may attempt to complete the task. If
the task requires user confirmation before taking action, the state can transition to a
confirmation state 340. If the task completes successfully, the state can transition to the final
state 360 with a status of success. If the task fails to complete successfully or the application is
canceled, the state can transition to the final state 360 with a failure status. If the task takes
more than a predetermined time to complete, the state may be transitioned to the progress state
330 while the task is being performed.
[0049]
[0058]
The example state machine 300 can be extended with additional or alternative states to enable
various multi-turn conversations between the user and the application. Disambiguation (by
disambiguation state 350) and confirmation (by confirmation state 340) are specific examples of
multi-turn conversations. In general, in multi-turn conversations, the headless application can
request additional information from the user without revealing its user interface. Rather, the
information may be obtained from the user by the digital personal assistant 120 instead of the
application. Thus, the digital personal assistant 120 can act as a conduit between the user and
the application.
[0050]
[0059]
Final state 360 indicates that the application successfully completed the task, failed to complete
the task, timed out, or that the application should be launched in the foreground (redirection).
Can be shown. As described above, final state 360 may be a single state with multiple statuses
(eg, success, failure, redirection, and timeout), or a final state (eg, success, failure, redirection, and
timeout). It can be a group. The application may provide the digital personal assistant 120 with a
TTS string, a GUI string, a list of items (provided via a template), and / or activation parameters.
Digital personal assistant 120 can use the digital personal assistant 120 user interface to present
the information provided by the application to the user. In addition, or alternatively, digital
personal assistant 120 can present predefined or boilerplate responses associated with different
situations. For example, if a timeout occurs or the task fails, the Digital Personal Assistant 120
"Sorry. I could not do it for you. Could you try again later? It can be said. As another example, if
the application is requesting redirection, the Digital Personal Assistant 120 "Sorry, <appName> is
not responding." The digital personal assistant 120 can launch the application in the foreground
11-04-2019
21
using the original voice command and launch parameters (if launch parameters are provided by
the application). Can try. As another example, if the application successfully completes the task,
the digital personal assistant 120 can say "I did it for you."
[0051]
[0060]
Exemplary Command Definitions FIG. 4 conforms to a schema that may be used to create data
structures such as command data structure 140 to enable the interface between third party
applications and digital personal assistant 120. This is an example of the command definition
400. The command definition 400 may be described in various languages, such as Extensible
Markup Language (XML) or a subset of XML defined by a schema. For example, a schema can
define the structure of command definitions, such as legal elements, the hierarchy of elements,
legal and optional attributes of each element, and other suitable criteria. The command definition
400 may be used by the digital personal assistant 120 to help parse the user's speech into
different components such as applications, commands or tasks, and data items or slots, which are
data items , Is optional. For example, the command "MovieAppService, add MovieX to my queue"
may be parsed into an application ("MovieAppService"), a command ("Add"), and a data item
("MovieX"). The command definition 400 can include an application name, an application task or
command, alternative language for natural language processing, and elements for defining
responses associated with different application states.
[0052]
[0061]
One or more applications may be defined in command definition 400. The application may be a
third party application or other application installed on a computing device or web server. The
information associated with the application may be defined by elements that define the
application. For example, an application name may be defined by an <AppName> element, and an
element between <AppName> may be associated with a leading <AppName> element. In the
command definition 400, the application name is "MovieAppService", and the element following
<AppName> is associated with the "MovieAppService" application.
[0053]
[0062]
11-04-2019
22
The command following the application name is an application command. Commands may be
identified in the <Command> element. The attributes of the command element can include the
name of the command (e.g., "Name") and the activation type of the command (e.g.,
"ActivationType"). For example, the activation type may be "foreground" for commands to be
activated in the foreground, and "background" for commands to be activated in the background.
The "ActivationType" attribute may be optional, the default activation type is foreground.
[0054]
[0063]
The <ListenFor> element may be nested within a <Command> element and may be used to define
one or more ways in which a command may be voiced. Optional words or carrier words may be
provided as hints to the digital personal assistant 120 when performing natural language
processing. Carrier words may be identified within square brackets []. Data items may be
identified within braces or braces {}. In command definition 400, in general, there are two
alternative ways to invoke the "Add" command, which is defined by two <ListenFor> elements.
For example, saying either "Add MovieX to my queue" or "Add MovieX to my MovieAppService
queue" will cause the Digital Personal Assistant 120 to launch the MovieAppService "Add"
command in the background It can be used for Predefined phrases may be identified by the
keyword "builtln" in the set of curly braces {builtln: <phrase identifier}.
[0055]
[0064]
The <Feedback> element may be nested within the <Command> element and may be used to
define a phrase to be spoken to the user when the digital personal assistant 120 successfully
recognizes the command uttered by the user. . Additionally or alternatively, the <Feedback>
element can define a text string to be displayed to the user when the spoken command is being
parsed by the digital personal assistant 120.
[0056]
[0065]
The <Response> element may be nested within the <Command> element and may be used to
11-04-2019
23
define one or more responses provided to the user by the digital personal assistant 120. Each
response is associated with the application's state as defined by the "State" attribute. The states
may be about final states such as normal or failure, or intermediate states such as progression.
For example, <DisplayString> for displaying text on the screen, <TTSString> for text spoken to the
user, <AppDeepLink> for deep linking to a website, and not too deep to a website There may be
multiple types of responses defined, such as <WebLink> for linking. The response defined by the
<Response> element may be augmented with additional response information provided by the
application.
[0057]
[0066]
Exemplary Sequence Diagram FIG. 5 is an exemplary sequence diagram illustrating the
communication of multiple execution threads (510, 520, and 530) to perform the functions of a
third party application from within the digital personal assistant 120 headlessly. It is 500. UI
thread 510 and control thread 520 may be parallel threads of the multi-threaded embodiment of
digital personal assistant 120. The UI thread 510 may be mainly responsible for capturing input
from the user interface of the digital personal assistant 120 and displaying output to the user
interface of the digital personal assistant 120. For example, speech input, haptic input, and / or
text input may be captured by UI thread 510. In one embodiment, the UI thread 510 can perform
natural language processing on the input and can match the user's spoken commands to the
commands in the command data structure 140. When the voiced command is determined to
match a command in command data structure 140, the command may be communicated to
control thread 520 for further processing. In an alternative embodiment, UI thread 510 may
capture utterances into text input, individual words may be conveyed to control thread 520, and
control thread 520 may perform natural language processing on the input. The user's spoken
commands can be matched with the commands in the command data structure 140.
[0058]
[0067]
The control thread 520 can communicate with the application, track the progress of the
application, and be primarily responsible for interfacing with the UI thread 510. For example,
control thread 520 may be notified by UI thread 510 that the user has spoken to the user
interface of digital personal assistant 120. A word or command may be received by control
thread 520, and control thread 520 may notify UI thread 510 when a user command is
recognized by control thread 520. The UI thread 510 may indicate to the user that progress is
11-04-2019
24
being made to the command through the digital personal assistant 120 user interface. The UI
thread 510 or control thread 520 can determine that the command should be headlessly invoked
by retrieving the attributes of the command from the command data structure 140. The control
thread 520 can either start a new thread when the command is to be headlessly launched, or
communicate with an existing thread, such as the AppService thread 530. It may be desirable for
the AppService thread 530 to be an existing thread, rather than having the control thread 520
start a new thread, in order to reduce the response time to the user. For example, the AppService
thread 530 may be started when warming up the application or during boot up of the computing
device 130.
[0059]
[0068]
The AppService thread 530 may execute on computing device 130 or may execute on a remote
server such as remote server computer 160. The AppService thread 530 may be mainly
responsible for completing the function specified by the user command. The AppService thread
530 can maintain a state machine (such as state machine 300) to track the progress of execution
of a function, and can provide updates to the control thread 520 for the state. By providing state
updates to control thread 520, AppService thread 530 may be headless, and output to the user is
provided by digital personal assistant 120 and not by the user interface of AppService thread
530.
[0060]
[0069]
Control thread 520 can track the progress of an application (eg, AppService thread 530) by
receiving state updates from the application and checking if the application is progressing. For
example, control thread 520 communicates with AppService thread 530 (sends information to
AppService thread 530 or receives information from AppService thread 530), each time with a
timer of a predefined duration (eg, 5 seconds) It can start. If the timer expires before the
AppService thread 530 responds, then the control thread 520 can indicate to the UI thread 510
that the application failed to respond, the UI thread 510 through the digital personal assistant
120 user interface The user can be presented with a failure message. AppService thread 530 may
be terminated or ignored by control thread 520 after the timer has expired. Alternatively, if the
AppService thread 530 responds before the timer expires, if another response is expected from
the application (such as when the application responds to progress), the timer may be reset, or
The timer may be canceled (such as when the application has completed its function (final state)
11-04-2019
25
or when a user response is required (confirmation or disambiguation state)).
[0061]
[0070]
When control thread 520 receives a confirmation or disambiguation state from AppService
thread 530, control thread 520 may indicate to UI thread 510 that a confirmation or
disambiguation has been requested by the user. The UI thread 510 can present the user with a
confirmation or disambiguation option via the digital personal assistant 120 user interface.
When the user responds or fails to respond, the UI thread 510 can provide the control thread
520 with the user response or its critical absence. The control thread 520 can pass user
responses to the AppService thread 530 so that the AppService thread 530 can perform
functions. If the user fails to respond, the control thread 520 can terminate the AppService
thread 530.
[0062]
[0071]
The UI thread 510 can display various types of output via the digital personal assistant 120 user
interface. For example, the UI thread 510 can generate audio output such as digital simulated
speech output from text. Digital simulated speech is an audio that can convert the digital
simulated speech into an analog signal (eg, using a digital-to-analog converter) that can be output
as sound through a speaker or headphones It can be sent to the processing chip. As another
example, UI thread 510 can provide visual output such as images, animations, text output, and
hyperlinks for viewing by a user on a display screen of computing device 130. If the hyperlink is
tapped or clicked, the UI thread 510 can start a browser application to display the website
corresponding to the selected hyperlink. As another example, UI thread 510 can generate haptic
output, for example, by sending a vibration signal to an electric motor that can cause computing
device 130 to vibrate.
[0063]
[0072]
Exemplary Method for Headless Task Completion FIG. 6 is a flow chart of an exemplary method
600 for headlessly completing an application's task in the background of the digital personal
11-04-2019
26
assistant 120. As shown in FIG. At 610, voice input generated by a user may be received by
digital personal assistant 120. Voice input may be captured locally at computing device 130 or
remotely from computing device 130. As an example, audio input generated by the user may be
captured locally by the microphone 150 of the computing device 130 and digitized by an analog
to digital converter. As another example, voice input generated by the user may be remotely
captured by a microphone connected wirelessly (eg, by a Bluetooth® companion device) to the
computing device 130. Digital personal assistant 120 may be controlled by voice and / or text
entered at the digital personal assistant 120 user interface.
[0064]
[0073]
At 620, natural language processing of speech input may be performed to determine the user's
speech commands. The user voice command may include a request to perform predefined
functions of the application, such as a third party voice enabled application. Predefined features
may be identified using data structures that define the applications and features of the
applications supported by the digital personal assistant 120. For example, compatible
applications may be identified in a command definition file, such as command definition 400. By
using extensible command definitions to define the functionality of third party applications that
can be headlessly executed by digital personal assistant 120, digital personal assistant 120
allows the user to use the digital personal assistant 120 user interface Can make it possible to
perform more tasks.
[0065]
[0074]
At 630, the digital personal assistant 120 can cause the application to headless perform the
predefined functions without the user interface of the application appearing on the display of the
computing device 130. Because the application is defined as headless in the command data
structure 140, or the user is using the computing device in hands-free mode, running the
application in the foreground is potentially a Because it can be distracting, the digital personal
assistant 120 can decide to run the application headlessly. For example, digital personal assistant
120 can invoke a web service to perform a predefined function of the application. As another
example, digital personal assistant 120 may start a new thread on computing device 130 to
perform a predefined function of the application after user commands have been determined. As
another example, the digital personal assistant 120 can communicate with existing threads, such
as threads started during application warm-up, to perform predefined functions of the
11-04-2019
27
application. Predefined functions can be implemented as background processes. The application
can monitor the progress of predefined functions, for example by tracking the state of predefined
functions.
[0066]
[0075]
At 640, a response may be received from the application indicating a state associated with the
predefined function. For example, states may include warm-up states, initial states, progress
states, confirmation states, disambiguation states, and final states. The response may be
additional information such as a templated list, a text string, a text-speech string, an image, a
hyperlink, or any other suitable information that may be displayed to the user via the digital
personal assistant 120 user interface. Can be included.
[0067]
[0076]
At 650, the user interface of the digital personal assistant 120 can provide a response to the user
based on the received status associated with the predefined function. In this way, responses can
come from within the context of the user interface of digital personal assistant 120 without
revealing the user interface of the application. Additionally, the verification and disambiguation
capabilities of digital personal assistant 120 may be used to confirm and / or clarify user
commands of the application.
[0068]
[0077]
Exemplary Method for Determining Whether to Warm Up the Application FIG. 7 shows an
exemplary method 700 for determining whether to warm up the application while the user is
speaking to the digital personal assistant 120. Flowchart of FIG. At 710, the user can type, speak
or speak to the digital personal assistant 120. The user's text or speech can be analyzed using
natural language processing techniques and individual words can be recognized from the speech.
Individual words can be analyzed separately in the intermediate phrases in which they were
spoken. たとえば、ユーザは、「やあ、アシスタント、MyApp、... It can be said that
The word "hi" is a carrier word and may be skipped. The word "assistant" may be used to inform
11-04-2019
28
the digital personal assistant 120 that the user is requesting to perform an action. The word
"MyApp" may be interpreted as an application.
[0069]
[0078]
At 720, the typed or spoken words can be compared to the native functionality of the digital
personal assistant 120 and the functionality provided in the extensible command definition.
Collectively, the native functions and the functions defined in the command definition may be
referred to as "known AppService". The spoken words can be parsed and compared to known
AppService when the word is spoken. In other words, analysis of the utterance may occur before
the entire phrase is spoken or typed by the user. If none of the known AppService matches, at
730, the digital personal assistant 120 opens a web browser to obtain a search engine web page
having a search string corresponding to the unrecognized spoken phrase be able to. Program
control may be transferred to the web browser so that the user can refine the web search and /
or view the results. However, if the known AppService matches, the method 700 can continue at
740.
[0070]
[0079]
At 740, it can be determined whether the AppService application is a foreground task or a
background task. For example, the command definition can include an attribute that defines the
AppService application as a foreground application or background application. If the AppService
application is a foreground task, at 750, the AppService application may be launched in the
foreground and control may be transferred to the AppService application to complete the
command. If the AppService application is a background task, method 700 may continue with
concurrent steps 760 and 770.
[0071]
[0080]
At 760, the digital personal assistant 120 can provide the user with information regarding
speech analysis. In particular, the digital personal assistant 120 can generate an output for the
on-going screen of the digital personal assistant 120 user interface. Output can be defined, for
11-04-2019
29
example, in a <Feedback> element nested within a <Command> element of a command definition.
The output may be a text string and may be continually updated as the user continues to speak.
[0072]
[0081]
At 770, the digital personal assistant 120 can warm up the AppService application without
waiting for the user to finish speaking. Warming up the AppService application may allocate
memory, prefetch instructions, establish a communication session, obtain information from a
database, start a new execution thread, generate an interrupt, or Other suitable application
specific actions may be included. Applications can be warmed up based on speculative functions.
For example, instructions corresponding to speculative functions may be fetched even if the
functions are not known for sure. By warming up the application before completing the
command the user is speaking, the time to respond to the user can potentially be reduced.
[0073]
[0082]
At 780, the digital personal assistant 120 can continue to parse partial speech recognition results
until speech is complete. The end of speech may be detected based on the command being
parsed and / or based on the user's pause for longer than a predetermined time. For example, the
end of the command "MovieAppService, add MovieX to my queue" may be detected when the
word "queue" is recognized. As another example, if the end of the "TextApp, notify my wife that I
will be late for dinner" end of command is more difficult to detect as the command ends with a
data item of unknown length There is. Thus, pause may be used to indicate to the digital personal
assistant 120 that the command has been completed.
[0074]
[0083]
At 790, the end of the spoken command may be detected and the final speech recognition result
may be passed to the application. The application and digital personal assistant 120 can
communicate with each other to complete the spoken command, as described with reference to
the above figures.
11-04-2019
30
[0075]
[0084]
Computing System FIG. 8 shows a generalized example of a suitable computing system 800 in
which the described innovation can be implemented. The computing system 800 is also not
intended to suggest any limitation as to the scope of use of the functionality, as the innovations
may be implemented in a variety of general purpose or special purpose computing systems.
[0076]
[0085]
Referring to FIG. 8, computing system 800 includes one or more processing units 810, 815 and
memories 820, 825. In FIG. 8, this basic configuration 830 is included within the dashed line.
Processing units 810, 815 execute computer-executable instructions. The processing unit may be
a general purpose central processing unit (CPU), an application specific integrated circuit (ASIC),
or any other type of processor. In multi-processing systems, multi-processing units execute
computer-executable instructions to increase processing power. For example, FIG. 8 shows a
central processing unit 810 as well as a graphics processing unit or coprocessing unit 815.
Tangible memory 820, 825 may be volatile memory (eg, registers, cache, RAM), non-volatile
memory (eg, ROM, EEPROM, flash memory, etc.) or any combination of the two accessible by the
processing unit. obtain. Memories 820, 825 store software 880 that implements one or more
innovations described herein in the form of computer-executable instructions suitable for
execution by a processing unit.
[0077]
[0086]
The computing system can have additional features. For example, computing system 800
includes storage 840, one or more input devices 850, one or more output devices 860, and one
or more communication connections 870. An interconnection mechanism (not shown), such as a
bus, controller or network interconnects the components of computing system 800. Operating
system software (not shown) typically provides an operating environment for other software
executing within computing system 800 and coordinates activities of components of computing
system 800.
11-04-2019
31
[0078]
[0087]
Tangible storage 840 may be removable or non-removable, and may be a magnetic disk,
magnetic tape or cassette, CD-ROM, DVD or any other information that may be used to store
information and may be accessed within computing system 800. Includes other media. Storage
840 stores instructions for software 880 that implements one or more innovations described
herein.
[0079]
[0088]
Input device 850 may be a keyboard, a touch input device such as a mouse, a pen, or a trackball,
an audio input device, a scanning device, or another device that provides input to computing
system 800. For video encoding, input device 850 may be a camera, video card, TV tuner card, or
similar device that accepts video input in analog or digital form, or a CD that reads video samples
to computing system 800. -ROM or CD-RW. Output device 860 may be a display, a printer, a
speaker, a CD writer, or another device that provides output from computing system 800.
[0080]
[0089]
Communication connection 870 enables communication with another computing entity via a
communication medium. The communication medium conveys computer-executable instructions,
audio or video input or output, or other data in the modulated data signal. A modulated data
signal is a signal that has one or more of its characteristics set or changed to encode information
in the signal. By way of example, and not limitation, communication media may use electrical
carriers, optical carriers, RF carriers, or other carriers.
[0081]
[0090]
The innovation may be described in the general context of computer-executable instructions,
11-04-2019
32
such as those included in program modules executed on a computing system on a target real
processor or virtual processor. Generally, program modules include routines, programs, libraries,
objects, classes, components, data structures, etc. that perform particular tasks or implement
particular abstract data types. The functionality of the program modules may be combined or
divided among the program modules as desired in various embodiments. Computer executable
instructions for program modules may be executed within a local computing system or a
distributed computing system.
[0082]
[0091]
The terms "system" and "device" are used interchangeably herein. Unless the context clearly
dictates otherwise, neither term implies any limitation on the type of computing system or
computing device. In general, the computing system or computing device may be local or
distributed, and any of the software and dedicated and / or general purpose hardware that
implements the functionality described herein. It can include combinations.
[0083]
[0092]
For the sake of presentation, the detailed description uses terms such as "determine" and "use" to
describe computer operations in a computing system. These terms are high-level abstractions of
computer-implemented operations and should not be confused with acts performed by humans.
The actual computer operation corresponding to these terms will vary depending on the
implementation.
[0084]
[0093]
Mobile Device FIG. 9 is a system diagram illustrating an example mobile device 900 including
various optional hardware and software components, shown generally at 902. While any
component 902 in the mobile device can communicate with any other component, not all
connections are shown for ease of explanation. The mobile device may be any of a variety of
computing devices (eg, cell phones, smart phones, handheld computers, personal digital
assistants (PDAs), etc.), one such as a cellular network, satellite network, or other network.
11-04-2019
33
Alternatively, wireless two-way communication with multiple mobile communication networks
can be enabled.
[0085]
[0094]
The illustrated mobile device 900 is a controller or processor 910 (eg, a signal processor,
microprocessor, ASIC, etc.) for performing tasks such as signal coding, data processing, input /
output processing, power control, and / or other functions. Or other control and processing
logic). Operating system 912 can control the assignment and use of component 902 and can
support digital personal assistant 120 and one or more application programs 914. The
application programs may include common mobile computing applications (eg, email
applications, calendars, contact managers, web browsers, messaging applications, video
applications, banking applications), or any other computing application. The application program
914 can include an application having tasks that can be performed headlessly by the digital
personal assistant 120. For example, tasks may be defined in command data structure 140. A
function 913 for accessing the application store may also be used to obtain and update the
application program 914.
[0086]
[0095]
The illustrated mobile device 900 can include a memory 920. Memory 920 may include nonremovable memory 922 and / or removable memory 924. Non-removable memory 922 may
include RAM, ROM, flash memory, hard disk, or other known memory storage technology.
Removable memory 924 may include flash memory or other known memory storage technology
such as a Subscriber Identity Module (SIM) card, or a "smart card", as is known in GSM
communication systems. Memory 920 may be used to store data and / or code for executing
operating system 912 and application 914. Exemplary data may be sent to one or more network
servers or other devices via web pages, text, images, sound files, video data, or via one or more
wired or wireless networks and / or Or it may include other data sets received from it. Memory
920 may be used to store a subscriber identifier, such as an International Mobile Subscriber
Identity (IMSI), and a device identifier, such as an International Mobile Equipment Identity (IMEI).
Such an identifier may be sent to the network server to identify the user and the device.
[0087]
11-04-2019
34
[0096]
Mobile device 900 may include one or more input devices 930 such as touch screen 932,
microphone 934, camera 936, physical keyboard 938, and / or trackball 940, and one or more
outputs such as speaker 952 and display 954. And can be supported. Other possible output
devices (not shown) can include piezoelectric or other haptic output devices. Some devices can
provide more than one input / output function. For example, touch screen 932 and display 954
may be coupled into a single input / output device.
[0088]
[0097]
The input device 930 can include a natural user interface (NUI). NUI is any interface technology
that allows the user to interact with the device in a "natural" way, without the artificial
constraints imposed by input devices such as a mouse, keyboard, remote control, and the like.
Examples of NUI methods include speech recognition, touch and stylus recognition, gesture
recognition of both on-screen and near-screen, air gestures, head and eye tracking, speech and
speech, vision, touch, gestures, and machine intelligence. Including dependent ones. Other
examples of NUI are motion gesture detection using accelerometers / gyros, face recognition, 3D
display, head and eye and eye tracking, immersive augmented reality and virtual, all of which
provide a more natural interface It includes real systems as well as techniques (EEG and related
methods) for sensing brain activity using electric field sensing electrodes. Thus, in one specific
example, operating system 912 or application 914 may comprise speech recognition software as
part of a speech user interface that allows the user to operate device 900 via speech commands. .
Additionally, the device 900 may comprise input devices and software that detect and interpret
gestures, for example, to provide input to a gaming application, to enable user interaction via
spatial gestures of the user.
[0089]
[0098]
Wireless modem 960 may be coupled to an antenna (not shown) and may support bi-directional
communication between processor 910 and external devices, as is well understood in the art.
Modem 960 is shown as a whole and may include a cellular network and / or other wireless
based modems (eg, Bluetooth 964 or Wi-Fi 962) to communicate with mobile communication
network 904. The wireless modem 960 is typically in a single cellular network, between cellular
11-04-2019
35
networks, or between a mobile device and a public switched telephone network (PSTN), such as a
GSM network for data and voice communication, It is configured to communicate with one or
more cellular networks.
[0090]
[0099]
The mobile device includes at least one input / output port 980, a power supply 982, a satellite
navigation system receiver 984, such as a Global Positioning System (GPS) receiver, an
accelerometer 986, and / or a USB port, an IEEE 1394 (FireWire) port And / or may include
physical connectors 990, which may be RS-232 ports. The illustrated component 902 is not
required or generic, as any component may be removed and other components may be added.
[0091]
[0100]
Cloud Support Environment FIG. 10 shows a generalized example of a suitable cloud support
environment 1000 in which the described embodiments, techniques and techniques may be
implemented. In the example environment 1000, various types of services (eg, computing
services) are provided by the cloud 1010. For example, cloud 1010 comprises a collection of
centrally located or distributed computing devices that provide cloud-based services to various
types of users and devices connected via a network such as the Internet be able to.
Implementation environment 1000 may be used in different ways to accomplish computing
tasks. For example, some tasks (eg, processing user input and presenting a user interface) may be
performed on a local computing device (eg, connected devices 1030, 1040, 1050) and others
Tasks (eg, storage of data to be used in subsequent processing) may be performed at cloud 1010.
[0092]
[0101]
In the exemplary environment 1000, the cloud 1010 provides services for connected devices
1030, 1040, 1050 with various screen capabilities. Connected device 1030 represents a device
having computer screen 1035 (e.g., a medium sized screen). For example, the connected device
1030 may be a desktop computer, a laptop, a notebook, a personal computer such as a netbook
11-04-2019
36
or the like. Connected device 1040 represents a device having a mobile device screen 1045 (eg, a
small sized screen). For example, the connected device 1040 may be a mobile phone, a
smartphone, a personal digital assistant, a tablet computer, etc. The connected device 1050
represents a device having a large screen 1055. For example, connected device 1050 may be a
television screen (eg, smart television) or another device connected to a television (eg, a set top
box or game console). One or more of the connected devices 1030, 1040, 1050 can include
touch screen functionality. The touch screen can accept input in different ways. For example, a
capacitive touch screen detects touch input when an object (eg, a fingertip or stylus) distorts or
interrupts the current flowing across the surface. As another example, the touch screen can use
the light sensor to detect touch input when the beam from the light sensor is interrupted. There
is no need for physical contact with the surface of the screen in order for some touch screens to
detect input. Devices without screen capabilities may also be used in the exemplary environment
1000. For example, cloud 1010 can provide services for one or more computers (eg, server
computers) that do not have a display.
[0093]
[0102]
Services may be provided by cloud 1010 via service provider 1020 or via other providers of
online services (not shown). For example, cloud services may be customized to the screen size,
display capabilities, and / or touch screen capabilities of a particular connected device (eg,
connected devices 1030, 1040, 1050).
[0094]
[0103]
In the exemplary environment 1000, the cloud 1010 at least partially uses the service provider
1020 to provide the techniques and solutions described herein to various connected devices
1030, 1040, 1050. For example, service provider 1020 can provide a centralized solution for
various cloud based services. Service provider 1020 may manage service subscriptions for users
and / or devices (eg, for connected devices 1030, 1040, 1050 and / or their respective users).
[0095]
[0104]
11-04-2019
37
Exemplary Implementations Although the actions of some of the disclosed methods have been
described in a particular sequential order for convenient presentation, unless a particular
ordering is required by a particular language below, It should be understood that the method of
this description includes rearrangement. For example, the operations described sequentially may,
in some cases, be rearranged or performed simultaneously. Moreover, for the sake of simplicity,
the attached drawings may not depict the various ways in which the disclosed method may be
used in conjunction with other methods.
[0096]
[0105]
Any of the disclosed methods are stored in one or more computer readable storage media and
any available computing including computing devices (eg, smart phones or other mobile devices
including computing hardware) Device) may be implemented as computer-executable
instructions or computer program products for execution on the device. A computer readable
storage medium may be any available tangible medium (eg, an optical media disc such as one or
more DVDs or CDs, volatile memory components (such as DRAM or SRAM)) that may be accessed
in a computing environment. Or non-volatile memory components (such as flash memory or hard
drive)). By way of example, with reference to FIG. 8, computer readable storage media include
memory 820 and 825 and storage 840. By way of example, referring to FIG. 9, a computer
readable storage medium includes memory and storage 920, 922 and 924. The term computer
readable storage medium does not include signals and carriers. In addition, the term computer
readable storage medium does not include communication connections (e.g., 870, 960, 962 and
964).
[0097]
[0106]
Any of the computer readable instructions for implementing the disclosed techniques, as well as
any data created and used during the implementation of the disclosed embodiments, may be
stored on one or more computer readable storage media. Computer-executable instructions may
be, for example, a dedicated software application or part of a software application accessed or
downloaded via a web browser or other software application (such as a remote computing
application). Such software may be, for example, on a single local computer (eg, any suitable
commercially available computer) or (eg, the Internet, wide area network, local area network,
client-server network (cloud computing) It may be implemented in a network environment using
one or more network computers (such as a network) or other such networks).
11-04-2019
38
[0098]
[0107]
For clarity, only certain selected aspects of the software-based implementation will be described.
Other details known in the art are omitted. For example, the disclosed technology is not limited
to any particular computer language or program. For example, the disclosed technology may be
implemented by software written in C ++, Java, Perl, JavaScript, Adobe Flash, or any other
suitable programming language. Likewise, the disclosed technology is not limited to any
particular computer or hardware type. The specific details of suitable computers and hardware
are well known and need not be described in detail in this disclosure.
[0099]
[0108]
Furthermore, any of the software-based embodiments (eg, comprising computer-executable
instructions for causing a computer to perform any of the disclosed methods) may be uploaded,
downloaded or remotely accessed via appropriate communication means. obtain. Such suitable
communication means include, for example, the Internet, the World Wide Web, intranets,
software applications, cables (including fiber optic cables), magnetic communication,
electromagnetic communication (including RF communication, microwave communication, and
infrared communication). , Electronic communication, or other such communication means.
[0100]
[0109]
The disclosed methods, devices, and systems should not be construed as limiting in any way.
Instead, the present disclosure is directed to all novel and non-obvious features and aspects of
the various disclosed embodiments, alone and in various combinations and subcombinations with
one another. The disclosed methods, apparatus, and systems are not limited to any particular
aspect or combination thereof, and the disclosed embodiments have any one or more particular
advantages. Or do not require that the problem be solved.
[0101]
11-04-2019
39
[0110]
The techniques from any of the examples may be combined with the techniques described in any
one or more other examples. In view of the many possible embodiments to which the principles
of the disclosed technology may be applied, the illustrated embodiments are examples of the
disclosed technology and should be taken as limitations on the scope of the disclosed technology.
It should be recognized that there is no.
11-04-2019
40
Документ
Категория
Без категории
Просмотров
0
Размер файла
67 Кб
Теги
jp2018511095, description
1/--страниц
Пожаловаться на содержимое документа