1 Introduction
Machine teaching studies optimal control on machine learners (Zhu et al., 2018; Zhu, 2015)
. In controls language the plant is the learner, the state is the model estimate, and the input is the (not necessarily
) training data. The controller wants to use the least number of training items—a concept known as the teaching dimension (Goldman and Kearns, 1995)—to force the learner to learn a target model. For example, in adversarial learning, an attacker may minimally poison the training data to force a learner to learn a nefarious model (Biggio et al., 2012; Mei and Zhu, 2015). Conversely, a defender may immunize the learner by injecting adversarial training examples into the training data (Goodfellow et al., 2014). In education systems, a teacher may optimize the training curriculum to enhance student (modeled as a learning algorithm) learning (Sen et al., 2018; Patil et al., 2014).Machine teaching problems are either batch or sequential depending on the learner. The majority of prior work studied batch machine teaching, where the controller performs onestep control by giving the batch learner an input training set
. Modern machine learning, however, extensively employs sequential learning algorithms. We thus study sequential machine teaching: what is the shortest training sequence to force a learner to go from an initial model
to some target model ? Formally, at time the controller chooses input from an input set . The learner then updates the model according to its learning algorithm. This forms a dynamical system :(1a)  
The controller has full knowledge of , and wants to minimize the terminal time subject to . 
As a concrete example, we focus on teaching a gradient descent learner of least squares:
(1b) 
with and the input set . We caution the reader not to trivialize the problem: (1) is a nonlinear dynamical system due to the interaction between and . A previous best attempt to solve this control problem by Liu et al. (2017) employs a greedy control policy, which at step optimizes to minimize the distance between and . One of our observations is that this greedy policy can be substantially suboptimal. Figure 1 shows three teaching problems and the number of steps to arrive at using different methods. Our optimal control method NLP found shorter teaching sequences compared to the greedy policy (lengths 151, 153, 259 for NLP vs 219, 241, 310 for GREEDY, respectively). This and other experiments are discussed in Section 4.
1.1 Main Contributions
Our main contribution is to show how tools from optimal control theory may be brought to bear on the machine teaching problem. Specifically, we show that:

[itemsep=1mm,topsep=1mm,parsep=1mm]

Optimal teaching sequences can be vastly more efficient than what may be obtained via common heuristics. We present two optimal approaches: an exact method (NLP) and a continuous approximation (CNLP). Both agree when the stepsize is small, but CNLP is more scalable because its runtime does not depend on the length of the training sequence. These results are shown in Section 4.
We begin with a survey of the relevant optimal control theory and algorithms literature in Section 2.
2 TimeOptimal Control
To study the structure of optimal control we consider the continuous gradient flow approximation of gradient descent, which holds in the limit . In this section, we present the corresponding canonical timeoptimal control problem and summarize some of the key theoretical and computational tools that have been developed over the past 60 years to address it. For a more detailed exposition on the theory, we refer the reader to modern references on the topic (Kirk, 2012; Liberzon, 2011; Athans and Falb, 2013).
This section is selfcontained and we will use notation consistent with the control literature ( instead of , instead of , instead of ). We revert back to machine learning notation in section 3. Consider the following boundary value problem:
(2) 
The function is called the state and is called the input. Here, is a given constraint set that characterizes admissible inputs. The initial and terminal states and are fixed, but the terminal time is free. If an admissible together with a state satisfy the boundary value problem (2) for some choice of , we call a trajectory of the system. The objective in a timeoptimal control problem is to find an optimal trajectory, which is a trajectory that has minimal .
Established approaches for solving timeoptimal control problems can be grouped in three broad categories: dynamic programming, indirect methods, and direct methods. We now summarize each approach.
2.1 Dynamic Programming
Consider the value function , where is the minimum time required to reach starting at the initial state . The Hamilton–Jacobi–Bellman (HJB) equation gives necessary and sufficient conditions for optimality and takes the form:
(3) 
together with the boundary condition . If the solution to this differential equation is , then the optimal input is given by the minimizer:
(4) 
A nice feature of this solution is that the optimal input depends on the current state . In other words, HJB produces an optimal feedback policy.
Unfortunately, the HJB equation (3) is generally difficult to solve. Even if the minimization has a closed form solution, the resulting differential equation is often intractable. We remark that the optimal may not be differentiable. For this reason, one looks for socalled viscosity solutions, as described by Liberzon (2011); Tonon et al. (2017) and references therein.
2.2 Indirect Methods
Also known as “optimize then discretize”, indirect approaches start with necessary conditions for optimality obtained via the Pontryagin Maximum Principle (PMP). The PMP may be stated and proved in several different ways, most notably using the Hamiltonian formalism from physics or using the calculus of variations. Here is a formal statement.
[PMP] Consider the boundary value problem (2) where and its Jacobian with respect to are continuous on . Define the Hamiltonian as . If is an optimal trajectory, then there exists some function (called the “costate”) such that the following conditions hold.

[label=),itemsep=1mm,topsep=1mm,parsep=1mm]

and satisfy the following system of differential equations for with boundary conditions and .
(5a) (5b) 
For all , an optimal input satisfies:
(6) 
Zero Hamiltonian along optimal trajectories:
(7)
In comparison to HJB, which needs to be solved for all , the PMP only applies along optimal trajectories. Although the differential equations (5) may still be difficult to solve, they are simpler than the HJB equation and therefore tend to be more amenable to both analytical and numerical approaches. Solutions to HJB and PMP are related via .
2.3 Direct Methods
Also known as “discretize then optimize”, a sparse nonlinear program is solved, where the variables are the state and input evaluated at a discrete set of timepoints. An example is collocation methods
, which use different basis functions such as piecewise polynomials to interpolate the state between timepoints. For contemporary surveys of direct and indirect numerical approaches, see
Rao (2009); Betts (2010).3 Teaching Least Squares: Insight From Pontryagin
In this section, we specialize timeoptimal control to least squares. To recap, our goal is to find the minimum number of steps such that there exists a control sequence that drives the learner (1) with initial state to the target state . The constraint set is . This is an nonlinear discretetime timeoptimal control problem, for which no closedform solution is available.
On the corresponding continuoustime control problem, applying Theorem 2.2 we obtain the following necessary conditions for optimality^{1}^{1}1State, costate, and input in Theorem 2.2 are , which is conventional controls notation. For this problem, we use , which is machine learning notation. for all .
(8a)  
(8b)  
(8c)  
(8d)  
(8e) 
For any trajectory satisfying (8), there exist another trajectory of the form . So we may set without any loss of generality.
Proof.
In fact, Proposition 3 holds if we consider trajectories of (1) as well. For a proof, see the appendix.
Applying Proposition 3, the conditions (8d) and (8e) may be combined to yield the following quadratically constrained quadratic program (QCQP) equation.
(9) 
where we have omitted the explicit time specification for clarity. Note that (9) constrains the possible tuples that can occur as part of an optimal trajectory. So in addition to solving the lefthand side to find , we must also ensure that it’s equal to . We will now characterize the solutions of (9) by examining five distinct regimes of the solution space that depend on the relationship between and as well as which regime transitions are admissible.
Regime I (Origin): and .
This regime happens when the teaching trajectory pass through the origin. In this regime, one can obtain closedform solutions. In particular, and . In this regime, both and are positively aligned with . Therefore, Regime I necessarily transitions from Regime II and into Regime III, given that it is not at the beginning or the end of the teaching trajectory.
Regime II (positive alignment): with and .
This regime happens when and are positively aligned. Again we have closed form solutions. In particular, and . In this regime, both and are negatively aligned with , thus Regime II necessarily transitions into Regime I and can never transition from any other regimes.
Regime III (negative alignment inside the origincentered ball): with and and .
This regime happens when and are negatively aligned and is inside the ball centered at the origin with radius . Again, closed form solutions exists: and . Regime III necessarily transitions from Regime I and into Regime IV.
Regime IV (negative alignment out of the origincentered ball): with and and .
In this case, the solutions satisfies so that is uniquely determined by . However, the optimal is not unique. Any solution to with can be chosen. Regime IV can only transition from Regime III and cannot transition into any other regime. In other word, once the teaching trajectory enters Regime IV, it cannot escape. Another interesting property of Regime IV is that we know exactly how fast the norm of is changing. In particular, knowing , one can derive that . As a result, once the trajectory enters regime IV, we know exact how long it will take for the trajectory to reach , if it is able to reach it.
Regime V (general positions): and are linearly independent.
This case covers the remaining possibilities for the state and costate variables. To characterize the solutions in this regime, we’ll first introduce some new coordinates. Define to be the orthonormal basis for such that and for some . Note that because we assume and are assumed to be linearly independent in this regime. We can therefore express any input uniquely as where is an outofplane
unit vector orthogonal to both
and , and are suitably chosen. Substituting these definitions, (9) becomes(10) 
Now observe that the objective is linear in and does not depend on . The objective is linear in because and otherwise the entire objective would be zero. Since the feasible set is convex, the optimal must occur at the boundary of the feasible set of variables and . Therefore, . This is profound, because it implies that in Regime V, the optimal solution necessarily lies on the 2D plane . In light of this fact, we can pick a more convenient parametrization. Let and . Equation (10) becomes:
(11) 
This objective function has at most four critical points, of which there is only one global minimum, and we can find it numerically. Last but not least, Regime V does not transition from or into any other Regime.
Intrinsic lowdimensional structure of the optimal control solution.
As is hinted in the analysis of Regime V, the optimal control sometimes lies in the 2D subspace spanned by and . In fact, this holds not only for Regime V but for the whole problem. In particular, we make the following observation. There always exists a global optimal trajectory of (8) that lies in a 2D subspace of . The detailed proof can be found in the appendix. An immediate consequence of Theorem 3 is that if and are linearly independent, we only need to consider trajectories that are confined to the subspace . When and are aligned, trajectories are still 2D, and any subspace containing and is equivalent and arbitrary choice can be made.
This insight is extremely important because it enables us to restrict our attention to 2D trajectories even though the dimensionality of the original problem () may be huge. This allows us to not only obtain a more elegant and accurate solution in solving the necessary condition induced by PMP, but also to parametrize direct and indirect approaches (see Sections 2.2 and 2.3) to solve this intrinsically 2D problem more efficiently.
Multiplicity of Solution Candidates.
The PMP conditions are only necessary for optimality. Therefore, the optimality conditions (8) need not have a unique solution. We illustrate this phenomenon in Figure 3. We used a shooting approach (Section 2.2) to propagate different choices of forward in time. It turns out two choices lead to trajectories that end at , and they do not have equal total times. So in general, PMP identifies optimal trajectory candidates, which can be thought of as local minima for this highly nonlinear optimization problem.
4 Numerical Methods
While the PMP yields necessary conditions for timeoptimal control as detailed in Section 3, there is no closedform solution in general. We now present and discuss four numerical methods: CNLP and NLP are different implementations of timeoptimal control, while GREEDY and STRAIGHT are heuristics.
Cnlp:
This approach solves the continuous gradient flow limit of the machine teaching problem using a direct approach (Section 2.3). Specifically, we used the NLOptControl package (Febbo, 2017), which is an implementation of the pseudospectral method GPOPSII (Patterson and Rao, 2014) written in the Julia programming language using the JuMP modeling language (Dunning et al., 2017) and the IPOPT interiorpoint solver (Wächter and Biegler, 2006). The main tuning parameters for this software are the integration scheme and the number of mesh points. We selected the trapezoidal integration rule with mesh points for most simulations. We used CNLP to produce the trajectories in Figures 1 and 2.
Nlp:
A naïve approach to optimal control is to find the minimum for which there is a feasible input sequence to drive the learner to . Fixing , the feasibility subproblem is a nonlinear program over dimensional variables and constrained by learner dynamics. Recall is given, and one can fix for all by Proposition 3. For our learner (1), the feasibility problem is
(12)  
s.t.  
As in the CNLP case, we modeled and solved the subproblems (12) using JuMP and IPOPT. We also tried Knitro, a stateoftheart commercial solver (Byrd et al., 2006), and it produced similar results. We stress that such feasibility problems are difficult; IPOPT and Knitro can handle moderately sized . For our specific learner (1) there are 2D optimal control and state trajectories in as discussed in Section 3. Therefore, we reparameterized (12) to work in 2D.
On top of this, we run a binary search over positive integers to find the minimum for which the subproblem (12) is feasible. Subject to solver numerical stability, the minimum and its feasibility solution is the timeoptimal control. While NLP is conceptually simple and correct, it requires solving many subproblems with variables and constraints, making it less stable and scalable than CNLP.
Greedy:
We restate the greedy control policy initially proposed by Liu et al. (2017). It has the advantage of being computationally more efficient and readily applicable to different learning algorithms (i.e. dynamics). Specifically for the least squares learner (1) and given the current state , GREEDY solves the following optimization problem to determine the next teaching example :
(13)  
s.t. 
The procedure repeats until . We used the Matlab function fmincon to solve the above quadratic program iteratively. We point out that the optimization problem is not convex. Moreover, does not necessarily point in the direction of . This is evident in Figure 1 and Figure 6.
Straight:
We describe an intuitive control policy: at each step, move in straight line toward as far as possible subject to the constraint . This policy is less greedy than GREEDY because it may not reduce as much at each step. The perstep optimization in is a 1D line search:
(14)  
s.t.  
The line search (14) can be solved in closedform. In particular, one can obtain that
4.1 Comparison of Methods
We ran a number of experiments to study the behavior of these numerical methods. In all experiments, the learner is gradient descent on least squares (1), and the control constraint set is . Our first observation is that CNLP has a number of advantages:

CNLP’s continuous optimal state trajectory matches NLP’s discrete state trajectories, especially on learners with small . This is expected, since the continuous optimal control problem is obtained asymptotically from the discrete one as . Figure 4 shows the teaching task . Here we compare CNLP with NLP’s optimal state trajectories on four gradient descent learners with different values. The NLP optimal teaching sequences vary drastically in length , but their state trajectories quickly overlap with CNLP’s optimal trajectory.

CNLP is quick to compute, while NLP runtime grows as the learner’s decreases. Table 1 presents the wall clock time. With a small , the optimal control takes more steps (larger ). Consequently, NLP must solve a nonlinear program with more variables and constraints. In contrast, CNLP’s runtime does not depend on .

CNLP can be used to approximately compute the “teaching dimension”, i.e. the minimum number of sequential teaching steps for the discrete problem. Recall CNLP produces an optimal terminal time . When the learner’s is small, the discrete “teaching dimension” is related by . This is also supported by Table 1.
That said, it is not trivial to extract a discrete control sequence from CNLP’s continuous control function. This hinders CNLP’s utility as an optimal teacher.
NLP  CNLP  
0.02  0.001  
75  1499  
0.013s  0.14s  59.37s  4.1s  
76  1519  
0.008s  0.11s  53.28s  2.37s  
128  2570  
0.012s  0.63s  310.08s  2.11s 
NLP  STRAIGHT  GREEDY  

148  161  233  
221  330  721  
292  867  2667  
346  2849  10581 
Our second observation is that NLP, being the discretetime optimal control, produces shorter teaching sequences than GREEDY or STRAIGHT. This is not surprising, and we have already presented three teaching tasks in Figure 1 where NLP has the smallest . In fact, there exist teaching tasks on which GREEDY and STRAIGHT can perform arbitrarily worse than the optimal teaching sequence found by NLP. A case study is presented in Table 2. In this set of experiments, we set and . As increases, the ratio of teaching sequence length between STRAIGHT and NLP and between GREEDY and NLP grow at an exponential rate.
We now dig deeper and present an intuitive explanation of why GREEDY requires more teaching steps than NLP. The fundamental issue is the nonlinearity of the learner dynamics (1) in . For any let us define the onestep reachable set . Figure 5 shows a sample of such reachable sets. The key observation is that the starting is quite close to the boundary of most reachable sets. In other words, there is often a compressed direction—from to the closest boundary of —along which makes minimal progress. The GREEDY scheme falls victim to this phenomenon.
Figure 6 compares NLP and GREEDY on a teaching task chosen to have short teaching sequences in order to minimize clutter. GREEDY starts by eagerly descending a slope and indeed this quickly brings it closer to . Unfortunately, it also arrived at the axis. For on the axis, the compressed direction is horizontally outward. Therefore, subsequent GREEDY moves are relatively short, leading to a large number of steps to reach . Interestingly, STRAIGHT is often better than GREEDY because it also avoids the axis compressed direction for general .
We illustrate the optimal inputs in Figure 7, which compares produced by STRAIGHT, GREEDY, and NLP and the produced by CNLP. The heuristic approaches eventually take smallermagnitude steps as they approach while NLP and CNLP maintain a maximal input norm the whole way.
5 Concluding Remarks
Techniques from optimal control are underutilized in machine teaching, yet they have the power to provide better quality solutions as well as useful insight into their structure.
As seen in Section 3, optimal trajectories for the least squares learner are fundamentally 2D. Moreover, there is a taxonomy of regimes that dictates their behavior. We also saw in Section 4 that the continuous CNLP solver can provide a good approximation to the true discrete trajectory when is small. CNLP is also more scalable than simply solving the discrete NLP directly because NLP becomes computationally intractable as gets large (or gets small), whereas the runtime of CNLP is independent of .
A drawback of both NLP and CNLP is that they produce trajectories rather than policies. In practice, using an openloop teaching sequence will not yield the we expect due to the accumulation of small numerical errors as we iterate. In order to find a control policy, which is a map from state to input , we discussed the possibility of solving HJB (Section 2.1) which is computationally expensive.
An alternative to solving HJB is to precompute the desired trajectory via CNLP and then use modelpredictive control (MPC) to find a policy that tracks the reference trajectory as closely as possible. Such an approach is used in Liniger et al. (2015), for example, to design controllers for autonomous race cars, and would be an interesting avenue of future work for the machine teaching problem.
Finally, this paper presents only a glimpse at what is possible using optimal control. For example, the PMP is not restricted to merely solving timeoptimal control problems. It is possible to analyze problems with state and inputdependent running costs, state and input pointwise or integral constraints, conditional constraints, and even problems where the goal is to reach a target set rather than a target point.
References
 Athans and Falb (2013) M. Athans and P. L. Falb. Optimal control: An introduction to the theory and its applications. Courier Corporation, 2013.
 Betts (2010) J. T. Betts. Practical methods for optimal control and estimation using nonlinear programming, volume 19. SIAM, 2010.

Biggio et al. (2012)
B. Biggio, B. Nelson, and P. Laskov.
Poisoning attacks against support vector machines.
In J. Langford and J. Pineau, editors, 29th Int’l Conf. on Machine Learning (ICML). Omnipress, Omnipress, 2012.  Byrd et al. (2006) R. H. Byrd, J. Nocedal, and R. A. Waltz. Knitro: An integrated package for nonlinear optimization. In Large Scale Nonlinear Optimization, 35–59, 2006, pages 35–59. Springer Verlag, 2006.

Dunning et al. (2017)
I. Dunning, J. Huchette, and M. Lubin.
Jump: A modeling language for mathematical optimization.
SIAM Review, 59(2):295–320, 2017. doi: 10.1137/15M1020575. 
Febbo (2017)
H. Febbo.
Nloptcontrol.jl, 2017.
URL :
https://github.com/JuliaMPC/NLOptControl.jl.  Goldman and Kearns (1995) S. Goldman and M. Kearns. On the complexity of teaching. Journal of Computer and Systems Sciences, 50(1):20–31, 1995.
 Goodfellow et al. (2014) I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and Harnessing Adversarial Examples. ArXiv eprints, Dec. 2014.
 Kao et al. (2004) C. Y. Kao, S. Osher, and J. Qian. Lax–Friedrichs sweeping scheme for static Hamilton–Jacobi equations. Journal of Computational Physics, 196(1):367–391, 2004.
 Kirk (2012) D. E. Kirk. Optimal control theory: An introduction. Courier Corporation, 2012.
 Liberzon (2011) D. Liberzon. Calculus of variations and optimal control theory: A concise introduction. Princeton University Press, 2011.
 Liniger et al. (2015) A. Liniger, A. Domahidi, and M. Morari. Optimizationbased autonomous racing of 1:43 scale RC cars. Optimal Control Applications and Methods, 36(5):628–647, 2015.
 Liu et al. (2017) W. Liu, B. Dai, A. Humayun, C. Tay, C. Yu, L. B. Smith, J. M. Rehg, and L. Song. Iterative machine teaching. In International Conference on Machine Learning, pages 2149–2158, 2017.

Mei and Zhu (2015)
S. Mei and X. Zhu.
Using machine teaching to identify optimal trainingset attacks on
machine learners.
In
The TwentyNinth AAAI Conference on Artificial Intelligence
, 2015.  Patil et al. (2014) K. Patil, X. Zhu, L. Kopec, and B. Love. Optimal teaching for limitedcapacity human learners. In Advances in Neural Information Processing Systems (NIPS), 2014.
 Patterson and Rao (2014) M. A. Patterson and A. V. Rao. GPOPSII: A MATLAB software for solving multiplephase optimal control problems using hpadaptive Gaussian quadrature collocation methods and sparse nonlinear programming. ACM Transactions on Mathematical Software (TOMS), 41(1):1, 2014.
 Rao (2009) A. V. Rao. A survey of numerical methods for optimal control. Advances in the Astronautical Sciences, 135(1):497–528, 2009.
 Sen et al. (2018) A. Sen, P. Patel, M. A. Rau, B. Mason, R. Nowak, T. T. Rogers, and X. Zhu. Machine beats human at sequencing visuals for perceptualfluency practice. In Educational Data Mining, 2018.
 Tonon et al. (2017) D. Tonon, M. S. Aronna, and D. Kalise. Optimal control: Novel directions and applications, volume 2180. Springer, 2017.
 Tsitsiklis (1995) J. N. Tsitsiklis. Efficient algorithms for globally optimal trajectories. IEEE Transactions on Automatic Control, 40(9):1528–1538, 1995.
 Wächter and Biegler (2006) A. Wächter and L. T. Biegler. On the implementation of a primaldual interior point filter line search algorithm for largescale nonlinear programming. Mathematical Programming, 106(1):25–57, 2006.
 Zhu (2015) X. Zhu. Machine teaching: an inverse problem to machine learning and an approach toward optimal education. In The TwentyNinth AAAI Conference on Artificial Intelligence (AAAI “Blue Sky” Senior Member Presentation Track), 2015.
 Zhu et al. (2018) X. Zhu, A. Singla, S. Zilles, and A. N. Rafferty. An Overview of Machine Teaching. ArXiv eprints, Jan. 2018. https://arxiv.org/abs/1801.05927.
References
 Athans and Falb (2013) M. Athans and P. L. Falb. Optimal control: An introduction to the theory and its applications. Courier Corporation, 2013.
 Betts (2010) J. T. Betts. Practical methods for optimal control and estimation using nonlinear programming, volume 19. SIAM, 2010.

Biggio et al. (2012)
B. Biggio, B. Nelson, and P. Laskov.
Poisoning attacks against support vector machines.
In J. Langford and J. Pineau, editors, 29th Int’l Conf. on Machine Learning (ICML). Omnipress, Omnipress, 2012.  Byrd et al. (2006) R. H. Byrd, J. Nocedal, and R. A. Waltz. Knitro: An integrated package for nonlinear optimization. In Large Scale Nonlinear Optimization, 35–59, 2006, pages 35–59. Springer Verlag, 2006.

Dunning et al. (2017)
I. Dunning, J. Huchette, and M. Lubin.
Jump: A modeling language for mathematical optimization.
SIAM Review, 59(2):295–320, 2017. doi: 10.1137/15M1020575. 
Febbo (2017)
H. Febbo.
Nloptcontrol.jl, 2017.
URL :
https://github.com/JuliaMPC/NLOptControl.jl.  Goldman and Kearns (1995) S. Goldman and M. Kearns. On the complexity of teaching. Journal of Computer and Systems Sciences, 50(1):20–31, 1995.
 Goodfellow et al. (2014) I. J. Goodfellow, J. Shlens, and C. Szegedy. Explaining and Harnessing Adversarial Examples. ArXiv eprints, Dec. 2014.
 Kao et al. (2004) C. Y. Kao, S. Osher, and J. Qian. Lax–Friedrichs sweeping scheme for static Hamilton–Jacobi equations. Journal of Computational Physics, 196(1):367–391, 2004.
 Kirk (2012) D. E. Kirk. Optimal control theory: An introduction. Courier Corporation, 2012.
 Liberzon (2011) D. Liberzon. Calculus of variations and optimal control theory: A concise introduction. Princeton University Press, 2011.
 Liniger et al. (2015) A. Liniger, A. Domahidi, and M. Morari. Optimizationbased autonomous racing of 1:43 scale RC cars. Optimal Control Applications and Methods, 36(5):628–647, 2015.
 Liu et al. (2017) W. Liu, B. Dai, A. Humayun, C. Tay, C. Yu, L. B. Smith, J. M. Rehg, and L. Song. Iterative machine teaching. In International Conference on Machine Learning, pages 2149–2158, 2017.

Mei and Zhu (2015)
S. Mei and X. Zhu.
Using machine teaching to identify optimal trainingset attacks on
machine learners.
In
The TwentyNinth AAAI Conference on Artificial Intelligence
, 2015.  Patil et al. (2014) K. Patil, X. Zhu, L. Kopec, and B. Love. Optimal teaching for limitedcapacity human learners. In Advances in Neural Information Processing Systems (NIPS), 2014.
 Patterson and Rao (2014) M. A. Patterson and A. V. Rao. GPOPSII: A MATLAB software for solving multiplephase optimal control problems using hpadaptive Gaussian quadrature collocation methods and sparse nonlinear programming. ACM Transactions on Mathematical Software (TOMS), 41(1):1, 2014.
 Rao (2009) A. V. Rao. A survey of numerical methods for optimal control. Advances in the Astronautical Sciences, 135(1):497–528, 2009.
 Sen et al. (2018) A. Sen, P. Patel, M. A. Rau, B. Mason, R. Nowak, T. T. Rogers, and X. Zhu. Machine beats human at sequencing visuals for perceptualfluency practice. In Educational Data Mining, 2018.
 Tonon et al. (2017) D. Tonon, M. S. Aronna, and D. Kalise. Optimal control: Novel directions and applications, volume 2180. Springer, 2017.
 Tsitsiklis (1995) J. N. Tsitsiklis. Efficient algorithms for globally optimal trajectories. IEEE Transactions on Automatic Control, 40(9):1528–1538, 1995.
 Wächter and Biegler (2006) A. Wächter and L. T. Biegler. On the implementation of a primaldual interior point filter line search algorithm for largescale nonlinear programming. Mathematical Programming, 106(1):25–57, 2006.
 Zhu (2015) X. Zhu. Machine teaching: an inverse problem to machine learning and an approach toward optimal education. In The TwentyNinth AAAI Conference on Artificial Intelligence (AAAI “Blue Sky” Senior Member Presentation Track), 2015.
 Zhu et al. (2018) X. Zhu, A. Singla, S. Zilles, and A. N. Rafferty. An Overview of Machine Teaching. ArXiv eprints, Jan. 2018. https://arxiv.org/abs/1801.05927.
6 Appendix
Proof of modified Proposition 3.
All we need to show is that for any pair of , there exist another pair , such that they give the same update. In particular, we set and show that there always exists an such that
This simplifies to
(15) 
The discriminant of the quadratic (15) is
So there always exists a solution . Moreover, and , so there must be a real root in .
Proof of Theorem 3.
We showed in Section 3 that Regime V trajectories are 2D. We also argued that solutions that reach via Regime III–IV are not unique and need not be 2D. We will now show that it’s always possible to construct a 2D solution.
We begin by characterizing the set of reachable via Regime III–IV. Recall from Section 3 that the transition between III and IV occurs when . If is the time at which this transition occurs, then for , the solution is , which leads to a straightline trajectory from to .
Now consider the part of the trajectory in Regime IV, where . As derived in Section 3, Regime IV trajectories satisfy . These lead to , which means that grows at the same rate regardless of . If our trajectory reaches , then we can deduce via integration that
(16) 
Suppose for is a trajectory that reaches . Refer to Figure 8. The reachable set at time is a spherical sector whose boundary requires a trajectory that maximizes curvature. We will now derive this fact.
Let be the largest possible angle between and any reachable , where we have fixed . Define to be the angle between and .
An alternative expression for this rate of change is the projection of onto the orthogonal complement of :
Where we used the fact that in Regime IV. Now,
(17) 
In the final step, we maximized over . Notice that the integrand (17) is an upper bound that only depends on and but not on . One can also verify that this upper bound is achieved by the choice
where and is any vector that satisfies (16) with angle with . Any with this norm but angle can also be reached by using the maxcurvature control until time , where is chosen such that , and then using for . This piecewise path is illustrated in Figure 8.
Our constructed optimal trajectory lies in the 2D span of and . This shows that all reachable can be reached via a 2D trajectory.
Comments
There are no comments yet.