PEDESTRIAN behavior modeling and analysis is

4354 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 25, NO. 9, SEPTEMBER 2016 Pedestrian Behavior Modeling From Stationary Crowds With Applications to Intelligent Surveillance Shuai Yi, Hongsheng Li, and Xiaogang Wang, Member, IEEE Abstract Pedestrian behavior modeling and analysis is important for crowd scene understanding and has various applications in video surveillance. Stationary crowd groups are a key factor influencing pedestrian walking patterns but was mostly ignored in the literature. It plays different roles for different pedestrians in a crowded scene and can change over time. In this paper, a novel model is proposed to model pedestrian behaviors by incorporating stationary crowd groups as a key component. Through inference on the interactions between stationary crowd groups and pedestrians, our model can be used to investigate pedestrian behaviors. The effectiveness of the proposed model is demonstrated through multiple applications, including walking path prediction, destination prediction, personality attribute classification, and abnormal event detection. To evaluate our model, two large pedestrian walking route datasets are built. The walking routes of around 15 000 pedestrians from two crowd surveillance videos are manually annotated. The datasets will be released to the public and benefit future research on pedestrian behavior analysis and crowd scene understanding. Index Terms Pedestrian behavior modeling, stationary crowd groups, crowd video surveillance. I. INTRODUCTION PEDESTRIAN behavior modeling and analysis is important in video surveillance and has drawn increasing attentions in recent years. Because of increasing demands of security enhancement in public spaces, automatically analyzing and understanding pedestrian behaviors for scenes with large population density such as train stations, shopping malls, and airports, is of great interest to researchers and authorities. It can be used for various applications including pedestrian walking path prediction [1], [2], traffic flow segmentation [3] [5], crowd counting [6], [7], crowd segmentation [8], [9], and abnormal event detection [10] [14]. Manuscript received September 14, 2015; revised January 20, 2016; accepted July 6, 2016. Date of publication July 11, 2016; date of current version August 2, 2016. This work was supported in part by the Ph.D. Programs Foundation of China under Grant 20130185120039, in part by the Hong Kong Innovation and Technology Support Programme under Grant ITS/221/13FP, in part by the National Natural Science Foundation of China under Grant 61371192 and Grant 61301269, and in part by the General Research Fund through the Research Grants Council, Hong Kong, under Grant CUHK14206114, Grant CUHK14205615, Grant CUHK419412, and Grant CUHK14203015. The associate editor coordinating the review of this manuscript and approving it for publication was Prof. Peter Tay. (Corresponding authors: Hongsheng Li and Xiaogang Wang.) The authors are with the Department of Electronic Engineering, The Chinese University of Hong Kong, Hong Kong (e-mail: syi@ee.cuhk.edu.hk; hsli@ee.cuhk.edu.hk; xgwang@ee.cuhk.edu.hk). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TIP.2016.2590322 Fig. 1. (a)-(b) Two video frames. (c)-(d) Energy maps calculated from (a) and (b) using the proposed model. Pedestrians are more likely to walk through regions in warm colors. (e) Illustration of multiple roles of a stationary crowd group. It can serve as a source (blue lines), a destination (red lines), or an obstacle (black lines). (f) Energy map calculated from (b) without modeling stationary crowd groups. Pedestrian behavior modeling is challenging, especially for scenes with crowds. Pedestrian decision making is complex and has high inter-person variance. Even given the same source-destination pair, one thousand pedestrians might generate one thousand different walking paths. It becomes even more complicated when pedestrians are affected by crowds. Studies [15], [16] show that people behave differently when encountering crowds. Each individual s behavior is constrained and people behave more aggressively. Moreover, social groups can be formed which might greatly influence pedestrian walking patterns. A. Stationary Crowd Groups in Pedestrian Models Previous studies [2], [17] [19] have shown that the walking behavior of an individual can be influenced by a variety of factors including scene layout (e.g. entrances, exits, walls, and obstacles), inter-person variations on the choice of source and destination, and interactions with other moving pedestrians. However, an important factor, i.e. stationary crowd groups, is missing in literature of modeling pedestrian behaviors. We argue that stationary crowd groups have considerable influence on pedestrians and are crucial in pedestrian behavior modeling. As shown in Fig. 1(d), the walking path of 1057-7149 2016 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

YI et al.: PEDESTRIAN BEHAVIOR MODELING FROM STATIONARY CROWDS 4355 a pedestrian (black curve) is affected by a stationary crowd group. However, without modeling the stationary crowd group, it is difficult to explain why the pedestrian detours when approaching the destination, as shown in Fig. 1(f). Studies show that stationary crowd groups have greater influences on pedestrian behaviors than moving crowds [20] [22]. A pedestrian usually changes the walking speed rather than direction to avoid collision with other moving crowds. However, when moving crowds become stationary, the walking pedestrian is forced to change his or her direction and the walking path is influenced significantly. As shown in Fig. 1(e), stationary crowd groups can serve as multiple roles for different pedestrians. For pedestrians that are leaving or joining a stationary crowd group, it can be regarded as the source or the destination (red and blue curves). For other pedestrians that are moving near the stationary crowd group, it can be regarded as an obstacle (black curves). Although both stationary crowd groups and fixed scene obstacles can block traffic, a pedestrian can choose to walk through the stationary crowd group or to detour from it, while scene obstacles are solid and cannot be penetrated. As shown in Fig. 1(a)-(d), the spatial distribution of stationary crowd groups might change over time, which leads to the dynamic variations of traffic patterns. Therefore, static models cannot be used for stationary crowd group modeling. In our work, the factor of stationary crowd groups is introduced for the first time to model pedestrian behaviors. Both walking through and walking by pedestrians can be well modeled. The proposed model can be dynamically updated over time to adapt the change of stationary crowd groups. B. Pedestrian Behavior Understanding By modeling the crowd scene and interactions between stationary and moving pedestrians, valuable statistical information of the scene can be obtained. The information can provide guidance for model design and inspire useful applications, which can lead to better understanding of pedestrian behaviors. By learning model parameters, we observe that stationary crowd groups have greater influence on pedestrian walking paths than moving crowds, which shows the importance of monitoring stationary groups in a traffic surveillance system. Moreover, by modeling the interactions among stationary groups and moving pedestrians, a personality attribute is proposed to classify pedestrians into different categories. This attribute is a key factor that makes each individual behave differently. Another interesting observation is that people are more likely to behave in a conservative way when the scene is not crowded. In contrast, a crowded scene leads to aggressive walking patterns because of the lack of space. C. Method Overview and Contributions A novel model is proposed in this paper to simulate the decision making process of pedestrians and to generate the most probable pedestrian walking paths. A general energy map is first computed to model the traveling difficulty of each location of the scene. Pedestrians tend to choose their walking paths through regions with higher energy values. Multiple influence factors such as scene layout, moving pedestrians, and stationary groups are all included. The influence weights of different factors are then learned from observed walking paths to reflect the importance of these factors. Different pedestrians may behave differently in the same situation and this influence is modeled by personalized energy maps, which are generated based on the general energy map and a personality parameter for each pedestrian. Finally, given a source and a destination, the fast marching algorithm [23], [24] is used to generate an optimal walking path based on the energy map. A preliminary version of this work was reported in [25]. Besides the extension of the introduction, related work, and experiment sections, we offer three significant changes in this paper. Firstly, a new outdoor dataset (Dataset II) with 2,049 pedestrian walking path annotations is added and will be released to public. All the experiments are also evaluated on the new dataset and new experimental results are reported. Secondly, a new application on pedestrian travel time estimation based on the pedestrian behavior model is introduced. Lastly, more recent methods are compared on all the proposed applications, including path prediction, destination prediction, travel time estimation, and pedestrian s personality attribute analysis. The contribution of this work can be summarized as the following three aspects. 1) A novel method is proposed to model pedestrian behaviors by including stationary crowd groups as a key component. By modeling the interactions between stationary crowd groups and pedestrians, our model can be used to investigate pedestrian behaviors. 2) Two large pedestrian walking path datasets are built. The walking routes of more than 15, 000 pedestrians from two crowd videos are annotated. 3) The effectiveness of the proposed model are demonstrated by multiple applications on the proposed datasets, including pedestrian walking path prediction (Section V-A), pedestrian destination prediction (Section V-B), pedestrian travel time prediction (Section V-C), pedestrian personality attribute estimation (Section V-D), pedestrian classification based on personality attribute (Section V-E), statistical analysis on personality attribute (Section V-F), and abnormal event detection (Section V-G). II. RELATED WORK A. Motion Pattern Modeling A lot of works have been done on modeling crowd motion patterns and segmenting traffic flows. Lagrangian coherent structures proposed by Ali and Shah [26], and Lie algebra representation proposed by Lin et al. [27] was used for flow field computation and segmentation. Topic models have been widely used [3], [28] for crowd flow modeling and estimation. Spatio-temporal dependency on motion patterns could be included in topic models [29] [31]. Motion patterns could also be discovered through clustering trajectories [32] [37]. Shao et al. [38], [39] characterized the generic properties of crowd systems by modeling the coherent motion crowd groups.

4356 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 25, NO. 9, SEPTEMBER 2016 All these methods mainly focus on general motion pattern learning but cannot model the decision making process of each individual. As pedestrian decision making is very complex and has high variance, it is difficult for these models to simulate or predict exact pedestrian behaviors. Directly using insufficient observed pedestrian position and speed as input cannot train a robust walking behavior model. In order to learn a detailed pedestrian behavior model of each individual, we should focus on the inner factors that influence pedestrian decision making process, so that the model complexity can be reduced. B. Agent-Based Models Agent-based models [17] are in a different category, in the sense that they model the decision making process of individuals. A typical example is the social force model [18], which was originally proposed for crowd simulation [40], and then was used in tracking [41], interaction analysis [42], and abnormal event detection [10]. A mixture model of dynamic agents (MDA) is proposed by Zhou et al. [2], which can learn parameters automatically. Typical agent-based models also include the self-driven particle model [43], and the reciprocal velocity obstacle model [44]. Existing agent-based methods use pre-defined rules to model each individual s walking behavior, which can be used for simulation and prediction. However, they have two main shortcomings compared with our model. First, most of these agent-based models are static models which cannot be dynamically updated with time. However, the influence factors are changing and pedestrian interactions also need to be updated with time. For example, the emergence and dispersal of stationary groups will influence the walking patterns of pedestrians. Second, most existing agent-based methods cannot model personality, which is a key factor that makes each individual behave differently [45], [46]. Even for the same scene, a conservative pedestrian is more likely to walk a long way to bypass an obstacle while an aggressive one might walk straightly through a group. In our work, a personality attribute is proposed to describe the high variance of individuals behavior caused by personality differences. In this way, subjective impacts can be included to the behavior model. C. Local Models Versus Global Models Most previous methods [2], [17], [18], [40] [42] are local models. Decisions are made based on local environments and the interactions with nearby people. These models may make reasonable decisions when there are not so many people in the scene and the scene is not that complex. However, when applied to complex scenes with crowds, these local models might be confused because there are so many factors to be considered and a lot of balance need to be made. For example, when a stationary group is blocking the main road, a pedestrian is likely to change walking direction immediately after he/she noticed the group rather than keep going forward until approaching the stationary group. This kind of early action can only be modeled by a global understanding of the scene. Actually, pedestrian decision making is a joint optimization process of the whole scene and needs global modeling. Once a pedestrian enters the scene, he/she would have a clear goal and know where to go. The person would choose an optimized walking path based on global understanding of the whole scene instead of just focusing on nearby pedestrians. Potential map modeling methods [47] can be used for global modeling and global optimized path can be generated. In our method, a jointly optimized model that simulates pedestrian decision making process is proposed. D. Walking Path Datasets Pedestrian walking route data with accurate annotations is very important for pedestrian behavior analysis. Several datasets were proposed by existing works [48], [49]. However, the videos in most of these datasets are not long enough, not crowded enough, or do not contain enough pedestrians, and therefore cannot be used in our study. Recently, Alahi et al. [50] built a large scale crowd dataset for forecasting pedestrian destinations. However, the dataset only provides trajectories of moving pedestrians without video frames, and cannot be used to study the influence of stationary crowds on pedestrian behaviors. Therefore, we built a new crowd dataset with both manually annotated trajectories and video frames. III. PEDESTRIAN BEHAVIOR MODELING Human walking path selection is similar to water flow. A pedestrian usually selects the most convenient and efficient path for reaching the destination. Based on this assumption, a general scene energy map M is proposed to model the traveling difficulty of every location of the scene. Regions with higher energy values in the energy map denote that pedestrians are energetic at these locations and can travel through these locations more easily. More pedestrians tend to choose their walking paths through the regions with higher energy values, and therefore the probability of observing pedestrians at these locations should be higher. Lower energy values indicate locations with lower occurrence probability of pedestrians. For example, areas near an obstacle or inside a stationary crowd group are difficult to walk through. The probability of observing pedestrians at these locations is lower. In our model, the influences of scene layout, moving pedestrians, and stationary groups are considered. These influence factors are chosen based on previous methods [2], [17] [19] and our own observations. Different factors may have different effects on pedestrian decision making. The three factors are modeled separately but can be jointly optimized. Their influence weights are learned from training data and reflect the importance of these factors. Factorizing these influence factors help us better understand the inner rules of pedestrian walking behavior. For example, understanding the influence of scene layout on pedestrian behavior may help us better design public areas. Personalized energy maps M P are generated based on the general energy map M and a personality parameter P. M P can be viewed as different pedestrians interpretations of

YI et al.: PEDESTRIAN BEHAVIOR MODELING FROM STATIONARY CROWDS 4357 Fig. 2. System flowchart of the proposed pedestrian behavior model. Three influence factors, i.e. scene layout, moving pedestrians, and stationary groups, are first extracted from the input frames. Afterwards, three energy maps are generated from the three corresponding influence factors separately. The general energy map can be obtained by combing the multiple energy maps together. Personalized maps are then calculated based on the general energy map by using different personality values. the general map M. Given a source and a destination, the fast marching algorithm [23], [24] is used to generate an optimal walking path in the energy map. The flowchart of the proposed model is shown in Fig. 2. A. General Energy Map Modeling There are three main steps in our pipeline to compute the general energy map M from the input frames. Firstly, as shown in the second column of Fig. 2, three influence factors, i.e. Scene Layout (SL), Moving Pedestrians (MP), and Stationary Groups (SG), are separated from the input video frames. SL is segmented by finding all the unreachable locations. MP is the location and motion information generated from the keypoint tracker. SG is the region segmented by the algorithm proposed in [22]. Then, the corresponding energy maps (the third column of Fig. 2) are computed from the three influence factors separately. The exponential family functions are adopted to model the energy maps because of their nice mathematical properties, especially when calculating the gradients and closed-form solutions of the log-likelihoods. Moreover, by adopting the exponential functions, the energy values can be mapped to the range of (0, 1]. The energy values can be well bounded and the energy map can be viewed as a probability distributions with a proper normalization term. Modeling details are introduced in Sections III-B, III-C, and III-D. Finally, the general energy map M is modeled by elementwisely multiplying the three energy maps together. M(x; ) = f SL (x; θ 1 ) f MP (x; θ 2 ) f SG (x; θ 3,θ 4 ), (1) where f SL, f MP, and f SG are the three energy maps, =[θ 1,θ 2,θ 3,θ 4 ] T are weighting parameters for different terms. M is also a probability map and can be used as the probability of pedestrian appearing at each location. With the exponential form, different influence terms can be easily merged by multiplication. Moreover, the terms can be viewed as independent and can be modeled separately. Such exponential based models are easy for extension by introducing more terms. B. Influence of Scene Layout Pedestrian s walking behavior is constrained by scene layout. Pedestrians cannot walk freely in a scene due to the constraints of walls and other static obstacles, and therefore they cannot be observed at some locations. Moreover, people tend to keep a distance from these obstacles and are not likely to walk very close to them, and thus the probability of observing a pedestrian decreases when getting close to the obstacle regions. The Scene Layout influence map is therefore modeled as ( ) θ 1 f SL (x; θ 1 ) = exp, (2) d 1 (x, SL) where SL is a set of locations occupied by scene obstacles that are unreachable, d 1 (x, SL) = min y SL x y 2 2 measures the distance from the current location x to its nearest scene obstacle location y. θ 1 is the influence bandwidth (which also can be viewed as the importance) of the scene layout term. If x SL, there is an obstacle at location x, and d 1 (x, SL) = 0. In this case, f SL (x; θ 1 ) is equal to 0, which means that pedestrians cannot appear at location x. When x / SL, d 1 (x, SL)>0. f SL (x; θ 1 ) gets close to 0 when the current location x approaches to scene obstacles. An example of a scene layout map is shown in Fig. 3. C. Influence of Moving Pedestrians The interaction with other moving pedestrians is another factor to be considered. A pedestrian tends to keep certain distance from others. As a result, there is a probability drop around the regions occupied by pedestrians. The Moving Pedestrian influence map is modeled as ( ) m θ 2 f MP (x; θ 2 ) = exp, (3) d 2 (x, MP i ) where MP i (i {1,, m}) is the ith moving pedestrian. The spatial location of MP i at current time t is denoted as yt i, and yt+1 i is used to estimate the spatial location of MP i at time i=1

4358 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 25, NO. 9, SEPTEMBER 2016 Fig. 3. An example of scene layout influence maps. (a) Scene background. (b) The energy values along the white horizontal lines in (c) and (d). The color bar indicates the energy values displayed in (c) and (d). (c)-(d) Two scene influence maps calculated by setting θ 1 as 0.01 and 0.05, respectively. Energy drops near the scene boundaries. Fig. 5. An example of stationary crowd group influence maps. (Left) Three stationary crowd group influence maps calculated from the same frame by using different θ 3 and θ 4. (Right) The energy values alone two vertical lines (a) and (b) in (Left). Comparing the two red curves, we notice that the stationary group regions may have non-zero energy values by setting a nonzero θ 4. Different groups may have different energy values due to the density differences. By setting θ 4 = 0, the differences disappear and the energy values inside the groups become zero. Fig. 4. An example of moving pedestrian influence maps. (a) A video frame. (b) The energy values along the white horizontal lines in (c) and (d). (c)-(d) Two moving pedestrian influence maps calculated by setting θ 2 as 0.01 and 0.05. Energy drops around moving pedestrian. t +1. d 2 (x, MP i ) = ( x y i t + x yi t+1 )2 ( y i t yi t+1 )2 measures the distance from the current location x to the moving pedestrian MP i,andθ 2 is the influence bandwidth of the moving pedestrian term. The locations and short tracklets of moving pedestrians are generated by the keypoint tracker [51]. The tracker may fail for long term tracking tasks. However, accurate short tracklets are easier to obtain which are enough for creating the moving pedestrian map. We use the same distance metric as the social force model [40]. A moving pedestrian influence map example is shown in Fig. 4. D. Influence of Stationary Crowd Groups Stationary crowd groups are modeled in two aspects. First, for pedestrians that bypass a stationary crowd group, this stationary crowd group acts similarly as a scene obstacle. The group has a repulsive force around the group region to keep moving pedestrians away. Second, for pedestrians that walk through a stationary crowd group, there should be a penalty inside the group region. This is the key difference with the scene layout factor, where obstacles cannot be walked through. The penalty is related to crowd density. It is more difficult to walk through denser stationary crowds. The Stationary Group influence map is modeled as ( ) n θ 3 f SG (x; θ 3,θ 4 ) = exp, (4) d 3 (x, SG i ) + θ 4 d 4 (SG i ) i=1 where SG i (i {1,, n}) is the ith stationary crowd group region, d 3 (x, SG i ) = min y SGi x y 2 2 measures the distance from x to the stationary crowd group region SG i, θ 3 is the influence bandwidth of the stationary crowd group term, and d 4 (SG i ) (0, + ) is used to measure the sparsity of stationary crowd group region SG i. d 4 represents the average distance between group members. Larger d 4 represents lower crowd density. The weight θ 4 controls the influence of group sparsity on estimation result. The stationary crowd group region is automatically detected by using the approach proposed in [22], which contains the following main steps. Foreground pixels are first encoded into multiple codewords via an L 0 norm optimization pipeline. The stationary time of foreground pixels are then accumulated for each codeword separately. Finally, stationary crowd groups are detected by thresholding the stationary time information and clustering the stationary pixels. If x SG i, the location x is inside SG i, and d 3 (x, SG i ) = 0. f SG (x; θ 3,θ 4 ) at locations x SG i inside the group is constant and is positively correlated with group sparsity d 4 (SG i ). f SG (x; θ 3,θ 4 ) is in the range of (0, 1), which means that the probability of observing a pedestrian walking through the group region decreases because of the influence of the stationary group, but it is still larger than 0. If x / SG i, x is outside SG i,andd 3 (x, SG i ) > 0. The influence value increases from group boundary to faraway locations. An example is shown in Fig. 5. E. Personalized Energy Map Modeling People might behave differently for the same situation and large variations exist for pedestrian behaviors. For example, even under the same situation, the walking paths and speed of individuals might be quite different, which leads to large variance of pedestrian walking behavior. Such variation cannot be simply modeled by a single energy map. In our model, the personality parameter P and personalized map M P are used to model such variation. The general energy map (M) can be used to model the behavior of general (average) pedestrians, and each influence

YI et al.: PEDESTRIAN BEHAVIOR MODELING FROM STATIONARY CROWDS 4359 Fig. 7. An example of path generation. The red point represents a source. Green points represent destinations. Black curves are optimal walking routes calculated by (7). Fig. 6. An example of a personalized maps. (Left) Three personalized maps calculated from the same frame using different P (P is set as 0.5, 1.0, 1.5 for cases from top to bottom). (Right) The energy values alone line (a) and (b) in (Left). factor have an influence bandwidth (θ 1,θ 2,θ 3 ). M(x; ) = exp ( E(x; )), (5) where E(x; ) = θ 1 /d 1 (x, SL) + m i=1 θ 2 /d 2 (x, MP i ) + ni=1 θ 3 /[d 3 (x, SG i ) + θ 4 d 4 (SG i )] is the summation of the three channels mentioned above. Different personalized energy maps M P are generated based on general energy map M with different P values. M P (x; ) = exp ( P E(x; )), (6) which can be regarded as a map transform process based on the general energy map M and the personality parameter P. An example of a personalized map is shown in Fig. 6. If P = 1, the personalized energy map is the same as the general energy map. However, if the personality parameter P is larger than 1 for one particular pedestrian, the influence bandwidth of all the terms (θ 1,θ 2,θ 3 ) would increase for this individual. As shown in Fig. 6, when P = 1.5, the energy values of more locations near obstacles and stationary crowd groups will be decreased (turn blue) compared to the general energy map (P = 1). It denotes that those locations are less likely to be passed through by this pedestrian. He/she cares more about these influence factors and is likely to walk a longer way to avoid close contact with these obstacles. On the other hand, if the personality parameter P is smaller than 1 for one particular pedestrian, this pedestrian is walking aggressively and cares less about obstacles. F. Path Prediction To predict pedestrian walking paths, the Fast Marching algorithm [23], [24] is utilized. Given the source x s and the destination x d, an optimal path T is calculated based on the general energy map M or the personalized energy map M P as T = f FM (M, x s, x d ) or T = f FM (M P, x s, x d ), (7) where T is the most efficient and probable route from x s to x d according to the energy map M or M P. Several examples are shown in Fig. 7. When using a personalized map M P,the optimal path is just for the specific individual. When using a general map M, the optimal path can be regarded as an average path for ordinary pedestrians. G. Learning Model Parameters The annotated pedestrian locations are used as training samples to learn the model parameters =[θ 1,θ 2,θ 3,θ 4 ] T. Since we use exponential family to model the factor maps, the values in an energy map M can be converted to represent the probabilities of pedestrian appearing at locations. A general energy map M is built based on (1). By dividing a marginalization term, Z( ), the energy map M(x; ) can be transformed to a probability distribution, p(x; ) = 1 M(x; ), (8) Z( ) where Z( ) = M(x; )dx. All the observed pedestrian locations can be regarded as independent samples to train the model parameters, and model parameters can be optimized by maximizing the likelihood of the training samples. Such assumption is widely used for various learning methods. It is quite reasonable for our problem where the number of training samples is quite large and the correlations among pedestrians can be ignored. Given X ={x 1,, x k,, x K } as K independent observations of x, the likelihood of these observations is calculated as p(x; ) = K k=1 1 Z( ) M(x k; ). (9) Gradient descent is used for optimizing the parameters by minimizing the log-likelihood log p(x; ). Details of the gradient derivations can be found in [52]. new = old + η log p(x; old). (10) old Since the log-likelihood term can be expressed as log p(x; ) = K log M(x k, ) K log Z( ), (11) k=1

4360 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 25, NO. 9, SEPTEMBER 2016 TABLE I DETAILS OF THE PROPOSED DATASETS AND SOME PREVIOUS DATASETS Fig. 8. (a) Ten annotated pedestrian walking routes in Dataset I. (b) Ten annotations in Dataset II. the gradients can be written as log p(x; ) K log M(x k ; ) log Z( ) = K k=1 [ ] log M(x; ) log M(x; ) = K, (12) X p(x; ) where < > X means expectation on observations, < > p(x; ) means expectation on probability map p(x; ). As M is the multiplication of exponential functions and is a fourdimensional parameter, log M(x, ) has the following closed-form solution, 1 d 1 (x, SL) m 1 log M(x, ) d 3 (x, SG i ) + θ 4 d 3 (SG i ) i=1 = n 1. (13) d 2 (x, MP i ) i=1 m θ 3 d 3 (SG i ) [d 3 (x, SG i ) + θ 4 d 4 (SG i )] 2 i=1 IV. PEDESTRIAN WALKING ROUTE DATASETS A. Dataset Details Pedestrian walking route data with accurate annotation can be used for model learning and evaluation. However, automatically tracking, especially in crowded scenes, is not accurate. Several existing datasets [48] [50] have major limitations and cannot be used in our study. The videos in most of these datasets are not long enough, not crowded enough, or do not contain enough pedestrians. The dataset proposed by [50] is large, but contains only trajectories without video frames, and thus cannot be used to evaluate our proposed method. Two new large scale pedestrian walking route datasets are built in this work. 1 Accurate pedestrian walking routes from two crowd surveillance videos were manually annotated as ground truth. Dataset I is collected from the Grand Central Train Station in New York city [37], and Dataset II is collected from Shanghai Expo in China [7]. Several example pedestrian walking routes are shown in Fig. 8. The details of the 1 Data can be downloaded from http://www.ee.cuhk.edu.hk/~syi/. proposed datasets and comparison with previous datasets are summarized in Table I. The proposed datasets have several advantages compared with existing ones. Our datasets are much longer than any existing one with ground truth trajectories. Long-term traffic flow changes can be observed from our datasets and they contains rich information to train a complex model of pedestrian behaviors. Moreover, the proposed datasets are crowd surveillance datasets and a large number of pedestrians in the two datasets are labeled. For Dataset I, an average of 123 pedestrians can be observed in each frame, and the most crowed frame contains 332 pedestrians. For Dataset II, the average number of pedestrians is 94, and the maximum number is 133. A large number of stationary social groups formed in our datasets and traffic pattern of each individual can be greatly influenced by crowds. These datasets are also well annotated. All the 12, 684 pedestrians in Dataset I and 2, 409 pedestrians in Dataset II are manually annotated. For each individual, the complete trajectory from the time point he/she enters the scene to the time he/she leaves is labeled. The large amount of data with accurate annotation is crucial for comprehensive evaluation and convincing statistical analysis. Besides pedestrian behavior modeling, our dataset can be used in various research areas, such as pedestrian detection, individual tracking, crowd segmentation, density estimation, and pedestrian counting. B. Statistical Analysis of the Annotated Data A lot of statistical information can be obtained from the datasets, and such information is valuable for the design of the supervised model. The influence of stationary crowds on pedestrian walking efficiency is analyzed in Fig. 9. We record the dynamic changes of (i) the percentage of stationary pedestrians, (ii) average walking path length, and (iii) average traveling time in Dataset I and Dataset II. (ii)-(iii) are measurements of traveling efficiency and larger values indicate lower efficiency. The strong correlations

YI et al.: PEDESTRIAN BEHAVIOR MODELING FROM STATIONARY CROWDS 4361 a person might prefer to adjust walking speed rather than change pre-decided walking direction to avoid collision with them. The walking path might be slightly changed but the influence is not obvious. When stationary crowds emerge in front of a pedestrian, he/she has to change his/her walking route to bypass the stationary crowds. This is the reason why stationary crowd group influence weight θ 3 is much larger than scene layout weight θ 1 and moving pedestrian weight θ 2. V. APPLICATIONS Based on the proposed model, inference and learning algorithms, various applications can be implemented and interesting characteristics about human walking behaviors can be revealed. Fig. 9. Correlation analysis between stationary crowds and traffic efficiency in (a) Dataset I and (b) Dataset II. Three statistics (i) Percentage of stationary pedestrians, (ii) average walking path length, and (iii) average traveling time, are considered. In Dataset I, the correlations between (i) and (ii)-(iii) are +0.754 and +0.753 respectively, while the correlations between total crowd density and (ii)-(iii) are 0.061 and 0.121. In Dataset II, the correlations between (i) and (ii)-(iii) are +0.502 and +0.510, while the correlations between total crowd density and (ii)-(iii) are +0.218 and 0.181. TABLE II THE LEARNED VALUES OF THE PARAMETERS FOR DATASET I AND DATASET II between (i) and (ii)-(iii) indicate that stationary crowd is a key factor that hinders traffic efficiency. In contrast, the correlations between total crowd density and (ii)-(iii) are much weaker. If every pedestrian is moving, traffic flow is smooth and efficient even when the scene is very crowded. However, when stationary crowd groups appear, the traffic efficiency might be dramatically reduced. C. Learned Model Parameters and Analysis The trajectories of moving pedestrians are used as training samples to learn model parameters. The optimized parameters in Dataset I and Dataset II are shown in Table II. The learned parameters for Dataset I and Dataset II have similar patterns, which indicates good generalization property of the proposed model. Comparing θ 3 with θ 1 and θ 2, we observe that the stationary crowd groups have greater influence on pedestrian walking behaviors than scene layout and moving pedestrians. The learned θ 4 is greater than zero, which indicates that the stationary crowd group density does influence pedestrian behaviors. From our study and analysis, we observe that a pedestrian is not sensitive to scene obstacles, because scene obstacles can never move and he/she does not need to consider possible collisions with these obstacles. For moving pedestrians, A. Prediction on Pedestrian Walking Paths Given a source x s and a destination x d, we predict an optimal walking route as T = f FM (M, x s, x d ) by minimizing (7). In this application, we assume P = 1asnoprioron pedestrians personality is given. An over-cost value η is proposed to evaluate whether predictions match observations or not. For one particular walking route T, the walking cost of T can be calculated based on the energy map M as 1 C(T, M) = x T M(x) + ɛ, (14) where x represents locations on the walking route T, M(x) is the energy value at location x on the energy map M, andɛ is a small positive number to avoid zero denominator. A good and reasonable walking route should lead to a small walking cost. The over-cost η is then defined as η = C(T O, M) C( T, M), (15) C( T, M) where C(T O, M) is the walking cost of the observed route T O based on the general energy map M, andc( T, M) is the cost of the optimized route T. η should be non-negative because C(T O, M) is no smaller than C( T, M), and smaller η indicates better match to observed ground truth. Two baselines are used to demonstrate the two main claims of the proposed stationary group factor, i.e. 1) the stationary groups cannot be ignored, and 2) the stationary groups cannot be simply modeled as solid obstacles. For the first claim, we observe that the stationary groups have repulsive forces around the group regions and may keep moving pedestrians away. Some stationary groups may force other pedestrians to change the walking path. This claim is verified by setting θ 3 = 0. When θ 3 = 0, f SG (x; 0,θ 4 ) = 1. The whole stationary crowd group term is removed. For the second claim, we observe some pedestrians may choose to walk through a stationary crowd group to destination if the stationary crowd group is not dense. However, solid obstacles cannot be walked through, which is the key difference between the stationary group factor and the scene layout factor. This claim is verified by setting θ 4 = 0. When θ 4 = 0, f SG (x; θ 3, 0) = exp ( n i=1 θ 3 /d 3 (x, SG i ) ),

4362 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 25, NO. 9, SEPTEMBER 2016 TABLE IV PATH PREDICTION RESULTS BY OUR METHOD AND THE COMPARISONS. ɛ 1.0 AND ɛ 0.8 ARE RECODED FOR BOTH DATASETS TABLE V PARAMETER SENSITIVITY ANALYSIS. PATH PREDICTION RESULTS (ɛ 1.0 ) BY DEVIATING FROM THE OPTIMAL PARAMETERS OBTAINED BY OUR ALGORITHM ON DATASET I Fig. 10. Over-cost values η calculated with our model and the two baselines on both datasets. (a) Over-cost values of 10,000 pedestrians in Dataset I. (b) Over-cost values of 1,200 pedestrians in Dataset II. The over-cost values are sorted in an increasing order for each method. TABLE III OVER-COST VALUES OF PATH PREDICTION RESULTS BY OUR METHOD AND THE BASELINES. η 1.0 AND η 0.8 ARE RECODED FOR BOTH DATASETS which is quite similar to scene layout term and the moving pedestrian term. For the stationary group region, d 3 (x, SG i ) = 0, f SG (x; θ 3, 0) = 0, and M(x) = 0. Zero probability means the stationary group regions are totally inaccessible. The whole stationary crowd groups are regarded as solid obstacle. The distributions of η by our method and the baselines on the two datasets are shown in Fig. 10. The average over-cost values of all pedestrians η 1.0 by our method and the baselines are listed in Table III. As abnormal pedestrians may have relatively higher η values which should be removed to obtain more reasonable evaluation results, the average values of the least 80% over-cost values η 0.8 are also listed in Table III. From the results in Table III, we observe that the proposed model can achieve better path prediction results than the baselines on both datasets. The performances of setting θ 3 = 0 and θ 4 = 0 are inferior, which can prove that including the influence of stationary crowd groups is necessary when modeling pedestrian behaviors, and the stationary crowd groups should be modeled differently from scene obstacles. In order to better demonstrate the effectiveness of the proposed walking path prediction model, three existing methods are compared, including the Social Force Model (SFM) [18], the Linear Trajectory Avoidance model (LTA) [41], and the Temporal Information Modeling based prediction (TIM) [47]. For each pedestrian, the walking paths are predicted using the proposed method and the compared ones. The distances (ɛ) between ground truth walking path and prediction results of different methods are computed. The average distance of all pedestrians (ɛ 1.0 ) and the average distance of 80% pedestrians with least distances (ɛ 0.8 ) are recorded for both datasets. The evaluation results of our method and the compared ones on Dataset I and Dataset II are shown in Table IV. From the results in Table IV, we observe that the proposed method can achieve better path prediction results than SFM [18], LTA [41], and TIM [47]. This is because those methods are local models, where each pedestrian is modeled to make decisions based on local environments and the interactions with nearby pedestrians. However, pedestrian walking behavior is influenced by the whole scene and needs global modeling. Moreover, the influences of stationary crowd groups are not modeled in all the compared approaches. Six examples of pedestrian walking path prediction are shown in Fig. 11. Examples (a)-(c) are from Dataset I and examples (d)-(f) are from Dataset II. Prediction results of examples (a), (b), (d), and (e) well match the observations and η is small (η = 9.7%, 4.7%, 2.7%, 5.9%). For examples in (d) and (e), the pedestrians walk in unexpected patterns. The walking costs of the observations are much higher than those of the predicted paths from optimization. Therefore, the overcosts are high (η = 74.6%, 459.9%). These activities can be regarded as abnormal and η can be used for anomaly detection. We also investigated the sensitivity of the four learnable parameters [θ 1,θ 2,θ 3,θ 4 ] on Dataset I. For each of the parameters, it is deviated by ±1%, ±5%, and ±10% from its optimal value obtained by our proposed algorithm and then evaluated on the pedestrian walking path prediction task. Path prediction results ɛ 1.0 of all these parameter settings by the proposed model are reported in Table V. From the results, we can observe that the proposed algorithm is robust to the change of model parameters. For θ 1 and θ 2, even with a huge deviation (±10%), the performance drop is not obvious. This is because the influences of scene layout factor and the moving pedestrian factor on pedestrian walking behaviors are not that significant.

YI et al.: PEDESTRIAN BEHAVIOR MODELING FROM STATIONARY CROWDS 4363 Fig. 11. Examples of pedestrian walking path prediction results on Dataset I (a-c) and Dataset II (d-f). Sources and destinations are represented by the red and green dots. The black curves are observed pedestrian walking paths and the blue curves are prediction results. (a)(b)(d)(e) are normal pedestrians while (c)(f) are abnormal cases. TABLE VI T OP -N ACCURACY OF D ESTINATION P REDICTION ON B OTH D ATASETS Fig. 12. (a) Ten source/destination regions of the scene in Dataset I. (b) Seven source/destination regions of the scene in Dataset II. The prediction performance is a little bit more sensitive to θ3 and θ4 compared with θ1 and θ2, because the stationary group factor is much more important than scene layout factor and moving pedestrian factor when modeling pedestrian behaviors. Small changes (±1%) and mid-level changes (±5%) from the optimal θ3 and θ4 do not influence the prediction performance a lot. Even with a large deviation (±10%) from the optimal θ3 and θ4, the 1.0 is still better than the other compared methods. In summary, the prediction performance is robust to the parameter values of θ1 -θ4. Small changes of these parameters would not lead to significant performance drop. B. Prediction of Pedestrian Destinations The source x s, the destination x d, and the walking path T are the three basic elements of pedestrian behaviors. In Section V-A, we predict T based on x s and x d. Given x s and part of the walking path, we can also predict the destination of this pedestrian. The source/destination regions Si (i {1,, R}) are manually labeled in both datasets as shown in Fig. 12. For Dataset I, R = 10, and for Dataset II, R = 7. The First Half of observed trajectory, which is denoted as TFH, is used as input in this experiment. Given x s and TFH, the task is to predict the destination index i {1,, R}. For each destination region Si, L(i ) is calculated as L(i ) = min D(TFH, TFH (x d )), x d Si (16) (x ), T (x ) is the where TFH (x d ) represents the First Half of T d d optimized walking route ended with x d, and D(, ) represents (x ) can the L 2 distance between the two half trajectories. T d be calculated using T (x d ) = f F M (M, x s, x d ). Smaller L(i ) indicates that the pedestrian is more likely to go to the destination Si. Then the index of estimated destination is obtained as i = arg min i {1,,R} L(i ). (17) The top-n accuracy (ground truth is within the top-n predictions) is adopted for evaluation. Two existing approaches, i.e., the MDA model [2] and an unsupervised visual prediction approach (UVP) [54], are used for comparison. The MDA model can predict the most probable destinations of trajectories. For the UVP approach, destinations are inferred from the predicted paths. Moreover, the two baselines introduced in Section V-A are also used for comparison. Estimation results on both datasets are shown in Table VI. Our method performs better than the baselines and the

4364 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 25, NO. 9, SEPTEMBER 2016 Fig. 13. Four examples of pedestrian destination prediction. Examples (a) and (b) are from Dataset I and examples (c) and (d) are from Dataset II. Sources and destinations are represented by the red and green dots respectively. (a1), (b1), (c1), and (d1) are observed walking routes, and the first halves (red parts) are used as input for destination prediction. (a2), (b2), (c2), and (d2) are prediction results. For different destinations, the most probable walking routes are predicted and shown. The values of L i for the predicted walking routes of different destinations are computed and shown. The estimated destination is chosen as the one with minimal L i (blue). In examples (b) and (d), our model makes wrong predictions due to the sudden and unexpected turning of the pedestrians. compared ones. This is because only the motion pattern information are modeled for MDA [2] and UVP [54], while the influences of stationary objects (e.g. stationary crowd groups) are ignored. The destinations of 48% pedestrians of Dataset I and those of 41% pedestrians of Dataset II can be successfully predicted at the first trial using our algorithm. Moreover, the destinations of 83% pedestrians of Dataset I and those of 80% pedestrians of Dataset II can be correctly predicted within three trials. Four examples of destination prediction are shown in Fig. 13. Examples (a-b) are from Dataset I and examples (c-d) are from Dataset II. For most pedestrians, such as examples (a) and (c), the destination prediction results well match the observations. For some special cases, such as examples (b) and (d), the walking paths are unusual and it is difficult to estimate the correct destinations at first several trials. These walking routes can be regarded as abnormal. In this way, destination prediction can be used for abnormal behavior detection. C. Pedestrian Travel Time Estimation Based on the predicted pedestrian walking path, we can further estimate the average travel time between sourcedestination regions for each time point. Instead of estimating the travel time of a single pedestrian, we care about the average travel time of many pedestrians within a temporal period. Estimating pedestrian travel time between entrances and exits has drawn great attention in surveillance applications because it indicates travel efficiency and cost. Based on the travel time information, security administrators can take prompt actions in response to the quick increase of travel time due to traffic congestion. For example, some entrances might be blocked until the congested crowds disperse, or extra exists are opened. Such information is also useful for travelers to make plans. If an algorithm can automatically identify the regions causing the increase of travel time, such as a stationary group blocking the traffic, it helps the administration staffs to solve the problem in a shorter time. Moreover, travel time itself is also an important feature to describe each individual s behavior. We can determine whether one pedestrian is behaving normally or not based on his/her travel time between an entrance and an exit. For each source-destination pair, an optimal walking route is generated based on the energy map at the current time point using the walking path prediction pipeline (Section V-A). The length of the optimal walking path, and the numbers of moving and stationary persons along the optimal walking route are important for estimating the pedestrian travel time between the source-destination pair. Therefore, these three values for each source-destination pair are used as features to train a secondorder polynomial regression model for travel time estimation. For each time point t, there might be multiple pedestrians coming from the source region S and going to the destination region D within a temporal window of t. The average travel time of all these pedestrians TS,D t is used as ground truth to train the regression model. For testing, pedestrian travel time of this source-destination pair at current time point is t estimated. For each of the source-destination traffic flows, all the video frames are randomly divided into ten folds and the 10-fold cross validation is adopted for evaluation. The average estimation Error of travel Time (ET) is adopted to evaluate the performance of the proposed method which is calculated as ET = E t,s,d T t S,D TS,D t, (18) where T S,D t is the estimation result, T S,D t is the average travel time of observed pedestrians, and E t,s,d represents the expectation among all the time points (t), and all sourcedestination flows (S, D). Several compared approaches are designed based on existing computer vision techniques. (a) The social force model [18] can be used for pedestrian behavior simulation and the simulated travel time can be used for travel time estimation. (b) Person re-identification [55] can be utilized and the pedestrian travel time can be calculated as the time interval of matched pedestrians in two frames. (c) Pedestrian tracking [56] can be applied and the pedestrian travel time can be calculated based on the starting and ending points of trajectories.

YI et al.: PEDESTRIAN BEHAVIOR MODELING FROM STATIONARY CROWDS 4365 TABLE VII TRAVEL TIME ESTIMATION RESULTS ON DATASET I AND DATASET II BY THE OUR METHOD AND THE COMPARISONS.THE AVERAGE ESTIMATION ERROR OF TRAVEL TIME (ET) IS RECORDED IN SECONDS The pedestrian in example (a) is walking in a conservative manner because the observation shown in (a1) is more similar to the walking path in (a4) with the large P value. On the other hand, the pedestrian in example (b) is walking in an aggressive manner because the observation shown in (b1) is more similar to the walking path in (b2) with the small P value. (d) Based on the pedestrian prediction approach [47], the travel time can be estimated by the temporal information provided with the path prediction result. The estimation results of our method and the compared ones on Dataset I and Dataset II are shown in Table VII. From the comparison results shown in Table VII, we can see that the our method achieves better performance than (a) the simulation method, (b) the re-identification method, (c) the tracking method, and (d) the prediction method. This is partly because these methods are not designed for the travel time estimation task. Specifically, (a) and (c) mainly focus on the pedestrians reactions to local environments. However, the travel time estimation problem needs global modeling of the whole scene instead of local areas. For (b), it cannot work well when pedestrians are occluded by each other frequently, which is quite common in crowded scenes. Person re-identification also becomes unreliable when there are a large number of pedestrians with similar appearance. For (d), the stationary crowd groups are not taken into account which may block traffic and should have great impact on pedestrian travel time. D. Personality Attribute Estimation As discussed in Section III-E, each individual s walking behaviors can be modeled by the personality parameter P, thus P can be regarded as an personality attribute to describe pedestrian walking behavior. The personality attribute P of each individual can be estimated as P = arg min D(T O, T (P)), (19) P where T O is the observed trajectory of current pedestrian, T (P) = f FM (M P (P), x s, x d ) is the estimated walking path calculated using the personalized energy map M P in (7), and D(, ) represents the distance between the two trajectories. The estimated personality parameter P minimizes the difference between the observation T O and the estimated path T (P). Examples of personality attribute estimation from the two datasets are shown in Fig. 14. Example (a) is from Dataset I and example (b) is from Dataset II. For different personality values, i.e. P = 0.1, P = 1.0, and P = 1.5, the optimal walking paths T (P) are shown. We can observe that smaller P values lead to straighter walking routes, and such walking patterns are denoted as conservative. Larger P values lead to longer walking routes that keep away from stationary crowds, and the corresponding walking patterns are denoted as aggressive. E. Pedestrian Classification on Personality Attribute From the personality estimation results, pedestrians can be classified into multiple classes based on the personality attribute P as different P values may lead to different walking behaviors. The distributions of P on both datasets are shown in Fig. 15 and similar patterns can be observed. The peak A in Fig. 15 represents aggressive pedestrians who prefer to walk directly to their destinations. Conservative pedestrians are represented by the peak B. They prefer to walk in a longer way to avoid close contact with others. The long tail of the distribution of P represents pedestrians that take a long route to their destination. Conservativeness is no longer proper to describe these pedestrians and we define these behaviors as abnormal. All the pedestrians can then be classified into three categories based on their walking behaviors: aggressive, conservative, and abnormal. All the pedestrians of both datasets are manually annotated into three categories as ground truth, i.e. aggressive, conservative, and abnormal. Bayesian classifiers (that minimize classification errors) are used to classify pedestrians into these three categories. Two other methods [45], [57] are adopted for comparison on pedestrian personality classification. For [45], the individual pedestrian walking parameters based on personality trait theory are used directly as features to train a linear SVM classifier (denoted as PTT). For [57], the relationship between trajectories based on stability analysis are used as features to train another linear SVM classifier (denoted SA). The leave-one-out evaluation results of the proposed model and the comparisons on both datasets are shown in Table VIII. Among all the annotated pedestrians, 87.43% pedestrians of Dataset I and 79.56% pedestrians of Dataset II are correctly classified using the proposed personality attribute P as feature. The classification accuracy of Dataset II is a little bit lower than that of Dataset I. It is partly because the pedestrian density of Dataset II is larger than Dataset I and is thus more difficult to model. The proposed personality feature can achieve best classification performance for most categories. For the compared methods, those features are mainly generated from motion information while the stationary structure is ignored. Moreover, the features proposed in [57] are designed for motion pattern analysis and stability analysis which may not be suitable for the personality classification task. F. Statistical Analysis on Personality We also explore the relationship between the personality value P and the scene population density on Dataset I as it is of one hour in length and long term correlations can be

4366 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 25, NO. 9, SEPTEMBER 2016 Fig. 14. Examples of pedestrian personality attribute estimation from (a) Dataset I and (b) Dataset II. Sources and destinations are represented by the red (P)) calculated based on different personality and green dots respectively. (a1) and (b1) are observed walking routes (TO ). The estimated walking paths (T values (P) and their corresponding personalized energy maps (M P ) are shown in (a2)-(a4) and (b2)-(b4). G. Abnormal Behavior Detection Abnormal behaviors can be defined as unexpected observations which are significantly different from our predictions. Predictions of pedestrian walking paths (Section V-A) and destinations (Section V-B) can both be used for anomaly detection. Several examples are shown in Fig. 11 (c)(f) and Fig. 13 (b)(d). Moreover, personality attribute estimation can also be used for anomaly detection, which is introduced in Section V-E. VI. C ONCLUSION Fig. 15. Personality value distribution on (a) Dataset I and (b) Dataset II. Two peaks A and B, and the long tail can be observed. The peak B is larger than 1 and the peak A is smaller than 1. When P = 1, the personalized map is degenerated into the general energy map. TABLE VIII A CCURACIES OF P EDESTRIAN C LASSIFICATION ON D ATASET I AND D ATASET II. T HE T OTAL N UMBER OF P EDESTRIANS OF E ACH C ATEGORY, AND THE C LASSIFICATION A CCURACY OF D IFFERENT M ETHODS A RE R ECORDED well observed. The quantitative correlation between the two values on Dataset I is 0.44. The negative correlation shows that the personality value P is negatively correlated to the scene population density. This finding is reasonable. When the scene is too crowded, the walking pattern of pedestrians is constrained and there is no enough space for conservative walking patterns. In order to reach destinations, close contact with each other is inevitable. In this paper, a novel pedestrian behavior model is proposed and the stationary crowd group influence is modeled as a key component. We have shown the model s effectiveness on various applications, including pedestrian walking path prediction, destination prediction, travel time estimation, personality attribute estimation, pedestrian classification based on personality attribute, and abnormal event detection. Two new pedestrian walking route datasets are proposed and will benefit future studies on pedestrian behavior analysis. R EFERENCES [1] G. Antonini, S. V. Martinez, M. Bierlaire, and J. P. Thiran, Behavioral priors for detection and tracking of pedestrians in video sequences, Int. J. Comput. Vis., vol. 69, no. 2, pp. 159 180, 2006. [2] B. Zhou, X. Wang, and X. Tang, Understanding collective crowd behaviors: Learning a mixture model of dynamic pedestrian-agents, in Proc. IEEE CVPR, Jun. 2012, pp. 2871 2878. [3] X. Wang, X. Ma, and W. E. L. Grimson, Unsupervised activity perception in crowded and complicated scenes using hierarchical Bayesian models, IEEE Trans. Pattern Anal. Mach. Intell., vol. 31, no. 3, pp. 539 555, Mar. 2009. [4] B. Zhou, X. Tang, and X. Wang, Measuring crowd collectiveness, in Proc. IEEE CVPR, Jun. 2013, pp. 3049 3056. [5] B. Zhou, X. Tang, H. Zhang, and X. Wang, Measuring crowd collectiveness, IEEE Trans. Pattern Anal. Mach. Intell., vol. 36, no. 8, pp. 1586 1599, Aug. 2014. [6] A. B. Chan, Z.-S. J. Liang, and N. Vasconcelos, Privacy preserving crowd monitoring: Counting people without people models or tracking, in Proc. IEEE CVPR, Jun. 2008, pp. 1 7. [7] C. Zhang, H. Li, X. Wang, and X. Yang, Cross-scene crowd counting via deep convolutional neural networks, in Proc. IEEE CVPR, Jun. 2015, pp. 833 841. [8] L. Dong, V. Parameswaran, V. Ramesh, and I. Zoghlami, Fast crowd segmentation using shape indexing, in Proc. IEEE ICCV, Oct. 2007, pp. 1 8.

YI et al.: PEDESTRIAN BEHAVIOR MODELING FROM STATIONARY CROWDS 4367 [9] P. Tu, T. Sebastian, G. Doretto, N. Krahnstoever, J. Rittscher, and T. Yu, Unified crowd segmentation, in Proc. ECCV, 2008, pp. 691 704. [10] R. Mehran, A. Oyama, and M. Shah, Abnormal crowd behavior detection using social force model, in Proc. IEEE CVPR, Jun. 2009, pp. 935 942. [11] M. Moussaïd, D. Helbing, and G. Theraulaz, How simple rules determine pedestrian behavior and crowd disasters, Proc. Nat. Acad. Sci. USA, vol. 108, no. 17, pp. 6884 6888, 2011. [12] V. Mahadevan, W. Li, V. Bhalodia, and N. Vasconcelos, Anomaly detection in crowded scenes, in Proc. IEEE CVPR, Jun. 2010, pp. 1975 1981. [13] M. J. Roshtkhari and M. D. Levine, Online dominant and anomalous behavior detection in videos, in Proc. IEEE CVPR, Jun. 2013, pp. 2611 2618. [14] C. Lu, J. Shi, and J. Jia, Abnormal event detection at 150 FPS in MATLAB, in Proc. IEEE ICCV, Dec. 2013, pp. 2720 2727. [15] D. R. Forsyth, Group Dynamics. Boston, MA, USA: Cengage Learning, 2009. [16] G. Lebon, The Crowd: A Study of the Popular Mind. London, U.K.: Macmillan, 1897. [17] E. Bonabeau, Agent-based modeling: Methods and techniques for simulating human systems, Proc. Nat. Acad. Sci. USA, vol. 99, no. Suppl 3, pp. 7280 7287, 2002. [18] D. Helbing and P. Molnár, Social force model for pedestrian dynamics, Phys. Rev. E, vol. 51, no. 5, p. 4282, 1995. [19] B. Zhou, X. Tang, and X. Wang, Learning collective crowd behaviors with dynamic pedestrian-agents, Int. J. Comput. Vis., vol. 111, no. 1, pp. 50 68, 2015. [20] M. Moussaïd, N. Perozo, S. Garnier, D. Helbing, and G. Theraulaz, The walking behaviour of pedestrian social groups and its impact on crowd dynamics, PLoS ONE, vol. 5, no. 4, p. e10047, 2010. [21] S. Yi and X. Wang, Profiling stationary crowd groups, in Proc. IEEE ICME, Jul. 2014, pp. 1 6. [22] S. Yi, X. Wang, C. Lu, and J. Jia, L 0 regularized stationary time estimation for crowd group analysis, in Proc. IEEE CVPR, Jun. 2014, pp. 2219 2226. [23] R. Kimmel, A. Amir, and A. M. Bruckstein, Finding shortest paths on surfaces using level sets propagation, IEEE Trans. Pattern Anal. Mach. Intell., vol. 17, no. 6, pp. 635 640, Jun. 1995. [24] J. A. Sethian, A fast marching level set method for monotonically advancing fronts, Proc. Nat. Acad. Sci. USA, vol. 93, no. 4, pp. 1591 1595, 1996. [25] S. Yi, H. Li, and X. Wang, Understanding pedestrian behaviors from stationary crowd groups, in Proc. IEEE CVPR, Jun. 2015, pp. 3488 3496. [26] S. Ali and M. Shah, A Lagrangian particle dynamics approach for crowd flow segmentation and stability analysis, in Proc. IEEE CVPR, Jun. 2007, pp. 1 6. [27] D. Lin, E. Grimson, and J. Fisher, Learning visual flows: A Lie algebraic approach, in Proc. IEEE CVPR, Jun. 2009, pp. 747 754. [28] D. Kuettel, M. D. Breitenstein, L. Van Gool, and V. Ferrari, What s going on? Discovering spatio-temporal dependencies in dynamic scenes, in Proc. IEEE CVPR, Jun. 2010, pp. 1951 1958. [29] R. Emonet, J. Varadarajan, and J.-M. Odobez, Extracting and locating temporal motifs in video scenes using a hierarchical non parametric Bayesian model, in Proc. IEEE CVPR, Jun. 2011, pp. 3233 3240. [30] T. Hospedales, S. Gong, and T. Xiang, A Markov clustering topic model for mining behaviour in video, in Proc. IEEE ICCV, Sep./Oct. 2009, pp. 1165 1172. [31] T. M. Hospedales, J. Li, S. Gong, and T. Xiang, Identifying rare and subtle behaviors: A weakly supervised joint topic model, IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 12, pp. 2451 2464, Dec. 2011. [32] W. Hu, D. Xie, Z. Fu, W. Zeng, and S. Maybank, Semantic-based surveillance video retrieval, IEEE Trans. Image Process., vol. 16, no. 4, pp. 1168 1181, Apr. 2007. [33] K. Kim, D. Lee, and I. Essa, Gaussian process regression flow for analysis of motion trajectories, in Proc. IEEE ICCV, Nov. 2011, pp. 1164 1171. [34] D. Makris and T. Ellis, Learning semantic scene models from observing activity in visual surveillance, IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 35, no. 3, pp. 397 408, Jun. 2005. [35] B. T. Morris and M. M. Trivedi, Trajectory learning for activity understanding: Unsupervised, multilevel, and long-term adaptive approach, IEEE Trans. Pattern Anal. Mach. Intell., vol. 33, no. 11, pp. 2287 2301, Nov. 2011. [36] X. Wang, K. T. Ma, G.-W. Ng, and W. E. L. Grimson, Trajectory analysis and semantic region modeling using nonparametric hierarchical Bayesian models, Int. J. Comput. Vis., vol. 95, no. 3, pp. 287 312, 2011. [37] B. Zhou, X. Wang, and X. Tang, Random field topic model for semantic region analysis in crowded scenes from tracklets, in Proc. IEEE CVPR, Jun. 2011, pp. 3441 3448. [38] J. Shao, K. Kang, C. C. Loy, and X. Wang, Deeply learned attributes for crowded scene understanding, in Proc. IEEE CVPR, Jun. 2015, pp. 4657 4666. [39] J. Shao, C. C. Loy, and X. Wang, Scene-independent group profiling in crowd, in Proc. IEEE CVPR, Jun. 2014, pp. 2227 2234. [40] D. Helbing, I. Farkas, and T. Vicsek, Simulating dynamical features of escape panic, Nature, vol. 407, no. 6803, pp. 487 490, 2000. [41] S. Pellegrini, A. Ess, K. Schindler, and L. van Gool, You ll never walk alone: Modeling social behavior for multi-target tracking, in Proc. IEEE ICCV, Sep./Oct. 2009, pp. 261 268. [42] P. Scovanner and M. F. Tappen, Learning pedestrian dynamics from the real world, in Proc. IEEE ICCV, Sep./Oct. 2009, pp. 381 388. [43] T. Vicsek, A. Czirók, E. Ben-Jacob, I. Cohen, and O. Shochet, Novel type of phase transition in a system of self-driven particles, Phys. Rev. Lett., vol. 75, no. 6, p. 1226, 1995. [44] J. van den Berg, M. Lin, and D. Manocha, Reciprocal velocity obstacles for real-time multi-agent navigation, in Proc. IEEE Int. Conf. Robot. Autom. (ICRA), May 2008, pp. 1928 1935. [45] S. J. Guy, S. Kim, M. C. Lin, and D. Manocha, Simulating heterogeneous crowd behaviors using personality trait theory, in Proc. ACM SIGGRAPH/Eurograph. Symp. Comput. Animation, 2011, pp. 43 52. [46] L. A. Pervin, The Science of Personality. London, U.K.: Oxford Univ. Press, 2003. [47] B. Cancela, A. Iglesias, M. Ortega, and M. G. Penedo, Unsupervised trajectory modelling using temporal information via minimal paths, in Proc. IEEE CVPR, Jun. 2014, pp. 2553 2560. [48] S. Ali and M. Shah, Floor fields for tracking in high density crowd scenes, in Proc. ECCV, 2008, pp. 1 14. [49] G. Shu, A. Dehghan, and M. Shah, Improving an object detector and extracting regions using superpixels, in Proc. IEEE CVPR, Jun. 2013, pp. 3721 3727. [50] A. Alahi, V. Ramanathan, and L. Fei-Fei, Socially-aware large-scale crowd forecasting, in Proc. IEEE CVPR, Jun. 2014, pp. 2211 2218. [51] J. Shi and C. Tomasi, Good features to track, in Proc. IEEE CVPR, Jun. 1994, pp. 593 600. [52] Y. Bengio, Learning deep architectures for AI, Found. Trends Mach. Learn., vol. 2, no. 1, pp. 1 127, 2009. [53] B. Benfold and I. Reid, Stable multi-target tracking in real-time surveillance video, in Proc. IEEE CVPR, Jun. 2011, pp. 3457 3464. [54] J. Walker, A. Gupta, and M. Hebert, Patch to the future: Unsupervised visual prediction, in Proc. IEEE CVPR, Jun. 2014, pp. 3302 3309. [55] W. Li, R. Zhao, T. Xiao, and X. Wang, DeepReID: Deep filter pairing neural network for person re-identification, in Proc. IEEE CVPR, Jun. 2014, pp. 152 159. [56] C. Tomasi and T. Kanade, Detection tracking point features, School Comput. Sci., Carnegie Mellon Univ., Pittsburgh, PA, USA, Tech. Rep. CMU-CS-91-132, 1991. [57] B. Solmaz, B. E. Moore, and M. Shah, Identifying behaviors in crowd scenes using stability analysis for dynamical systems, IEEE Trans. Pattern Anal. Mach. Intell., vol. 34, no. 10, pp. 2064 2070, Oct. 2012. Shuai Yi received the B.Eng. degree in electronic engineering from Tsinghua University in 2012. He is currently pursuing the Ph.D. degree with the Department of Electronic Engineering, The Chinese University of Hong Kong. His research interests include computer vision and machine learning, specifically for crowd analysis, and video surveillance.

4368 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 25, NO. 9, SEPTEMBER 2016 Hongsheng Li received the bachelor s degree in automation from the East China University of Science and Technology in 2006, and the master s and Ph.D. degrees in computer science from Lehigh University, Pennsylvania, in 2010 and 2012, respectively. He is currently a Research Assistant Professor with the Department of Electronic Engineering, The Chinese University of Hong Kong. His research interests include computer vision, medical image analysis, and machine learning. Xiaogang Wang received the B.S. degree from the University of Science and Technology of China in 2001, the M.S. degree from The Chinese University of Hong Kong in 2003, and the Ph.D. degree from the Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology in 2009. He is currently an Associate Professor with the Department of Electronic Engineering, The Chinese University of Hong Kong. His research interests include computer vision and machine learning.